TESTING FOR RANDOM UTILITY MAXIMIZATION by J. Scott Shonkwiler and Darek J. Nalle* T HE USE OF MULTI-RESPONSE QUALITATIVE response models is prevalent in all fields of microeconomic inquiry. The most common formulation is the conditional logit model which has been widely embraced to represent qualitative response variables arising from the choice of a single alternative from a set of J alternatives. Extensions of the conditional logit model typically are chosen to better account for more flexible patterns of substitution among the J alternatives. Such extensions comprise the nested logit, generalized nested logit, mixed multinomial logit, and the multinomial probit estimators. Fundamental to these estimators is the maintained assumption of random utility maximization. While random utility maximization might be a desirable construct in terms of generating utility theoretic welfare measures, its imposition may result in a misspecified estimator of the data-generating process. Most tests of the conditional logit model have been concerned with detecting violations of the independence from irrelevant alternatives assumption. Curiously, little work has been directed at inferring consistency with random utility maximization in qualitative response models as contrasted to the extensive literature on testing for optimizing behavior of agents in the allocation of continuous choice variables. A principal reason for the lack of testing the random utility maximization hypothesis has been a dearth of alternative qualitative choice models that are well defined, flexible, and unconstrained by certain restrictions of random utility maximization. In particular, to test the Hotelling symmetry condition of random utility maximization, the universal logit and the dogit models have been proposed as schemes for incorporating the characteristics of other alternatives in the choice model of a given alternative. Yet the universal logit model is typically computationally infeasible in its general form and the dogit model is fairly inflexible (McFadden 1984, p. 1415; McFadden 1981, footnote 33). We propose the multinomial discrete normal joint probability mass function as a flexible alternative to the set of statistical models consistent with random utility maximization. We will show that the multinomial discrete normal is a member of the exponential family and that it nests the conditional logit model. We will discuss its parametric identification and describe how it fails to satisfy the conditions required * J. Scott Shonkwiler and Darek J. Nalle are Professor and Assistant Professor, respectively, at the Department of Resource Economics, University of Nevada, Reno. This chapter has benefited from discussions with Alan Ker, Klaus Moeltner, Kenneth Train, and Roger von Haefen. Work was supported in part by the Nevada Agricultural Experiment Station and NRI Grant No. 2002-01815. 2 ESSAYS IN HONOR OF STANLEY R. JOHNSON for random utility maximization. We will consider more flexible alternatives to the conditional logit and adopt tests for discriminating between the multinomial discrete normal and conditional logit specifications. We then implement our approach by analyzing a data set of recreational visits to reservoirs in the Columbia River Basin. First, however, we will review the key features of random utility maximization for the linear in income case and illustrate the requirements of random utility maximization for the conditional logit model. Random Utility Modeling A chooses from i = 1,2,..., J mutually exclusive alternatives on a given choice occasion. Let an indirect utility function represent the individual’s (n = 1,2,..., N) level of satisfaction associated with the ith alternative, according to SSUME THAT A RELEVANT POPULATION U ni = f i ( I n - p ni) + β' x i + ε ni = µ ni + ε ni , where In is money income, pni is individual n’s cost of obtaining the ith alternative, and x is a vector of attributes associated with each alternative. Assume that the indirect utility function can be defined to be linear and additive in income and attributes. The probability that the ith alternative (suppressing individual subscripts) yields maximum utility across the J alternatives is given by Pi = Prob [U ( I, pi , x i , εi) > U (I, ph , x h , εh )] for all h ≠ i . Obtaining an operational estimator requires finding a joint distribution for the errors such that evaluation of the probability above is not computationally burdensome. Assume the errors are independently and identically distributed according to the joint extreme value cumulative distribution function F(ε); then the probability of selecting the ith alternative results in the multinomial logit model with probabilities given by J µ µ Pi = e i / ∑ e h . h=1 Under the random utility model Ui = µi + εi with ε distributed F(ε), then E[maxU] = 1n ∑ Jh =1 eµh + E , where E is Euler’s constant, i.e., E = 0.577215665 (McFadden 1999). In the additive income random utility setup, µni is typically parameterized according to TESTING FOR RANDOM UTILITY MAXIMIZATION 3 µni = βppni + β'xni . In this formulation the parameters do not vary across alternatives; therefore, the statistical model is termed the conditional logit under the iid extreme value assumption for the εh. It is important to note that there is an equivalent representation of the estimator using either utility levels or utility differences, i.e., the translation invariance property. For example, in the two alternative case, either the probability of alternative one may be written Pn1 = exp(µ n1 ) [exp(µ n1 ) + exp(µ n 2 )] or, if instead we consider the differences in the systematic components of utility, this probability may be written as Pn1 = exp(µ n1 − µ n 2 ) [exp(µ n1 − µ n 2 ) + exp(µ n 2 − µ n 2 )] , which is estimably equivalent to the former expression. As Hanemann (1999) notes, it is this characteristic that is necessary for an estimator to be consistent with random utility maximization (RUM). Violation of this condition can occur when the joint distribution function of ε depends in some manner on one or more elements of µ (Koning and Ridder 2003). The literature provides no test of this condition. A symmetry restriction imposed by RUM is that ∂Pni / ∂p nj = ∂Pnj / ∂p ni (McFadden 1981; Daly and Zachary 1978). This relationship follows from applying Roy’s Identity to a proper indirect utility function. Termed the Hotelling condition by Daly and Zachary, we have in the MNL (multinomial logit) case ∂µ ∂Pni ∂µ nj = − Pni Pnj nj = − Pni Pnjβp . ∂µ nj ∂p nj ∂p nj This condition can be tested empirically by allowing µh to depend on prices other than ph. Finally, the non-negativity requirement is that each of the j ≤ J-1 mixed partials of Pi with respect to the components of µ, excluding µi, have signs given by (−1) j ∂ j Pi ∂µ1...∂µi −1∂µi +1...∂µ j ≥ 0. This guarantees that the implied density function is non-negative (Daly and Zachary 1978) and also has the interpretation that if all other alternatives become more attractive, the probability of choosing the ith alternative cannot increase (Koning and Ridder 2003). 4 ESSAYS IN HONOR OF STANLEY R. JOHNSON Borsch-Supan (1990) examines the non-negativity condition because its violation can arise in the estimation of nested logit models. He introduces the distinction between global compatibility versus local compatibility with the conditions for random utility maximization and suggests that for the non-negativity condition the Daly-Zachary– McFadden conditions for the validity of random utility maximization may be unnecessarily strong. Borsch-Supan develops a weaker validity condition which depends on the magnitudes of the systematic components of utility (see Herriges and Kling, 1996, for an extension of this work). Koning and Ridder (2003) generalize the relationships between global and local compatibility under stochastic utility maximization. In an empirical application their tests suggest that even with the fewest maintained assumptions the hypothesis of random utility maximization cannot be supported. In fact, they claim that the conditions for stochastic utility maximization are much stronger than those underpinning models of a representative agent facing a continuous choice problem. However, they do not develop parametric tests for the key assumptions of random utility maximization. To accomplish this, a more general model that nests the conditional logit but is not bound by the random utility maximization conditions would be needed. We propose the multinomial discrete normal distribution for this purpose. Kemp’s Discrete Normal Distribution T for multinomial data based on the discrete normal distribution. The discrete normal distribution − introduced by Kemp (1997) − has support on all integers, not just non-negative integers. The intriguing characteristic of this distribution is its generalization to the multivariate discrete normal distribution. The discrete normal distribution is at the center of the proposed estimator and has recently been reparameterized by Szablowski (2001) and Shonkwiler (2001) so that location and scale parameters are estimated directly. This resultant probability mass function is easily shown to be a member of the exponential family. Kemp (1997) recognized that the discrete normal distribution (i) is analogous to the normal distribution in that it is the only two-parameter discrete distribution on (−∞, ∞) for which the first two moment equations are the maximum-likelihood equations; (ii) is log-concave (Johnson, Kotz, and Balakrishnan 1997, p. 27); and (iii) has either a single mode or a joint mode at y and y-1. Unfortunately, Kemp’s characterization does not permit closed form expressions of the mean and variance of Y. The discrete normal may be represented as HIS STUDY INVESTIGATES A NEW MODEL TESTING FOR RANDOM UTILITY MAXIMIZATION P(Y = y) = 5 e −.5( y − µ ) ∑e 2 / σ2 −.5(Y − µ )2 / σ2 Y (Szablowski 2001; Shonkwiler 2001), such that E(Y) ≈ µ and V(Y) ≈ σ2. While there do not exist closed form expressions for the mean and variance, the location and scale parameters, µ and σ2, are the approximate mean and variance respectively. We say approximate because problems arise when σ2→0. Essentially, in these situations, the available points of support are not sufficient to reproduce a mass function with mean and variance equal to µ and σ2 respectively (see also Szablowski 2001). For σ2 > 1, µ and σ2 can effectively be considered the mean and variance of the discrete normal distribution (with accuracy better than about 10-4 using the bounds provided by Szablowski). Multinomial Discrete Normal Distribution The univariate discrete normal distribution can be generalized to the multivariate case (Shonkwiler 2001) using the representation P(Y1 = y1 , Y2 = y 2 ,..., Ym = y m ) = exp( −.5(y − µ) ' ∑ −1 (y − µ)) yi = −2, −1,0,1, 2,...∀i, −1 ∑ ...∑ ∑ exp(−.5(Y − µ) ' ∑ (Y − µ)) Ym Y2 Y1 where Σ is a positive (semi) definite symmetric matrix and quantities in bold denote the corresponding m-element vectors. Empirical application of this distribution can be computationally intensive because the entire support of the distribution enters the normalizing factor. We see that this problem can be mitigated under certain types of truncation of the distribution. When each yj is truncated such that yh = 1 and yj≠h = 0, the support of the multivariate discrete normal for multinomial data with J alternatives is an identity matrix of dimension J. As a consequence, the application of the multinomial discrete normal requires relatively few computations to obtain a normalizing factor. In contrast, the multinomial probit model requires integration of order J-1 (Hausman and Wise 1978). The multinomial discrete normal has several attractive properties. First, each conditioning variable can exhibit both positive and negative marginal cross effects, unlike the multinomial logit, which restricts all cross effects of a conditioning variable to have the same sign. Second, the multivariate binary discrete normal coincides with the MNL when Σ is an identity matrix (see below). Third, while the multinomial probit can accommodate both positive and negative correlations, it is typically more computationally intensive than the multinomial discrete normal distribution. The multinomial discrete normal joint probability mass function is 6 ESSAYS IN HONOR OF STANLEY R. JOHNSON exp(−.5(y − µ) ' Σ + (y − µ)) . + ∑ ..... ∑ ∑ exp( −.5(Y − µ) ' Σ (Y − µ)) P(Y1 = y1 , Y2 = y 2 ,..., YJ = y J ) = YJ =1 Yj≠ J = 0 Y2 =1 Y1 =1 Yj≠2 = 0 Yj≠1 = 0 This form, however, obscures the density’s relationship to the multinomial. This leads to Proposition 1. The likelihood for the nth observation may be written as J ∏ P(y nj = 1) j=1 y nj , such that P(y nj = 1) = exp(µ n '[Σ + ]. j − .5[Σ + ] jj ) J + + ∑ exp(µ n '[Σ ].h − .5[Σ ]hh ) , h =1 where [Σ+]jj denotes the jth diagonal element of Σ+, and [Σ+]j denotes the jth column of Σ+. Proposition 2. When Σ = I the multinomial discrete normal and multinomial logit models coincide. Proof in Appendix. Identification in the Discrete Normal Multinomial Model Shonkwiler and Nalle (2003) discuss the conditions necessary for identifying the elements of Σ and µn. Typically the jth element of the vector µn will be parameterized as µnj = xnjβ so that both the elements of the β vector and the (potentially) J(J+1)/2 unique elements of Σ will be estimated. For the J>2 case, the identification requirement is that one diagonal element of Σ be normalized to unity and one off-diagonal element be normalized to zero (Shonkwiler and Nalle). We choose to require that Σ11 = 1 and Σ21 = 0. One reason for such a choice is that if the Cholesky decomposition is employed to form Σ = AA' (where A is a lower triangular matrix), then the identification requirement is satisfied by imposing A11 = 1 and A21 = 0. In terms of identifying the elements of β, we consider the restrictions that identify the full multinomial model. If there are K variables that change over observations and alternatives, then (J-1)K parameters can be identified along with J-1 constant terms. As J increases, the number of over-identifying restrictions increases as well. Consequently the model is distinct from the universal logit. TESTING FOR RANDOM UTILITY MAXIMIZATION 7 Violation of the Daly-Zachary–McFadden Conditions Because the conditional logit is nested within the multinomial discrete normal model, we show those conditions of stochastic utility maximization that may not be satisfied in its general form. Consider the fundamental case of model equivalence in terms of utility differences. In the two-alternative case, Pn1 = exp(µ n1 − 0.5) −1 −2 exp(µ n1 − 0.5) + exp(µ n 2s 22 − 0.5s 22 ) , and only if s22 = 1 can the probability be equivalently written in terms of differences of the µh. Consequently the multinomial discrete normal distribution does not satisfy translation invariance. The symmetry (Hotelling) property in the multinomial discrete normal case is ∂µ ∂Pni ∂µ nj = Pni (Σij − Pn ' Σ. j ) nj = Pni (Σij − Pn ' Σ⋅ j )β p , ∂µ nj ∂p nj ∂p nj where Σ⋅ j denotes the jth column of Σ −1 and Pn denotes the J×1 vector of probabilities.1 In general, there is no symmetry of cross-price effects, nor are cross-price effects restricted to be positive as in the conditional logit model. Thus the non-negativity condition may not be satisfied as well. Generalizing and Testing the MNL Model E typically are chosen to better account for more flexible patterns of substitution among the J alternatives and to relax the independence from irrelevant alternatives property which the MNL model possesses. Such models comprise the nested logit, generalized nested logit, mixed multinomial logit, and the multinomial probit estimators. Generally these models can be made consistent with requirements of random utility maximization. But specification issues arise in the case of nested logit models due to the lack of theory to guide the choice of nesting structures – typically the selection of the nesting structure is somewhat arbitrary. Similarly, generalizations which may violate the conditions of RUM, such as the universal logit, dogit, and approximate generalized extreme value 1 XTENSIONS OF THE CONDITIONAL LOGIT MODEL More generally the symmetry condition relates to the systematic effects and requires that ∂Pni /∂µnj = ∂Pnj /∂µni – a property not maintained by the multinomial discrete normal. This condition is satisfied in the conditional logit when attributes of the other h ≠ i alternatives do not enter µi. This assumption is known as weak complementarity. 8 ESSAYS IN HONOR OF STANLEY R. JOHNSON (GEV) models (Small 1994), have forms that are subject to the analyst’s tastes as opposed to being dictated by theory or convention. In contrast, the mixed (or random) parameters logit model operates with the same specification of the systematic component of utility as the MNL and introduces flexibility by randomizing the parameters of the conditioning variables. Consequently the IIA property does not hold. According to McFadden and Train (2000), mixed multinomial logit can “closely approximate a very broad class of RUM models” (p. 451). Additionally, it can be considered as an error components type model and thus is attractive for modeling panel data (Train 2003). And unlike generalized extreme value RUMs, mixed multinomial logit models are nested relative to multinomial discrete normal models with identical random parameters. We will exploit this property in testing the random utility hypothesis implied by mixed logit models as well as the constant parameter logit model. In the empirical application, we entertain parameters of the form βkn = βk + Γk⋅ηn , where Γ is a lower triangular matrix of dimension K×K and η denotes a standard normal random error vector. If Γ is a diagonal matrix, then the random parameters are uncorrelated. For the mixed logit, the probability of the jth alternative for the nth individual is given by Pnj = ∫ (e µ ( β )nj ∑e µ ( β )nh )φ(β, ΓΓ ')dβ = ∫ L(β) nj φ(β, ΓΓ ')dβ (Train 2003) when the β’s are assumed normally distributed with mean vector β and variance-covariance ΓΓ'. The estimator is implemented by taking R draws and evaluating P̂nj =1/ R ∑ Rr=1 L(βr ) nj , j = 1,2,…,J, for each observation. The P̂nj ’s are used to form the maximum simulated likelihood. Estimation of the random parameter multinomial discrete normal is performed analogously. Testing Strategy Given that either constant or random parameter conditional logit models are nested relative to constant or random parameter multinomial discrete normal models, our tests of compatibility of the random utility restrictions embodied by the logit models rest on the hypotheses that Σ22 = Σ33 =…ΣJJ = 1 and Σij = 0 ∀, i ≠ j and i,j ≠ 1,2. This yields a total of ½ J(J+1)−2 restrictions. Under the assumption that the multinomial discrete normal distribution characterizes the data-generating process, then likelihood ratio tests may be used to discriminate between the two models. Under misspecification of the data-generating process, however, robust tests of the restrictions associated with the conditional logit model are obtained by using the Wald test given in Theorem 3.4 of White (1982). We will present both tests. TESTING FOR RANDOM UTILITY MAXIMIZATION 9 Empirical Application T HE DATA USED TO ESTIMATE THE MODEL were obtained from a subset of data collected using a mail survey of a sample of individuals who live in the Pacific Northwest. The larger data set was developed for use in examination of water reallocation policy issues (see Callaway et al. 1995), the most important being related to flushing salmon smolts down the Columbia River from spawning areas. The survey questionnaire focuses mainly on reservoirs on the Columbia, and we select four such reservoirs as destinations for the analysis: Lake Roosevelt (behind Grand Coulee Dam), Dworshak, Lower Granite, and Lake Pend Oreille. We use only the actual behavior data (as opposed to contingent behavior data, which were also collected) for the analysis below. Respondents reported their visits each summer month to each of the four reservoirs. For this analysis, 234 randomly sampled respondents reported a total of 1,396 trips to these reservoirs. Because we use travel costs to sites as a proxy for the price of a trip, one can interpret the model below as a recreation demand, or travel cost model (Shonkwiler 1999; Shonkwiler and Shaw 2003), and, as such, satisfaction of the requirements of optimizing behavior is necessary if conventional measures of welfare changes are to be valid. In addition to the travel cost variable, which mainly drives the empirical model, we also use the monthly average deviation of each water level away from its full pool level (e.g., a negative ten means ten feet below full pool). We recognize that our sample contains different types of recreators (e.g., anglers, water skiers, people picnicking, etc.), and so we have no a priori expectation on the direction of influence this variable should have on trips. One might hypothesize that shore-oriented users prefer more shore to be available, suggesting a negative coefficient on a deviation below full pool, and the reverse might be true for boaters who may prefer high water. Constant Parameter Estimation Results Under the assumption that the multiple trips made by respondents are independent of each other, a conventional multinomial model was estimated with the systematic component of utility modeled for the nth observation on the jth site as µnj = βppnj + βddevnj + β3d3j + β4d4j, where p represents travel cost, dev represents deviation from full pool, and the dij, dij = I(i=j), are constants representing the Lower Granite and Lake Pend Oreille sites, respectively. Although the model is not fully saturated, it is nearly so because dev has only 8 unique values (some reservoir levels did not change every month). Table 1 reports the parameter estimates and associated robust (White 1982) standard errors for the multinomial model under independence. Two multinomial discrete normal counterparts were also estimated under the independence assumption. Table 2 reports the results for a model in which Σ is restricted to be diagonal, and Table 3 provides estimates for the model when Σ is unconstrained (aside from those restrictions required for identification). In each table the average 10 ESSAYS IN HONOR OF STANLEY R. JOHNSON Table 1. Multinomial Logit. Log Likelihood: -740.30 Parameter Estimate Std. Error -0.06908 0.00282 βp 0.00849 0.00408 βd -1.12527 0.15292 β3 0.55066 0.08601 β4 Asymptotic t -24.49460 2.08096 -7.35832 6.40226 Average Price Effects Matrix ∂π1 ∂π2 ∂π3 ∂π4 ∂p1 -0.0062 0.0004 0.0006 0.0052 ∂p2 0.0004 -0.0040 0.0031 0.0005 ∂p3 0.0006 0.0031 -0.0042 0.0005 ∂p4 0.0052 0.0005 0.0005 -0.0061 price effects matrix is included to provide information regarding the sensitivity of E[Yi] = πi to changes in own and cross prices. In the independent multinomial case we can test the restrictions necessary for random utility maximization by considering the estimated elements of the Σ matrix. For the model reported in Table 2, a test of RUM is that Σjj = 1 (j=2,3,4).2 The robust Wald test (White 1982) and likelihood ratio tests of this restriction yield test statistics of 42.90 (p=.000) and 43.52 (p=.000), respectively. A joint test that Σjj = 1 (j=2,3,4) and Σij = 0, i≠j, determined from the model reported in Table 3, produces robust Wald and likelihood ratio statistics of 1179.8 (p=.000) and 125.18 (p=.000), respectively. Therefore, we conclude that under the hypothesis of independence with a constantparameters specification, the data-generating process is not consistent with the logit specification. Random Parameter Estimation Results Of course the conclusion above is not particularly surprising considering that the logit model is of a restrictive form and the panel nature of the data set has been ignored. Given that most of the individuals in the data set make repeated choices, allowing parameters that vary over individuals but not over choice situations for an individual is a defensible notion as long as the individual’s tastes are stable over the choice period. All the β’s then were assumed to be normally distributed, and a set of 2 In the diagonal Σ specification, note that ∂Pi ∂µ j = −Pi Pj / σ jj . TESTING FOR RANDOM UTILITY MAXIMIZATION 11 Table 2. Multinomial Discrete Normal with Diagonal Σ Matrix. Log Likelihood: −718.54 Parameter βp βd β3 β4 Σ22 Σ33 Σ44 Estimate -0.08799 0.01212 -2.14192 -0.54365 1.06465 1.21849 1.31493 Std. Error 0.00488 0.00547 0.36279 0.24551 0.05921 0.05240 0.05265 Asymptotic t -18.04770 2.21698 -5.90410 -2.21436 17.97966 23.25269 24.97494 Average Price Effects Matrix ∂π1 ∂π2 ∂π3 ∂π4 ∂p1 -0.0076 0.0003 0.0008 0.0064 ∂p2 0.0003 -0.0046 0.0038 0.0005 ∂p3 0.0006 0.0033 -0.0045 0.0006 ∂p4 0.0049 0.0004 0.0005 -0.0058 1,500 standard normal draws for each of the 234 individuals was created. Specifically, the kth parameter for the nth individual making a choice in the tth period is βknt = βk + Γk⋅ηn, where, as before, Γ is a lower triangular matrix and η denotes a standard normal random error vector. When the individual makes multiple choices, the probabilities for each choice occasion are calculated from a draw of β. Then the products of these probabilities over choice occasions are averaged (Train 2003, p. 150). The same set of draws is used in estimating each of the random parameter models. The maximum simulated likelihood estimator (Train, p. 148) was programmed in GAUSS. We first consider the case where the random coefficients are assumed uncorrelated (Γ is diagonal). The estimation results for the mixed logit model are presented in Table 4. Note the substantial increase in log likelihood due to introducing the panel structure via the random parameters. The variance-covariance matrix of the random parameters is obtained from the expression ΓΓ'. Therefore the (absolute values of) estimated Γ values may be interpreted as standard deviations since Γ is diagonal in this case. Note that the coefficient mean on the deviation from full pool variable is not significantly different from zero at conventional levels (p=.121) based on the robust statistic; however, its associated standard deviation is significantly greater than zero (p=.025), so this variable should remain in the model. We contrast the results in Table 4 to the random parameter multinomial discrete normal model reported in Table 5. Because of its identical random parameters 12 ESSAYS IN HONOR OF STANLEY R. JOHNSON Table 3. Multinomial Discrete Normal Estimator. Log Likelihood: -677.71 Parameter Estimate Std. Error Asymptotic t βp -0.03436 0.03000 -1.14525 βd 0.00372 0.00447 0.83251 β3 -1.21937 0.50556 -2.41191 β4 -0.10902 0.57583 -0.18932 Σ22 1.11504 0.18152 6.14266 Σ33 1.78413 0.38381 4.64850 Σ44 1.16697 0.25340 4.60529 Σ13 0.87635 0.52767 1.66080 Σ14 0.42366 0.18327 2.31173 Σ23 -0.08902 0.13482 -0.66030 Σ24 0.73195 0.21993 3.32816 Σ34 0.13016 0.09937 1.30983 Average Price Effects Matrix ∂π1 ∂π2 ∂π3 ∂π4 ∂p1 -0.0081 -0.0030 0.0041 0.0070 ∂p2 -0.0037 -0.0040 0.0030 0.0048 ∂p3 0.0035 0.0019 -0.0031 -0.0024 ∂p4 0.0064 0.0035 -0.0026 -0.0073 specification, the mixed logit is nested relative to it. The test that Σ = I yields robust Wald and likelihood ratio statistics of 172.34 (p=.000) and 72.60 (p=.000), respectively. Therefore it appears that the assumption of RUM implied by the mixed logit model is inconsistent with the data-generating mechanism. A more flexible mixed logit specification still exists, however – we can allow the random parameters to be correlated. This introduces six more parameters. Table 6 provides the estimation results for the correlated parameter mixed logit model. Since the parameter variancecovariance matrix is not estimated directly, the implied parameter correlation matrix is also presented. The robust Wald and likelihood ratio tests of the hypothesis that Γij = 0, i≠j, yields test statistics of 78.64 (p=.000) and 24.26 (p=.000), respectively, suggesting that the introduction of non-zero covariances among random parameters produces a better model. This model is nested relative to the random parameter TESTING FOR RANDOM UTILITY MAXIMIZATION 13 Table 4. Mixed Multinomial Logit. Log Likelihood: -546.55 Parameter Estimate Std. Error -0.13156 0.01587 βp 0.01390 0.00897 βd -2.04596 0.77970 β3 0.62461 0.36965 β4 0.06652 0.01480 Γ11 0.02902 0.01476 Γ22 2.96007 0.77072 Γ33 2.51068 0.47783 Γ44 Asymptotic t -8.28729 1.54987 -2.62405 1.68974 4.49386 1.96581 3.84066 5.25435 Average Price Effects Matrix ∂π1 ∂π2 ∂π3 ∂π4 ∂p1 -0.0139 0.0012 0.0020 0.0107 ∂p2 0.0012 -0.0083 0.0055 0.0016 ∂p3 0.0020 0.0055 -0.0101 0.0026 ∂p4 0.0107 0.0016 0.0026 -0.0148 discrete normal model, whose results are shown in Table 7. A test that Σ = I yields robust Wald and likelihood ratio test statistics of 163.87 (p=.000) and 60.60 (p=.000), respectively. Again, we conclude that the random utility maximization restrictions embedded in the correlated mixed logit model are not consistent with the data-generating mechanism. Note that the multinomial discrete normal model is always more highly parameterized than the corresponding logit model, and under a poorly specified conditional mean function one could argue that this over-parameterization works in favor of rejecting RUM. But comparison of the correlated mixed logit with the uncorrelated random parameter multinomial discrete normal makes the degrees-of-freedom critique less of an issue. Here the latter model has only two more parameters than the former. Since the models are non-nested, we use Vuong’s (1989) test to discriminate between the two. The z-score for testing the superiority of the random parameter multinomial discrete normal is 5.00 (p=.000), providing strong evidence for its selection.3 3 We could also consider whether the uncorrelated random parameter discrete normal model is to be preferred to its counterpart with correlated random parameters. The robust Wald and likelihood ratio tests of the hypothesis that Γij = 0, i≠j, yield test statistics of 26.34 (p=.000) and 12.26 (p=.056), respectively. Because of the conflicting criteria and the fact that the likelihood ratio test is not robust to distributional misspecification, a robust Lagrange multiplier test of this hypothesis was formulated as a 14 ESSAYS IN HONOR OF STANLEY R. JOHNSON Table 5. Random Parameter Multinomial Discrete Normal. Log Likelihood: −510.25 Parameter Estimate Std. Error Asymptotic t -0.09999 0.01884 -5.30628 βp 0.02570 0.02113 1.21633 βd -2.28663 1.42798 -1.60130 β3 -2.31452 0.90489 -2.55780 β4 0.01075 0.01834 0.58609 Γ11 0.05859 0.03349 1.74937 Γ22 3.65608 0.88749 4.11959 Γ33 1.39979 0.34476 4.06014 Γ44 1.70904 0.36332 4.70395 Σ22 1.65143 0.29767 5.54780 Σ33 1.41251 0.10368 13.62381 Σ44 0.19257 0.32860 0.58603 Σ13 0.77832 0.07346 10.59450 Σ14 -0.04919 0.20416 -0.24093 Σ23 0.32941 0.22376 1.47215 Σ24 0.26567 0.20988 1.26585 Σ34 Average Price Effects Matrix ∂π1 ∂π2 ∂π3 ∂π4 ∂p1 -0.0173 -0.0016 0.0019 0.0171 ∂p2 -0.0022 -0.0043 0.0025 0.0039 ∂p3 0.0007 0.0026 -0.0052 0.0019 ∂p4 0.0133 0.0023 0.0008 -0.0163 score test. It produced a test statistic of 9.082 (p=.169). Given that there were no compelling reasons for including random parameter covariances, we would likely choose the more parsimonious model. TESTING FOR RANDOM UTILITY MAXIMIZATION 15 Table 6. Mixed Multinomial Logit with Correlated Parameters. Log Likelihood: −534.42 Parameter Estimate Std. Error Asymptotic t -0.13882 0.02043 -6.79645 βp 0.02379 0.00892 2.66735 βd -2.93706 0.95972 -3.06033 β3 0.58871 0.39381 1.49489 β4 0.06854 0.01452 4.72106 Γ11 0.02358 0.00764 3.08538 Γ22 2.83812 0.52919 5.36317 Γ33 -0.15355 0.55384 -0.27724 Γ44 -0.03049 0.00884 -3.44772 Γ21 1.70795 0.85570 1.99596 Γ31 -0.59317 0.39354 -1.50726 Γ32 0.91033 0.52099 1.74731 Γ41 -2.39703 0.34666 -6.91461 Γ42 -0.58727 0.67463 -0.87050 Γ43 Implied Parameter Correlation Matrix βp βd β3 β4 βp 1.000 βd -0.791 1.000 β3 0.508 -0.509 1.000 β4 0.345 -0.830 0.148 1.000 ∂p2 0.0008 -0.0084 0.0061 0.0014 ∂p3 0.0022 0.0061 -0.0110 0.0027 ∂p4 0.0111 0.0014 0.0027 -0.0152 Average Price Effects Matrix ∂π1 ∂π2 ∂π3 ∂π4 ∂p1 -0.0141 0.0008 0.0022 0.0111 16 ESSAYS IN HONOR OF STANLEY R. JOHNSON Table 7. Multinomial Discrete Normal with Correlated Random Parameters. Log Likelihood: −504.12 Parameter Estimate Std. Error Asymp. t -0.11997 0.02319 -5.17254 βp 0.03260 0.01835 1.77675 βd -4.18226 2.13511 -1.95880 β3 -3.55298 1.00296 -3.54249 β4 0.04021 0.01012 3.97235 Γ11 0.02543 0.04257 0.59749 Γ22 2.20081 4.21866 0.52168 Γ33 0.85609 0.24511 3.49270 Γ44 -0.03372 0.02224 -1.51592 Γ21 1.12011 0.88323 1.26819 Γ31 3.17119 2.25504 1.40627 Γ32 2.01848 0.50048 4.03306 Γ41 -0.00634 0.97266 -0.00652 Γ42 -0.52692 0.49827 -1.05749 Γ43 1.46969 0.54769 2.68342 Σ22 1.81879 0.40344 4.50822 Σ33 1.38213 0.12569 10.99609 Σ44 0.11408 0.29094 0.39211 Σ13 0.74669 0.1207 6.18655 Σ14 -0.09926 0.28881 -0.34369 Σ23 0.23594 0.18487 1.27624 Σ24 0.30563 0.25176 1.21399 Σ34 Implied Parameter Correlation Matrix βp βd β3 β4 βp 1.000 βd -0.798 1.000 β3 0.279 0.253 1.000 β4 0.895 -0.716 0.119 1.000 TESTING FOR RANDOM UTILITY MAXIMIZATION 17 Average Price Effects Matrix ∂π1 ∂π2 ∂π3 ∂π4 ∂p1 -0.0198 -0.0005 0.0013 0.0189 ∂p2 -0.0020 -0.0050 0.0033 0.0037 ∂p3 -0.0005 0.0026 -0.0052 0.0031 ∂p4 0.0154 0.0012 0.0013 -0.0179 Discussion It is tempting to speculate about the features of this data set that lead to the demonstrated superiority of the multinomial discrete normal estimator. As mentioned, previous studies using data from the same survey found that own price explained most of the variation in visitation patterns. We can analyze own price effects by examining the diagonal elements of the Average Price Effects Matrix provided for each estimated model. Upon inspection, there appear to be no uniform differences between the magnitudes of own price effects across comparable models. On the other hand, note that for all multinomial discrete models estimated (excepting, of course, the one with diagonal Σ), there are at least two off-diagonal elements of the Average Price Effects Matrix that are negative. Considering that every column of the Average Price Effects Matrix is constrained to sum to zero and that the diagonal elements are logically negative, this seems to be an important finding. The multinomial discrete normal models apparently capture a complementarity relationship between certain alternatives – a relationship that is precluded by models that must satisfy the conditions of random utility maximization. Summary R ANDOM UTILITY MAXIMIZATION IS A COMMONLY maintained hypothesis in multi-response qualitative response models. Most practitioners would likely claim that it is a benign restriction that can be satisfied by proper (or flexible) specification of the qualitative response model. Such confidence is manifested by the work of McFadden and Train (2000), who claim that the mixed multinomial logit model can closely approximate any random utility model and thus, by extension, can represent behavioral choice in general. But random utility maximization is in fact a very stringent requirement that is not satisfied when attributes of other alternatives affect the utility of one alternative or when marginal cross effects are not identically signed and symmetric. Certainly more complicated GEV models might exist that may satisfy the DalyZachary–McFadden properties for additive-in-income random utility maximization. These models would not likely be nested relative to a multinomial discrete normal 18 ESSAYS IN HONOR OF STANLEY R. JOHNSON estimator, so non-nested hypothesis type tests would be needed in order to discriminate between them. Current practice has, however, gravitated away from generalized extreme value models in favor of mixed logit models largely because of the specification and estimation difficulties GEV models present. This study has introduced a new statistical distribution for estimating multiresponse qualitative choice models. The multinomial discrete normal distribution appears to be an ideal device for testing MNL models with either constant or random coefficients since these latter models are nested in the former. Unlike previous models that have been promulgated to verify RUM, the multinomial discrete normal estimator has a well-defined form. Estimation is straightforward and it permits the systematic introduction of attributes of alternatives. Based on the empirical example, the multinomial discrete normal estimator provides strong evidence against random utility maximization and produces substantially better fitting models than its MNL counterparts. References Borsch-Supan, A. “On the Compatibility of Nested Logit Models with Utility Maximization.” Journal of Econometrics 43(1990): 373-388. Callaway, J.M., S. Ragland, S. Keefe, T.A. Cameron, and W.D. Shaw. “Columbia River Systems Operation Review of Recreation Impacts: Demand Model and Simulation Results.” Report prepared for the U.S. Army Corps of Engineers by RGC/Hagler Bailly, Boulder, CO, 1995. Daly, A.J., and S. Zachary. “Improved Multiple Choice Models.” In D. Hensher and Q. Dalvi, eds. Determinants of Travel Choice. Farnborough, UK: Saxon House, 1978. Hanemann, W.M. “Welfare Analysis with Discrete Choice Models.” In J. Herriges and C. Kling, eds., Valuing Recreation and the Environment. Northampton, MA: Edward Elgar, 1999. Hausman, J.A., and D.A. Wise. “A Conditional Probit Model for Qualitative Choice: Discrete Decisions Recognizing Interdependence and Heterogeneous Preferences.” Econometrica 46(1978): 403-426. Herriges, J.A., and C.L. Kling. “Testing the Consistency of Nested Logit Models with Utility Maximization.” Economics Letters 50(1996): 33-39. Johnson, N.L., S. Kotz, and N. Balakrishnan. Discrete Multivariate Distributions. New York: John Wiley, 1997. Kemp, A. “Characterizations of a Discrete Normal Distribution.” Journal of Statistical Planning and Inference 63(1997): 223-229. Koning, R.H., and G. Ridder. “Discrete Choice and Stochastic Utility Maximization.” Econometrics Journal 6(2003): 1-27. Lindsey, J.K. Parametric Statistical Inference. New York: Oxford University Press, 1996. McFadden, D. “Econometric Models of Probabilistic Choice.” In C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data with Econometric Applications. Cambridge, MA: MIT Press, 1981. McFadden, D. “Econometric Analysis of Qualitative Response Models.” In Z. Griliches and M. Intriligator, eds., Handbook of Econometrics (Vol. II). Amsterdam: North Holland, 1984. McFadden, D. “Computing Willingness-to-Pay in Random Utility Models.” In J.R. Melvin, J.C. Moore, and R. Riezman, eds., Trade, Theory and Econometrics. New York: Routledge, 1999. TESTING FOR RANDOM UTILITY MAXIMIZATION 19 McFadden, D., and K. Train. “Mixed MNL Models for Discrete Response.” Journal of Applied Econometrics 15(2000): 447-470. Shonkwiler, J.S. “Recreation Demand Systems for Multiple Site Count Data Travel Cost Models.” In J. Herriges and C. Kling, eds., Valuing Recreation and the Environment. Northampton, MA: Edward Elgar, 1999. Shonkwiler, J.S. “New Estimators for the Analysis of Univariate and Multivariate Discrete Data.” Working paper, Applied Economics and Statistics Department, University of Nevada, 2001. Shonkwiler, J.S., and D.J. Nalle. “An Empirical Test of the Random Utility Hypothesis.” Working paper, Applied Economics and Statistics Department, University of Nevada, 2003. Shonkwiler, J.S., and W.D. Shaw. “A Finite Mixture Approach to Analyzing Income Effects in Random Utility Models: Reservoir Recreation along the Columbia River.” In N. Hanley, W.D. Shaw, and R.E. Wright, eds., The New Economics of Outdoor Recreation. Northampton, MA: Edward Elgar, 2003. Small, K.A. “Approximate Generalized Extreme Value Models of Discrete Choice.” Journal of Econometrics 62(1994): 351-382. Szablowski, P.J. “Discrete Normal Distribution and Its Relationship with Jacobi Theta Functions.” Statistics and Probability Letters 52(2001): 289-299. Train, K.E. Discrete Choice Methods with Simulation. Cambridge, UK: Cambridge University Press, 2003. Vuong, Q.H. “Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses.” Econometrica 57(1989): 307-333. White, H. “Maximum Likelihood Estimation of Misspecified Models.” Econometrica 50(1982): 1-25. 20 ESSAYS IN HONOR OF STANLEY R. JOHNSON Appendix Proof of Proposition 1. We show the result for the case of J = 2; the case of J>2 follows by induction. P(Y1 = y1 , Y2 = y 2 ) = exp(−.5(y − µ)' Σ + (y − µ)) + ∑ ∑ exp(−.5(Y − µ) ' Σ (Y − µ)) Y2 =1 Y1 =1 Y1 = 0 Y2 = 0 If y1 = 1, P(y1=1, y2=0) = exp(−[(1 − µ1 ) 2 Σ11 − 2µ 2 (1 − µ1 )Σ12 + µ 22 Σ 22 ] / 2) exp(−[(1 − µ1 ) 2 Σ11 − 2µ 2 (1 − µ1 )Σ12 + µ 22 Σ 22 ] / 2) + exp(−[µ12 Σ11 − 2µ1 (1 − µ 2 )Σ12 + (1 − µ 2 )2 Σ 22 ] / 2) . If y2 = 1, P(y1=0, y2=1) = exp(−[µ12 Σ11 − 2µ1 (1 − µ 2 )Σ12 + (1 − µ 2 ) 2 Σ 22 ] / 2) exp(−[(1 − µ1 ) 2 Σ11 − 2µ 2 (1 − µ1 )Σ12 + µ 22 Σ 22 ] / 2) + exp(−[µ12 Σ11 − 2µ1 (1 − µ 2 )Σ12 + (1 − µ 2 )2 Σ 22 ] / 2) . Note that terms involving µ12 Σ11 , µ1µ 2 Σ12 , and µ 22 Σ 22 cancel from both numerator and denominator, yielding P(y1 = 1, y 2 = 0) = exp((µ1 − 0.5)Σ11 + µ 2 Σ12 ) exp((µ1 − 0.5)Σ + µ 2 Σ12 ) + exp(µ1Σ12 + (µ 2 − 0.5)Σ 22 ) 11 , etc. Proof of Proposition 2. Trivial given Proposition 1 and upon recognizing that all Σij = 0∀i ≠ j and Σjj = 1 and multiplying the numerator and denominator by exp(0.5).
© Copyright 2025 Paperzz