Semiparametric Estimation of the Random Utility Model with Rank-Ordered Choice Data ∗ † Jin Yan Hong Il Yoo ‡ September, 2015 Abstract We propose semiparametric methods for estimating the random utility model exploiting rank-ordered choice data. The term semiparametric refers to the fact that the preference parameters of interest are nite dimensional but the error term in the random utility function has unspecied distribution. We allow for a exible form of heteroskedasticity across individuals. The case that random coecients can be allowed is also discussed. We show the strong consistency of the proposed generalized maximum score (GMS) estimator. The asymptotic distribution of the GMS estimator is nonstandard. For inference purpose, we propose the smoothed GMS (SGMS) estimator. The SGMS estimator is strongly consistent and asymptotically normal, making inference straightforward. Monte Carlo experiments provide the evidence that the proposed estimators outperform the rank-ordered logit model in the presence of heteroskedasticity. An illustrative empirical application using a bank account choice data set suggests that the proposed estimators are capable of nding plausible and robust solutions. ∗ We thank Xu Cheng, Liran Einav, Bruce Hansen, Han Hong, Arthur Lewbel, Taisuke Otsu, Joris Pinkse, Jack Porter and seminar participants at the 2015 Tsinghua Econometric Conference and Academia Sinica for valuable comments and discussions. We acknowledge funding support provided by the Hong Kong Research Grants Council General Research fund 2014/2015 (Project No.14413214) and six anonymous referee reports for the project proposal. All errors are ours. † Department of Economics, The Chinese University of Hong Kong. Email: [email protected]. ‡ Durham University Business School, Durham University. Email: [email protected]. 1 Keywords: Rank-ordered; Random utility; Semiparametric estimation; Smoothing; Laplace; Markov Chain Monte Carlo JEL Classication: C14, C35 1 Introduction This paper develops semiparametric methods for the estimation of random utility models for rank-ordered choice data. The random utility function of interest has a typical structure, comprising a deterministic component or utility index that depends on nite-dimensional preference parameters, and an additive stochastic component or error term. The methods are semiparametric in that they allow one to estimate the preference parameters, by combining mild nonparametric restrictions on the stochastic component with the usual parametric specication of the deterministic component. Rank-ordered choice data are available in similar empirical contexts as multinomial choice data, wherein the economic agent faces a nite set of mutually exclusive alternatives. Multinomial choice data reveal the agent's choice or most preferred alternative from the set. Rank-ordered choice data reveal further about the agent's preference ordering over the set, such as her second and third preferences, thereby revealing what the agent's counterfactual choices would have been if her most preferred were not available. Having a rank-ordered choice observation on an agent thus oers a similar advantage as having several repeated multinomial choice observations on that agent, in that it allows one to reduce sampling variations in the estimated preference parameters. The parametric methods for rank-ordered choices are as well established as the parametric methods for multinomial choices. Every well-known parametric multinomial choice model has its rank-ordered choice counterpart, both of which permit maximum likelihood estimation. One can derive the likelihood of an observed preference ordering by assuming a parametric form of the stochastic component's distribution. Beggs et al. (1981) follow this approach to specify the rank-ordered logit (ROL) model that postulates the same distribution as McFadden's (1974) multinomial logit (MNL) model. Assuming a exible distribution that allows for correlated errors over alternatives could make deriving the rank-ordered choice likelihood algebraically dicult. In such cases, as McFadden (1986) suggests on the basis of Falmagne (1978) and Barberá and Pattanaik (1986), one can exploit logical links between a rank-ordered choice likelihood and 2 1 several multinomial choice likelihoods to express the former as a combination of the latter. Using this approach, Layton and Levine (2003) specify the rank-ordered probit model and Dagsvik and Liu (2009) specify the nested rank-ordered logit model. These two models postulate the same stochastic distribution as the multinomial probit model and the nested logit model, respectively. The semiparametric methods for rank-ordered choices are much less developed than the semiparametric methods for multinomial choices. To our best knowledge, the study of Hausman and Ruud (1987) is both the earliest and only precedent of developing a semiparametric estimator for rank-ordered choice data. In the multinomial choice context, the prior works of Ruud (1983, 1986) have established conditions under which a weighted M-estimator (WME) can be applied to the MNL model to estimate the ratios of slope coecients consistently despite stochastic misspecication. Hausman and Ruud construct an extension of this WME to the rank-ordered choice context and the ROL model. Much as Ruud's multinomial choice WME, however, their rank-ordered choice WME is subject to two limitations that aects its applicability in many empirical situations: its consistency is limited to coecients on continuous independent variables, and its asymptotic distribution is unknown outside a special case where the population moments of the independent variables is known. A random utility model for rank-ordered choices can be represented as a multivariate system of latent dependent variables, much as that for multinomial choices involving three or more alternatives. Most of the existing semiparametric estimators for multinomial choices focus on the special case of binomial choices between two alternatives (Manski, 1975; Han, 1987; Horowitz, 1992; Klein and Spady, 1993; Lewbel, 2000) that allows one to work with a univariate latent dependent variable instead. As Thompson (1993) point out, however, the results for this special case do not easily carry over to more general cases involving three or more alternatives. Lee (1995) and Fox (2007) count among a few that develop semiparametric multinomial choice estimators for these general cases. This paper proposes two types of semiparametric estimators for rank-ordered choices. We call them the generalized maximum score (GMS) estimator and the smoothed generalized maximum score estimator (SGMS) respectively, and prove the main asymptotic properties of each. Roughly put, each estimator allows the consistent estimation of suitably normalized coecients, be they on continuous or discrete independent 1 As a simple example, consider a random preference ordering B A C over three alternatives A, B, C . Note that Pr(A C) = [Pr(A C B) + Pr(A B C)] + Pr(B A C) and the two terms in [.] add up to the trinomial choice probability of A. The functional form of Pr(B A C) is thus known whenver the functional forms of the binomial choice probability Pr(A C) and the trinomial choice probability Pr(A B and A C) are known. 3 variables, so long as there is one continuous independent variable. In addition, the SGMS estimator can be shown as asymptotically normal, though it relies on somewhat stronger assumptions than the GMS estimator. We also provide preliminary estimation results from Monte Carlo simulations and an illustrative empirical application involving bank account products. These preliminary results suggest that our estimators have promising nite sample properties, and can be very useful additions to the empirical practitioner's toolbox. The GMS estimator generalizes Manski's (1975) MS estimator for binomial choices, and more directly, Fox's (2007) pairwise MS estimator for multinomial choices. Like those two estimators, the GMS estimator is motivated by the identifying assumption that when considering a pair of alternatives, the agent's more likely choice is one that yields the higher derministic utility. A multinomial choice observation involving alternatives allows one to learn the implied outcomes of J J −1 pairwise comparisons, where each pair comprises the agent's actual choice and one other alternative. A rank-ordered choice observation allows one to learn the implied outcomes of additional pairwise comparisons. GMS intuitively extends Fox's MS by incoporating that additional information. We show that this intuitive extension allows the GMS estimator to inherit the attractive interpersonal heterogeneity features of Fox's MS estimator: it can accommodate arbitrary forms of heteroskedasticity across agents, and also dierent error distributions across agents. The former feature makes the estimator particularly appealing since the empirical evidence from parametric studies (Hensher et al., 1999; Fiebig et al., 2010) suggests that heteroskedasticity is the most important aspect of interpersonal heterogeneity in choice behavior, and yet it is often, if not always, unclear how heteroskedasticity is to be modeled. The SGMS estimator complements the GMS estimator by addressing the latter's two practical limitations, in return for making some additional assumptions. Specically, the GMS estimator is N N 1/3 -consistent where is the number of agents, thereby exhibiting a slow rate of convergence, and also follows a non-standard asymptotic distribution of Kim and Pollard (1990). These limitations are shared by the MS estimators of Manski (1975) and Fox (2007), and arise from that the MS criterion function involves step functions. To address these issues in the context of Manski's binomial choice estimator, Horowitz (1992) develops a smoothed maximum score (SMS) estimator that replaces the step functions with smooth kernels. Yan (2013) applies the same technique to derive a smoothed version of Fox's multinomial choice estimator. Our SGMS estimator is also based on the same approach, and oers similar benets as its precedents: we show that 4 the SGMS estimator's convergence rate can be made arbitrarily close to the usual N −1/2 , and also that its asymptotic distribution is normal. A further novel aspect of our results is that the benet of using rank-ordered choice data over multinomial choice data in the semiparametric framework is not limited to eciency gains, whereas it is in the parametric framework. In the special case of fully rank-ordered choice data that have all alternatives in a choice set ranked from most to least preferred, one can replace the exchangeability assumption of Fox (2007, p.1006) with a much weaker zero conditional median assumption, without aecting the strong consistency of the GMS estimator. In this case, the rank-ordered choice estimator can be consistent even when the multinomial choice estimator is not, since the former is robust to a wider class of stochastic distributions. Most importantly, this expanded class includes many popular parametric models that the multinomial choice estimator rules out, such as the nested logit model, the probit model with an unrestricted covariance matrix, and the mixed logit model. Together with that the relevant special case commonly holds in empirical applications (Calfee et al., 2001; Capparos et al., 2008; Scarpa et al., 2011; Yoo and Doiron, 2013), this makes the use of semiparametric methods more attractive in the rank-ordered choice data environment than in the multinomial choice data environment. The remainder of this paper is organized as follows. Section 2 develops the GMS estimator and discusses its asymptotic properties. Section 3 develops the SGMS estimator. Section 4 presents our preliminary Monte Carlo evidence on the nite sample properties of those two estimators. Section 5 presents an illustrative empirical application using bank account choice data. Section 6 concludes. 2 The Model and the Estimation 2.1 A Random Utility Framework and Rank-Ordered Choice Data Consider a standard random utility model. collection of alternatives. Let J = {1, . . . , J} number of alternatives contained in J. Each individual in the population of interest faces a nite J ≥2 be the from choosing alternative j , unj , denote the common choice set of alternatives and The utility obtained by individual 5 n is assumed as follows: where unj = x0nj β + εnj ∀ j ∈ J, xnj q -vector is an observed (1) is the preference parameter vector of interest, and index, x0nj β , n containing the characteristics of individual εnj and alternative is the unobserved component of utility. The utility is often referred to as systematic (or deterministic) utility. Based on the utilities associated with all the alternatives, each individual reveals her best J − 1) alternatives and ranks those M = J − 1, M alternatives. When M = 1, one-to-one onto the integers any two alternatives the set of the best where M (1 ≤ M ≤ we have multinomial choice data; when we have a complete ranking of all the alternatives. In case any utility ties occur, let be the set of alternatives with the same utility as alternative rnj = j , β ∈ Rq {0, . . . , |T(n, j)| − 1}, where |T| M alternatives of individual n and rnj maps the elements of is the number of alternatives in the set k, l ∈ T(n, j), A(k, T(n, j)) < A(l, T(n, j)) if and only if k < l.2 Let denote the ranking of alternative L(n, j) + 1 + A(j, T(j)) if j ∈ Mn M +1 if j ∈ J \ Mn , L(n, j) j . A(·, T(n, j)) Mn ⊂ J j, T(n, j) T(n, j) T. For denote that is, (2) denotes the number of alternatives with strictly larger utility than alternative j for individual n. Let the vector r n ≡ (rn1 , . . . , rnJ ) ∈ NJ example, if individual Denote n denote the ranking of all alternatives in the choice set ranks four alternatives as, X n ≡ (xn1 , . . . , xnJ )0 ∈ RJ×q , and un3 > un4 > max{un1 , un2 }, εn ≡ (εn1 , . . . , εnJ )0 ∈ RJ . then J. For r n ≡ (3, 3, 1, 2). Although utilities are not observable, for each individual we can observe the ranking and the explanatory vectors for all the alternatives, (r n , X n ). Next, we make the sampling assumption. Assumption 1. {rn , X n : n = 1, . . . , N } is a random sample of (r, X), where r ≡ (r1 , . . . , rJ )0 ∈ NJ , X ≡ (x1 , . . . , xJ )0 ∈ RJ×q , 2 The and rj is the utility ranking of alternative j dened in (2) . function A has the eect of breaking utility ties and is introduced purely for technical convenience. 6 (r n , X n ) Assumption 1 states that sometimes drop subscript n are independently identically distributed (i.i.d.). For this reason, we for simplicity in the following analysis. 2.2 Relation to Literature Identication of ranking r β is subject to the restriction on the conditional probability of observing an arbitrary given the explanatory vectors X. Parametric methods assume particular functional forms for this conditional probability of observing any rankings, among which the most popular models are the rank-ordered logit (ROL) model and the rank-ordered probit (ROP) model. Both models assume that the error terms in are independent of explanatory vectors i.e., the conditional distribution of ε X, ε ruling out the possibility of heteroskedasticity aross individuals, varies across individuals. The ROP model assumes that the error terms ε are jointly normal and allows correlation in error terms across alternatives. The computation cost of the ROP model increases rapidly as the number of alternatives rises due to calculation of multivariate integrals and estimation of covariance matrix of the error terms. For example, an individual ranks four alternatives as r = (3, 3, 1, 2) given X = (x1 , . . . , x4 )0 . The ROP probability is PROP (r | X) = P (u3 > u4 > max{u1 , u2 }) = P (u3 > u4 > u1 > u2 ) + P (u3 > u4 > u2 > u1 ) (3) = ´∞ −∞ + where dε2 ´∞ −∞ ´∞ x02 β−x01 β+ε2 dε1 dε1 ´∞ x01 β−x02 β+ε1 ´∞ x01 β−x04 β+ε1 dε2 φ(ε) is a multivariate normal density function. dε4 ´∞ x02 β−x04 β+ε2 ´∞ x04 β−x03 β+ε4 dε4 φ(ε)dε3 ´∞ x04 β−x03 β+ε4 φ(ε)dε3 , The complexity of computing PROP (r | X) is especially high when the number of alternatives is large and alternatives are partially ranked. The ROL model, rst introduced by Beggs, Cardell and Hausman (1981), assumes the elements of ε are i.i.d. extreme value type I distributed. The ROL model does not allow for any form of heteroskedasticity 7 nor any correlation across alternatives, but it has the ease of computation. The conditional probability of observing the ranking PROL (r | X) r = (3, 3, 1, 2) given X = (x1 , . . . , x4 )0 = P (u3 > u4 > max{u1 , u2 }) = P (r3 = 1| X) · P (r4 = 2|r3 = 1, X) = 0 ex3 β 0 0 x01 β x02 β e +e +ex3 β +ex4 β · has a closed-form solution: (4) 0 ex4 β 0 x01 β x02 β e +e +ex4 β eliminating the necessity of calculating multivariate integrals. , The MLE objective function is concave so optimization becomes standard. The low computation cost explains the popularity of the ROL model over the ROP model in empirical work, especially when the choice set is large. However, the closed-form expression (4) arises from the independence of irrelevant alternative (IIA) assumption, which is a very strong hypothesis as many authors point out. 3 When the IIA assumption is violated, both the probability that alternative 3 is the best alternative (P (r3 alternative (P (r4 = 1| X)) and the conditional probability that alternative 4 is the second best = 2|r3 = 1, X)) are misspecied. best alternative (P (r3 = 1| X)) By contrast, only the probability that alternative 3 is the is misspecied in the MNL model when the IIA assumption is problematic. The ROL model that gains more eciency than the MNL model by incorporating information on rankings is at the cost of suering from more misspecication than its MNL counterpart. Yan and Yoo (2014) provide analytical examples and Monte Carlo evidence to illustrate the inconsistency of the ROL estimator under misspecication. The preference parameters are not estimated consistently by the parametric models under misspecication. However, since the scale of utilities is not identied, we cannot identify the scale of the preference parameters either. 4 The considerable information in utility functions is the ranking of alternatives by prefer- ence and the ratio of preference parameters, also known as the marginal rates of substitution, that underlie 5 It is the ratio of preference parameters that is the focus of our structural estimation.6 Many such rankings. 3 See Hausman and McFadden (1984). 4 The ranking still holds if β and ε are multiplied by any positive constant. 5 The marginal rates of substitution are sometimes called equivalent or implicit prices when they are calculated as ratios with the coecient that has a monetary interpretation. 6 Some research focuses on forecasting the choice probabilities instead of the structural preference parameters. In that case, 8 semiparametric estimators have been proposed to estimate the ratio of preference parameters using multinomial choice data. Rank-ordered data allows considerably more information to be gathered from a given observation than multinomial choice data in which only the best alternative is revealed. Since sampling costs are the major portion of a project to evaluate the preference for new products and non-market goods, it is important to develop robust estimation method that utilizes ranking information. To our best knowledge, the only semiparametric estimator that exploits rank-ordered choice data is developed by Hausman and Ruud (1987). Ruud (1983) nds that under misspecication the MNL model provides consistent estimator for the ratio of slope coecients if the regressors have linear conditional expectations. Taking advantage of this result, Ruud (1986) constructs a weighted M-estimator (WME) that achieve the same consistency in situations where the conditional expectations are not linear. Hausman and Ruud (1987) generalize the WME for multinomial choice data to rank-ordered choice data. However, consistent estimation is limited to coecients of continuous explanatory variables and the asymptotic distribution of the WME is still unknown, which limit the application of WME in the applications where some (if not most) variables are categorical variables. 2.3 The Generalized Maximum Score (GMS) Estimator In this section, we will develop a method to estimate the preference parameter distribution function form for the error terms. As the scale of β β without specifying the is not identied, we need to impose some normalization rst. Subject to the prior knowledge that at least one parameter is non-zero, we can normalized the magnitude of that entry of that |β1 | = 1 β to be one. Dene 0 β ≡ (β1 , β̃ )0 ∈ Rq . Without loss of generality, we assume as stated in Assumption 2. Assumption 2. β ∈ B where B ≡ {−1, 1} × B̃ and B̃ is a compact subset of Rq−1 where q ≥ 2. The normalization given by Assumption 2 is widely used in semiparametric estimation for the preference the ROL (or ROP) model may not be a good replacement for the MNL (or MNP) model. This is because both the ROP and ROL estimators are quasi-maximum likelihood estimator (QMLE) in the presence of misspecication on the conditional distribution of error terms. The quasi-maximum likelihood (QMLE) estimator converges to the parameter vector that minimizes the KullbackLeibler Information Criterion (KLIC). The ROL (or ROP) estimator has its limiting parameter vector to be distorted in a way to mimic the probability of rankings instead of the probability of choosing the best alternative. Therefore, the ROL (or ROP) model may not be a good replacement for the MNL (or MNP) model when the research focus is to approximate the choice probabilities of each alternative. 9 parameter vector in the random utility functions. The vector β̃ ∈ B̃ is our estimation focus because the sign estimator converges at a much faster rate. Without knowing the joint density of ε given X, we rely on the relationship between the ranking of utilities and the ranking of systematic utilities stated in Assumption 3 to identify the preference parameters. For any alternative j ∈ J, denote the deterministic utility of alternative j as vj ≡ x0j β . Assumption 3. For any pair of alternatives j, k ∈ J, vj > vk if and only if P (rj < rk |X) > P (rk < rj |X), (5) i.e., if alternative j has higher deterministic utility than alternative k, then it is more likely to occur that alternative j is preferred to alternative k (rj < rk ) than the case that alternative k is preferred to alternative j (rk < rj ), conditioning on the explanatory variables. Assumption 3 immediately implies that j k and alternative than alternative k P (rj < rk |X) = P (rj > rk |X) when vj = vk , have the same deterministic utility, then the chance that alternative is the same as the one that alternative Two special cases worth mentioning here. First, of all the alternatives in the choice set. P (rj < rk |X) j j is ranked better is ranked better than alternative j. M = J − 1, i.e., individuals reveal their complete ranking With the complete ranking of alternatives, we can compare the utilities of any two alternatives. Therefore, alternative utility from choosing alternative k i.e., if alternative j is ranked better than alternative is higher than the utility from choosing alternative k. k if and only if the We have = P (uj > uk |X) (6) = P (εk − εj < x0j β − x0k β|X). The rst line of (6) does not hold if we only observe a partial ranking. For example, when neither alternative j nor alternative k belongs to the set M, both of them have rank M +1 no matter whether they yield the same utility level for the individual or not. Assume that the conditional distribution of increasing function. Then the well-known conditional median zero restriction, εj − εk is a strictly median(εj − εk |X) = 0, is a sucient and necessary condition for Assumption 3. The second special case is M = 1, i.e., only the best alternative is revealed, meaning that we have 10 multinomial discrete choice data. For multinomial choice data, alternative k if and only if j j is ranked better than alternative is ranked as the best alternative. So we have P (rj < rk |X) = P (rj = 1|X). (7) In this case, Assumption 3 is known as the monotonicity of choice probabilities property, i.e., the ranking of the choice probabilities of alternatives is the same as the ranking of the deterministic utilities of alternatives. Next, we describe the intuition of applying Assumption 3 to identify the parameter vector β. Let 7 1(·) be the indicator function that equals one if the event in the parenthesis is true and zero otherwise, and let 0 b ≡ (b1 , b̃ )0 be any vector in the parameter space true, then the event rk < rj event rj < rk B, where b̃ ∈ B̃. is more likely to be true than the event is more likely to be true than the event has the same chance to be true as the event rk < rj . rj < rk ; if Under Assumption 3, if rk < rj ; x0j β = x0k β if x0k β > x0j β x0j β > x0k β is true, then the is true, then the event rj < rk Therefore, the expected value of the following match mjk (b) ≡ 1(rj < rk ) · 1(x0j b ≥ x0k b) + 1(rk < rj ) · 1(x0k b > x0j b) should be maximized at the true preference parameter vector index for individual estimator, bN ≡ n and alternative 0 (bN,1 , b̃N ) ∈ B, j. is β, (8) where b ∈ B. Dene x0nj b as the b-utility Applying the analogy principle, we propose a semiparametric for the preference parameter vector β dened by (9): bN ∈ argmaxb∈B QN (b), (9) where QN (b) = N −1 N X X 1(rnj < rnk ) · 1(x0nj b ≥ x0nk b) + 1(rnk < rnj ) · 1(x0nk b > x0nj b) , (10) n=1 1≤j<k≤J In the special case that M = 1, i.e., we have multinomial choice data, the estimator bN dened by (9) becomes the well-known maximum score (MS) estimator. For this reason, we name the estimator 7 See Manski (1975), Fox (2007) and Yan (2013). 11 bN as a 8 Generalized Maximum Score (GMS) estimator. When all the explanatory variables are discrete, we can always nd another parameter vector in the neighborhood of β to generate the same ranking of b-utility indexes as the true parameter vector. To get point identication, we need to impose an extra assumption on the explanatory variables, namely, we need a continuous explanatory variable conditional on other explanatory variables. We make a few notations and state the restrictions on explanatory variables formally in Assumption 4. In random utility models, only the dierences in utilities matter. Normalize dierence in the explanatory vectors between alternatives j 2, we assumed that the rst parameter has nonzero value. xj into (xj,1 , x̃0j )0 , where So the rst element of xj,1 xjk is is the rst element of xjk,1 = xj,1 − xk,1 X̃ ≡ (x̃1 , . . . , x̃J )0 ∈ RJ×(q−1) . Vectors x̃jk , and respectively. Matrices Xn xnj , xnjk , X̃ n are the xj , k, that is, For each alternative and x̃j x̃njk Let xjk = xj − xk . j ∈ J, xjk denote the In Assumption factor the vector refers to the remaining elements of and its remaining elements and nth and xJ = 0.9 are the nth x̃jk = x̃j − x̃k . observation of vectors observation of matrices X and X̃ , xj . Denote xj , xjk , and respectively. Assumption 4. The following statements are true. (a) For any pair of alternatives j, k ∈ J, gjk (xjk,1 |x̃jk ) denotes the density function of xjk,1 conditional on x̃jk , (b) and gjk (xjk,1 |x̃jk ) is nonzero everywhere on R for almost every x̃jk . For any constant vector c ≡ (c1 , . . . , cq )0 ∈ Rq , Xc = 0 with probability one if and only if c = 0. Assumption 4 is sucient to show that other vectors limit of the objective function QN (b) b ∈ B would yield dierent values for the probability from the true parameter vector β. Assumption 4(a) avoids a local failure of identication, which is important for semiparametric setting. Assumption 4(b) is analogous to the full-rank condition for the binary choice model, which prevents a global failure of identication. The following theorem establishes the strong consistency of the GMS estimator. Theorem 1. Let Assumptions 1-4 hold. The GMS estimator bN that solves the following problem (11) maxb: b∈B QN (b) 8 The GMS estimator is dierent from the maximum rank estimator that Han and Sherman study with. Though the objective function (10) includes comparison of rankings, it is the comparison within each individual, allowing for heteroskedasticity across individuals. The maximum rank estimator compares ranks across individuals, ruling out the heteroskedasticity across individuals. 9 If x 6= 0, we can subtract each x by x j J J 12 converges almost surely to β, the true parameter in the data generating process. Similar to the MS estimator, the complexity of the GMS estimator are due to the discontinuity of the indicator function in (10). N 1/3 bN and its slow rate of convergence Kim and Pollard (1990) have shown that times the centered MS estimator converges in distribution to the random variable that maximizes a certain Gaussian process for the binary choice model. special case of the GMS estimator. The estimator that Kim and Pollard study is a This asymptotic distribution result is too complicated to be used for inference in applications. Abrevaya and Huang (2005) prove that the standard bootstrap is not consistent for the MS estimator. Delgado et al. (2001) show that subsampling consistently estimates the asymptotic distribution of the test statistic of the MS estimator for the binary choice model. Subsampling has eciency loss and its computational cost is high for our estimator because a global search method is needed to solve the maximization problem for each subsample. In Section 3, we introduce two methods of smoothing over the objective function QN (b) to (at least partly) overcome the theoretical and computational diculty. 2.4 A Special Case: Random Coecients There is a special case that we can incorporate random coecients into the random utility model. Consider a random utility model with random coecients as follows: unj = x0nj β n + enj = x0nj β + εnj , where εnj = x0nj (β n − β) + enj . (12) The random coecient βn observable characteristics. The parameter vector of interest, reects unobserved heterogeneity in tastes for β, now represents a certain central tendency measure (mean or median) of the random coecients for the population of interest. The parameter vector β can be estimated using parametric method under several restrictive assumptions. For example, the popular mixed-logit model assumes that and en , and βn enj are i.i.d. extreme value type I distributed, βn is independent of Xn follows a known distribution subject to a few parameters. The mixed-logit model has many good properties such as allowing for heteroskedasticity and correlation across alternatives, and unrestricted substitution patterns. It is straightforward to generalize the mix-logit model for multinomial choice data to the one for rank-ordered choice data by replacing the choice probabilities in the likelihood function with the 13 probability of rankings. However, as other quasi-maximum likelihood estimators, the mixed-logit estimator may be inconsistent when the distribution function forms of Estimating and βn β enj and βn are misspecied. in the random coecient model (12) semiparametrically, i.e., leaving the distribution of enj unspecied, is a challenging task. When we only have multinomial choice data, the MS estimator does not guarantee consistent estimation for β in (12) because the ranking of deterministic utilities of some alternatives may not be the same as the ranking of their choice probabilities. Given the same level of deterministic utility, the alternative with higher variance has more frequent draws from the right tail of the 10 When we have fully rank-ordered choice data, i.e., we know the ranking for all the error term distribution. alternatives, semiparametric estimation of β in the random coecient model can be achieved using the GMS estimator. Next, we describe the intuition of consistency of the GMS estimator for (12). When the alternatives in the choice set is fully ranked, rnj < rnk β in the random coecient model if and only if unj > unk . A sucient and necessary condition for Assumption 3 is median(εnj − εnk |X) ≡ median[(xnj − xnk )0 (β n − β) + enj − enk | X] = 0. Condition (13) is much weaker than the assumptions imposed on the mixed-logit model. (13) For example, condition (14), median(e nj − enk | X) = 0 f or any j, k, (14) β n ⊥ en | X, which is satised by the mixed-logit model, is sucient for (13). By theorem 1, the GMS estimator is consistent for β in the random coecient model (12) under Assumptions 1, 2, 4 together with a mild location restriction (13) when we have complete ranking data. The GMS estimator is consistent for estimating the random coecient model (12) using fully rank-ordered data. This property is important because in many applications (Calfee et al., 2001; Capparos et al., 2008; Scarpa et al., 2011; Yoo and Doiron, 2013) with a small choice set the alternatives are fully ranked. The 10 See Fox (2007) and Yan (2013). 14 GMS does not require parametric assumption on the distributional function form on individual heterogeneity βn or the error term en , making it more robust than the mixed-logit model. Meanwhile, the GMS estimator keeps the good properties of the mixed-logit model, allowing for unobserved heterogeneity in tastes, exible substitution pattern across alternatives, and a general form of heteroskedasticity across both individuals and alternatives. 3 The Smoothed GMS Estimator The maximum score type estimator is N 1/3 -consistent and follows a non-standard asymptotic distribution of Kim and Pollard (1990). Kim and Pollard have shown that N 1/3 times the centered MS estimator converges in distribution to the random variable that maximizes a certain Gaussian process for the binary choice data. This asymptotic distribution result is too complicated to be used for inference in applications. Abrevaya and Huang (2005) prove that the standard bootstrap is not consistent for the MS estimator. Delgado et al. (2001) show that subsampling consistently estimates the asymptotic distribution of the test statistic of the MS estimator for the binary choice data. Subsampling has eciency loss and its computational cost is high for the MS or GMS estimator because a global search method is needed to solve the maximization problem for each subsample. In this section, we propose an estimator that complements the GMS estimator by addressing these practical limitations, in return for making some additional assumptions. In the context of Manski's binary choice MS estimator, Horowitz (1992) develops a smoothed maximum score (SMS) estimator that replaces the step functions with smooth functions. Yan (2012) applies this technique to derive a smoothed version of Fox's multinomial choice MS estimator. We use the same approach to derive a smoothed GMS (SGMS) estimator and oer similar benets as its precedents: we show that the SGMS estimator's convergence rate is faster than N −1/3 with extra smoothness conditions, and also that its has asymptotically normal distribution. 15 3.1 The Smoothed GMS Estimator and its Asymptotic Properties The objective function in (10) can be rewritten as = N −1 QN (b) N X X {[1(rnj < rnk ) − 1(rnk < rnj )] (15) n=1 1≤j<k≤J ·1(x0nj b ≥ x0nk b) + 1(rnk < rnj ) . The indicator function of b in (15) can be replaced with a suciently smooth function is analogous to a cumulative distribution function. Let the sample size N hN K(·), where K(·) be a positive bandwidth that goes to zero when goes to innity. Application of the smoothing idea in Horowitz (1992) to the right-hand side of (15) yields a smoothed GMS (SGMS) estimator bSN ∈ argmaxQSN (b, hN ), (16) b∈B where QSN (b, hN ) = N −1 N X X {[1(rnj < rnk ) − 1(rnk < rnj )] (17) n=1 1≤j<k≤J ·K x0nj b − x0nk b /hN + 1(rnk < rnj ) . The next assumption states the requirement for the smoothing function bSN , to be a consistent estimator for Condition 1. 0 and let K(x) Let K(·) for the SGMS estimator, β. {hN : N = 1, 2, . . .} be a sequence of strictly positive real numbers satisfying limN →∞ hN = be a function on (a) |K(x)| < C (b) limx→−∞ K(x) = 0 for some nite and R such that: C and all x ∈ (−∞, ∞); and limx→∞ K(x) = 1. Theorem 2. Let Assumptions 1-4 and Condition 1 hold. The SGMS estimator bSN ∈ B dened in (16) converges almost surely to the true preference parameter β. 16 Since the objective function (17) of the SGMS estimator bSN is a smooth function, the derivation of the asymptotic distribution of the SGMS estimator for the binary choice model in Horowitz (1992) is applicable to it. Denote the rst element of bSN as bSN,1 and the rest elements as S b̃N . Next, we dene a few terms and give a sketch of deriving the asymptotic distribution of the SGMS estimator. Dene the rst order derivative of the smoothed objective function second order derivative of function K(·) QSN (b, hN ) QSN (b, hN ) with respect to with respect to b̃ as b̃ as tN (b, hN ) ≡ ∂QSN (b, hN )/∂ b̃ 0 H N (b, hN ) ≡ ∂ 2 QSN (b, hN )/∂ b̃∂ b̃ . and the Assume that is twice dierentiable, and we have the vector tN (b, hN ) = (N hN )−1 N X X {[1(rnj < rnk ) − 1(rnk < rnj )] (18) n=1 1≤j<k≤J ·K 0 x0njk b/hN o x̃njk . and the matrix H N (b, hN ) = (N h2N )−1 N X X {[1(rnj < rnk ) − 1(rnk < rnj )] (19) n=1 1≤j<k≤J ·K 00 x0njk b/hN x̃njk x̃0njk o . Assumption 5. β̃ is an interior point of B̃, If Assumption 5 is true, then with probability approaching 1 as by the rst order condition. A Taylor series expansion of N → ∞, bSN,1 = β1 , tN (bSN , hN ) and tN (bSN , hN ) = 0 around the true parameter S tN (bSN , hN ) = tN (β, hN ) + H N (b∗N , hN )(b̃N − β̃) = 0, where b∗N is a vector between bSN and β. ∗ in distribution and also that H N (bN , H. Suppose there is a function hN ) β yields (20) ρ(N ) such that ρ(N )tN (β, hN ) converges converges in probability to a nonsingular, nonstochastic matrix Then, S ρ(N )(b̃N − β̃) = −H −1 ρ(N )tN (β, hN ) + op (1). Equation (21) implies that it is essential to derive the limiting distribution of 17 (21) ρ(N )tN (β, hN ) to establish the asymptotic distribution of the estimator probabilities of dierent rankings. bSN . The mean and variance of ρ(N )tN (β, hN ) depend on the Unlike the binary choice setting where the choice probabilities depend on a single dierence between explanatory vectors, the probabilities of rankings rely on multiple dierences between explanatory vectors. How the dierence in explanatory vectors of any two alternatives aects the dierence in their rankings is key to the derivation of the asymptotic distribution of To formalize the idea of deriving the asymptotic distribution of the estimator notations. Let vj ≡ x0j β be the systematic utility of choosing alternative vJ is normalized to be zero. Denote X and (v −J , X̃) v − ι J vj for xed excluding its j th β. Dene v −J ≡ (v1 , . . . , vJ−1 )0 . ιJ ≡ (1, . . . , 1) ∈ RJ . element. For example, when j. Denote ρ(N )tN (β, hN ). bSN , we introduce some v ≡ (v1 , . . . , vJ−1 , vJ )0 . There is a one-to-one correspondence between For any alternative j ∈ J, let v −j be the vector 1 < j < J, v −j = (v1 − vj , . . . , vj−1 − vj , vj+1 − vj , . . . , vJ − vj )0 . In other words, v −j is the systematic utility vector normalized by the systematic utility of alternative any pair of alternatives elements of v −j j, k ∈ J, excluding v−j,k . dene v−j,k = vk − vj For example, when and ṽ −j,k j. For as the vector that consists of all of the 1 < j < k < J, ṽ −j,k ≡ (v1 − vj , . . . , vj−1 − vj , vj+1 − vj , . . . , vk−1 − vj , vk+1 − vj , . . . , vJ − vj )0 . If J > 2, for any three dierent alternatives elements of v −j excluding v−j,k and v−j,l . j, k, l ∈ J, dene ṽ −j,kl For example, when as the vector that consists of all of the 1 < j < k < l < J, ṽ −j,kl ≡ (v1 − vj , . . . vj−1 − vj , vj+1 − vj , . . . , vk−1 − vj , vk+1 − vj , . . . , vl−1 − vj , vl+1 − vj , . . . , vJ − vj )0 . Let pjk (v−j,k |ṽ −j,k , X̃) denote the conditional density of (i) v−j,k pjk (v−j,k |ṽ −j,k , X̃) = ∂ i pjk (v−j,k |ṽ −j,k , X̃)/∂(v−j,k )i 18 given (ṽ −j,k , X̃). Dene the derivatives and (0) pjk (v−j,k |ṽ −j,k , X̃) ≡ pjk (v−j,k |ṽ −j,k , X̃). pjkl (v−j,k , v−j,l |ṽ −j,kl , X̃) Let denote the joint density of Given any pair of alternatives for xed β ∈ B. (v−j,k , v−j,l ) X, (ṽ −j,kl , X̃). j, k ∈ J, there is a one-to-one correspondence between X and (v−j,k , ṽ −j,k , X̃) The conditional probability of alternative explanatory matrix conditional on or equivalently, (v−j,k , ṽ −j,k , X̃). j ranked better than alternative Next, for any alternatives j, k ∈ J, k depends the dene P (rj < rk |v−j,k , ṽ −j,k , X̃) ≡ Fjk (v−j,k , ṽ −j,k , X̃) (22) P (rj < rk |v−j,k , ṽ −j,k , X̃) − P (rk < rj |v−j,k , ṽ −j,k , X̃) ≡ F̄jk (v−j,k , ṽ −j,k , X̃). (23) and For any integer i > 0, dene the following derivatives: (i) F̄jk (v−j,k , ṽ −j,k , X̃) ≡ ∂ i F̄jk (v−j,k , ṽ −j,k , X̃)/∂(v−j,k )i whenever the derivatives exist. Dene the scalar constants kd = ´∞ −∞ kd and kΩ by xd K 0 (x)dx and ˆ ∞ [K 0 (x)]2 dx, kΩ = −∞ whenever these quantities exist. Dene the q−1 vector 19 a and the (q − 1) × (q − 1) matrices Ω and H as follows: X a= kd 1≤j<k≤J Ω= d X h i 1 (i) (d−i) E F̄jk (0, ṽ −j,k , X̃) pjk (0|ṽ −j,k , X̃)x̃jk , i!(d − i)! i=1 h i 2kΩ E Fjk (0, ṽ −j,k , X̃) pjk (0|ṽ −j,k , X̃)x̃jk x̃0jk , X 1≤j<k≤J and H= i h (1) E F̄jk (0, ṽ −j,k , X̃) pjk (0|ṽ −j,k , X̃)x̃jk x̃0jk X 1≤j<k≤J whenever these quantities exist. In addition to Condition 1, we make the following additional restrictions on the smoothing function Condition 2. (a) K(·). The following statements are true. K(x) is twice dierentiable for x ∈ R, |K 0 (x)| and |K 00 (x)| are uniformly ´∞ ´∞ ´∞ [K 0 (x)]4 dx, −∞ x2 |K 00 (x)|dx, and −∞ [K 00 (x)]2 dx are nite. −∞ (1 ≤ i < d), ´∞ |xd K 0 (x)|dx < ∞, kd ∈ (0, ∞), ´∞ |xi K 0 (x)|dx < ∞ and −∞ xi K 0 (x)dx = 0. −∞ d ≥ 2, (b) For some integer ´∞ (c) For any integer −∞ i (0 ≤ i ≤ d), lim hi−d N ´ |hN x|>η N →∞ any η > 0, and any sequence {hN } and bounded, and the integrals kΩ ∈ (0, ∞). converging to For any integer 0, |xi K 0 (x)|dx = 0 and lim h−1 N N →∞ ´ |hN x|>η |K 00 (x)|dx = 0. We need additional assumptions to derive the asymptotic distribution of the estimator 20 bSN . i (i) Assumption 6. For any pair of alternatives j, k ∈ J, and for v−j,k in a neighborhood of zero, F̄jk (v−j,k , ṽ −j,k , X̃) exists and is a continuous function of v−j,k and is bounded by C for almost every (ṽ −j,k , X̃), where C < ∞ and i is an integer (1 ≤ i ≤ d). Assumption 6 states the dierentiability of the conditional distribution function of the error term ε. Assumption 7. The following statements are true. (a) For any pair of alternatives j, k ∈ J, p(i) jk (v−j,k |ṽ −j,k , X̃) exists and is a continuous function of v−j,k satisfying |p(i) jk (v−j,k |ṽ −j,k , X̃)| < C for v−j,k in a neighborhood of zero, almost every (ṽ −j,k , X̃), some C < ∞, and any integer i (1 ≤ i ≤ d − 1). In addition, |pjk (v−j,k |ṽ −j,k , X̃)| < C for all v−j,k almost every (b) and (ṽ −j,k , X̃). For any three dierent alternatives j, k, l ∈ J, pjkl (v−j,k , v−j,l |ṽ −j,kl , X̃) < C for all (v−j,k , v−j,l ) and almost every (ṽ −j,kl , X̃). (c) The components of matrices X̃ , vec(X̃)vec(X̃)0 , and vec(X̃)vec(X̃)0 vec(X̃)vec(X̃)0 have nite rst absolute moments. Assumption 7 imposes regularity conditions on the explanatory variables. Assumptions 6-7 guarantee the existence of the parameters in the limiting distribution of the estimator. Assumption 8. (logN )/N h4N → 0 as N → ∞. Assumptions 6-8 together with Condition 2 are analogous to assumptions in kernel density estimation. A higher convergence rate of the SGMS estimator can be achieved using a higher order kernel (K required derivatives of F̄ and p 0 (·)) when the exist. Assumption 9. The matrix H is negative denite. The matrix H is analogous to the Hessian information matrix in the quasi-MLE. The main results concerning the asymptotic distribution of the SGMS estimator are given by the following theorem. Theorem 3. Let Assumptions 1-9 and Conditions 1-2 hold for some integer d ≥ 2 and let {bSN } be a sequence of solutions to problem (16), 21 S (a) −d If N h2d+1 → ∞ as N → ∞, then hN (b̃N − β̃) → −H −1 a. N (b) If N h2d+1 has a nite limit λ as N → ∞, then N p S d (N hN )1/2 (b̃N − β̃) → M V N −λ1/2 H −1 a, H −1 ΩH −1 . (c) Let hN = (λ/N )1/(2d+1) with 0 < λ < ∞; W be any nonstochastic, positive semidenite matrix such that a0 H −1 W H −1 a 6= 0; EA denote the expectation with respect to the asymptotic distribution of S S N d/(2d+1) (b̃N − β̃); S and M SE ≡ EA [(b̃N − β̃)0 W (b̃N − β̃)]. The M SE is minimized by setting λ = λ∗ ≡ trace ΩH −1 W H −1 / 2da0 H −1 W H −1 a , S in which case N d/(2d+1) (b̃N − β̃) converges in distribution to M V N −(λ∗ )d/(2d+1) H −1 a, (λ∗ )−1/(2d+1) H −1 ΩH −1 . Theorem 3 implies that the fastest rate of convergence of the estimator is (N h2d+1 )−d/(2d+1) → 0 N N h2d+1 →0 N as if N → ∞. N h2d+1 →∞ N as N → ∞, Choosing bandwidth and d/(2d+1) S of N (b̃N (N hN )1/2 /N d/(2d+1) = (N h2d+1 )1/(4d+2) → 0 N hN = (λ/N )1/(2d+1) the fastest rate of convergence. Theorem 3(c) shows that d/(2d+1) N −d/(2d+1) because h−d = N /N λ∗ where minimizes λ ∈ (0, ∞) M SE if is sucient to achieve of the asymptotic distribution − β̃). To make the result of Theorem 3 useful in applications, it is necessary to be able to estimate the parameters in the limiting distribution a, Ω, and H consistently from observations of (r, X). The next theorem shows how this can be done. Theorem 4. Let Assumptions 1-9 and Conditions 1-2 hold for some integer d ≥ 2 and vector bSN be a consistent estimator based on hN ∝ N −1/(2d+1) . Let h∗N ∝ N −δ/(2d+1) , where δ ∈ (0, 1). Then (a) âN ≡ (h∗N )−d tN (bSN , h∗N ) converges in probability to a. 22 (b) For b ∈ B and n = 1, . . . , N , dene X tN n (b, hN ) = [1(rnj < rnk ) − 1(rnk < rnj )] K 0 x0njk b/hN x̃njk h−1 N , 1≤j<k≤J the matrix Ω̂N ≡ (hN /N ) N X tN n (bSN , hN )tN n (bSN , hN )0 n=1 converges in probability to Ω. (c) H N (bSN , hN ) converges in probability to H . By Theorem 3(c), the asymptotic bias of hN = (λ/N )1/(2d+1) . consistently by u S N d/(2d+1) (b̃N − β̃) is −λd/(2d+1) H −1 a It follows from Theorem 4 that the bias term −λd/(2d+1) H N (bN , hN )−1 âN . when the bandwidth −λd/(2d+1) H −1 a can be estimated Therefore, dene S b̃N = b̃N + (λ/N )d/(2d+1) H N (bN , hN )−1 âN (24) as the bias-corrected SGMS estimator. 3.2 Bandwidth Selection Theorem 3(c) provides a way to choose the bandwidth for the SGMS estimator. To achieve the minimum M SE , an optimal λ∗ can be consistently estimated by the conclusion of Theorem 4. Therefore, one possible way of choosing bandwidth is to set hN = (λ̂/N )1/(2d+1) , where λ̂ is a consistent estimator for λ∗ . In Monte Carlo experiments and empirical application, the choice of bandwidth can be implemented by taking the following steps. Step 1. Given d, choose a hN ∝ N −1/(2d+1) Step 2. Compute the SGMS estimator compute Ω̂N and bSN and using h∗N ∝ N −δ/(2d+1) hN . H N (bSN , hN ). 23 Use bSN and for h∗N δ ∈ (0, 1). to compute âN . Use bSN and hN to Step 3. Estimate λ̂N = λ∗ by n h io trace Ω̂N H N (bSN , hN )−1 H N (bSN , hN )−1 h i−1 · 2dâ0N H N (bSN , hN )−1 H N (bSN , hN )−1 âN . Step 4. Calculate the estimated bandwidth After deriving the bandwidth heN , heN = (λ̂N /N )1/(2d+1) . an SGMS estimator can be calculated based on it. This method of choosing the bandwidth is analogous to the plug-in method of kernel density estimation. 4 Monte Carlo Experiments In this section, we provide small-scale Monte Carlo simulation results to demonstrate some nite-sample properties of the proposed estimator bN bSN . In each experiment, we compare the estimation results of bSN with and the ROL estimator. We consider three data generating processes (DGPs). In the rst process, the ROL model is correctly specied, i.e., the error terms are i.i.d. type-I extreme value distributed and are independent of explanatory variables. In the second process, the error terms are i.i.d. distributed from a mixed normal distribution and are independent of explanatory variables. In the last process, we allow for heteroskedasticity, the conditional distribution of the error terms varies across agents. For each agent n, data are generated from the random utility model: unj = xnj,1 β1 + xnj,2 β2 + εnj where f or j ∈ J, β1 = β2 = 1. A gradient-based algorithm is used to search for the ROL estimator due to the concavity of the maximum likelihood function. The challenge in nding bN is that its objective function is a step function, and hence every point is a local extrema as shown in Figure 1. Smoothing over the step function yields the objective function of bSN . It is dierentiable, but the diculty in nding the maxima persists. In all Monte Carlo experiments, we use the dierential evolution random search method (Storn and Price, 1995) to search for bN and a gradient-based algorithm with dierent initial searching points to search for 24 bSN . We consider sample sizes (N = 100 number of alternatives is xed at chosen by hN = N −1/5 and 500) j = 5.There and three rankings (M = 1, 2, and 4) for each DGP. The are 1000 replications for each experiment. The bandwidth is The smoothing function K(·) is the standard normal distribution function. Following are the three DGPs. • DGP 1: xnj,1 εnj • and xnj,2 are i.i.d. N(0, 2). are i.i.d. EV(0, 1, 0). DGP 2: xnj,1 εnj and xnj,2 are i.i.d. N(0, 2). are i.i.d. with density function given by " # " # 2 2 0.369 − (ε + 1) (1 − 0.369) − (ε − 1.5) f (ε) = √ exp +√ exp . 2 × 0.184 2 × 0.193 2π × 0.184 2π × 0.193 • (25) DGP 3: xnj,1 are i.i.d. N(0, 2). xnj,2 = wnj /cn , where wnj are i.i.d. N(0, 2) and εnj = 0.004(1 + 2c2n + c4n )enj , where enj cn are i.i.d. unif(1/5, 5). are i.i.d. EV(0, 1, 0). The results of the experiments using the three DGPs are summarized in Tables 1-3, respectively. normalization of the ROL model is imposed on the variance of the error terms. both parameters parameter: β1 |β1 | = 1. and β2 for the ROL model. For Therefore, we only estimate bN and bSN , The Therefore, we estimate the normalization is imposed on the rst β2 .11 For each table, columns 1-2 show the number of agents and rankings; columns 3-4 show the bias of and β̂2 obtained from the ROL model; columns 5-6 show the bias and RMSE of the ratio estimators; columns 7-8 show the bias and RMSE of 11 Because bN ; β̂2 /β̂1 β̂1 of the ROL columns 10-11 show the bias and RMSE of bSN . the estimate of the sign will converge at a faster rate such that there is no need to analyze its nite-sample property. 25 Table 1 illustrates the eciency loss of using semiparametric methods when the ROL model is correctly specied. The RMSE of the semiparametric estimators is larger than the parametric estimator, as expected. However, bSN always has a smaller RMSE than bN . Table 2 illustrates the performance of the three estimators when the error terms are i.i.d. from a mixed normal distribution and homoskedasticity still holds. Because the ROL model is misspecied, the ROL estimator may be inconsistent. Columns 3-4 show the bias of the ROL estimators. This bias does not vanish when the sample size increases, which suggests that the ROL estimators are inconsistent. However, if our interest is in comparing the relative importance of the explanatory variables the utility of choosing alternative j, xnj,1 and xnj,2 in determining then it seems that the ROL performs well in terms of achieving very small bias and RMSE as shown in columns 5-6. Table 3 illustrates the performance of the three estimators in the presence of heteroskedasticity. Because the ROL model is misspecied, the ROL estimators may be inconsistent. Columns 3-4 show the bias of the ROL estimators. This bias does not vanish when the sample size increases, which suggests inconsistency of the ROL estimators. Now, the ROL estimators cannot predict the relative importance of all of the factors that aect an agent's utility, either, as shown in columns 5-6. The bias of the ratio of the parameters does not vanish when the sample size increases. heteroskedasticity. The RMSE of bSN However, semiparametric estimators still perform well under decreases faster than that of bN as the sample size increases. In empirical work, heteroskedasticity might be relevant (e.g., the variance of the error term of students with high family income may be dierent from the variance of the error term of students with low family income in a college choice problem). As shown in Monte Carlo experiments, the MNL estimators might be inconsistent, and therefore analysis based on them could be misleading in the presence of heteroskedasticity. Results of Monte Carlo experiments suggest considering bN and bSN , which allow for a exible form of heteroskedasticity as alternatives in estimating the random utility model with rank-ordered data. 5 Illustrative application: preferences for bank services This section provides an illustrative empirical application of the generalized maximum score (GMS) estimator and the smoothed generalized maximum score (SGMS) estimator that we have proposed earlier. We apply 26 27 0.006 0.001 0 2 4 0.009 4 1 0.012 2 500 0.021 1 0.002 0.001 0.003 0.012 0.014 0.018 (β̂2 ) (β̂1 ) 100 Bias Bias M N 0.002 0.001 -0.001 0.005 0.007 0.008 (β̂2 /β̂1 ) Bias ROL RMSE 0.036 0.047 0.064 0.078 0.107 0.152 (β̂2 /β̂1 ) 0.007 0.010 0.010 0.023 0.033 0.053 (β̂2 ) Bias 0.102 0.125 0.155 0.178 0.222 0.297 (β̂2 ) RMSE bN 0.025 0.025 0.026 0.047 0.055 0.074 (β̂2 ) Bias Table 1: Monte Carlo results of DGP 1 0.066 0.085 0.117 0.127 0.170 0.258 (β̂2 ) RMSE bSN 142.7s 61.9s 25.4s 24.84s 11.15s 6.35s time 28 0.176 0.004 -0.141 2 4 -0.135 4 1 0.014 2 500 0.200 1 -0.141 0.006 0.178 -0.133 0.016 0.201 (β̂2 ) (β̂1 ) 100 Bias Bias M N 0.001 0.003 0.003 0.007 0.008 0.010 (β̂2 /β̂1 ) Bias ROL RMSE 0.042 0.048 0.061 0.092 0.106 0.138 (β̂2 /β̂1 ) 0.004 0 0.008 0.017 0.022 0.034 (β̂2 ) Bias 0.082 0.093 0.111 0.152 0.174 0.223 (β̂2 ) RMSE bN 0.023 0.022 0.026 0.050 0.051 0.057 (β̂2 ) Bias Table 2: Monte Carlo results of DGP 2 0.053 0.060 0.078 0.115 0.133 0.180 (β̂2 ) RMSE bSN 108.3s 83.9s 32s 25.5s 11s 8.5s time 29 0.447 0.486 0.512 1 2 4 0.701 0.685 0.664 0.723 0.718 0.710 Bias RMSE Size of t-test**: test of hypothesis 0 -0.001 0.003 0.002 0.007 0.013 (β̂2 ) Bias 0.019 0.032 0.057 0.050 0.089 0.160 (β̂2 ) RMSE bN at the nominal 0.05 level. at the nominal 0.05 level. 0.124 0.142 0.166 0.143 0.175 0.227 (β̂2 /β̂1 ) β2 = β1 β2 = 1 0.120 0.135 0.152 0.124 0.143 0.159 (β̂2 /β̂1 ) Size of t-test*: test of hypothesis 500 0.539 4 0.490 (β̂2 ) (β̂1 ) 0.512 1 100 Bias Bias 2 M N ROL 0.021 0.021 0.021 0.040 0.042 0.046 (β̂2 ) Bias Table 3: Monte Carlo results of DGP 3 0.028 0.034 0.047 0.064 0.085 0.134 (β̂2 ) RMSE bSN 137.6s 82.6s 41.1s 25.2s 14.5s 8.19s time those methods to estimate a semiparametric rank-ordered choice model using the bank account choice data set that is distributed on the website of software package Latent GOLD 5.0. 12 As in our Monte Carlo experiment, we compare the results with the maximum likelihood estimates of a popular parametric model, rank-ordered logit (ROL). The data set includes authentic empirical observations that originate from the stated preference study of Kamakura et al. (1994). The sample includes Every customer has faced a choice set of J =9 N = 256 customers of a large bank in the United States. hypothetical alternatives, each of which describes a checking account product with dierent characteristics. There are four alternative-specic characteristics in total as follows: • minbal : minimum balance (in $'00s) required to exempt the customer from a monthly service fee (0, 5, or 10 i.e. $0, $500 or $1000) • costpch : • fee : • atm : amount charged per check issue in $s (0, 0.15, or 0.35) monthlty service fee in $s (0, 3, or 6) availability and cost of automatic teller machines (N/A; available at $0.75 per transaction; avail- able for free). In our model specication, we include two alternative-specic constants atm($0.75), atm(N/A) and keeping available for free as the base category. In addition to the alternative-specic characteristics, we incorporate one customer-specic characteristic into our illustrative specication. • bal : 13 average monthly balance (in $'000s) kept in the customer's account during the past 6 months. Each customer has ranked all 9 alternatives from most to least preferred. introduced in section 2, the depth of available rankings dierent depth levels. The starting level choice data. The next levels M = 3 M =1 and M = 8. We apply each estimation method at four provides the same amount of information as multinomial M = 6 mimic partial ranking data, where one observes up to the third and sixth preferences of each customer respectively. 12 The 13 The In line with our notation The nal level M = 8 lets us analyze the web address is: http://statisticalinnovations.com/technicalsupport/choice_datasets.html#bank average balance varies across 256 customers from $7 to $9999, with the mean of $1158. 30 data as fully rank-ordered choices, as they actually are. In a rank-ordered choice analysis, one may consider N ×M as the eective sample size that takes into account the additional information from the use of deeper preferences. The data set thus allows us to investigate in an empirical setting how the performance of the semiparametric estimators vary across relatively small (256 × 1 = 256) to large (256 × 8 = 2048) samples. The random utility model of interest is specied as: unj 0 α × minbal nj × baln + xnj β + εnj = 1, 2, · · · , 256; j = 1, 2, · · · , 9 n = where n and j index customers and alternatives, respectively; atm($0.75) nj ]; α and β are preference parameters; and εnj variable baln xnj = [minbal nj 0 costpch nj f eenj atm(N/A) nj is the stochastic component of utility. We center around its sample mean so that the estimated coecient on minbal nj can be interpreted as the taste coecient for someone with the sample mean average balance ($1158). One may expect all coecients in inconvenience. The coecient β to be negative, as attributes xnj capture an increase in cost or α on the other hand can be expected to be positive, since someone who usually keeps a larger balance is less likely to be sensitive to an increase in the minimum balance requirement. The ROL estimates at all depth levels conform to these expectations. In our GMS and SGMS applications, we normalize α to 1 and estimate β .14 As in the Monte Carlo experiment, the dierential evolution algorithm is used to compute the GMS estimates, and a gradient-based algorithm with several initial starting points is used to compute the SGMS estimates. 15 Table 4 reports the estimates of coecients α and β. All estimates have expected signs. The results suggest that there are potentially large gains from using deeper ranking information, in terms of statistical precision. All sets of point estimates change visibly between sample size increases from 256 to 768, but much less between M =1 M =6 and M =3 cases or when the eective and M =9 cases or when the eective sample size increases from 1536 to 2048 observations. The faster convergence rate of SGMS over GMS has 14 The interaction term minbal nj × baln takes 477 distinct values across alternative-customer pairs. 15 The dierential algorithm conguration uses the amplication factor of 0.4, the cross-over probability 0.8, and the population size of 200. The algorithm runs up to 2000 generations. At each depth level, the reported set of GMS estimates corresponds to the maximum that has been found from restarting the algorithm 200 times. At each depth level, the reported set of SGMS estimates corresponds to the maximum that has been found by experimenting with 201 starting points that include the ROL estimates and 200 sets of GMS estimates. 31 also played out vividly. The SGMS estimates at M = 8, M =3 dier much less from the corresponding estimates at in comparison with the GMS estimates at those two depth levels. Table 5 transforms the results in table 4 into equivalent prices, to facilitate the discussion of how substantive economic results vary across dierent sets of estimates. These equivalent prices have been computed by using the coecient on f eenj to divide other coecients, and measure the monthly fee equivalents of unit increases in relevant attributes. For instance, the SGMS equivalent price of costpch at M =8 is 16.41, and indicates that raising the amount chargered per check issue by $1 is equivalent to a $16.41 increase in the monthly fee. Such coecient ratios feature as the primary parameters of interest in several empirical studies: see for example Layton (2000), Calfee et al. (2001), and Small et al. (2005). The equivalent prices illustrate the advantage of the semiparametric methods over the popular parametric method. In an empirical application, one may expect the ROL estimates to be more sensitive across ranks than the semiparametric estimates. The former relies on a stringent distributional assumption, while the latter are consistent for a wide class of stochastic distributions. As the earlier Monte Carlo results suggest, the misspecication bias of the ROL coecient ratios can vary with the depth of rankings used in the estimation. One may also expect the GMS estimates and SGMS estimates to be rather similar in a large sample, since their primary practical dierence is in the convergence rates even though we need a few more technical assumptions to derive the asymptotic properties of the SGMS estimator. that the ROL estimates imply dierent equivalent prices at M =6 and M = 8, Indeed, table 5 shows despite the relatively large sample sizes. By contrast, the GMS estimates imply identical equivalent prices at M =6 and M = 8, and show only a minimal dierence from the SGMS estimates at those two depth levels. The faster convergence of the SGMS estimator becomes even more evident when the estimates are expressed as equivalent prices: apart from atm(N/A), the SGMS equivalent prices vary by less than $1 between M =3 and M =6 cases, whereas the GMS equivalent prices vary by as much as $4.13. Outside Monte Carlo settings, it is admittedly dicult to conrm whether the semiparametric estimates are indeed closer to the true parameters than the ROL estimates. In this particular application, however, we are able to provide further circumstantial evidence that it is likely to be the case. Specically, note that the equivalent price of atm($0.75) from $0 to $0.75 per transaction. measures the monthly fee equivalent of raising the ATM transaction cost The data set includes a variable 32 N AT M that measures the number of ATM transaction that each customer makes per month, even though we have not incorporated it into our model specication. Since the sample mean of this variable is 5.25 transactions, be a reasonable guess about the equivalent price of this amount at M = 8. M =6 and M = 8, atm($0.75). $0.75 × 5.25 = $3.94 Both GMS and SGMS estimates are close to whereas the ROL estimates dier by more than $2 at We also note that in the fully ranked case of M = 8, would M =6 and $1 at the GMS estimator and the SGMS estimator admit several random coecient models that allow their equivalent prices to be interpreted as the population average equivalent prices. 6 Conclusions The parametric methods for rank-ordered choices are as well established as the parametric methods for multinomial choices. The semiparametric methods for rank-ordered choices are much less developed than the semiparametric methods for multinomial choices. To our best knowledge, the study of Hausman and Ruud (1987) is both the earliest and only semiparametric analysis of rank-ordered choices. We propose two types of semiparametric estimators for rank-ordered choices. We call them the generalized maximum score (GMS) estimator and the smoothed generalized maximum score estimator (SGMS) respectively. The term semiparametricrefers to the fact that the preference parameters of interest are nite dimensional but the error term in the random utility function has unspecied distribution. Both types of estimators admit very exible forms of interpersonal heteroskedasticity, and a wide class of stochastic distributions including those that motivate most of popular parametric models. We show the strong consistency of the proposed GMS estimator. The asymptotic distribution of the GMS estimator is nonstandard. To facilitate inference, we propose the SGMS estimator. We show that the SGMS estimator is strongly consistent and asymptotically normal, making inference straightforward. The preliminary results from Monte Carlo experiments and an illustrative empirical application show that the estimators have promising nite sample properties, and can be very useful additions to the empirical practitioner's toolbox. Rank-ordered choices allow one to gather considerably more preference information from a given sample of agents than multinomial choices. The practical gains from this extra information can be quite substantial. Train and Winston (2007), for example, exploit data from a market survey that asks for the consumer's 33 Table 4: Utility coecients: the rank-ordered logit (ROL), generalized maximum score (GMS) and smoothed GMS (SGMS) estimators ROL M = 1 minbal × bal minbal costpch f ee atm($0.75) atm(N/A) log-likelihood M = 3 M = 6 M = 8 0.04 0.07 0.04 0.04 -0.33 -0.28 -0.22 -0.22 -15.74 -4.61 -2.22 -2.49 -0.53 -0.20 -0.16 -0.18 -0.75 -0.60 -0.98 -0.88 -0.73 -1.22 -1.31 -1.17 -303.3 -1051 -2170 -2566 M = 1 M = 3 M = 6 M = 8 GMS minbal × bal minbal costpch f ee atm($0.75) atm(N/A) score 1.00 1.00 1.00 1.00 -7.11 -16.76 -5.69 -5.69 -345.53 -295.46 -56.45 -56.45 -11.93 -21.24 -3.37 -3.37 -72.29 -169.23 -12.93 -12.93 -69.26 -207.16 -21.39 -21.39 7.25 18.22 26.7 28.82 M = 1 M = 3 M = 6 M = 8 SGMS minbal × bal minbal costpch f ee atm($0.75) atm(N/A) score 1.00 1.00 1.00 1.00 -7.13 -5.25 -6.17 -6.13 -350.72 -99.09 -58.73 -58.89 -11.97 -5.77 -3.59 -3.59 -71.32 -18.61 -14.30 -14.04 -68.50 -27.92 -23.22 -22.96 7.25 18.07 26.65 28.77 34 Table 5: Equivalent prices: the rank-ordered logit (ROL), generalized maximum score (GMS) and smoothed GMS (SGMS) estimators ROL M = 1 M = 3 M = 6 M = 8 -0.08 -0.24 minbal × bal minbal costpch atm($0.75) atm(N/A) -0.34 -0.27 0.62 1.45 1.40 1.26 29.48 23.59 14.02 14.23 1.40 3.05 6.18 5.00 1.38 6.24 8.27 6.70 log-likelihood -303.3 -1051 -2170 -2566 M = 1 M = 3 M = 6 M = 8 -0.08 -0.05 -0.30 -0.30 0.60 0.79 1.69 1.69 28.97 13.91 16.77 16.77 6.06 7.97 3.84 3.84 5.81 9.75 6.36 6.35 7.25 18.22 26.7 28.82 M = 1 M = 3 M = 6 M = 8 -0.08 -0.17 -0.28 -0.28 0.60 0.91 1.72 1.71 29.30 17.17 16.36 16.41 5.96 3.22 3.98 3.91 5.72 4.84 6.47 6.40 7.25 18.07 26.65 28.77 GMS minbal × bal minbal costpch atm($0.75) atm(N/A) score SGMS minbal × bal minbal costpch atm($0.75) atm(N/A) score 35 most recent vehicle purchase and preference ordering of other vehicles that the consumer has considered for buying. They nd that the use of rank-ordered choices is essential for estimating exible parametric models. Scarpa et al. (2011) administer a non-market valuation survey eliciting mountain visitors' preferences for land management plans. In the context where obtaining a large on-site sample of the target population is dicult, they nd that the use of rank-ordered choices reduces their sample size requirements up to a third of comparable mutinomial choice data. At the same time, however, the parametric analysis of rank-ordered choices can be expected to be more sensitive to stochastic misspecication than that of multinomial choices, since one needs to use maintained stochastic assumptions more heavily to derive likelihood functions. The simulation study of Yan and Yoo (2014), for example, nds that an empirical regularity taken as the classic symptoms of poor-quality rank-ordered responses (Chapman and Staelin, 1982; Ben-Akiva et al., 1992) may simply result from estimating the rank-ordered logit model under misspecication. Since the sampling cost, in terms of both time and money, is a major consideration for any project aiming at estimating individual preferences using choice data, it is important to develop robust estimation method that utilizes ranking information as we propose. 36 A Proof of Theorem 1 In Appendix A, we provide the proofs of Theorem 1 and of Lemmas 1-3. Lemma 1 establishes the identication condition. Lemma 2 veries the continuity property of the probability limit of the GMS estimator's objective function QN (b). Lemma 3 shows the uniform convergence of Throughout, for Q∗ (b) ≡ E b ∈ Rq , X 1≤j<k≤J QN (b) to its probability limit. let 1(rj < rk ) · 1(x0j b ≥ x0k b) + 1(rk < rj ) · 1(x0k b > x0j b) , denote the probability limit of (A1) QN (b). Lemma 1. Under Assumptions , the true preference parameter vector β uniquely maximizes Q∗ (b) for 2-4 b ∈ B. Proof. Applying the law of iterated expectations to the right-hand side of (A1) yields Q∗ (b) = E X h P (rj < rk |X) · 1(x0jk b ≥ 0) + P (rk < rj |X) · 1(x0kj b > 0) i 1≤j<k≤J = n o E [P (rj < rk |X) − P (rk < rj |X)] · 1(x0jk b ≥ 0) + P (rk < rj |X) X 1≤j<k≤J It follows from Assumption 3 that rk |X) − P (rk < rj |X) of Q∗ (b). and β− for any β globally maximizes is the same as the sign of Consider a dierent parameter vector x0jk β . Q∗ (b) β − ∈ B. with positive probability, if we observe that of alternatives j, k ∈ J, the argument for then we can conclude β1 = −1 is similar. If If, for values of x0jk β β− and x0jk β − X β P (rj < is a unique global maximizer with positive probability, Q∗ (b). β In other words, have opposite signs for some pair We will show this argument for the set of points where 37 β because the sign of will not maximize Q∗ (β) > Q∗ (β − ). β1− = 1, b ∈ B Next, we show that yield dierent rankings of systematic utilities, then X for and β− β1 = 1; yield dierent ranking of systematic utility is D(β, β − ) = {X|x0jk β < 0 < x0jk β − f or some j, k ∈ J} = {X|x̃0jk β̃ < −xjk,1 < x̃0jk β̃ f or some j, k ∈ J}. − D(β, β − ) By Assumption 4(a), the probability of one for any pair of alternatives − Assumption 4(b). If β1 = −1, equals zero if and only if j, k ∈ J, which implies that Xβ = Xβ the set of points where β and β − − x̃0jk β̃ = x̃0jk β̃ − with probability with probability one. This contradicts give dierent predictions is − D(β, β − ) = {X|xjk,1 < min(x̃0jk β̃ , −x̃0jk β̃) f or some j, k ∈ J}. The probability of parameter vector D(β, β − ) β is positive by Assumption 4(a). Thus, we have proved that the true preference uniquely maximizes Lemma 2. Under Assumptions Proof. 2 For any pair of alternatives Q∗ (b) for b ∈ B. and 4, Q∗ (b) is continuous in b ∈ B. j, k ∈ J, dene Q∗jk (b) ≡ E [P (rj < rk |X) − P (rk < rj |X)] · 1(x0jk b ≥ 0) + P (rk < rj |X) . Assume that b1 = 1. The argument for ˆ (ˆ Q∗jk (b) where ) [P (rj < rk |X) − P (rk < rj |X)] gjk (xjk,1 |x̃jk )dxjk,1 dP (x̃jk ) + P (rk < rj ), (A2) −x̃0jk b̃ denotes the cumulative distribution function of right-hand side of (A2) is a function of Q∗ (b) = is symmetric. Calculate ∞ = P (x̃jk ) b1 = −1 X x̃jk and b x̃jk . The curly brackets inner integral of the that is continuous in b ∈ B. Therefore, Q∗jk (b) 1≤j<k≤J is also continuous in b ∈ B. Lemma 3. Under Assumptions 1-2 and 4, QN (b) converges almost surely to Q∗ (b) uniformly over b ∈ B. 38 Proof. For any pair of alternatives QN jk (b) ≡ N −1 j, k ∈ J, dene N X [1(rnj < rnk ) − 1(rnk < rnj )] · 1(x0njk b ≥ 0) + 1(rnk < rnj ) , n=1 and Q∗jk (b) ≡ E[QN jk (b)]. X QN (b) = Thus, we have QN jk (b) 1≤j<k≤J and X Q∗ (b) = Q∗jk (b). 1≤j<k≤J By Lemma 4 of Manski (1985), with probability one, limN →∞ supb∈Rq QN jk (b) − Q∗jk (b) = 0, Because QN (b) uniformly over Proof. is the sum of a nite number of term QN jk (b), QN (b) converges almost surely to Q∗ (b) b ∈ B. (THEOREM 1) The proof of strong consistency involves verifying the conditions of Theorem 2.1 in Newey and McFadden (1994): (1) Q∗ (b) is uniquely maximized at (2) The parameter space (3) Q∗ (b) (4) QN (b) B is continuous in β; is compact; b; and converges almost surely to its probability limit Q∗ (b) uniformly over b∈B . Conditions (1), (3), and (4) are veried by Lemmas 1, 2, and 3, respectively. Condition (2) is guaranteed by Assumption 2. Therefore, the GMS estimator that maximizes Assumptions 1-4. 39 QN (b) converges to β almost surely under B Proof of Theorems 2-4 In Appendix B, we provide the proofs of Theorems 2-4 and of Lemmas 4-9. Lemma 4 establishes the uniform convergence of the SGMS objective function to its probability limit. Lemmas 5-6 establish the asymptotic distribution of the normalized forms of tN (β, hN ). Lemmas 7-9 justify that H N (b∗N , hN ) converges to a nonstochastic matrix in probability. By applying Taylor series expansion, Lemmas 5-7 can be used to derive the asymptotic distribution of the centered, properly normalized SGMS estimator for the random utility model. Lemma 4. Under Assumptions 1-4 and Condition 1, QN (b, hN ) converges almost surely to Q∗ (b) uniformly over b ∈ B. Proof. First, we show that QSN (b, hN ) converges almost surely to QN (b) uniformly over b∈B following the method in Lemma 4 of Horowitz (1992). We calculate N X S QN (b, hN ) − QN (b) ≤ 1 N n=1 X 1 x0njk b > 0 − K x0njk b/hN . (B1) 1≤j<k≤J The right-hand side of (B1) is the sum of cN 1 (η) and cN 2 (η), where N cN 1 (η) ≡ 1X N n=1 X 1 x0njk b > 0 − K x0njk b/hN · 1 x0njk b ≥ η , 1≤j<k≤J N 1X cN 2 (η) ≡ N n=1 and η ∈R 1 x0njk b > 0 − K x0njk b/hN · 1 x0njk b < η , 1≤j<k≤J is a positive number. Condition 1(b) implies that for any |K(x) − 1| < δ · J −2 for any X N > N0 . uniformly over and |K(−x)| < δ · J −2 Therefore, b∈B as N → ∞. " cN 2 (η) ≤ cN 1 (η) < δ X 1≤j<k≤J C N for any for any Next consider x > c. As N > N0 . cN 2 (η). # N X 0 1 xnjk b < η . δ > 0, there exists c >0 such that hN → 0, there exist N0 ∈ N such that η/hN > c We have shown that for each η > 0, cN 1 (η) → 0 By Condition 1(a), there is a nite C such that (B2) n=1 40 Horowitz (1992) shows that the inner-bracket part of the right-hand side of (B2) converges to 0 uniformly over b ∈ B. Because J cN 2 (η) is nite, also converges to 0 uniformly over side of (B1) converges to 0 uniformly over sup QSN (b, hN ) − Q∗ (b) b∈B b∈B as N → ∞. The right-hand because ≤ sup QSN (b, hN ) − QN (b) + |QN (b) − Q∗ (b)| b∈B (B3) b∈B ≤ sup QSN (b, hN ) − QN (b) + sup |QN (b) − Q∗ (b)| , b∈B b∈B and we have proved that the right-hand side of (B3) converges to 0 almost surely. converges almost surely to Proof. Q∗ (b) uniformly over Therefore, QSN (b, hN ) b ∈ B. (THEOREM 2) The proof of strong consistency involves verifying the conditions of Theorem 2.1 in Newey and McFadden (1994): (1) Q∗ (b) is uniquely maximized at (2) The parameter space (3) Q∗ (b) (4) QSN (b, hN ) B is continuous in β; is compact; b; and converges uniformly almost surely to its probability limit Q∗ (b). Conditions (1), (3), and (4) are veried by Lemmas 1, 2, and 4, respectively. Condition (2) is guaranteed by Assumption 2. Therefore, the SGMS estimator that maximizes QSN (b, hN ) converges to β almost surely under Assumptions 1-4 and Condition 1. Lemma 5. Let Assumptions 1-4, 6-7 and Conditions 1-2 hold. Then (a) (b) Proof. limN →∞ E h−d N tN (β, hN ) = a; and limN →∞ V ar (N hN )1/2 tN (β, hN ) = Ω. First, under Assumption 1 we calculate that E h−d = N tN (β, hN ) X n o −d−1 E [1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )x̃jk hN 1≤j<k≤J = X (B4) djk , 1≤j<k≤J 41 where djk ≡ E [1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )x̃jk h−d−1 . N (B5) By the law of iterated expectations, djk = E nh i P (rj < rk |v−j,k , ṽ −j,k , X̃) − P (rk < rj |v−j,k , ṽ −j,k , X̃) ·K 0 (−v−j,k /hN )x̃jk h−d−1 N (B6) −d−1 = F̄jk (v−j,k , ṽ −j,k , X̃) · K 0 (−v−j,k /hN )x̃jk hN Application of the Taylor series expansion for the right-hand side of (B6) around F̄jk (v−j,k , ṽ −j,k , X̃) = F̄jk (0, ṽ −j,k , X̃) + v−j,k = 0 d−1 X 1 (i) F̄jk (0, ṽ −j,k , X̃)(v−j,k )i i! i=1 yields (B7) (d) 1 F̄jk (ξ, ṽ −j,k , X̃)(v−j,k )d , + d! where ξ is between 0 and v−j,k . around v−j,k = 0 ξi F̄jk (0, ṽ −j,k , X̃) = 0. Taylor expansion of pjk (v−j,k |ṽ −j,k , X̃) yields pjk (v−j,k |ṽ −j,k , Z̃) = where By Assumption 3, is between 0 and d−i−1 X 1 (i) pjk (0|ṽ −j,k , X̃)(v−j,k )i c! c=0 1 (d−i) p (ξi |ṽ −j,k , X̃)(v−j,k )d−i , + (d − i)! jk v−j,k . (B8) Combining (B7) and (B8) yields F̄jk (v−j,k , −ṽ −j,k , X̃) pjk (v−j,k |ṽ −j,k , X̃) d−1 X 1 (i) (d−i) = F̄jk (0, ṽ −j,k , X̃) pjk (ξi |ṽ −j,k , X̃)(v−j,k )d i!(d − i)! i=1 (d) 1 + d! F̄jk (ξ, ṽ −j,k , X̃) pjk (v−j,k |ṽ −j,k , X̃)(v−j,k )d d−1 d−i−1 X X 1 (i) (c) + F̄jk (0, ṽ −j,k , X̃) pjk (0|ṽ −j,k , X̃)(v−j,k )i+c i!c! i=1 c=0 42 (B9) whenever the derivatives exist. Assumptions 6-7 imply that all of the derivatives in the right-hand side of (B9) exist and are uniformly bounded for almost every ζjk = −v−j,k /hN . Decompose djk (ṽ −j,k , X̃) if |v−j,k | ≤ η for some η > 0. Let into two parts: djk ≡ djk1 + djk2 , where djk1 = h−d N ´ |hN ζjk |>η F̄jk (−ζjk hN , ṽ −j,k , X̃) pj,k (−ζjk hN |ṽ −j,k , X̃) (B10) ·x̃jk K 0 (ζjk )dζjk dP (ṽ −j,k , X̃), and djk2 = h−d N ´ |hN ζjk |≤η F̄jk (−ζjk hN , ṽ −j,k , X̃) pj,k (−ζjk hN |ṽ −j,k , X̃) (B11) ·x̃jk K 0 (ζjk )dζjk dP (ṽ −j,k , X̃). Under Assumption 7 and Condition 2, ˆ |djk1 | ≤ where |djk1 | Ch−d N denotes the vector of the absolute values of assumption that djk2 → kd as N → ∞, |x̃jk | · |K 0 (ζjk )| dζjk dP (ṽ −j,k , X̃) → 0, |hN ζjk |>η K 0 (·) is a dth djk1 . Plugging (B9) into (B11) and using the order kernel yield the result that d X h i 1 (i) (d−i) E F̄jk (0, ṽ −j,k , X̃) pjk (0|ṽ −j,k , X̃)x̃jk i!(d − i)! i=1 (B12) by Lebesgue's dominated convergence theorem. Therefore, by (B4) we have proved part (a). Next consider V ar[(N hN )1/2 tN (β, hN )]. V ar[(N hN )1/2 tN (β, hN )] = hN V ar By Assumption 1, X [1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )x̃jk h−1 N 1≤j<k≤J 43 . Denote X eN ≡ [1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )x̃jk h−1 N , (B13) 1≤j<k≤J then V ar[(N hN )1/2 tN (β, hN )] = hN E(eN e0N ) − hN E(eN )E(e0N ). In part (a), we show that E[h−d N eN ] = O(1), so hN E(eN )E(e0N ) = o(1). (B14) Since the binary choice setting J = 2 has been discussed in Horowitz (1992), the following discussion focuses on the case where J ≥ 3. where Dene hN E(eN e0N ) ≡ LN 1 + LN 2 , (B15) where X LN 1 = 2h−1 N E {[1(rj < rk ) − 1(rk < rj )] [1(rj < rl ) − 1(rl < rj )] (B16) 1≤j<k<l≤J ·K 0 (x0jk β/hN )K 0 (x0jl β/hN )x̃jk x̃0jl + [1(rj < rk ) − 1(rk < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jk β/hN )K 0 (x0kl β/hN )x̃jk x̃0kl o + [1(rj < rl ) − 1(rl < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jl β/hN )K 0 (x0kl β/hN )x̃jl x̃0kl , and X LN 2 = n o 2 2 0 0 0 h−1 [1(r < r ) − 1(r < r )] E K (x β/h ) x̃ x̃ j k k j N jk jk . jk N (B17) 1≤j<k≤J Let ζjk = −v−j,k /hN LN 2 = for any pair of alternatives X j, k ∈ J. By the law of iterated expectation, i ´h Fjk (−hN ζjk , ṽ −j,k , X̃) + Fkj (hN ζjk , ṽ −j,k + hN ζjk ιJ−2 , X̃) 1≤j<k≤J ·x̃jk x̃0jk pjk (−hN ζjk |ṽ −j,k , X̃)[K 0 (ζjk )]2 dζjk dP (ṽ −j,k , X̃). 44 (B18) By Assumptions 3, 6, 7 and Lebesgue's dominated convergence theorem, LN 2 → Ω when N → ∞. By Assumption 7, ´ 2ChN [ |K 0 (ζjk )K 0 (ζjl )x̃jk x̃0jl |dζjk dζjl dP (ṽ −j,kl , X̃) X |LN 1 | ≤ 1≤j<k<l≤J (B19) ´ + |K 0 (ζjk )K 0 (ζkl )x̃jk x̃0kl |dζkj dζkl dP (ṽ −k,jl , X̃) ´ + |K 0 (ζjl )K 0 (ζkl )x̃jl x̃0kl | dζlj dζlk dP (ṽ −l,jk , X̃)]. LN 1 → 0 Thus, by Assumptions 6-7, when N → ∞. We have proved part (b). Lemma 6. Let Assumptions 1-4, 6-7 and Conditions 1-2 hold. Then (a) 2d+1 If N hN → ∞ as N → ∞, then h−d N tN (β, hN ) → a. (b) d 2d+1 If N hN → λ, where λ ∈ (0, ∞), as N → ∞, then (N hN )1/2 tN (β, hN ) → M V N (λ1/2 a, Ω). Proof. p If 2d+1 N hN →∞ N → ∞, as then 1 V ar[(N hN )1/2 tN (β, hN )] → 0 N h2d+1 N −d V ar[hN tN (β, hN )] = by Lemma 5(b). Lemma 5(b) together with Chebyshev's Theorem imply Lemma 6(a). Next consider part (b). Dene wN = N hN {tN n (β, hN ) − E[tN n (β, hN )]}. Lemma 5(a) implies that 1/2 (N hN )1/2 E[tN (β, hN )] = (N h2d+1 )1/2 E[h−d a, N N tN (β, hN )] → λ so it suces to prove that vector γ such that γ 0 wN γ 0 γ = 1. tN (β, hN ) ≡ N −1 N X is asymptotically distributed as N (0, γ 0 Ωγ) for any nonstochastic (q − 1) × 1 Denote that tN n , n=1 where tN n ≡ tN n (β, hN ) = X [1(rnj < rnk ) − 1(rnk < rnj )] K 0 (x0njk β/hN )x̃njk h−1 N . 1≤j<k≤J 45 So we have γ 0 wN = (hN /N )1/2 γ 0 N X [tN n − E(tN n )]. n=1 Let CFN (·) denote the characteristic function of γ 0 wN . Then by Assumption 1, CFN (τ ) = [ΨN (τ )] N , where n o ΨN (τ ) = E exp iτ (hN /N )1/2 γ 0 [tN n − E(tN n )] , and n ≤ N derivative of is arbitrary. ΨN (τ ) ΨN (0) = 1 and Ψ0N (0) = 0 by construction. We can calculate the second order as 0 Ψ00N (τ ) = E −hN N −1 ΨN (τ )γ 0 [tN n − E(tN n )] [tN n − E(tN n )] γ . For each N, lim Ψ00N (τ ) = −N −1 γ 0 [hN V ar(tN n )]γ = −N −1 γ 0 V ar (N hN )1/2 tN (β, hN ) γ. τ →0 Lemma 5(b) implies that lim Ψ00N (τ ) = −N −1 [γ 0 Ωγ + o(1)] . τ →0 Therefore, a Taylor series expansion of ΨN (τ ) around τ =0 ΨN (τ ) = 1 − γ 0 Ωγ(τ 2 /2N ) + o(τ 2 /N ) and N CFN (τ ) = 1 − γ 0 Ωγ(τ 2 /2N ) + o(τ 2 /N ) . 46 yields Applying the methods used in the proof of the Lindeverg-Levy central limit theorem yields the result that lim CFN (τ ) = exp(−γ 0 Ωγτ 2 /2), N →∞ which is the same as the characteristic function of N (0, γ 0 Ωγ). Lemma 7. Let Assumptions 1-4, 6-8 and Conditions 1-2 hold. For any pair of alternatives j, k ∈ J, assume that ||x̃jk || ≤ c for some c > 0. Let η be some positive real number such that p(1) jk (v−j,k |ṽ −j,k , X̃), F̄jk (v−j,k , ṽ −j,k , X̃), (1) (2) and F̄jk (v−j,k , ṽ −j,k , X̃) exist and are uniformly bounded for almost every (ṽ −j,k , X̃) if |v−j,k | ≤ η. For θ ∈ Rq−1 , dene t∗N (θ) = (N h2N )−1 N X X [1(rnj < rnk ) − 1(rnk < rnj )] K 0 (x0njk β/hN + x̃0njk θ)x̃njk . n=1 1≤j<k≤J Dene the sets ΘN , N = 1, 2, . . ., by ΘN = θ : θ ∈ Rq−1 , hN kθk ≤ η/2c . (a) Then plim sup kt∗N (θ) − E [t∗N (θ)]k = 0. (B20) N →∞ θ∈ΘN (b) There are nite numbers α1 and α2 such that for all θ ∈ ΘN kE[t∗N (θ)] − Hθk ≤ o(1) + α1 hN kθk + α2 hN kθk 2 (B21) uniformly over θ ∈ ΘN . Proof. Dene g N n (θ) = X n [1(rnj < rnk ) − 1(rnk < rnj )] K 0 (x0njk β/hN + x̃0njk θ)x̃njk (B22) 1≤j<k≤J h −E [1(rnj < rnk ) − 1(rnk < rnj )] K 0 (x0njk β/hN + x̃0njk θ)x̃njk io . The remaining part of the proof of (B20) follows the proof of (A15) in Lemma 7 of Horowitz (1992). Next, 47 we prove (B21). Dene X E [t∗N (θ)] ≡ t∗N jk (θ), 1≤j<k≤J where h i 0 0 t∗N jk (θ) = h−2 N E F̄jk (v−j,k , ṽ −j,k , X̃)K (−v−j,k /hN + x̃jk θ)x̃jk . Decompose t∗N jk (θ) t∗N jk1 = h−2 N into two parts: ´ |v−j,k |>η t∗N jk (θ) ≡ t∗N jk1 + t∗N jk2 , where F̄jk (v−j,k , ṽ −j,k , X̃)K 0 (−v−j,k /hN + x̃0jk θ) ·x̃jk pjk (v−j,k |ṽ −j,k , X̃)dv−j,k dP (ṽ −j,k , X̃) and t∗N jk2 = h−2 N ´ |v−j,k |≤η F̄jk (v−j,k , ṽ −j,k , X̃)K 0 (−v−j,k /hN + x̃0jk θ) ·x̃jk pjk (v−j,k |ṽ −j,k , X̃)dv−j,k dP (ṽ −j,k , X̃). For some nite C > 0, by Assumption 7, ˆ ∗ tN jk1 ≤ Ch−2 N Let 0 K (−v−j,k /hN + x̃0jk θ) dv−j,k dP (ṽ −j,k , X̃). |v−j,k |>η ζjk = −v−j,k /hN + x̃0jk θ . Since hN ||θ|| ≤ η/2c and ||x̃jk || ≤ c, |v−j,k | > η implies that |ζjk | > | − v−j,k /hN | − |x̃0jk θ| > η/2hN and ˆ ∗ tN jk1 ≤ Ch−1 N |K 0 (ζjk )| dζjk . (B23) |ζjk |>η/2hN 48 We have lim sup t∗N jk1 = 0, (B24) N →∞θ∈ΘN because the term on the right-hand side of (B23) converges to 0 by Condition 2. Next, we consider |v−j,k | ≤ η , d=2 substitution of (B25) (1) = F̄jk (0, ṽ −j,k , X̃) pjk (0|ṽ −j,k , X̃)v−j,k (1) (1) +F̄jk (0, ṽ −j,k , X̃) pjk (ξ1 |ξ1 , ṽ −j,k , X̃)(v−j,k )2 (2) +(1/2)F̄jk (ξ, ṽ −j,k , X̃) pjk (v−j,k |ṽ −j,k , X̃)(v−j,k )2 , ξ and ξ1 v−j,k . are between 0 and Decompose t∗N jk2 into two parts t∗N jk2 ≡ sN jk1 + sN jk2 , where sN jk1 = h−2 N ´ (1) F̄jk (0, ṽ −j,k , X̃) pjk (0|ṽ −j,k , X̃)x̃jk |v−j,k |≤η ·v−j,k K 0 (−v−j,k /hN + x̃0jk θ)dv−j,k dP (ṽ −j,k , X̃), and sN jk2 = h−2 N ´ (1) |v−j,k |≤η (1) [F̄jk (0, ṽ −j,k , X̃) pjk (ξ1 |ξ1 , ṽ −j,k , X̃) (2) +(1/2)F̄jk (ξ, ṽ −j,k , X̃) pjk (v−j,k |ṽ −j,k , X̃)]x̃jk ·(v−j,k )2 K 0 (−v−j,k /hN + x̃0jk θ)dv−j,k dP (ṽ −j,k , X̃). Let ζjk = −v−j,k /hN + z̃ 0jk θ , sN jk1 = then ´ (1) |ζjk −x̃0jk θ|≤η/hN If into the right-hand side of (B9) yields F̄jk (v−j,k , −ṽ −j,k , X̃) pjk (v−j,k |ṽ −j,k , X̃) where t∗N jk2 . F̄jk (0, ṽ −j,k , X̃) pjk (0|ṽ −j,k , X̃) ·x̃jk (ζjk − x̃0jk θ)K 0 (ζjk )dζjk dP (ṽ −j,k , X̃). 49 Because | ´ ζK 0 (ζ)dζ = 0 ´ |ζjk −x̃0jk θ|≤η/hN and |x̃0jk θhN | ≤ η/2, ´ ζjk K 0 (ζjk )dζjk | = | |ζjk −x̃0 θ|>η/hN ζjk K 0 (ζjk )dζjk | jk ´ ≤ |ζjk |>η/2hN |ζjk K 0 (ζjk )|dζjk . By Condition 2, the right-hand term of (B26) is bounded uniformly over (B26) θ ∈ ΘN and it converges to 0. Therefore, by Lebesgue's dominated convergence theorem, lim sup | ´ (1) |ζjk −x̃0jk θ|≤η/hN N →∞θ∈ΘN F̄jk (0, ṽ −j,k , X̃) (B27) ·pjk (0|ṽ −j,k , X̃)x̃jk ζjk K 0 (ζjk )dζjk dP (ṽ −j,k , X̃)| = 0. In addition, ||x̃0jk θ ´ |ζjk −x̃0jk θ|≤η/hN K 0 (ζjk )dζjk − x̃0jk θ|| (B28) ´ ≤ |x̃0jk θhN |h−1 |K 0 (ζjk )|dζjk N |ζjk −x̃0jk θ|>η/hN ´ ≤ (η/2)h−1 |K 0 (ζjk )|dζjk . N |ζjk −x̃0 θ|>η/hN jk By Condition 2, the right-hand side of (B28) is bounded uniformly over θ ∈ ΘN Therefore, by Lebesgue's dominated convergence theorem and the denition of N →∞ θ∈ΘN ´ X lim || sup H, (1) |ζjk −x̃0jk θ|≤η/hN and it converges to 0. F̄jk (0, ṽ −j,k , X̃) (B29) 1≤j<k≤J ·pjk (0|ṽ −j,k , X̃)x̃jk x̃0jk θζjk K 0 (ζjk )dζjk dP (ṽ −j,k , X̃) − Hθ|| = 0. For some nite C > 0, ||sN jk2 || ≤ ChN ´ |ζjk −x̃0jk θ|≤η/hN (ζjk − x̃0jk θ)2 |K 0 (ζjk )|dζjk dP (ṽ −j,k , X̃) (B30) ≤ o(1) + αjk1 hN ||θ|| + αjk2 hN ||θ||2 for some nite αjk1 and αjk2 . So part (b) is established by combining (B24), (B27), (B29), and (B30). S Lemma 8. Let Assumptions 1-9 and Conditions 1-2 hold and dene θN = (b̃N − β̃)/hN , where bSN is a 50 SGMS estimator. Then the probability limit of θN is zero. Proof. Pδ Let Given any δ > 0, choose γ to be a nite number such that P r(||x̃jk || ≤ γf orany1 ≤ j < k ≤ J) ≥ 1−δ . be the probability distribution of k ≤ J}. Let Cγ0 X denote the complement of conditional on the event Cγ . Cγ ≡ {X : |||x̃jk || ≤ γ f or any 1 ≤ j < The remaining part of the proof of Lemma 8 follows the proof in Lemma 8 of Horowitz (1992). 0 Lemma 9. Let Assumptions 1-8 and Conditions 1-2 hold. Let {βN = (βN 1 , β̃N )0 } be any sequence in B such that (βN − β)/hN → 0 as N → ∞. Then the probability limit of H N (βN , hN ) is H . Proof. Assume that β̃)/hN . Let βN 1 = β1 because βN 1 = β1 ∈ {−1, 1} for suciently large {aN } be a sequence such that aN → ∞ and aN θN → 0 as N → ∞. aN f or any 1 ≤ j < k ≤ J}. For any N. Dene Denote θ N = (β̃ N − X̃ N = {X̃ : ||x̃jk || ≤ > 0, h i lim P [|H N (β N , hN ) − H| > ] = lim P |H N (β N , hN ) − H| > |X̃ N . N →∞ N →∞ Therefore, it suces to show that shev's Theorem. Consider Dene E[H N (β N , hN )|X̃ N ] → H E[H N (β N , hN )|Z̃ N ] E N ≡ E[H N (β N , hN )|X̃ N ], E N jk = h−2 N ´ then and V ar[H N (β N , hN )|X̃ N ] → 0 by Cheby- rst. EN = P 1≤j<k≤J E N jk , where F̄jk (v−j,k , ṽ −j,k , X̃) pjk (v−j,k |ṽ −j,k , X̃)x̃jk x̃0jk (B31) ·K 00 (−v−j,k /hN + x̃0jk θ N )dv−j,k dPN jk (ṽ −j,k , X̃), and η PN jk denote the distribution of such that (1) (ṽ −j,k , X̃) conditional on (2) F̄jk (v−j,k , ṽ −j,k , X̃), F̄jk (v−j,k , ṽ −j,k , X̃), uniformly bounded if |v−j,k | ≤ η . and (1) X̃ N . By Assumptions 7-8, there exists an pjk (v−j,k |ṽ −j,k , X̃) exist and are almost surely Therefore, substitution of (B25) into (B31) yields E N jk = I N jk1 + I N jk2 + I N jk3 , 51 where I N jk1 = h−2 N ´ (1) |v−j,k |≤η F̄jk (0, ṽ −j,k , X̃) pjk (0|ṽ −j,k , X̃)x̃jk x̃0jk ·v−j,k K 00 (−v−j,k /hN + x̃0jk θ N )dv−j,k dPN jk (ṽ −j,k , X̃), I N jk2 = h−2 N ´ (1) |v−j,k |≤η (1) [F̄jk (0, ṽ −j,k , X̃) pjk (ξ1 |ṽ −j,k , X̃) (2) +(1/2)F̄jk (ξ, ṽ −j,k , X̃) pj,k (v−j,k |ṽ −j,k , X̃)]x̃jk x̃0jk ·(v−j,k )2 K 00 (−v−j,k /hN + x̃0jk θ N )dv−j,k dPN jk (ṽ −j,k , X̃), and I N jk3 = h−2 N ´ |v−j,k |>η F̄jk (v−j,k , ṽ −j,k , X̃) pjk (v−j,k |ṽ −j,k , X̃)x̃jk x̃0jk (B32) ·K 00 (−v−j,k /hN + x̃0jk θ N )dv−j,k dPN jk (ṽ −j,k , X̃). Dene ζjk = −v−j,k /hN + x̃0jk θ N . I N jk1 = ´ Then (1) |ζjk −x̃0jk θ N |≤η/hN F̄jk (0, ṽ −j,k , X̃) pj,k (0|ṽ −j,k , X̃)x̃jk x̃0jk ·(ζjk − x̃0jk θ N )K 00 (ζjk )dζjk dPN jk (ṽ −j,k , X̃). Because |x̃0jk θ N | ≤ aN ||θ N || → 0, by Assumptions 6-7 and Conditions 1-2, i h (1) I N jk1 → E F̄jk (0, ṽ −j,k , X̃) pjk (0|ṽ −j,k , X̃)x̃jk x̃0jk . For some nite (B33) C > 0, |I N jk2 | ≤ ChN ´ |ζjk −x̃0jk θ N |≤η/hN |x̃jk x̃0jk | (B34) ·(ζjk − x̃0jk θ N )2 |K 00 (ζjk )|dζjk dPN jk (ṽ −j,k , X̃) → 0. Next we calculate (B32). By Condition 2, ˆ |I N jk3 | ≤ Ch−1 N |ζjk −x̃0jk θ N |>η/hN |x̃jk x̃0jk | · |K 00 (ζjk )|dζjk dPN jk (ṽ −j,k , X̃). 52 (B35) Under Assumption 7, the right-hand side of (B35) converges to 0 as N goes to innity. Combination of (B35), (B33), (B34), and (B35) establishes that X EN = 1≤j<k≤J Now consider = X E N jk = (I N jk1 + I N jk2 + I N jk3 ) → H. 1≤j<k≤J V ar[H(β N , hN )|X̃ N ]: V ar[H(β N , hN )|X̃ N ] X N −1 V ar{ [1(rnj < rnk ) − 1(rnk < rnj )] K 00 (−v−j,k /hN + x̃0jk θ N )x̃jk x̃0jk h−2 N |X̃ N } (B36) 1≤j<k≤J = N −1 E[r N (θ N )r N (θ N )0 |X̃ N ] + O(N −1 ), where r N (θ N ) = X [1(rnj < rnk ) − 1(rnk < rnj )] K 00 (−v−j,k /hN + x̃0jk θ N )vec(x̃jk x̃0jk )h−2 N . 1≤j<k≤J Let ζjk = −v−j,k /hN + x̃0jk θ N . For some nite C, N −1 E[r N (θ N )r N (θ N )0 |X̃ N ] X ˆ |vec(x̃jk x̃0jk )vec(x̃jk x̃0jk )0 | ≤ ChN (N h4N )−1 (B37) 1≤j<k≤J +Ch2N (N h4N )−1 ·[K 00 (ζjk )]2 dζjk dPN jk (ṽ −j,k , X̃) ˆ X |vec(x̃jk x̃0jk )vec(x̃jl x̃0jl )0 | k6=l; k,l∈J\{j} ·K 00 (ζjk )K 00 (ζjl )dζjk dζjl dPN jkl (ṽ −j,kl , X̃) by Assumption 7, where PN jkl is the distribution of (ṽ −j,kl , X̃) conditional on of (B37) converges to zero by Assumptions 7-8 and Condition 2. V ar[H(β N , hN )|X̃ N ] converges to zero. 53 X̃ N . The right-hand side Therefore, it follows from (B36) that Proof. 1 as (THEOREM 3) By Theorem 2, N → ∞. So tN (bSN , hN ) = 0 S bSN,1 = β1 and b̃N is an interior point of B̃ with probability approaching and a Taylor series expansion yields S tN (β, hN ) + H N (b∗N , hN )(b̃N − β̃) = 0 with probability approaching 1, where b∗N (B38) is between β and bSN . Part (a): By (B38), S ∗ −d h−d N tN (β, hN ) + H N (bN , hN )hN (b̃N − β̃) = 0 with probability approaching 1 as N → ∞. Lemmas 8-9 imply that plimH N (b∗N , hN ) = H . nonsingular by Assumption 9, we have S −1 −d h−d hN tN (β, hN ) + op (1). N (b̃N − β̃) = −H Part (a) now follows from Lemma 6(a). Part (b): By (B38), S (N hN )1/2 tN (β, hN ) + H N (b∗N , hN )(N hN )1/2 (b̃N − β̃) = 0 with probability approaching 1 as N → ∞. So, by Lemmas 8-9 and Assumption 9, S (N hN )1/2 (b̃N − β̃) = −H −1 (N hN )1/2 tN (β, hN ) + op (1). Part (b) now follows from Lemma 6(b). Part (c): By the property of matrix trace, S S S S EA [(b̃N − β̃)0 W (b̃N − β̃)] = T r{W EA [(b̃N − β̃)(b̃N − β̃)0 ]}. 54 Because H is Part (b) implies that S S EA [(b̃N − β̃)(b̃N − β̃)0 ] = N −2d/(2d+1) λ−1/(2d+1) H −1 ΩH −1 + λ2d/(2d+1) H −1 aa0 H −1 . Therefore, by the denition of MSE, M SE = N −2d/(2d+1) T r W S −1 λ−1/(2d+1) Ω + λ2d/(2d+1) aa0 H −1 . (B39) To minimize MSE, take the dierentiation of the right-hand side of (B39) with respect to order condition, we show that MSE is minimized by setting λ = λ∗ , λ. From the rst where λ∗ = [trace(W H −1 ΩH −1 )]/[trace(2dW H −1 aa0 H −1 )]. It follows from Part (b) that S N d/(2d+1) (b̃N − β̃) (B40) has the asymptotic distribution M V N (−(λ∗ )d/(2d+1) H −1 a, (λ∗ )−1/(2d+1) H −1 ΩH −1 ). Proof. (THEOREM 4) Part (a): Applying Taylor expansion to tN (bSN , h∗N ) around β yields (h∗N )−d tN (bSN , h∗N ) = (h∗N )−d tN (β, h∗N ) (B41) 0 +[∂tN (b∗N , h∗N )/∂ b̃ ] (h∗N )−d (b̃N − β̃) with probability approaching one as op (1). Therefore, S (h∗N )−d (b̃N S N → ∞, (b̃N − β̃)/h∗N = op (1). − β̃) = op (1) b∗N is between Lemma 9 implies that −d S because (hN ) (b̃N plim[(h∗N )−d tN (β, h∗N )] = a where − β̃) = Op (1) bSN and β. By Lemma 8, 0 S (b̃N − β̃)/hN = plim[∂tN (b∗N , h∗N )/∂ b̃ ] = H . by Theorem 3(b) and In addition, (h∗N /hN )−d → 0. by Lemma 5(a). Part (a) now follows by taking probability limits as 55 Finally, N →∞ of each side of (B41). Part (b): By Chebyshev's Theorem, it suces to show that consider E(Ω̂N ) → Ω and V ar(Ω̂N ) → 0. First E(Ω̂N ): E(Ω̂N ) = hN E[tN n (bSN , hN )tN n (bSN , hN )0 ] ≡ L∗N 1 + L∗N 2 , (B42) where h i2 E [1(rj < rk ) − 1(rk < rj )] K 0 (x0jk bSN /hN ) x̃jk x̃0jk , X L∗N 1 = h−1 N 1≤j<k≤J and L∗N 2 = X 2h−1 N E {[1(rj < rk ) − 1(rk < rj )] [1(rj < rl ) − 1(rl < rj )] 1≤j<k<l≤J ·K 0 (x0jk bSN /hN )K 0 (x0jl bSN /hN )x̃jk x̃0jl + [1(rj < rk ) − 1(rk < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jk bSN /hN )K 0 (x0kl bSN /hN )x̃jk x̃0kl + [1(rj < rl ) − 1(rl < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jl bSN /hN )K 0 (x0kl bSN /hN )x̃jl x̃0kl ]. Let S θ N = (b̃N − β̃)/hN L∗N 1 = X ´ and ζjk = −v−j,k /hN + x̃0jk θ N . Then {Fjk [hN (−ζjk + x̃0jk θ N ), ṽ −j,k , X̃] (B43) 1≤j<k≤J +Fkj [hN (−ζjk + x̃0jk θ N ), ṽ −j,k + hN (−ζjk + x̃0jk θ N )ι0J−2 , X̃} ·pjk [hN (−ζjk + x̃0jk θ N )|ṽ −j,k , X̃]x̃jk x̃0jk [K 0 (ζjk )]2 dζjk dP (ṽ −j,k , X̃). By Assumptions 3, 6-7 and Lebesgue's dominated convergence theorem, the right-hand side of (B43) converges 56 to Ω when N → ∞. Under Assumption 7, ´ 2M hN [ |K 0 (ζjk )K 0 (ζjl )x̃jk x̃0jl |dζjk dζjl dP (ṽ −j,kl , X̃) X |L∗N 2 | ≤ 1≤j<k<l≤J ´ + |K 0 (ζjk )K 0 (ζkl )x̃jk x̃0kl |dζkj dζkl dP (ṽ −k,jl , X̃) ´ + |K 0 (ζjl )K 0 (ζkl )x̃jl x̃0kl | dζlj dζlk dP (ṽ −l,jk , X̃)]. Therefore, the right-hand side of (B44) converges to 0 when E(Ω̂N ) → Ω (B44) N →∞ by Assumption 7 and Condition 2. So by (B42). Next consider V ar(Ω̂N ). By Assumption 1, we can calculate 0 S S V ar(Ω̂N ) = V ar tN n bN , hN tN n bN , hN 0 S S 2 = ( hN /N E vec tN n bN , hN tN n bN , hN ) 0 0 S S vec tN n bN , hN tN n bN , hN + o(1) h2N /N = (B45) (N h2N )−1 E [cc0 ] + o(1), where c≡ X cjklm , 1≤j<k≤J 1≤l<m≤J and cjklm ≡ [1(rj < rk ) − 1(rk < rj )] [1(rl < rm ) − 1(rm < rl )] K 0 (x0jk bSN /hN )K 0 (x0lm bSN /hN )vec(x̃jk x̃0lm ). The right-hand side of (B45) converges to 0 under Assumptions 8-10. Part (c): This is implied by Lemma 9. 57 References [1] Abrevaya, J., and Huang, J. 2005. On the Bootstrap of the Maximum Score Estimator. Econometrica 73: 1175-1204. [2] Barbera S, Pattanaik PK. 1986. Falmagne and the rationalizability of stochastic choices in terms of random orderings. Econometrica 54: 707-715 [3] Basile R, Castellani D, Zanfei A. 2008. Location choices of multinational rms in Europe: the role of EU cohesion policy. Journal of International Economics 74: 328-340. [4] Beggs S, Cardell S, Hausman J. 1981. Assessing the potential demand for electric cars. Journal of Econometrics 17: 1-19 [5] Ben-Akiva, M., T. Morikawa, and F. Shiroish. 1992. Analysis of the reliability of preference ranking data. Journal of Business Research 24: 149-164. [6] Calfee, J., C. Winston, and R. Stempski. 2001. Econometric issues in estimating consumer preferences from stated preference data: a case study of the value of automobile travel time. Review of Economics and Statistics 83(4): 699-707. [7] Caparrós A, Oviedo JL, Campos P. 2008. Would you choose your preferred option? Comparing choice and recoded ranking experiments. American Journal of Agricultural Economics 90: 843-855 [8] Chapman RG, Staelin R. 1982. Exploiting rank ordered choice set data within the stochastic utility model. Journal of Marketing Research 19: 288-301 [9] Dagsvik JK, Liu G. 2009. A framework for analyzing rank-ordered data with application to automobile demand. Transportation Research Part A 43: 1-12 [10] Falmagne JC. 1978. A representation theorem for nite scale systems. Journal of Mathematical Psychology 18: 52-72 [11] Fiebig D, Keane M, Louviere J, Wasi N. 2010. The generalized multinomial logit model: Accounting for scale and coecient heterogeneity. Marketing Science 29: 393-421 58 [12] Fox J. 2007. Semiparametric estimation of multinomial discrete-choice models using a subset of choices. RAND Journal of Economics 38: 1002-1019. [13] Goeree, J.K., Holt, C.A., and Palfrey, T.R. 2005. Regular Quantal Response Equilibrium. Experimental Economics 8: 347-367. [14] Han, A.K. 1987. Non-parametric Analysis of a Generalized Regression Model. Journal of Econometrics 35: 303-316. [15] Hausman J, McFadden D. 1984. Specication tests for the multinomial logit model. Econometrica 52: 1219-1240. [16] Hausman J, Ruud P. 1987. Specifying and testing econometric models for rank-ordered data. Journal of Econometrics 34: 83-104. [17] Hausman, J, and Wise, D. 1978. A Conditional Probit Model for Qualitative Choice: Discrete Decisions Recognizing Interdependence and Heterogeneous Preferences. Econometrica 46: 403-426. [18] Horowitz JL. 1992. A smoothed maximum score estimator for the binary response model. Econometrica 60: 505-531. [19] Kim, J., and Pollard, D. 1990. Cube Root Asymptotics. Annals of Statistics 18: 191-219. [20] Layton D. 2000. Random coecient models for stated preference surveys. Journal of Environmental Economics and Management 40: 21-36. [21] Layton D., Levine, R. 2003. How Much Does the Far Future Matter? A Hierarchical Bayesian Analysis of the Public's Willingness to Mitigate Ecological Impacts of Climate Change. Journal the American Statistical Association 98: 533-544. [22] Luce D.R. and Suppes P. 1965. Preference, Utility and Subjective Probability, Handbook of Mathematical Psychology. Vol 3, New York: John Wiley & Sons, Inc., 249-410. [23] Lee L-F. 1995. Semiparametric maximum likelihood estimation of polychotomous and sequential choice models. Journal of Econometrics 65: 381-348 59 [24] Lewbel A. 2000. Semiparametric qualitative response model estimation with unknown heteroskedasticity or instrumental variables. Journal of Econometrics 97: 145-177. [25] Manski CF. 1975. Maximum score estimation of the stochastic utility models of choice. Journal of Econometrics 3: 205-228. [26] McFadden D. 1974. Conditional logit analysis of qualitative choice behavior. In: Zarembka P (ed) Frontiers in econometrics. Academic Press, New York [27] McFadden D. 1986. The choice theory approach to market research. Marketing Science 5: 275-297 [28] Newey, W.K., and McFadden, D. 1994. Large Sample Estimation and Hypothesis Testing. Handbook of Econometrics, Vol 4: 2111-2245. [29] Rao R.R. 1962. Relations between weak and uniform convergence of measures with applications. Annals of Mathematical Statistics 33, 659-680. [30] Ruud PA. 1983. Sucient conditions for the consistency of maximum likelihood estimation despite misspecication of distribution in multinomial discrete choice models. Econometrica 51: 225-228 [31] Ruud PA. 1986. Consistent estimation of limited dependent variable models despite misspecication of distribution. Journal of Econometrics 32: 157-187. [32] Scarpa R, Notaro S, Louviere J, Raaelli R. 2011. Exploring scale eects of best/worst rank ordered choice data to estimate benets of tourism in Alpine Grazing Commons. American Journal of Agricultural Economics 93: 813-828 [33] Sherman, R.P. 1993. The Limiting Distribution of the Maximum Rank Correlation Estimator. Econometrica 61: 123-137. [34] Small KA, Winston C, Yan J. 2005. Uncovering the distribution of motorists' preferences for travel time and reliability. Econometrica 73: 1367-1382. [35] Storn, R., and Price, K. 1997. Dierential EvolutionA Simple and Ecient Heuristic for Global Optimization over Continuous Spaces. Journal of Global Optimization 11: 341-359. 60 [36] Thompson S. 1993. Some eciency bounds for semiparametric discrete choice models. Journal of Econometrics 58: 257-274. [37] Train KE, Winston C. 2007. Vehicle choice behavior and the declining market share of U.S. automakers. International Economic Review 48: 1469-1496 [38] Yan J, 2012. A smoothed maximum score estimator for multinomial discrete choice models. Working paper. [39] Yan J, Yoo H. 2014. The seeming unreliability of rank-ordered data as a consequence of model misspecication. MPRA Paper No. 56285. [40] Yoo HI, Doiron D. 2013. The use of alternative preference elicitation methods in complex discrete choice experiments. Journal of Health Economics 32: 1166-1179 61
© Copyright 2026 Paperzz