Decision, Risk & Operations Working Papers Series An Expectation-Maximization Algorithm to Estimate a General Class of Non-Parametric Choice Models G. van Ryzin and G. Vulcano December 2011 DRO-2011-07 An expectation-maximization algorithm to estimate a general class of non-parametric choice models Garrett van Ryzin ∗ Gustavo Vulcano † December 2011 Abstract We propose an approach for estimating customer preferences for a set of substitutable products using only readily-available sales transactions and product availability data. Our approach combines a general, non-parametric discrete choice model with a Bernoulli process of arrivals over time. The discrete choice model, which is compatible with any random utility model, is defined simply by a discrete probability mass function (pmf) on a given set of possible preference rankings of alternatives (including the no-purchase alternative). When faced with a given choice set, a customer is assumed to purchase the available option that ranks highest in her preference list – or does not purchase at all if no product ranks higher than the no-purchase alternative. The problem we address is how to jointly estimate the arrival rate of customers and the pmf of the rank-based choice model. We apply the EM method to solve this estimation problem and show that it leads to a simple and highly efficient estimation procedure. All limit points of the procedure are provably stationary points of the associated incomplete data log-likelihood function and the estimates produced are maximum likelihood estimates (MLE), so they inherit all the desirable statistical properties of a MLE. We also develop a market discovery mechanism that allows us to start with a parsimonious set of possible customer types and enlarge this initial set by progressively and automatically generating new types that provably improve the likelihood function. The approach is based on column generation ideas from mathematical programming and requires solving a mixed integer program (MIP) at each stage to identify new customer types to add. Our numerical experiments confirm the potential of our estimation approach to rapidly identify a good set of customer types and produce corresponding MLE estimates. Key words: Demand estimation, demand untruncation, choice behavior, EM method. ∗ † Graduate School of Business, Columbia University, New York, NY 10027, [email protected]. Leonard N. Stern School of Business, New York University, New York, NY 10012, [email protected]. 1 1 Introduction Demand estimation is a central task in retail operations and revenue management (RM). For simplicity, many demand models in practice assume each product receives an independent stream of demand. However, if products are substitutes then the demand for a given product will be a function of the set of alternatives available to consumers when they make their purchase decisions (their choice set). Such choice behavior is a topic of great interest among revenue management researchers and practitioners alike because it has obvious and important revenue consequences. Estimating demand is particularly challenging when product availability varies over time due to either inventory stockouts or deliberate availability controls applied by the seller. Two consequences of changing availability are the occurrence of spilled and recaptured demand. The former refers to the demand lost when a consumer’s first choice is unavailable. Recapture refers to demand for a product that results when customers substitute because their preferred product is unavailable. Both behaviors can complicate demand estimation. Spilled demand could potentially be observed if a seller were to record all initial customer requests; however, this is rarely the case. In many physical retail stores, customers shopping is self-service (buying off a shelf) and their is no realistic opportunity for a seller to elicit customer preferences. Even when customers interact with a sales agent, other issues make eliciting preferences difficult. For instance, in the case of hotel operations, Queenan et al. [30] mention multiple availability inquiries from the same customer, incorrect categorization of turndowns by reservations agents, and customer requests that arrive through a channel not controlled by the firm. Clearly, if historical sales data were left uncorrected for spill, the true underlying demand would be underestimated. This bias contributes to the well-known spiral-down phenomenon, in which a firms expected revenue decreases over time (see Cooper et al. [15]) due to a cycle of lower estimates of demand leading to progressively lower availability. Correcting this bias is important for preventing such spiral-down problems. Recaptured demand due to substitution, in contrast, produces an increase in observed sales of alternative available products. Ignoring this behavior leads to an overestimation bias among the products that are available. For example, in retailing customers may buy an alternative color, size or flavor if their first preference is out of stock. A similar phenomenon occurs among airline passengers who may buy a ticket on an alternative flight with a different departure time or routing. Empirical studies from different industries confirm that such substitution is significant. Gruen et al. [19] report recapture rates of 45% across eight categories at retailers worldwide, while for airline passengers recapture rates are acknowledged to be in the range of 15%-55% (see Ja et al. [22]). Various statistical techniques have been proposed to correct for both spill and recapture effects. Collectively, these techniques are known as demand uncensoring methods. One of the most popular such methods is the expectation-maximization (EM) algorithm. The EM 2 method works using alternating steps of computing conditional expected values of the parameter estimates based on the observable data to obtain an expected log-likelihood function (the E-step), and then maximizing this function to obtain improved estimates (the M-step). Traditionally, demand estimates that employ the EM approach have been limited to uncensoring sales history for individual products and disregard choice behavior effects. Some recent exceptions in the literature are discussed below. Estimating customer choice behavior first requires specifying a choice model, either parametric or nonparametric. Parametric models assume a specific functional form relating choice alternatives and product attributes to purchase probabilities. Given such a form, one then estimates the parameters of this functional form from data. One widely used example of a parametric model is the multinomial logit model (MNL) (e.g., see Ben-Akiva and Lerman [3] and Train [37]). A convenient aspect of the MNL model is that the likelihood of purchase can be readily recalculated if the mix of available related products changes (e.g., due to another item being sold out or restocked). Yet one major disadvantage is that the MNL exhibits the independence from irrelevant alternatives (IIA) property, which can lead to unrealistic substitution patterns in cases where alternatives have shared attributes, despite promising results reported in the literature (e.g., see Vulcano et al. [39]). Other parametric models such as the mixed logit or nested logit model help overcome this IIA property (again, see Ben-Akiva and Lerman [3] and Train [37]). Still, the drawback of any parametric model is that one must make assumptions about the structure of preferences and the relevant attributes that influence it, which in turn requires significant expert judgement and extensive trial-and-error testing. There is also a balance between specification and estimation error when selecting a parametric model; a complex model with many attributes, latent classes and nests may be able to better approximate a wide range of choice behavior, but estimating the parameters of such a model from a finite set of data may lead to significant estimation error. A more parsimonious model, while potentially less faithful in terms of representing underlying choice behavior, is less prone to estimation error. The overall accuracy of a model in predicting demand requires the modeler to balance specification and estimation error. In this paper, we take a non-parametric approach. We use a simple, though quite general, non-parametric choice model, originally introduced in Mahajan and van Ryzin [25] and later used by van Ryzin and Vulcano [38] and more recently by Farias et al. [17]. In it, customer types are defined by their rank ordering of all alternatives (along with the no-purchase alternative). When faced with a choice, a customer is assumed to purchase the available product that ranks highest in her preference list – or she does not purchase at all if the no-purchase alternative ranks higher than any available product. With a finite number of alternatives, there are a finite number of rankings and hence a finite number of customer types. Demand is then described by a discrete pmf on the set of customer types. As pointed out by Mahajan and van Ryzin [25], several common demand processes studied in the literature can be modeled as special cases of this 3 choice model (e.g., MNL, Markovian second choice, universal backup, Lancaster demand, and the independent demand model). To complete the demand model, we assume customers arrive in time according to a Bernoulli process process with constant rate, which must be estimated along with the customer type pmf. Of course, the potential number of preference lists (customer types) in this rank-based choice model is exponential in the number of alternatives. Indeed, specifying the subset of relevant types to use in our model is analogous to the problem of specifying the form of model used in the parametric case and involves similar trade-offs; using the complete set of possible customer types provides maximum flexibility in modeling the choice behavior of a given population of customers, but the large number of parameters involved can lead to significant estimation error (over-fitting). In contrast, a more parsimonious set of customer types, while more restrictive in terms of representing choice behavior, will have fewer parameters to estimate and hence is less prone to estimation error. In addition, and consistent with realistic transactions data, it is often not possible to distinguish between a period with no arrival and a period with an arriving customer who decides not to purchase. This is particularly true in brick-and-mortar retailers, but also in online settings that record transactions rather than visits. Our procedure also corrects for this incompleteness in the data. The problem we address, in summary, is how to jointly estimate the arrival rate of customers and the pmf of rank-based choice model that best explains a set of observed transactions. The only required inputs are observed historical sales, product availability data, and a set of potential customer types. We apply the EM method to this model and show that it leads to an efficient, iterative procedure for estimating the demand parameters. All limit points of the algorithm are stationary points of the associated incomplete data log-likelihood function. Since our estimates are maximum likelihood estimates (MLEs), they inherit the desirable statistical properties of maximum likelihood estimation, i.e., they are consistent, asymptotically unbiased and asymptotically efficient (i.e., attain the Cramer-Rao lower bound for the variance, asymptotically). Since our MLE estimates for the pmf of types and arrival rate are computed using a given set of customer types, their overall quality depends on this input set. We assume initially the set of possible types is chosen exogenously by the modeler based on judgement and experience, but we subsequently develop a market discovery procedure for automatically augmenting an initial set of types following a column generation approach. Using our procedure, one can start from a very limited set of customer types (e.g., types based on the independent demand assumption) and successively add new types automatically that increase the likelihood function. Our numerical experiments show that after adding only a handful of new types, the likelihood function decreases rapidly and then exhibits only marginal improvements thereafter, confirming the practical potential of our procedure to reduce the inherent complexity of the rank-based choice model. 4 The remainder of this paper is organized as follows. In Section 2, we review the related literature. In Section 3 we introduce the demand model and the maximum likelihood estimation problem. Section 4 presents our EM approach. The market discovery mechanism is described in Section 5. Section 6 shows our numerical experiments, and we present our conclusions in Section 7. 2 Literature review Since the late 1990s, there has been growing interest in the operations and marketing science literature on estimating choice behavior and lost sales in the context of retailing. Anupindi et al. [2] present a method for estimating consumer demand when the first choice variant is not available. They assume a continuous time model of demand and develop an EM method to uncensor times of stock-outs for a periodic review policy, with the constraint that at most two products stock-out in order to handle a manageable number of variables. The authors find maximum likelihood estimates of arrival rates and substitution probabilities. Other papers in this area are Swait and Erdem [34], Borle et al. [5], Chintagunta and Dubé [13], Kalyanam et al. [23], Bruno and Vilcassim [7], and more recently, Musalem et al. [28]. However, they do not use EM as the estimation procedure. The paper by Kök and Fisher [24] is related to ours from a methodological standpoint. They develop an EM method for estimating demand and substitution probabilities under a hierarchical model of consumer purchase behavior at a retailer. This consumer behavior model is quite standard in the marketing literature; see e.g. Bucklin and Gupta [8], Chintagunta [12], and Campo et al. [9]. In their setting, upon arrival, a consumer decides: 1) whether or not to buy from a subcategory (purchase-incidence), 2) which variant to buy given the purchase incidence (choice), and 3) how many units to buy (quantity). Product choice is modeled with the MNL framework. Like in our work, they also analyze the problem at the individual consumer level, but in constrast they assume that the number of customers who visit the store without purchasing is negligible (see Section 4.3 therein). The outcome of the estimation procedure are estimates of the parameters of the incidence purchase decision, the parameters of the MNL model for the first choice, and the coefficients for the substitution matrix. Due to the complexity of the likelihood function, the EM procedure requires the use of non-linear optimization techniques in its M-step. Conlon and Mortimer [14] develop an EM algorithm to account for missing data in a periodic review inventory system under a continuous time model of demand, where for every period they attempt to uncensor the fraction of consumers not affected by stock-outs. Their aim is to demonstrate how to incorporate data from short-term variations in the choice set in order to identify substitution patterns, even when the changes to the choice set are not fully observed. A limitation of this work is that the E-step becomes difficult to implement when multiple products are simultaneously stocked-out since it requires estimating an exponential number of parameters 5 (see Conlon and Mortimer [14, Appendix A.2]). Interest in estimation and forecasting techniques has been growing in the RM research literature, which traditionally focused more on optimization of RM controls. Queenan et al.[30] review unconstraining methods for univariate demand models that are used commonly in RM practice. Talluri and van Ryzin [35, Section 5] develop an EM method to jointly estimate arrival rates and parameters of a MNL choice model based on consumer level panel data under unobservable no-purchases. Vulcano et al. [39] provide empirical evidence of the potential of that approach. Recently, Newman et al. [29] present a heuristic that accelerates the computational times of [35] but leads to estimates that are not MLEs. Ratliff et al. [31] provide a comprehensive review of the demand untruncation and choice behavior estimation literature in the context of RM settings. They also propose a heuristic to jointly estimate spill and recapture across numerous flight classes, by using balance equations that generalize the proposal of Anderson [1]. A similar approach was presented before by Ja et al. [22]. Talluri [36] proposes an alternative estimation method to the EM in [35]. It relies on a finitepopulation model and a heuristic that exploits the functional form of the MNL model, the variety of offer sets in a typical RM setting, and qualitative knowledge of arrival rates. He also provides experiments on synthetic data that show some promising features of the resulting estimates. Stefanescu [33] proposes a class of multivariate demand models that capture both time and product dimensions of demand correlation. The models use the multivariate normal distribution to account for product demand correlation and include a latent random term common to all time periods that induces intertemporal demand correlation over the selling horizon. She also develops a methodology for estimating the parameters of demand models from censored sales data, which is performed in a maximum likelihood framework using the EM method. Haensel and Koole [20] present an EM-based procedure to estimate a nonparametric choice model that is quite similar to ours, consisting also of a set of customer types that are defined by preference orderings. However, they do not explicitly take into account the incompleteness of the data with respect to arrival and no-arrival periods and instead use ad-hoc extrapolation to fill in for the lack of observations in a given time interval. The model for the arrivals is more general than ours; they assume a non-homogeneous Poisson arrival process for each type that follows an exponential arrival intensity over time, the parameters of which must be estimated. This leads to a more complex M-step in which one must solve multiple nonlinear optimization problems. Also different from our work, their model allows for multiple sales per period, which leads to a discrete Poisson model of demand that requires an approximate rounding procedure in the E-step. In a follow-up paper, Haensel et al. [21], the procedure is tested on real airline booking data, showing a slight overestimation bias. Lastly, the work does not address the important issue of how to efficiently construct the set of customer types used in the nonparametric choice model. 6 In our previous paper [40], we analyzed a model of demand that combines a MNL choice model with non-homogeneous Bernoulli arrivals over multiple periods. The problem addressed is how to jointly estimate the preference weights of the products and the arrival rates of customers when the availability of products changes over time. The key idea is to view the problem in terms of primary (or first-choice) demand and to treat the observed sales as incomplete observations of primary demand. The EM method is then applied to this primary demand model and it is shown that the algorithm shares the convergence properties of the procedure that we present here. Different from [40], in our new proposal we work on individual purchase records rather than on demand data aggregated over long periods (e.g., a day or a week). But the main difference in our work here is the generality of the choice model; indeed, the MNL model is just a special case of our nonparametric choice model. From a choice modeling point of view, our paper is closest to Farias et al. [17], who also proposes procedures for estimating a non-parametric choice model which is essentially identical to ours. However, the approach to estimation is very different. While we find maximum likelihood estimates of the choice demand model, Farias et al. [17] use a robust approach, finding the distribution that produces the worst-case revenue consistent with the observed data assuming a given fixed assortment. These robust revenues are computed approximately using constraint sampling. They then show that the demand model constructed by the robust approach is approximately the sparsest-choice model, where sparsity is measured by the number of customer types that occur with positive probability in the population. Farias et al. [17] show that the sparsity grows with the amount of observed data, but do not discuss statistical properties of the estimates obtained. Another related model in the context of assortment pricing is the rank pricing model of Rusmevichientong et al. [32], but their approach requires access to samples of entire customer preference lists which are unlikely to be available in real-world applications. Rank-based choice models can be used as input for revenue optimization routines. Such optimization routines were explored by Mahajan and van Ryzin [25] in the context of a retail assortment optimization problem, and by Zhang and Cooper [42], van Ryzin and Vulcano [38], Chen and Homem-de-Mello [11], and Chaneton and Vulcano [10] in the context of airline RM. 3 Model description A set of n substitutable products is sold over T purchase periods indexed by t = 1, 2, . . . , T . The full set of products is denoted N = {0, 1, . . . , n}, n ≥ 2, where 0 stands for the outside or no-purchase option. We assume customers arrive according to a discrete-time, homogeneous Bernoulli arrival process, in which in each time period t an arrival occurs with probability λ < 1, which we call the arrival rate. The parameter λ must be estimated from data. If λ << 1, this arrival process can be considered a discrete-time approximation to a Poisson arrival process. To avoid a limiting 7 and non interesting case, we assume there is at least one period with an observed transaction. In cases where the arrival rate may not be homogeneous throughout the selling horizon1 , one can partition time periods into multiple segments in such a way that within each segment the arrival rate is constant (e.g., see Chaneton and Vulcano [10, Section 6.2.3]). For ease of exposition, however, we will consider only the case where arrival rate is constant over time periods. Customers are assumed to have a rank-based preference relation for products. That is, each customer has a preference list (or total ranking), σ, of the products in N . Each preference list σ defines a customer type. A customer of type σ prefers product h to product j if and only if σ(h) < σ(j). The market, in turn, is assumed to be composed of a pre-defined set of customer types σ = {σ (1) , . . . , σ (N ) }. This set could perhaps be defined based on judgement and knowledge of the market or on surveys. Further, below we develop a procedure to “discover” new customer types that improve the likelihood function, which makes it possible to sequentially augment the set σ starting from an initial, minimal set of types. However, for now we assume the σ is given. Since the set of types used in the estimation model may differ from the true set of types present in the market, the set σ is a potential source of specification error in our model. Note that the meaningful rankings are only the ones truncated at the position of product 0. That is, a preference list σ will have as many elements as σ(0), where we will assume that σ(0) ≥ 2, since a customer type whose first preference is not to purchase anything is irrelevant. Considering all such meaningful rankings, the number of customer types in the market is at P n most |σ| ≤ n+1 i=1 i−1 (i − 1)! = O(n!). Arriving customers are assumed to be of type σ (i) with probability P(σ (i) ), i = 1, . . . , N . This probability mass function (pmf) for customer types is denoted P(σ), and it too must be estimated from the data. Customer types are assumed independent across periods. Upon arrival, each customer is offered a set of products St ⊂ N , with 0 ∈ St , i.e. the no-purchase alternative is always assumed available. We will further assume that |St | ≥ 2, so that at least one product other than the no-purchase option is available in each period, else if St = {0}, then the period is not informative. An arriving customer chooses her most preferred product among those available in St . In symbols, if there is an arrival of type σ in period t, that customer chooses jt = arg minj∈St σ(j). We assume sales transactions are the only available date. That is, the only data available for each period are purchase incidence (i.e., a purchase was observed or not), the product purchased, and the set of products available during the period. Specifically, we assume data is not available on no-purchase outcomes. This is a fundamental incompleteness in the data which is a common in many retail settings. Our objective is to compute MLE estimates for the parameters θ = (P(σ), λ), for a given set of rankings σ. Note that it is not possible to precisely identify customer types or to distinguish a period without an arrival from a period in which a customer arrived but chose not to purchase. 1 For some empirical evidence in the airline industry, we refer the reader to Haensel and Koole [20]. 8 For each period with an observed purchase, the most we can say is that the customer’s choice is consistent with a subset of the possible types (i.e., those types who rank the purchased product higher than all other available products). For each period with no observed purchase, we can only say that if there was an arrival, the customer who arrived preferred the no-purchase alternative to any of the products offered. Hence, sales data provide only incomplete observations of the choice model we wish to estimate. More precisely, for jt ∈ St consider the set: Mt (jt , St ) = {i : σ (i) (jt ) < σ (i) (k), ∀k ∈ St , k 6= jt }. In words, Mt (jt , St ) is the set of customer types for which product jt (i.e., the product in St chosen in period t) is ranked highest among the available products in St . The definition also covers the case jt = 0, and the corresponding set Mt (0, St ). That is, Mt (jt , St ) is the set of customer types that is compatible with the observed transaction jt and the offer set St . To illustrate the model and definition, consider the following example: Illustrative example. The selling horizon has T = 10 periods and there are n = 5 products (plus the always-available no-purchase alternative 0). The market consists of two customer types with rank preferences σ (1) = (1, 2) and σ (2) = (1, 2, 3, 4, 5), respectively. For instance, if product indices follow an increasing price order, type 1 customers (price sensitive customers) prefer only the cheapest products 1 and 2, while type 2 customers (price insensitive customers) are willing to buy up to the most expensive product 5. Table 1 describes the parameters of our model. A label “Yes” in position (j, t) means that product j is available in period t. The table also reports the observed sales (a zero means no-purchase). Following this information, the sets Mt (jt , St ) of compatible customer types for each period that could be inferred from the previous data are given. Lastly, the table lists the unobservable data: the specific type of the customer that arrived in each period (or the no arrival incidence, represented by a zero). For instance, in period 1 a unit of product 1 was purchased. The offer set was S1 = {0, 1, 2, 3, 4}. Since product 1 is the top choice for both customer types, we cannot distinguish which type indeed arrived and purchased the product, i.e., M1 (1, {0, 1, 2, 3, 4}) = {1, 2}. It turned out that in this period the (unobserved) arrival was of type 1. In the second period, the full assortment was available, but no transaction was observed. No customer type is compatible with this outcome, since an arrival from either type 1 or 2 would have resulted in a purchase of product 1. Therefore, M2 (0, {0, 1, 2, 3, 4, 5}) = ∅. In this period, we can say with certainty that there was no arrival. For period 3, product 4 was purchased. The only compatible type with this outcome is type 2, since it is the only type who has product 4 ranked higher than not purchasing and ranked higher than product 5, i.e., with σ (2) (4) < σ (2) (0) and σ (2) (4) < σ (2) (5). Hence, M3 (4, {4, 5}) = {2} 9 Period 8 shows yet a different situation. Product 3 is the only available one, and no transaction was observed. This could be due to the fact that a type-1 customer arrived or no customer arrived. (The latter happened in this case.) Note that we could not have had an arrival of a type-2 customer since she would have purchased the product. So here, M8 (0, {3}) = {1} Table 1: Availabilities and purchases for the Illustrative Example Observable data: Availabilities and transactions Product Period 5 6 1 2 3 4 7 8 9 10 1 2 3 4 5 Yes Yes Yes Yes No Yes Yes Yes Yes Yes No No No Yes Yes Yes No No No Yes No No Yes No Yes No Yes Yes No Yes No No No No Yes No No Yes No No Yes Yes No Yes No No Yes Yes No Yes Transactions 1 0 4 1 3 2 5 0 1 0 {1, 2} ∅ {2} {1, 2} {2} {1, 2} {2} {1} {1, 2} ∅ 1 No arr. 2 1 2 1 2 No arr. 2 No arr. Compatible customer types Sets Mt (jt , St ) Unobservable data Cust. types If we had the chance to observe the no arrivals and the customer types when an arrival occurs, we could easily compute MLEs from the input data. For this particular case, we would come up with estimates λ̂ = 7/10, P̂(σ (1) ) = 3/7, and P̂(σ (2) ) = 4/7. However, the incompleteness in the data creates a challenging estimation problem. Let P and P̄ be, respectively, the set of periods with purchases and no purchases. Let xi = P(σ (i) ) denote the probability a customer is of a type i. The incomplete data log-likelihood function for our problem is then given by: X X log λ + log LI (x, λ) = xi t∈P i∈Mt (jt ,St ) + X t∈P̄, Mt (0,St )6=∅ log λ X i∈Mt (0,St ) xi + (1 − λ) + X log(1 − λ). (1) t∈P̄, Mt (0,St )=∅ The first term accounts for the likelihood of the observed transactions in the periods with purchases (e.g., periods 1, 3-7, and 9 in Table 1). The second term accounts for the no-purchase periods, where an arriving customer preferred not to buy (i.e., customer types in Mt (0, St )) or no customer arrived at all (e.g., as in period 8 in Table 1)). The third term accounts for the periods where none of the customer types would have picked a product from the available assortment, and hence |Mt (0, St )| = 0. This last case indicates with certainty that no arrival occurred (e.g., periods 2 and 10 in Table 1)). 10 The MLE estimation problem can be formulated as follows: max LI (x, λ) x,λ N X s.t.: xi = 1, (2) i=1 x ≥ 0, 0 ≤ λ ≤ 1. This constrained, nonlinear optimization problem is hard to solve. In fact, one can show the objective function is not even quasi-concave in general, so we cannot preclude the possibility that there maybe local optima. While this is not entirely surprising, we formalize this property in a proposition: Proposition 1 The function LI (x, λ) in the optimization problem (2) is not quasiconcave in general. Proof. We exhibit an example where the objective function in (2) is not quasiconcave over the feasible set. Consider a case where there are three customer types and, given the product availability and transaction data, the objective function is: LI (x, λ) = 3(log λ + log x1 ) + log λ + log(x2 + x3 ) + log λ + log x3 +2 log(λ(x2 + x3 ) + 1 − λ) + 100 log(λ(x1 + x2 ) + 1 − λ) + 3 log(1 − λ). Take the following two points (x, λ): The first one is (x1 , λ1 ) = (0.08, 0.15, 0.77, 0.1), with LI (x1 , λ1 ) = −12.07; the second one is (x2 , λ2 ) = (0.07, 0.64, 0.29, 0.7), with LI (x2 , λ2 ) = −16.747. If we take α = 0.4, we get an intermediate point (xα , λα ) = α(x1 , λ1 )+(1−α)(x2 , λ2 ) = (0.074, 0.444, 0.482, 0.46), with LI (xα , λα ) = −17.316, which is smaller than the previous two. Therefore, quasiconcavity is violated. In order to simplify the solution of (2), we next utilize the EM method of Dempster et al. [16]. The EM method helps overcome two fundamental kinds of incompleteness in the likelihood function (1): 1) The fact that it is impossible to distinguish a period with no arrival from a period with an arrival but a no-purchase outcome; and 2) the fact that we do not know the customer type that arrives in each period. 4 The EM solution approach The building block for our simplified approach to solving the estimation problem (2) is to consider the complete data log-likelihood function; that is, the likelihood function we would get if we were able to directly observe whether an arrival occurred as well as the precise type of each 11 customer who did arrive. As shown below, under these assumptions, a reformulation of (1) turns into a trivial optimization problem. The EM algorithm exploits this simplification by using the complete data log-likelihood, but replacing the complete data with beliefs (expectations) about their values based on the incomplete data. 4.1 The complete data log-likelihood function For the periods with no observed transactions, let at = 1 if there is an arrival in the period, and at = 0 otherwise. Note that at = 1 accounts for an arrival that either buys an outside option or does not buy at all. Define xt = P(σt ) the probability that a type σt arrival occurs in period t. The complete data log-likelihood function is X X X (1 − at ) log(1 − λ). (3) (at log λ + at log xt ) + (log λ + log xt ) + L(x, λ) = t∈P t∈P̄ t∈P̄ The first term in (3) accounts for the observed purchases; there is one term for every period in which there is a sale. The second term accounts for customer arrivals that do not purchase at all. The third term accounts for periods with no arrivals. Note that the function L is separable in x and λ. Our objective is to maximize L(x, λ) subject to the constraints in (2). Define the function F (λ) as X X F (λ) = |P| log λ + at log λ + |P̄| log(1 − λ) − at log(1 − λ). t∈P̄ t∈P̄ This function is globally concave in λ. Taking derivative and setting it equal to zero, we obtain the unique maximizer P P |P| + t∈P̄ at |P| + t∈P̄ at ∗ λ = = , T |P| + |P̄| which is simply the number of arrivals divided by the total number of periods. Clearly, λ∗ satisfies 0 ≤ λ∗ ≤ 1. Define N X H(x) = mi log xi , i=1 where mi = X I{σt = σ (i) } + t∈P X at I{σt = σ (i) }. (4) t∈P̄ That is, mi counts the number of occurrences of a type σ (i) arrival, both in periods with and without purchases. For mi > 0, i = 1, . . . , N , the problem of maximizing H(x) can be posed as follows: N N X X max mi log xi , s.t. xi = 1, xi ≥ 0, ∀i. x i=1 i=1 P The KKT conditions give the unique maximizers x∗i = mi / N k=1 mk , which is simply the number of type i customers observed divided by the total number of customers observed. 12 4.2 The two main steps of the EM algorithm The EM method works by starting with arbitrary initial estimates of the parameters θ̂ = (x̂, λ̂). These estimates are then used to compute the conditional expected value of L: E[L(θ|θ̂)] (the “E”, expectation, step). Effectively, this replaces the missing data (i.e., the indicators at for the no-purchase time periods, and the number of occurrences of each preference list) by their expected values conditioned on the current estimates. The resulting expected log-likelihood function is then maximized to generate new estimates θ̂ (the “M”, maximization, step), and the procedure is repeated until convergence. We next describe the two steps of each iteration. 4.2.1 The E-step In the E-step, the unknown data values are the at for all t ∈ P̄. However, given current estimates θ̂ = (x̂, λ̂), we can determine the expected values of these arrival indicators. First, we use Bayes’ theorem to update the probability mass function over the set of rankings: (i) P(σt |jt , St , θ̂) = = P(jt |σ (i) , St , θ̂)P(σ (i) |θ̂) P(jt |St , θ̂) I{i ∈ Mt (jt , St )}x̂i P , h∈Mt (jt ,St ) x̂h (5) where P(jt |σ (i) , St , θ̂) in the numerator of (5) stands for the conditional probability of choosing product jt ∈ N . In particular, the case jt = 0 implements the update for the no-purchase probability. Again using Bayes’ theorem, we get the update for ât . As it is implicit in (3), if t ∈ P, then ât = 1. For t ∈ P̄, ât = E[at |t ∈ P̄, θ̂] = P(at = 1|t ∈ P̄, θ̂) = = P(t ∈ P̄|at = 1, θ̂)P(at = 1|θ̂) P(t ∈ P̄|θ̂) P(0|t, St , θ̂) λ̂ λ̂P(0|t, St , θ̂) + (1 − λ̂) , (6) where P(0|t, St , θ̂) = X x̂i i∈Mt (0,St ) represents the conditional probability of no-purchasing. If Mt (0, St ) = ∅, then P(0|t, St , θ̂) = 0, and therefore ât = 0. This is the case when all products are offered and no transaction occurs, which reveals that no arrival occurs. From (4), we get the estimates m̂i = E[mi |θ̂] X X = P(σ (i) |jt , St , θ̂) + ât P(σ (i) |0, St , θ̂). t∈P t∈P̄ 13 (7) In words, m̂i represents the conditional expected number of occurrences of the preference list σ (i) . P PT Note that N i=1 m̂i = t=1 ât . Substituting ât and m̂i into (3), we obtain the expected, complete data log-likelihood function N X X X E[L(x, λ)|θ̂] = m̂i log xi + |P| + ât log λ + |P̄| − ât log(1 − λ). (8) i=1 t∈P̄ t∈P̄ This is the function to be maximized in the current iteration. 4.2.2 The M-step ∗ In order to determine a maximizer θ̂ of (8), we use the results from equation (3). The function E[L(x, λ)|θ̂] is globally concave in (x, λ), separable in both variables, and has unique, closedform maximizers: P |P| + t∈P̄ ât m̂i ∗ , and x̂∗i = PN , i = 1, . . . , N. (9) λ̂ = T k=1 m̂k The simplicity of this maximization step is the most appealing feature of the EM algorithm. 4.3 Pseudocode We next summarize the entire EM algorithm using pseudocode. EM algorithm for estimating the rank-based choice model [Input data]: Number of customer types N , availability information represented by sets St and transaction data jt , t = 1, . . . , T . [Initialization]: Set xi := 1/N , and λ = 0.5. Based on the availability information and transaction data, build the sets Mt (jt , St ) := {i : σ (i) (jt ) < σ (i) (k), ∀k ∈ St , k 6= jt }. Set at := 0, t = 1, . . . , T . Repeat Set mi := 0, xit := 0, i = 1, . . . , N, t = 1, . . . , T . [E-step]: For t := 1, . . . , T do [Update probabilities of customer types] For i ∈ Mt (jt , St ) do P Set xit := xi /( h∈Mt (jt ,St ) xh ). Endfor [Update estimates at ] If jt > 0 Set at := 1, Else (i.e., jt = 0) If Mt (0, St ) = ∅ Set at := 0, Else P P Set at := λ i∈Mt (0,St ) xi /(λ i∈Mt (0,St ) xi + (1 − λ)) 14 Endif Endif Endfor [Compute estimates for mi ] For i := 1 to N do For t := 1 to T do Set mi := mi + at xit Endfor Endfor [M-step]: For i := 1 to N do PT Set xi := mi / t=1 at . Endfor PT Set λ := t=1 at /T . Until Stopping criteria is met. A few remarks on the implementation are in order. The initialization of λ and xi , i = 1, . . . , N , is arbitrary; we merely need starting values different from zero. In our numerical experiments, we started from uniform values (i.e., λ = 0.5, xi = 1/N ). (See Section 6.2 for further discussion.) The stopping criteria can be based on various measures of numerical convergence, e.g., that the difference between all values xi and the value λ from two consecutive iterations of the algorithm is less than a small constant or that the Euclidean norm of the difference between two consecutive vectors (x, λ) is less than . Alternatively, we could set a maximum number of iterations. In all of our experiments we observed very fast convergence, so it would appear that the stopping criteria is not critical. 4.4 Convergence The concavity of the complete data log-likelihood function guarantees that our procedure is indeed an EM algorithm, as opposed to a so-called Generalized EM algorithm (GEM). In the case of GEM, the M-step only requires that we generate an improved set of estimates over the current ones (i.e., it requires to find improved estimates (x, λ) such that E[L(x, λ)|θ̂] ≥ E[L(x̂, λ̂)|θ̂]), and the conditions for convergence are more stringent (e.g., see McLachlan and Krishnan [27, Chapter 3] for further discussion.) The following result argues that since our EM method satisfies a mild regularity condition, the likelihood converges to a stationary value of the incomplete-data log-likelihood function (1): Theorem 1 The conditional expected value E[L(x, λ)|θ̂] in (8) is continuous both in (x, λ) > 0 and (x̂, λ̂) > 0, and hence all limit points of any instance {(x̂(k) , λ̂(k) ), k = 1, 2, . . .} of the EM algorithm are stationary points of the corresponding incomplete-data log-likelihood function LI (x, λ), and {LI (x̂(k) , λ̂(k) ), k = 1, 2, . . .} converges monotonically to a value L∗I = LI (x∗ , λ∗ ), for some stationary point (x∗ , λ∗ ). 15 Proof. The result follows from the fact that m̂i in (7) and ât in (6) are continuous in x̂ and λ̂ according to equation (5). Clearly, E[L(x, λ)|θ̂] is also continuous in (x, λ). Therefore, from Theorem 2 in Wu [41] (see also McLachlan and Krishnan [27, Theorem 3.2]) the EM algorithm, given the unique maximizer found in the M-step, indeed generates an implied sequence {(x̂(k) , λ̂(k) ), k = 1, 2, . . .}. As pointed out by Wu [41, Section 2.2], the convergence of {LI (x̂(k) , λ̂(k) )} to LI (x∗ , λ∗ ), for some stationary point (x∗ , λ∗ ), does not automatically imply the convergence of {(x̂(k) , λ̂(k) )} to a point (x∗ , λ∗ ). Convergence of the EM estimator in the later sense usually requires more stringent regularity conditions. Boyles [6] further investigates these requirements. In his Theorem 2, he states sufficient conditions under which the sequence of iterates (x̂(k) , λ̂(k) ) converges to a compact, connected component of stationary points of the incomplete data log-likelihood function with value L∗I . Most of the conditions are rather mild and are satisfied by our algorithm. The stringent condition is the one that requires (k+1) (k) θ̂ − θ̂ → 0, as k → ∞. (10) As Boyles [6] comments, it is difficult to verify (10) in general. But under this condition, the (k) sequence {θ̂ } will seek an isolated plateau of stationary points, and for k sufficiently large will not leap over valleys to neighboring plateaux. See also Wu [41, Theorem 5]. Nevertheless, as a practical matter, the convergence of the sequence of points {(x̂(k) , λ̂(k) )} can be checked numerically as part of the EM procedure. In our experiments reported in Section 6, we consistently observed that the sequence of points visited by EM converges to a limit point. Another caveat is the fact that the stationary point of LI (x, λ) is not guaranteed to be a global maximum, but this feature is also shared by any standard nonlinear optimization method working directly on the incomplete log-likelihood function. 5 A market discovery procedure Our EM algorithm guarantees convergence to a stationary value of the incomplete data loglikelihood function for a predefined set of customer types σ = {σ (1) , . . . , σ (N ) }. To keep the procedure tractable, this subset will typically have a small cardinality. But one might ask: is there an additional customer type σ (N +1) that, when added to the set σ, increases the likelihood of the observed data? We call this process of finding customer types that improve the likelihood the market discovery problem, and below we provide a procedure to solve it. 5.1 Finding relevant customer types P For the analysis below, let yt = i∈Mt (jt ,St ) xi be the aggregated likelihood of the customer types that would pick alternative jt in period t. We assume yt = 0 if Mt (0, St ) = ∅. Then, 16 rewrite the incomplete data log-likelihood function as a function of (y, λ). Define the matrix A ∈ RT ×N , with elements ati = 1 if i ∈ Mt (jt , St ) (here, jt could also be zero), and ati = 0 otherwise. Consider the problem: max LI (y, λ) y ,λ s.t.: 1 − N X xi ≥ 0, (11) i=1 Ax − y ≥ 0, 1 − λ ≥ 0, x, y, λ ≥ 0. Take a stationary point (x∗ , λ∗ ) of function (1) that satisfies the constraints in (2). Note that the P point (x∗ , y ∗ , λ∗ ), where yt∗ = i∈Mt (jt ,St ) x∗i , is a feasible and stationary point of problem (11). In particular, the first two inequalities are binding at (x∗ , y ∗ , λ∗ ). Define the Lagrangian φ(x, y, λ, σ, µ, γ) = LI (y, λ) + σ(1 − N X xi ) + µT (Ax − y) + γ(1 − λ). i=1 Let β ∗ , µ∗ , and γ ∗ be the dual variables corresponding to the first three constraints, respectively. Their values are still unknown, but conditioned on them, define the restricted Lagrangian G(x, y, λ) = φ(x, y, λ, β ∗ , µ∗ , γ ∗ ) = LI (y, λ) + β ∗ (1 − N X xi ) + µ∗ T (Ax − y) + γ ∗ (1 − λ). (12) i=1 The KKT necessary conditions for optimality of (11) are: ∇G(x, y, λ) = 0, σ(1 − N X (13) xi ) = 0, i=1 µt (AT t x − yt ) = 0, for all t, γ(1 − λ) = 0, σ, µ, γ ≥ 0, where At is the t-th row of matrix A. These conditions are satisfied at the point (x∗ , y ∗ , λ∗ , β ∗ , µ∗ , γ ∗ ), and can be used to find the specific values β ∗ , µ∗ and γ ∗ : • For t ∈ P: 1 ∂G(x, y, λ) = ∗ − µ∗t = 0, ∗ ∗ ∗ ∂yt yt (x,y ,λ)=(x ,y ,λ ) 17 and then µ∗t = 1 . yt∗ • For t ∈ P̄, Mt (0, St ) 6= ∅: λ∗ ∂G(x, y, λ) = − µ∗t = 0, ∂yt λ∗ yt∗ + 1 − λ∗ (x,y ,λ)=(x∗ ,y ∗ ,λ∗ ) and then µ∗t = λ∗ . λ∗ yt∗ + 1 − λ∗ • For t ∈ P̄, Mt (0, St ) = ∅: ∂G(x, y, λ) = −µ∗t = 0, ∂yt (x,y ,λ)=(x∗ ,y ∗ ,λ∗ ) and then µ∗t = 0. • The value β ∗ can be computed from the derivative with respect to any xi : ∂G(x, y, λ) = −β ∗ + ∂xi X µ∗t = 0, for all i. (14) t:i∈Mt (jt ,St ) • The value γ ∗ can be computed from the derivative with respect to λ: ∂G(x, y, λ) ∗ |P| = −γ + ∗+ ∂λ λ (x,y ,λ)=(x∗ ,y ∗ ,λ∗ ) X t∈P̄, Mt (0,St )6=∅ yt∗ − 1 − λ∗ yt∗ + 1 − λ∗ X t∈P̄, Mt (0,St )=∅ 1 = 0. 1 − λ∗ Going back to our Lagrangian in (12), we have G(x, y, λ) ≥ LI (y, λ), for all (x, y, λ) feasible of (11), while G(x∗ , y ∗ , λ∗ ) = LI (y ∗ , λ∗ ). Hence, if (x∗ , y ∗ , λ∗ ) is a maximizer of G(x, y, λ), it will also be a maximizer of LI (y, λ). Consider the gradient ∇G(x, y, λ) in (13), and focus again on the derivative (14). Suppose that we could find a new, augmented point (x, xN +1 ), with corresponding column for the matrix A, A(N +1) , such that ∂G(x, y, λ) = −β ∗ + ∂xN +1 X µ∗t > 0. (15) t:σ (N +1) (jt )<σ (N +1) (i), ∀i∈St ,i6=jt Then, customer type (N + 1) defines a direction of improvement for the reduced Lagrangian, and hence becomes a candidate for increasing the value of the incomplete data log-likelihood function. The condition (15) has an intuitive interpretation in terms of our problem: We are seeking a customer type for which the marginal increase in the likelihood, LI (x, λ), from shifting probability mass to the new type more than compensates for the marginal reduction, β ∗ , in likelihood due to taking away that probability mass from the current set of types. 18 5.2 Complexity of the discovery problem Determining such a customer type (N + 1) can be posed as an optimization problem: Given sets St and coefficients µ∗t , 1 ≤ t ≤ T , we are interested in maximizing the second term in (15). This problem is closely related to a single-machine scheduling problem with (n + 1) jobs of unit length (recall that there are n products plus the no-purchase option) called the linear ordering problem (e.g., see Martı́ and Reinelt [26]). An important difference between our formulation and the standard linear ordering problem in the literature is that in our case, the cost structure is not defined at the job-pair level (i.e., there is no explicit cost for having job i before job j), but rather it is defined at the bundle level (i.e., there is a cost for having a job scheduled earlier than a set of other jobs). In addition, here we are maximizing benefits rather than minimizing cost. The linear ordering problem is known to be NP-Hard in its general form, but given the particular structure of the variation that we are considering here, its complexity is worth verifying formally. The following proposition establishes that the NP-hard, maximum independent set problem (see Garey and Johnson [18])) can be reduced to (15), so that our market discovery problem is also NP-hard. Therefore, it is unlikely that there exists an efficient algorithm for solving it. Proposition 2 The optimization problem max T X µ∗t wt t=1 s.t.: wt = I{σ(jt ) < σ(i), ∀i ∈ St , i 6= jt }, t = 1, . . . , T, (16) is NP-Hard. Proof. Given a connected graph G = (V, E), with nodes V = {1, 2, . . . , v}, v ≥ 2, and arcs E ⊂ {(i, j) ∈ V × V, i < j}, a maximum independent set of G is a subset V 0 ⊂ V such that there is no arc between two nodes in V 0 , and V 0 is of maximum cardinality. Let I be an instance of maximum independent set. We will construct an instance J of problem (16) corresponding to I. There will be n = v products and T = v binary decision variables w1 , . . . , wv . The variable wi represents node i ∈ V . For each node i ∈ V , we consider the set of adjacent nodes Γ(i) = {j : (i, j) ∈ E}. Then, for every node j ∈ V , we define the S “transaction” jt := j and the availability set St := Γ(j) {i, 0}. Finally, we set the coefficients µ∗t := 1. Note that in the optimum of J, we cannot have wi0 = wj0 = 1, for an arc (i0 , j0 ) ∈ E. Arguing by contradiction: Take an arc (i0 , j0 ) ∈ E. If wi0 = 1, and since j0 ∈ Γ(i0 ), then σ(i0 ) < σ(j0 ). Similarly, if wj0 = 1, and since i0 ∈ Γ(j0 ), then σ(j0 ) < σ(i0 ), which gives 19 an inconsistency in the order σ. Therefore, the output of the optimization problem (16) for instance J is an independent set of maximum cardinality. 5.3 A MIP formulation to discover new types Finding a new customer type that increases the likelihood of the observed data reduces to finding a type that maximizes the sum in (15). Of course, a simple necessary condition for such a type P to exist is that Tt=1 µ∗t > β ∗ . If this condition is satisfied, then the problem of finding a new customer type may be formulated as a MIP. Define binary linear ordering variables xji , which are equal to 1 if product j goes before i in the sequence (i.e., preference list), and equal to zero otherwise. In addition, define binary variables wt , which are equal to 1 if the dual variable u∗t must be added to the sum in (15), and equal to zero otherwise. Overall, there are O(n2 + T ) binary variables. The MIP is as follows: max T X wt u∗t t=1 s.t.: xji + xij = 1, ∀j, i, 0 ≤ j < i ≤ n, xji + xil + xlj ≤ 2, ∀j, i, l, j 6= i 6= l, 0 ≤ j, i, l ≤ n n X xj0 ≥ 1, (17) j=1 wt ≤ xjt ,i , ∀i ∈ St , jt 6= i, and ∀t, xji , wt ∈ {0, 1}, 0 ≤ j < i ≤ n, 1 ≤ t ≤ T. The first set of equalities ensure that either product j goes before product i in the preference list, or product i goes before product j. The second constraints represent transitivity constraints that ensure a linear ordering between three products. The third constraint precludes the no purchase alternative from being the most preferred option. The fourth set of constraints guarantees that in period t, product jt must be preferred over all other available products in St for the proposed type to make a contribution to the objective function in period t. Note that wt = 1 only if xjt ,i = 1, for all i ∈ St , i 6= jt . If the optimal objective function value is bigger than σ ∗ , then we have found a customer type that will increase the value LI (x, λ). The position of product j in the new ranking, σ (N +1) (j), can be determined from: X σ (N +1) (j) = xij + 1. i∈N ,i6=j 5.4 Implementation The discovery mechanism starts from an initial set σ = {σ (1) , . . . , σ (N ) } of customer types. This initial set can be built upon an informed guess of the market composition. For instance, we can 20 assume an independent demand model, where we have a customer type σ (jt ) = (jt ) for each of the transactions jt we observe in the market, leading to N ≤ n singleton types. Then, we apply our EM algorithm, and get initial proportions xi , 1 ≤ i ≤ N . In our experience, in order to ensure numerical stability of the procedure, we run a filtering procedure that eliminates the customer types with very low values of xi (i.e., those less relevant in the market). In our examples, we set this tolerance value at 1e-4. The procedure also involves P 0 0 readjusting the remaining proportions xi so that N i=1 xi = 1, where N ≤ N . Next, we apply our market discovery proposal using the MIP formulation (17), followed by the filtering procedure to discard irrelevant customer types. This sequence of discovery-filter steps is repeated until a convergence criteria (e.g., the difference of the log-likelihood values of two consecutive iterations) or a maximum number of new types is reached. 6 Numerical examples We next present the results of some tests of our EM algorithm on both simulated and real-world data sets. In all the examples, we set a stopping criteria based on the difference between the vectors (x̂, λ̂) produced by two consecutive iterations of the EM method, halting the procedure as soon as the absolute value of all the elements of the difference vector was smaller than 1e-4. We also set a maximum number of iterations at 1e7. In all the experiments below we verified the numerical convergence of the of the EM iterates to a limit point, though such convergence in theory is not guaranteed. The algorithm was implemented using the MATLAB2 procedural language, in which the method detailed in Section 4.3 is simple to code. 6.1 Comparison with respect to direct MLE Our first experiment is on simulated data, in which we know the underlying set of types that generated the data. We also assume complete information in the sense that the modeler knows the description of the customer types existing in the market, though he does not know the proportions of each type nor the overall volume of demand. The objective of the experiments is to assess the performance of our EM method relative to the direct maximization of the incomplete data log-likelihood function (labeled Direct Max in what follows). For the latter, we used the built-in MATLAB function “fmincon”, which finds a constrained minimum of a function of several variables, in our case subject to linear constraints and 0-1 bounds for the decision variables (x, λ). We set sequential quadratic programming (SQP) as the algorithm to be used by the MATLAB function. SQP closely mimics Newton’s method for constrained optimization just as is done for unconstrained optimization. At each major iteration, an approximation is 2 MATLAB is a trademark of The MathWorks, Inc. We used version 7.10.0.499 (R2010a) for Microsoft Windows 7 on a CPU with Intel Core i7 processor and 4Gb of RAM. 21 made of the Hessian of the Lagrangian function using a quasi-Newton updating method. This is then used to generate a quadratic programming subproblem whose solution is used to form a search direction for a line search procedure. We set both the iteration and function evaluation limits at 1,000, and the tolerance (i.e., the difference between two consecutive function values) at 1e-3. The starting point for both EM and Direct Max was the same: xi = 1/N , λ = 0.5. We fixed n = 10 and the arrival rate at λ ∈ {0.2, 0.8}. We generated between 100 and 500 instances of transaction data for different combinations of length of selling season T ∈ {1000, 2000, 4000, 8000}, and number of customer types N ∈ {30, 50, 100, 130, 150, 200}. The types themselves were generated by sampling random permutations of the n products along with the no-purchase alternative. For any instance, each product was available randomly and independently in each period with probability 0.7. Then, based on the true customer types, we apply our EM and Direct Max to maximize the incomplete data log-likelihood function. Table 2 and Table 3 summarize the results, fixing the values provided by Direct Max as the baseline. We report the difference between the log-likelihood values and root mean squared errors (RMSEs)3 of our EM algorithm with respect to Direct Max; desirable outcomes are positive values in the log-likelihood difference, and negative values in the RMSE difference. The results show a slight dominance of EM on the final log-likelihood values and on the RMSEs. However, the differences are small in all cases (note that zero is included in 16 of the 48 reported 95% CIs for the mean RMSE difference). The main substantial improvement provided by our EM method is the computational time: EM is generally 10 times faster than Direct Max, and this factor may scale up to 20 in some cases. If one thinks of a major carrier in an airline revenue management setting estimating hundreds of thousands of O-D pairs on a daily or even more frequent basis, such differences in computing times are indeed significant. 6.2 The effect of the starting point on EM Given the non-quasiconcavity of the incomplete data log-likelihood function identified in Proposition 1, convergence of the EM method could well be affected by the starting point. So we next study the effect that different starting points have on the performance of the EM method.4 To this end, we consider again the setting in Section 6.1, but here we generate a single random instance for each combination (T, N ). Then, for each such instance, we solved the problem starting from 50 randomly generated starting points. We summarize the results for the log-likelihood and RMSE values obtained in Table 4 for the low demand case (λ = 0.2) and in Table 5 for the high demand case (λ = 0.8). We observe that in general, there is little variability with respect to the level set of the log3 The RMSE measures the difference between the total observed and the expected number of purchases throughout the horizon, given the varying availability of products. 4 Recall that, as we discussed in Section 4.4, the local convergence property of the EM method is also shared by any standard steepest ascent nonlinear optimization algorithm. 22 Table 2: Comparative results with respect to Direct Max (n = 10, λ = 0.2). Results are 95% CI for the mean values. Times are in seconds. T N Log value diff. Center Width RMSE diff. Center Width Direct Max time Center Width EM time Center Width 1000 30 50 100 130 150 200 0.00% 0.01% 0.01% 0.01% 0.01% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% -0.40% -0.93% -0.15% -0.81% -0.76% -0.90% 0.24% 0.33% 0.32% 0.36% 0.40% 0.37% 0.92 1.70 4.09 5.92 6.93 10.66 0.02 0.03 0.07 0.11 0.11 0.18 0.14 0.22 0.44 0.58 0.67 0.92 0.01 0.00 0.01 0.02 0.02 0.03 2000 30 50 100 130 150 200 0.00% 0.01% 0.01% 0.02% 0.02% 0.02% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% -0.34% -0.53% -0.05% -0.67% -0.76% -0.81% 0.38% 0.41% 0.20% 0.42% 0.45% 0.48% 1.96 3.73 8.89 12.20 14.66 21.89 0.03 0.05 0.13 0.18 0.20 0.30 0.26 0.45 0.92 1.17 1.41 1.98 0.01 0.02 0.03 0.04 0.04 0.08 4000 30 50 100 130 150 200 0.01% 0.01% 0.02% 0.02% 0.02% 0.03% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% -0.26% -0.42% -0.14% -0.55% -0.74% -0.61% 0.44% 0.48% 0.56% 0.60% 0.54% 0.62% 4.13 8.07 19.79 27.49 32.46 47.44 0.06 0.10 0.25 0.26 0.34 0.46 0.63 1.03 2.05 3.05 3.46 4.85 0.03 0.00 0.06 0.10 0.10 0.12 8000 30 50 100 130 150 200 0.00% 0.01% 0.02% 0.03% 0.03% 0.03% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% -0.06% -1.03% -1.03% -0.92% -0.77% -0.08% 0.70% 0.59% 0.30% 0.82% 0.76% 0.71% 9.07 17.60 43.96 57.93 71.76 99.38 0.09 0.20 0.40 0.67 0.72 1.05 1.29 2.19 4.34 6.45 7.45 8.85 0.07 0.07 0.14 0.19 0.24 0.23 likelihood function reached by the algorithm despite starting from different points. However, there is slightly more variability with respect to the RMSE, specially for the high demand case. Therefore, the limit points of the EM algorithm are different, but the log-likelihood values are very close. The main take-away from this set of computations is that it may be worth trying different starting points on the estimation when the density of transactions is high (i.e., the number of periods with observed no-purchases is small). 6.3 The effect of incomplete market information Here, we assess the value of market information, in the sense of knowing (or now knowing) the set of customer types that make up the market. To this end, we generate random instances of transactions similarly to what was described in Section 6.1, but with the probability of a product being available in each period equal to 0.5. The key distinction is that in these computations, the modeler does not know the description of all types; that is, the estimation is performed 23 Table 3: Comparative results with respect to Direct Max (n = 10, λ = 0.8). Results are 95% CI for the mean values. Times are in seconds. T N Log value diff. Center Width RMSE diff. Center Width Direct Max time Center Width EM time Center Width 1000 30 50 100 130 150 200 0.01% 0.02% 0.03% 0.04% 0.04% 0.05% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% -0.28% -0.57% -0.77% -0.45% -1.37% -1.47% 0.20% 0.24% 0.29% 0.66% 0.70% 0.74% 1.27 2.68 7.10 10.30 13.10 20.04 0.01 0.02 0.04 0.13 0.18 0.24 0.15 0.24 0.48 0.61 0.74 0.98 0.00 0.00 0.01 0.02 0.02 0.03 2000 30 50 100 130 150 200 0.01% 0.02% 0.03% 0.04% 0.05% 0.06% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% -0.10% 0.16% -0.74% -1.07% -1.03% -2.20% 0.26% 0.32% 0.36% 0.80% 0.79% 0.78% 2.47 5.27 14.84 20.75 25.60 40.75 0.01 0.03 0.08 0.21 0.22 0.50 0.27 0.46 0.98 1.26 1.53 2.17 0.01 0.01 0.01 0.03 0.04 0.06 4000 30 50 100 130 150 200 0.04% 0.04% 0.05% 0.05% 0.05% 0.06% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% -0.45% -1.37% -1.47% -1.11% -1.61% -2.02% 0.30% 0.31% 0.33% 1.26% 1.32% 1.33% 10.30 13.10 20.04 39.66 46.21 74.03 0.06 0.08 0.11 0.36 0.43 0.71 0.61 0.74 0.98 2.82 3.46 4.42 0.01 0.01 0.01 0.08 0.10 0.11 8000 30 50 100 130 150 200 0.04% 0.05% 0.06% 0.05% 0.06% 0.07% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% -1.07% -1.03% -2.20% 1.49% 2.25% 2.27% 0.36% 0.35% 0.35% 2.06% 2.46% 2.06% 20.75 25.60 40.75 87.79 108.93 173.07 0.09 0.10 0.22 0.78 0.96 1.50 1.26 1.53 2.17 5.38 6.51 8.00 0.01 0.02 0.03 0.20 0.25 0.31 with a set of types that differs from the set of types that generated the data. The aim is to understand how miss-specification of the set of customer types affects the performance of the estimation. We consider two cases. In the first one, the modeler assumes an underlying independent demand model. This is represented by preference lists of size one. For n = 10, there are ten customer types, each a singleton of the form σ (k) = (k), k = 1, . . . , 10. In the second case, the modeler assumes an independent demand model (i.e., the ten previous types) and 10 additional customer types, which correspond to the ten most probable types among the randomly generated N types. The rational is that the modeler may not be aware of the fine granularity of the market composition, but is perhaps aware of the most common customer types in the market. Table 6 summarizes the results for a low demand intensity (λ = 0.2) and a high demand intensity (λ = 0.8) scenarios. In this experiment, we conducted an out-of-sample test. The value of T reported in Table 6 corresponds to the in-sample, training phase where we estimate the 24 Table 4: Performance of EM under different starting points (n = 10, λ = 0.2). T N St. Dev. Min RMSE Max Mean 1000 30 50 100 130 150 200 -836.29 -838.58 -881.82 -849.13 -771.51 -891.27 -836.25 -838.48 -881.77 -848.98 -771.44 -891.02 -836.26 -838.52 -881.80 -849.03 -771.47 -891.11 0.01 0.02 0.01 0.03 0.02 0.06 1.25 1.18 1.14 2.00 1.23 2.11 1.27 1.23 1.23 2.12 1.41 2.18 1.26 1.20 1.19 2.06 1.33 2.14 0.00 0.01 0.02 0.03 0.03 0.02 2000 30 50 100 130 150 200 -1655.11 -1679.00 -1710.47 -1755.93 -1870.90 -1703.58 -1654.98 -1678.86 -1710.13 -1755.39 -1870.52 -1703.19 -1655.01 -1678.90 -1710.24 -1755.52 -1870.69 -1703.38 0.03 0.03 0.08 0.09 0.10 0.11 3.15 2.10 1.41 1.39 2.24 1.52 3.19 2.18 1.48 1.47 2.37 1.67 3.18 2.14 1.44 1.44 2.32 1.58 0.01 0.02 0.02 0.02 0.02 0.03 4000 30 50 100 130 150 200 -3413.08 -3550.82 -3497.67 -3495.93 -3361.43 -3398.68 -3412.87 -3550.28 -3496.89 -3494.95 -3360.62 -3397.85 -3412.94 -3550.45 -3497.14 -3495.32 -3360.97 -3398.26 0.05 0.11 0.19 0.26 0.20 0.17 1.71 3.51 2.27 1.64 1.89 2.17 1.94 3.79 2.54 1.81 2.03 2.42 1.79 3.68 2.39 1.72 1.96 2.29 0.05 0.04 0.04 0.03 0.03 0.06 8000 30 50 100 130 150 200 -6778.86 -6900.93 -6895.88 -6974.29 -7054.32 -7109.21 -6778.63 -6899.96 -6894.53 -6972.09 -7052.29 -7106.18 -6778.73 -6900.37 -6894.96 -6972.81 -7053.22 -7107.10 0.04 0.24 0.33 0.46 0.49 0.56 6.07 2.88 2.56 3.30 2.37 1.97 6.23 3.27 2.85 3.65 2.62 2.41 6.13 3.06 2.68 3.54 2.50 2.19 0.04 0.08 0.06 0.06 0.06 0.10 Min Log value Max Mean St. Dev. parameters. Then, we apply those parameters over 100 randomly generated instances from the true underlying demand model, over 2,000 additional, out-of-sample periods. The baseline for the RMSE is the one obtained under the true market description. We report 95% CIs for the mean RMSE differences from using 20 types (labeled “choice under incomplete information”) and from using 10 singleton types (labeled “independent demand”). For both cases, the estimates were computed using EM. We observe substantially different behavior for these two demand scenarios. Both the choice model under incomplete information and the independent demand model perform equally well relative to the true underlying demand model for the low intensity case. The choice-based estimation is off by around 50%, while the independent demand is off by 52%. So in this case, augmenting the independent demand model with additional customer types provides minimal benefit. In contrast, for the high intensity case the difference between the two demand models is dramatic. The RMSE is on average 430% higher than the true underlying demand model for 25 Table 5: Performance of EM under different starting points (n = 10, λ = 0.8). T N St. Dev. Min RMSE Max Mean 1000 30 50 100 130 150 200 -1884.50 -1997.04 -1972.00 -1975.68 -1974.82 -1980.70 -1884.13 -1996.62 -1971.04 -1975.02 -1974.01 -1980.10 -1884.19 -1996.81 -1971.35 -1975.27 -1974.40 -1980.42 0.05 0.11 0.17 0.15 0.19 0.13 2.84 2.34 1.97 0.94 1.26 0.80 3.09 2.49 2.32 1.15 1.36 0.94 2.99 2.40 2.22 1.07 1.31 0.87 0.03 0.04 0.06 0.04 0.02 0.03 2000 30 50 100 130 150 200 -3831.58 -3932.34 -3939.98 -3969.46 -3984.72 -4009.07 -3831.18 -3930.84 -3938.27 -3967.58 -3982.70 -4006.75 -3831.40 -3931.30 -3938.97 -3968.35 -3983.59 -4007.73 0.09 0.24 0.45 0.42 0.50 0.51 2.41 2.83 1.48 2.70 1.66 1.75 2.62 3.17 1.76 3.08 1.96 2.14 2.51 3.05 1.65 2.91 1.85 1.96 0.04 0.06 0.06 0.08 0.08 0.07 4000 30 50 100 130 150 200 -7884.72 -7855.22 -8015.32 -8046.40 -8025.75 -7973.06 -7880.98 -7853.69 -8011.10 -8041.32 -8020.91 -7966.80 -7881.30 -7854.16 -8012.61 -8043.12 -8022.95 -7969.40 0.54 0.30 0.93 1.05 1.08 1.57 4.83 5.57 4.80 3.30 3.55 2.30 5.63 6.33 5.62 3.78 4.17 3.03 5.37 5.92 5.00 3.56 3.91 2.76 0.11 0.15 0.15 0.11 0.12 0.12 8000 30 50 100 130 150 200 -15038.08 -15700.30 -16171.93 -16119.54 -15891.41 -16126.69 -15037.11 -15694.79 -16163.55 -16111.52 -15881.56 -16113.81 -15037.60 -15696.06 -16167.05 -16115.07 -15886.66 -16119.09 0.28 1.02 2.04 2.01 2.16 2.80 5.46 3.41 2.25 1.60 2.56 1.88 6.10 4.77 3.41 2.54 3.66 3.44 5.84 4.03 2.71 2.15 3.20 2.55 0.14 0.26 0.25 0.22 0.24 0.27 Min Log value Max Mean St. Dev. the choice-based model, and on average 1427% higher for the independent demand model. In summary, for both intensity cases, the choice under incomplete information works better than the independent demand assumption, but the difference is minimal for the low demand case, where few transactions are observed throughout the horizon, and hence knowing the preference order of the customer is less relevant. For the high intensity demand case, many transactions are observed and therefore understanding the market composition is critical. 6.4 The effect of discovering new types In this section, we test our market discovery mechanism proposed in Section 5. The procedure is initialized with the observed transactions assuming an independent demand model, where we add a customer type σ (jt ) = (jt ) for each of the transactions jt we observe during the horizon. Using this set of independent demand types as a starting point, we then generated new types using the MIP-based procedure described in Section 5. The MIP for the discovery mechanism was implemented using the MATLAB interface from IBM ILOG CPLEX v12.3. 26 Table 6: RMSE comparison between choice under incomplete information and the independent demand assumption. The baseline is the RMSE obtained under the true underlying choice model. Results are 95% CIs for the mean difference. T N λ = 0.2 Choice under incomp. inf. Independent demand Center Width Center Width λ = 0.8 Choice under incomp. inf. Independent demand Center Width Center Width 1000 30 50 100 130 150 200 48.55% 49.18% 51.07% 48.29% 51.54% 51.42% 9.97% 9.73% 10.93% 10.51% 9.42% 10.87% 48.99% 51.16% 52.70% 49.60% 51.82% 52.55% 10.00% 10.34% 11.36% 10.49% 9.52% 11.38% 183.06% 351.59% 494.88% 532.44% 536.07% 566.19% 23.71% 30.36% 39.95% 38.07% 49.08% 49.29% 1386.13% 1451.24% 1441.92% 1472.40% 1429.45% 1480.30% 91.20% 81.16% 79.29% 72.94% 87.92% 86.71% 2000 30 50 100 130 150 200 45.98% 57.98% 45.56% 44.37% 48.54% 57.45% 8.80% 9.05% 8.50% 11.83% 9.21% 11.62% 50.52% 62.27% 46.99% 45.56% 49.89% 58.24% 10.08% 9.71% 9.11% 11.89% 9.46% 11.85% 164.33% 312.61% 480.03% 561.12% 492.07% 568.76% 17.90% 30.18% 48.66% 57.32% 40.71% 39.68% 1320.58% 1386.60% 1481.61% 1483.49% 1388.06% 1490.92% 77.47% 73.44% 91.22% 92.93% 74.19% 78.70% 4000 30 50 100 130 150 200 48.88% 53.38% 55.07% 60.26% 53.56% 43.43% 12.49% 10.41% 12.84% 12.69% 10.52% 9.03% 51.35% 56.24% 56.27% 62.92% 53.68% 43.66% 11.22% 11.63% 13.42% 13.75% 10.61% 8.92% 185.93% 312.20% 464.25% 456.78% 538.68% 554.84% 18.88% 29.37% 36.53% 32.98% 46.69% 48.00% 1455.95% 1430.55% 1409.72% 1370.59% 1497.11% 1452.82% 76.90% 73.07% 76.61% 69.26% 89.16% 81.01% 8000 30 50 100 130 150 200 49.16% 41.37% 51.21% 58.75% 52.58% 49.73% 12.10% 10.12% 11.60% 12.57% 10.70% 10.31% 48.77% 47.93% 53.24% 59.80% 55.31% 52.05% 12.07% 10.94% 12.42% 12.26% 11.54% 11.07% 170.84% 292.61% 475.71% 526.16% 541.85% 564.84% 20.84% 30.75% 41.94% 50.92% 41.34% 47.59% 1299.80% 1362.90% 1398.92% 1460.88% 1462.92% 1449.15% 74.77% 71.43% 76.16% 90.99% 89.47% 78.42% In Figure 1 we illustrate an example of the evolution of the procedure on five random instances of data. The initial set of customer types are built using the independent demand assumption based on the observed transactions as described, and the performance of this initial set is reflected in the first data point of each series (zero value on the horizontal axis). The graphs shows the value of the incomplete data log-likelihood function (left) and the RMSEs (right) as a function of the number of types discovered by our mechanism. In this case, we stopped after seven new types were added to the original set, although for one of the instances the mechanism could only find two new types. We observe that the beneficial effect of new types tend to be decreasing in the number of types considered, although not in a monotonic way. Table 7 shows results of the performance of our market discovery mechanism under randomly generated instances of data. For these experiments, the probability of a product being available 27 Log−likelihood Value for a Market Discovery Example RMSE for a Market Discovery Example 60 −1150 −1200 50 −1300 40 −1350 RMSE Log−likelihood value −1250 −1400 30 −1450 20 −1500 −1550 10 −1600 −1650 0 1 2 3 4 5 6 0 7 Number of new customer types 0 1 2 3 4 5 6 7 Number of new customer types Figure 1: An example based on synthetic data generated for n = 10 products, N = 5 customer types, T = 1000 periods, and λ = 0.7 arrivals per period. Each product is available in each period with probability 0.5. The initial customer types are defined by the independent demand assumption built upon the observed transactions. in each period is 0.5. The baselines are the true underlying demand model to generate the data (30 instances for each experiment), and the independent demand model based on the observed transactions. Desirable outcomes are positive values of log-value differences, and negative values of RMSE differences, meaning that our market discovery mechanism was able to increase the log-likelihood values and reduce the RMSEs. Table 7: Effect of the application of the market discovery mechanism (n = 10, T = 1000). Comparative results with respect to the true underlying demand model and the independent demand model. Results are 95% CIs for the mean difference. λ N Log-value difference True demand model Independent demand model Center Width Center Width RMSE difference True demand model Independent demand model Center Width Center Width 0.2 5 10 15 11.42% 5.83% 12.74% 10.75% 6.37% 10.59% 17.21% 8.78% 15.02% 10.07% 6.18% 10.32% -61.08% -65.41% -60.17% 9.80% 5.65% 12.04% 38.43% 41.33% 37.37% 78.19% 23.72% 33.79% 0.5 5 10 15 4.08% 24.17% 44.35% 6.48% 15.23% 17.71% 14.04% 29.13% 46.52% 5.89% 14.24% 17.02% -58.32% -72.45% -83.43% 8.59% 8.53% 6.15% -78.80% -86.04% -88.65% 5.33% 3.56% 4.17% 0.8 5 10 15 7.13% 10.63% 23.85% 9.04% 10.84% 15.29% 25.76% 23.05% 32.54% 7.43% 9.41% 13.56% -47.23% -53.78% -60.55% 9.22% 9.09% 10.30% -94.07% -94.69% -95.97% 1.10% 0.97% 0.93% Note that our procedure is able to match (or even beat) the performance of the true underlying demand model both in terms of log-likelihood and RMSE differences. Indeed, when looking at the RMSE difference, our mechanism consistently beats the true underlying demand model. When compared to the independent demand model, our mechanism dominates from a log-likelihood value point of view. Note, however, that our tests are within sample, so the improved performance relative to the true model is likely due to some degree of over-fitting. But when looking at the RMSE difference, we see that for λ = 0.2 (i.e., for a low demand 28 volume), the independent demand model dominates. This is consistent with our observation in Section 6.3, in the sense that there is not much benefit considering a more sophisticated model of demand under a low volume demand scenario. However, under moderate or high volume demands, our market discovery mechanism exhibits very good performance, providing significant gains in terms of the quality of the estimates. While our market discovery mechanism performs well, it is a computationally intensive procedure, since it involves solving a MIP whose difficulty is increasing in the underlying market complexity (i.e., number of base types N and demand intensity λ). For instance, in our implementation, the average computational time to discover one additional type increased from 2.3 seconds for the case (N = 5, λ = 0.2) to 10 seconds for the case (N = 15, λ = 0.8). 6.5 Examples based on real data In order to illustrate the performance of our method on real-world data, in this section we present results applying it to two data sets that we used in our previous paper Vulcano et al. [40, Section 5.2] (one for an airline market and one for a retail market), and to another hotel data set publicly available in Bodea et al. [4]. 6.5.1 Airline and retail data The data we obtained for testing the single-class multinomial logit (MNL) model of our previous paper was collected for one-day long periods, and therefore consisted of multiple transactions per period. Our approach here was to transform these data into a form compatible with our model, in which we have at most one transaction per period. To do so, we used the market share estimate provided by the data owners to subdivide time periods with multiple observations as follows: if in a given day we observe two sales of product 1 and three sales of product 4, for a total of five transactions, and if the market share estimate is 0.5, then we split the period into (2 + 3)/0.5 = 10 sub-periods, two of them with a sale of product 1, three of them with a sale of product 4, and the remaining five periods with no transactions. When a full period in the original data does not register any transaction, it is replicated just once. The original availability of the products on a given day is replicated for each of the 10 periods. Note that this ensures that the arrival intensity is approximately constant across different periods, which is consistent with the underlying arrival process of our model, as opposed to the non-homogeneous Bernoulli process in our prior work [40]. For each of the examples below, we tested the performance of our market discovery mechanism starting from the independent demand assumption, comparing it to the single-class MNL that we tested in [40]. For the market discovery mechanism, we set a limit of a maximum of 50 types to discover. Due to the somewhat ad-hoc data transformation process described above, the comparative results should be taken with some caution. Nevertheless, the results give a 29 sense of the estimation properties of our non-parametric, rank-based model relative to a more standard parametric MNL model. Airline market example The first example is based on data from a leading commercial airline serving a sample O-D market with two daily flights. The original data set covered 77 observation days, with 7 products (fare-class combinations) from one flight and 8 from the other one, for a total of 15 products. As in our previous paper, we computed two sets of estimates for the demand under different assumptions: In the joint, multi-flight case we assumed customers chose between both flights in the day, so the consideration set consisted of all 15 products with a market share of 50%; in the independent-flight case, we assumed customers were interested in only one of the two flights, implying there were two disjoint consideration sets, with a market share of 25% per flight. Table 8 shows the results. Besides checking the value of LI (x, λ) for each pair (x̂, λ̂) under the independent demand assumption and our market discovery mechanism (the log-likelihood function for the MNL is different), we aggregated the observed bookings and the predicted bookings across all the periods and computed RMSEs. Table 8: Estimation results for the Airline Market Example Consideration set Measure Estimation method Indep. Market disc. MNL Flight 1 LI (v, λ) RMSE -652.99 1.41 -564.30 5.57 2.18 Flight 2 LI (v, λ) RMSE -734.49 4.43 -605.21 4.21 2.00 Joint LI (v, λ) RMSE -891.15 6.37 -717.85 4.17 6.31 In terms of the log-likelihood value, the general choice model clearly dominates the independent demand model, confirming the improvement provided by our market discovery mechanism. But in terms of RMSE, we observe that the general choice model is dominated by either the independent demand model or the MNL for the single flight cases. This is perhaps due to the fact that the data in these cases are too sparse; with only 25% of market share and with T = 737 and T = 829 periods, respectively, we have only around 200 transactions in total, and this does not seem to be enough to calibrate a choice model well. This is also consistent with our observation in Table 7 for the simulated data, where the independent demand dominates in the low-arrival intensity case. As Table 8 shows, the likelihood and RMSE performance of our demand model clearly improves in the joint flight case. In this case, the the number of products in the consideration set and the number of observed transactions is larger, since T = 525 and the market share is 50%, giving approximately 260 transactions. Besides an increase in the number of transactions, 30 the improvement in performance is likely due to the fact that our rank-based model is better at representing customer preferences in this joint-flight case compared to the restrictive MNL model. Indeed, a nested logit model would likely be a more appropriate parametric model for this case, in which a customer first chooses a flight and then chooses a particular fare class within that flight. It is interesting to observe the final composition of customer types provided by our market discovery mechanism. In all cases, we started from the singletons of the independent demand assumption. In general, after adding about 5 to 10 types, we obtain most of the improvement in log-likelihood values. Yet the final composition of types consisted of either zero or one singletons for each of the three cases (recall that there is a filtering process that drops the types with very low probabilities). The hardest case was the joint flight estimation, where the market ended with 37 customer types, and the procedure stopped because of the maximum number of types limit. Finding a customer type for this case took about 5 seconds. Retail market example This example illustrates our EM method and market discovery mechanism applied to sales data from a retail chain. We consider sales observed during eight weeks over a sample selling season. We assume a choice set defined by 6 substitutable products within the same small subcategory of SKUs. The market share of this retail location is estimated to be 48%. Applying a similar transformation to the previous example, we get a data set with at most one transaction per period, resulting in a selling horizon of length T = 245. Here, we also tested the performance of our market discovery mechanism against the independent demand model and the MNL model. Our market discovery increased the log-likelihood value of the independent demand model from −286.79 to −241.34, and the RMSE was reduced from 3.81 to 1.61 after getting a final composition of 12 customer types with only one singleton. This RMSE beats the 1.86 corresponding to the MNL model reported in [40]. Again, the higher density of data seems to be a critical factor for the success of our market discovery mechanism. 6.5.2 Hotel data This example is based on the publicly available data used by Bodea et al. [4]. The booking records correspond to transient customers (predominately business travelers) with check-in dates between March 12, 2007, and April 15, 2007 in one of five continental U.S. hotels. For every hotel, a minimum booking horizon of four weeks for each check-in date was considered. Rate and room type availability information present at the time of booking was recorded for reservations made via the hotel or customer relationship officers (CROs), the hotels web sites, and off-line travel agencies. Due to the memory limitations of our MATLAB-linked CPLEX optimizer, we focus in particular on the properties labeled Hotel 4 and Hotel 5, with a market share of 20% each. We define a product as a room type (e.g., king non-smoking, queen smoking, etc), and 31 a period as a (booking date, check-in date)-pair. Since periods with no purchases were not recorded in the original data set, we use the market share information to augment the data with no purchases, similarly to what we did with the previous examples. Specifically, for each booking record we created four no-purchase records with the same availability of products as the one for the booking record. Table 9 summarizes further details and the estimation results of our EM method and market discovery mechanism relative to the independent demand model. Table 9: Estimation results for the Hotel Example Feature Number Number Number Number of of of of Hotel 4 products n original bookings bookings after preprocessing periods T Initial number of singletons under Indep. Demand Indep. Demand RMSE Indep. Demand Log-value Final number of types under Market Discovery Market Discovery RMSE Market Discovery Log-value Hotel 5 9 373 288 1440 8 310 245 1225 9 1.34 -977.48 8 1.40 -1092.46 21, with 1 singleton 1.08 -948.83 22, with 2 singletons 0.89 -1062.05 We note that for both hotels, our market discovery mechanism clearly outperforms the independent demand model, both in terms of log-likelihood and RMSE. The RMSE improvements here occur even under low market share (i.e., a low value of λ). Intuitively, the difference relative to the previous data sets is that here products are more widely available (on average, 75% of the time), and therefore the booking behavior is better explained by a richer choice model. One can observe this by looking at the final number of customer types discovered by our mechanism and by the fact that most of the original singletons of the independent demand assumption are removed from the final set of types. 7 Conclusions Our demand model combines a Bernoulli process model of arrivals with a non-parametric, rankbased choice model over multiple periods and allows for different availability of products over time. This specification is quite general and is essentially compatible with any utility maximizing model of preferences. The method can be used to untruncate demand and correct for spill and recapture effects across an entire set of substitutable products. More generally, it serves as a useful and general model of customer demand in a potentially wide range of retail settings. Given a set of potential customer types (rankings), our EM method provides a very efficient algorithm for calculating maximum likelihood estimates for the probability of each customer type and for the arrival rate. The only required inputs are observed historical sales, product 32 availability data, and the given set of customer types. Through an extensive numerical study, we showed that our method is effective in terms of reliably maximizing the likelihood function and is an order of magnitude faster computationally than direct maximization of the incomplete log-likelihood function. Importantly, we also provide a market discovery mechanism that automatically augments the initial set of potential customer type based on a column-generation-like approach that involves solving a novel MIP to discover new types that improve the likelihood. We show that our market discovery mechanism can significantly improve the quality of the estimates obtained if we start from singletons from the independent demand assumption as the initial set of customer types. Sample testing on real-world data sets shows that the market discovery mechanism is effective relative to parametric models, especially in cases with larger, complex choice sets. The ability to automate the specification of the set of model types in this way we feel further enhances the practical appeal of the method. References [1] S.E. Andersson. Passenger choice analysis for seat capacity control: A pilot project in Scandinavian Airlines. International Transactions in Operational Research, 5:471–486, 1998. [2] R. Anupindi, M. Dada, and S. Gupta. Estimation of consumer demand with stock-out based substitution: An application to vending machine products. Marketing Science, 17:406–423, 1998. [3] M. Ben-Akiva and S. Lerman. Discrete Choice Analysis: Theory and Applications to Travel Demand. The MIT Press, Cambridge, MA, sixth edition, 1994. [4] T.D. Bodea, M.E. Ferguson, and L.A. Garrow. Choice-based revenue management: Data from a major hotel chain. M&SOM, 11(2):356–361, 2009. [5] S. Borle, P. Boatwright, J. Kadane, J. Nunes, and S. Galit. The effect of product assortment changes on customer retention. Marketing Science, 24:616–622, 2005. [6] R. Boyles. On the convergence of the EM algorithm. Journal of the Royal Statistical Society (Series B), 45:47–50, 1983. [7] H. Bruno and N. Vilcassim. Structural demand estimation with varying product availability. Marketing Science, 27:1126–1131, 2008. [8] R. Bucklin and S. Gupta. Brand choice, purchase incidence, and segmentation: An integrated modeling approach. Journal of Marketing Research, 29:201–215, 1992. 33 [9] K. Campo, E. Gijsbrechts, and P. Nisol. The impact of retailer stockouts on whether, how much, and what to buy. International Journal of Research in Marketing, 20:273–286, 2003. [10] J. Chaneton and G. Vulcano. Computing bid-prices for revenue management under customer choice behavior. M&SOM, 13(4):452–470, 2011. [11] L. Chen and T. Homem de Mello. Mathematical programming models for revenue management under customer choice. European Journal of Operational Research, 203:294–305, 2010. [12] P. Chintagunta. Investigating purchase incidence, brand choice and purchase quantity decisions on households. Marketing Science, 12:184–208, 1993. [13] P. Chintagunta and J-P Dubé. Estimating a SKU-level brand choice model that combines household panel data and store data. Journal of Marketing Research, 42:368–379, 2005. [14] C. Conlon and J. Mortimer. Demand estimation under incomplete product availability. Working paper, Department of Economics, Harvard University, Cambridge, MA, 2010. [15] W. Cooper, T. Homem de Mello, and A. J. Kleywegt. Models of the spiral-down effect in revenue management. Operations Research, 54(5):968–987, 2006. [16] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1–38, 1977. [17] V. Farias, S. Jagabathula, and D. Shah. A non-parametric approach to modeling choice with limited data. Working paper, MIT Sloan School, Cambridge, MA, 2010. [18] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, San Francisco, CA, 1979. [19] T. Gruen, D. Corsten, and S. Bharadwaj. Retail out-of-stocks: A worldwide examination of causes, rates, and consumer responses. Grocery Manufacturers of America, 2002. [20] A. Haensel and G. Koole. Estimating unconstrained demand rate functions using customer choice sets. Journal of Revenue and Pricing Management, 9:1–17, 2010. [21] A. Haensel, G. Koole, and J. Erdman. Estimating unconstrained customer choice set demand: A case study on airline reservation data. Working paper, VU University Amsterdam, The Netherlands, 2011. [22] S. Ja, B. V. Rao, and S. Chandler. Passenger recapture estimation in airline revenue management. In AGIFORS 41st Annual Symposium, Sydney, Australia, August 2001. Presentation. 34 [23] K. Kalyanam, S. Borle, and P. Boatwright. Deconstructing each item’s category contribution. Marketing Science, 26:327–341, 2007. [24] G. Kök and M. Fisher. Demand estimation and assortment optimization under substitution: Methodology and application. Operations Research, 55:1001–1021, 2007. [25] S. Mahajan and G.J. van Ryzin. Stocking retail assortments under dynamic consumer substitution. Operations Research, 49:334–351, 2001. [26] R. Martı́ and G. Reinelt. The Linear Ordering Problem: Exact and Heuristic Methods in Combinatorial Optimization. Springer, first edition, 2011. [27] G. McLachlan and T. Krishnan. The EM Algorithm and Extensions. John Wiley & Sons, New York, NY, 1996. [28] A. Musalem, M. Olivares, E. Bradlow, C. Terwiesch, and D. Corsten. Structural estimation of the effect of out-of-stocks. Management Science, 56:1180–1197, 2010. [29] J. Newman, M. Ferguson, and L. Garrow. Estimation of choice-based models using sales data from a single firm. In INFORMS Revenue Management and Pricing Section Conference, New York, NY, June 2011. Presentation. [30] C. Queenan, M. Ferguson, J. Higbie, and R. Kapoor. A comparison of unconstraining methods to improve revenue management systems. Production and Operations Management, 16:729–746, 2007. [31] R. Ratliff, B. Rao, C. Narayan, and K. Yellepeddi. A multi-flight recapture heuristic for estimating unconstrained demand from airline bookings. Journal of Revenue and Pricing Management, 7:153–171, 2008. [32] P. Rusmevichientong, B. van Roy, and P. Glynn. A nonparametric approach to multiproduct pricing. Operations Research, 54(1):82–98, 2006. [33] C. Stefanescu. Multivariate customer demand: Modeling and estimation from censored sales. Working paper, European School of Management and Technology, Berlin, Germany, 2009. [34] J. Swait and T. Erdem. The effects of temporal consistency of sales promotions and availability on consumer choice behavior. Journal of Marketing Research, 39:304–320, 2002. [35] K. T. Talluri and G. J. van Ryzin. Revenue management under a general discrete choice model of consumer behavior. Management Science, 50:15–33, 2004. 35 [36] K.T. Talluri. A finite-population revenue management model and a risk-ratio procedure for the joint estimation of population size parameters. Working paper, Department of Economics and Business, Universitat Pompeu Fabra, Barcelona, Spain, 2009. [37] K. Train. Discrete choice methods with simulation. Cambridge University Press, New York, NY, 2003. [38] G. van Ryzin and G. Vulcano. Computing virtual nesting controls for network revenue management under customer choice behavior. M&SOM, 10(3):448–467, 2008. [39] G. Vulcano, G. van Ryzin, and W. Chaar. Choice-based revenue management: An empirical study of estimation and optimization. M&SOM, 12(3):371–392, 2010. [40] G. Vulcano, G. van Ryzin, and R. Ratliff. Estimating primary demand for substitutable products from sales transaction data. Leonard N. Stern School of Business, New York University, New York, NY. Forthcoming in Operations Research, 2008. [41] C.F. Wu. On the convergence properties of the EM algorithm. The Annals of Statistics, 11:95–103, 1983. [42] D. Zhang and W. Cooper. Revenue management for parallel flights with customer-choice behavior. Operations Research, 53(3):415–431, 2006. 36
© Copyright 2026 Paperzz