An expectation-maximization algorithm to estimate a general class

Decision, Risk & Operations
Working Papers Series
An Expectation-Maximization Algorithm to Estimate a
General Class of Non-Parametric Choice Models
G. van Ryzin and G. Vulcano
December 2011
DRO-2011-07
An expectation-maximization algorithm to estimate
a general class of non-parametric choice models
Garrett van Ryzin
∗
Gustavo Vulcano
†
December 2011
Abstract
We propose an approach for estimating customer preferences for a set of substitutable
products using only readily-available sales transactions and product availability data. Our
approach combines a general, non-parametric discrete choice model with a Bernoulli process
of arrivals over time. The discrete choice model, which is compatible with any random
utility model, is defined simply by a discrete probability mass function (pmf) on a given set
of possible preference rankings of alternatives (including the no-purchase alternative). When
faced with a given choice set, a customer is assumed to purchase the available option that
ranks highest in her preference list – or does not purchase at all if no product ranks higher
than the no-purchase alternative.
The problem we address is how to jointly estimate the arrival rate of customers and the
pmf of the rank-based choice model. We apply the EM method to solve this estimation
problem and show that it leads to a simple and highly efficient estimation procedure. All
limit points of the procedure are provably stationary points of the associated incomplete data
log-likelihood function and the estimates produced are maximum likelihood estimates (MLE),
so they inherit all the desirable statistical properties of a MLE. We also develop a market
discovery mechanism that allows us to start with a parsimonious set of possible customer
types and enlarge this initial set by progressively and automatically generating new types
that provably improve the likelihood function. The approach is based on column generation
ideas from mathematical programming and requires solving a mixed integer program (MIP)
at each stage to identify new customer types to add. Our numerical experiments confirm the
potential of our estimation approach to rapidly identify a good set of customer types and
produce corresponding MLE estimates.
Key words: Demand estimation, demand untruncation, choice behavior, EM method.
∗
†
Graduate School of Business, Columbia University, New York, NY 10027, [email protected].
Leonard N. Stern School of Business, New York University, New York, NY 10012, [email protected].
1
1
Introduction
Demand estimation is a central task in retail operations and revenue management (RM). For
simplicity, many demand models in practice assume each product receives an independent stream
of demand. However, if products are substitutes then the demand for a given product will be a
function of the set of alternatives available to consumers when they make their purchase decisions
(their choice set). Such choice behavior is a topic of great interest among revenue management
researchers and practitioners alike because it has obvious and important revenue consequences.
Estimating demand is particularly challenging when product availability varies over time
due to either inventory stockouts or deliberate availability controls applied by the seller. Two
consequences of changing availability are the occurrence of spilled and recaptured demand. The
former refers to the demand lost when a consumer’s first choice is unavailable. Recapture refers
to demand for a product that results when customers substitute because their preferred product
is unavailable. Both behaviors can complicate demand estimation.
Spilled demand could potentially be observed if a seller were to record all initial customer
requests; however, this is rarely the case. In many physical retail stores, customers shopping
is self-service (buying off a shelf) and their is no realistic opportunity for a seller to elicit customer preferences. Even when customers interact with a sales agent, other issues make eliciting
preferences difficult. For instance, in the case of hotel operations, Queenan et al. [30] mention
multiple availability inquiries from the same customer, incorrect categorization of turndowns by
reservations agents, and customer requests that arrive through a channel not controlled by the
firm.
Clearly, if historical sales data were left uncorrected for spill, the true underlying demand
would be underestimated. This bias contributes to the well-known spiral-down phenomenon,
in which a firms expected revenue decreases over time (see Cooper et al. [15]) due to a cycle
of lower estimates of demand leading to progressively lower availability. Correcting this bias is
important for preventing such spiral-down problems.
Recaptured demand due to substitution, in contrast, produces an increase in observed sales
of alternative available products. Ignoring this behavior leads to an overestimation bias among
the products that are available. For example, in retailing customers may buy an alternative
color, size or flavor if their first preference is out of stock. A similar phenomenon occurs among
airline passengers who may buy a ticket on an alternative flight with a different departure time or
routing. Empirical studies from different industries confirm that such substitution is significant.
Gruen et al. [19] report recapture rates of 45% across eight categories at retailers worldwide,
while for airline passengers recapture rates are acknowledged to be in the range of 15%-55% (see
Ja et al. [22]).
Various statistical techniques have been proposed to correct for both spill and recapture
effects. Collectively, these techniques are known as demand uncensoring methods. One of
the most popular such methods is the expectation-maximization (EM) algorithm. The EM
2
method works using alternating steps of computing conditional expected values of the parameter
estimates based on the observable data to obtain an expected log-likelihood function (the E-step),
and then maximizing this function to obtain improved estimates (the M-step). Traditionally,
demand estimates that employ the EM approach have been limited to uncensoring sales history
for individual products and disregard choice behavior effects. Some recent exceptions in the
literature are discussed below.
Estimating customer choice behavior first requires specifying a choice model, either parametric or nonparametric. Parametric models assume a specific functional form relating choice
alternatives and product attributes to purchase probabilities. Given such a form, one then
estimates the parameters of this functional form from data. One widely used example of a parametric model is the multinomial logit model (MNL) (e.g., see Ben-Akiva and Lerman [3] and
Train [37]). A convenient aspect of the MNL model is that the likelihood of purchase can be
readily recalculated if the mix of available related products changes (e.g., due to another item
being sold out or restocked). Yet one major disadvantage is that the MNL exhibits the independence from irrelevant alternatives (IIA) property, which can lead to unrealistic substitution
patterns in cases where alternatives have shared attributes, despite promising results reported
in the literature (e.g., see Vulcano et al. [39]). Other parametric models such as the mixed logit
or nested logit model help overcome this IIA property (again, see Ben-Akiva and Lerman [3]
and Train [37]). Still, the drawback of any parametric model is that one must make assumptions
about the structure of preferences and the relevant attributes that influence it, which in turn
requires significant expert judgement and extensive trial-and-error testing.
There is also a balance between specification and estimation error when selecting a parametric
model; a complex model with many attributes, latent classes and nests may be able to better
approximate a wide range of choice behavior, but estimating the parameters of such a model
from a finite set of data may lead to significant estimation error. A more parsimonious model,
while potentially less faithful in terms of representing underlying choice behavior, is less prone
to estimation error. The overall accuracy of a model in predicting demand requires the modeler
to balance specification and estimation error.
In this paper, we take a non-parametric approach. We use a simple, though quite general,
non-parametric choice model, originally introduced in Mahajan and van Ryzin [25] and later used
by van Ryzin and Vulcano [38] and more recently by Farias et al. [17]. In it, customer types are
defined by their rank ordering of all alternatives (along with the no-purchase alternative). When
faced with a choice, a customer is assumed to purchase the available product that ranks highest
in her preference list – or she does not purchase at all if the no-purchase alternative ranks higher
than any available product. With a finite number of alternatives, there are a finite number of
rankings and hence a finite number of customer types. Demand is then described by a discrete
pmf on the set of customer types. As pointed out by Mahajan and van Ryzin [25], several
common demand processes studied in the literature can be modeled as special cases of this
3
choice model (e.g., MNL, Markovian second choice, universal backup, Lancaster demand, and
the independent demand model). To complete the demand model, we assume customers arrive
in time according to a Bernoulli process process with constant rate, which must be estimated
along with the customer type pmf.
Of course, the potential number of preference lists (customer types) in this rank-based choice
model is exponential in the number of alternatives. Indeed, specifying the subset of relevant
types to use in our model is analogous to the problem of specifying the form of model used in
the parametric case and involves similar trade-offs; using the complete set of possible customer
types provides maximum flexibility in modeling the choice behavior of a given population of
customers, but the large number of parameters involved can lead to significant estimation error
(over-fitting). In contrast, a more parsimonious set of customer types, while more restrictive in
terms of representing choice behavior, will have fewer parameters to estimate and hence is less
prone to estimation error.
In addition, and consistent with realistic transactions data, it is often not possible to distinguish between a period with no arrival and a period with an arriving customer who decides not
to purchase. This is particularly true in brick-and-mortar retailers, but also in online settings
that record transactions rather than visits. Our procedure also corrects for this incompleteness
in the data.
The problem we address, in summary, is how to jointly estimate the arrival rate of customers
and the pmf of rank-based choice model that best explains a set of observed transactions. The
only required inputs are observed historical sales, product availability data, and a set of potential
customer types. We apply the EM method to this model and show that it leads to an efficient,
iterative procedure for estimating the demand parameters. All limit points of the algorithm are
stationary points of the associated incomplete data log-likelihood function. Since our estimates
are maximum likelihood estimates (MLEs), they inherit the desirable statistical properties of
maximum likelihood estimation, i.e., they are consistent, asymptotically unbiased and asymptotically efficient (i.e., attain the Cramer-Rao lower bound for the variance, asymptotically).
Since our MLE estimates for the pmf of types and arrival rate are computed using a given
set of customer types, their overall quality depends on this input set. We assume initially the
set of possible types is chosen exogenously by the modeler based on judgement and experience,
but we subsequently develop a market discovery procedure for automatically augmenting an
initial set of types following a column generation approach. Using our procedure, one can
start from a very limited set of customer types (e.g., types based on the independent demand
assumption) and successively add new types automatically that increase the likelihood function.
Our numerical experiments show that after adding only a handful of new types, the likelihood
function decreases rapidly and then exhibits only marginal improvements thereafter, confirming
the practical potential of our procedure to reduce the inherent complexity of the rank-based
choice model.
4
The remainder of this paper is organized as follows. In Section 2, we review the related
literature. In Section 3 we introduce the demand model and the maximum likelihood estimation
problem. Section 4 presents our EM approach. The market discovery mechanism is described
in Section 5. Section 6 shows our numerical experiments, and we present our conclusions in
Section 7.
2
Literature review
Since the late 1990s, there has been growing interest in the operations and marketing science
literature on estimating choice behavior and lost sales in the context of retailing. Anupindi
et al. [2] present a method for estimating consumer demand when the first choice variant is
not available. They assume a continuous time model of demand and develop an EM method
to uncensor times of stock-outs for a periodic review policy, with the constraint that at most
two products stock-out in order to handle a manageable number of variables. The authors find
maximum likelihood estimates of arrival rates and substitution probabilities. Other papers in
this area are Swait and Erdem [34], Borle et al. [5], Chintagunta and Dubé [13], Kalyanam et
al. [23], Bruno and Vilcassim [7], and more recently, Musalem et al. [28]. However, they do not
use EM as the estimation procedure.
The paper by Kök and Fisher [24] is related to ours from a methodological standpoint.
They develop an EM method for estimating demand and substitution probabilities under a
hierarchical model of consumer purchase behavior at a retailer. This consumer behavior model
is quite standard in the marketing literature; see e.g. Bucklin and Gupta [8], Chintagunta [12],
and Campo et al. [9]. In their setting, upon arrival, a consumer decides: 1) whether or not to buy
from a subcategory (purchase-incidence), 2) which variant to buy given the purchase incidence
(choice), and 3) how many units to buy (quantity). Product choice is modeled with the MNL
framework. Like in our work, they also analyze the problem at the individual consumer level, but
in constrast they assume that the number of customers who visit the store without purchasing
is negligible (see Section 4.3 therein). The outcome of the estimation procedure are estimates of
the parameters of the incidence purchase decision, the parameters of the MNL model for the first
choice, and the coefficients for the substitution matrix. Due to the complexity of the likelihood
function, the EM procedure requires the use of non-linear optimization techniques in its M-step.
Conlon and Mortimer [14] develop an EM algorithm to account for missing data in a periodic
review inventory system under a continuous time model of demand, where for every period they
attempt to uncensor the fraction of consumers not affected by stock-outs. Their aim is to
demonstrate how to incorporate data from short-term variations in the choice set in order to
identify substitution patterns, even when the changes to the choice set are not fully observed. A
limitation of this work is that the E-step becomes difficult to implement when multiple products
are simultaneously stocked-out since it requires estimating an exponential number of parameters
5
(see Conlon and Mortimer [14, Appendix A.2]).
Interest in estimation and forecasting techniques has been growing in the RM research literature, which traditionally focused more on optimization of RM controls. Queenan et al.[30]
review unconstraining methods for univariate demand models that are used commonly in RM
practice. Talluri and van Ryzin [35, Section 5] develop an EM method to jointly estimate arrival
rates and parameters of a MNL choice model based on consumer level panel data under unobservable no-purchases. Vulcano et al. [39] provide empirical evidence of the potential of that
approach. Recently, Newman et al. [29] present a heuristic that accelerates the computational
times of [35] but leads to estimates that are not MLEs.
Ratliff et al. [31] provide a comprehensive review of the demand untruncation and choice
behavior estimation literature in the context of RM settings. They also propose a heuristic to
jointly estimate spill and recapture across numerous flight classes, by using balance equations
that generalize the proposal of Anderson [1]. A similar approach was presented before by Ja et
al. [22].
Talluri [36] proposes an alternative estimation method to the EM in [35]. It relies on a finitepopulation model and a heuristic that exploits the functional form of the MNL model, the variety
of offer sets in a typical RM setting, and qualitative knowledge of arrival rates. He also provides
experiments on synthetic data that show some promising features of the resulting estimates.
Stefanescu [33] proposes a class of multivariate demand models that capture both time and
product dimensions of demand correlation. The models use the multivariate normal distribution
to account for product demand correlation and include a latent random term common to all time
periods that induces intertemporal demand correlation over the selling horizon. She also develops
a methodology for estimating the parameters of demand models from censored sales data, which
is performed in a maximum likelihood framework using the EM method.
Haensel and Koole [20] present an EM-based procedure to estimate a nonparametric choice
model that is quite similar to ours, consisting also of a set of customer types that are defined by
preference orderings. However, they do not explicitly take into account the incompleteness of the
data with respect to arrival and no-arrival periods and instead use ad-hoc extrapolation to fill in
for the lack of observations in a given time interval. The model for the arrivals is more general
than ours; they assume a non-homogeneous Poisson arrival process for each type that follows an
exponential arrival intensity over time, the parameters of which must be estimated. This leads
to a more complex M-step in which one must solve multiple nonlinear optimization problems.
Also different from our work, their model allows for multiple sales per period, which leads to
a discrete Poisson model of demand that requires an approximate rounding procedure in the
E-step. In a follow-up paper, Haensel et al. [21], the procedure is tested on real airline booking
data, showing a slight overestimation bias. Lastly, the work does not address the important
issue of how to efficiently construct the set of customer types used in the nonparametric choice
model.
6
In our previous paper [40], we analyzed a model of demand that combines a MNL choice
model with non-homogeneous Bernoulli arrivals over multiple periods. The problem addressed is
how to jointly estimate the preference weights of the products and the arrival rates of customers
when the availability of products changes over time. The key idea is to view the problem in terms
of primary (or first-choice) demand and to treat the observed sales as incomplete observations
of primary demand. The EM method is then applied to this primary demand model and it is
shown that the algorithm shares the convergence properties of the procedure that we present
here. Different from [40], in our new proposal we work on individual purchase records rather than
on demand data aggregated over long periods (e.g., a day or a week). But the main difference
in our work here is the generality of the choice model; indeed, the MNL model is just a special
case of our nonparametric choice model.
From a choice modeling point of view, our paper is closest to Farias et al. [17], who also
proposes procedures for estimating a non-parametric choice model which is essentially identical
to ours. However, the approach to estimation is very different. While we find maximum likelihood estimates of the choice demand model, Farias et al. [17] use a robust approach, finding the
distribution that produces the worst-case revenue consistent with the observed data assuming
a given fixed assortment. These robust revenues are computed approximately using constraint
sampling. They then show that the demand model constructed by the robust approach is approximately the sparsest-choice model, where sparsity is measured by the number of customer
types that occur with positive probability in the population. Farias et al. [17] show that the
sparsity grows with the amount of observed data, but do not discuss statistical properties of
the estimates obtained. Another related model in the context of assortment pricing is the rank
pricing model of Rusmevichientong et al. [32], but their approach requires access to samples of
entire customer preference lists which are unlikely to be available in real-world applications.
Rank-based choice models can be used as input for revenue optimization routines. Such
optimization routines were explored by Mahajan and van Ryzin [25] in the context of a retail
assortment optimization problem, and by Zhang and Cooper [42], van Ryzin and Vulcano [38],
Chen and Homem-de-Mello [11], and Chaneton and Vulcano [10] in the context of airline RM.
3
Model description
A set of n substitutable products is sold over T purchase periods indexed by t = 1, 2, . . . , T .
The full set of products is denoted N = {0, 1, . . . , n}, n ≥ 2, where 0 stands for the outside or
no-purchase option.
We assume customers arrive according to a discrete-time, homogeneous Bernoulli arrival
process, in which in each time period t an arrival occurs with probability λ < 1, which we call
the arrival rate. The parameter λ must be estimated from data. If λ << 1, this arrival process
can be considered a discrete-time approximation to a Poisson arrival process. To avoid a limiting
7
and non interesting case, we assume there is at least one period with an observed transaction. In
cases where the arrival rate may not be homogeneous throughout the selling horizon1 , one can
partition time periods into multiple segments in such a way that within each segment the arrival
rate is constant (e.g., see Chaneton and Vulcano [10, Section 6.2.3]). For ease of exposition,
however, we will consider only the case where arrival rate is constant over time periods.
Customers are assumed to have a rank-based preference relation for products. That is, each
customer has a preference list (or total ranking), σ, of the products in N . Each preference
list σ defines a customer type. A customer of type σ prefers product h to product j if and
only if σ(h) < σ(j). The market, in turn, is assumed to be composed of a pre-defined set of
customer types σ = {σ (1) , . . . , σ (N ) }. This set could perhaps be defined based on judgement and
knowledge of the market or on surveys. Further, below we develop a procedure to “discover”
new customer types that improve the likelihood function, which makes it possible to sequentially
augment the set σ starting from an initial, minimal set of types. However, for now we assume
the σ is given. Since the set of types used in the estimation model may differ from the true set
of types present in the market, the set σ is a potential source of specification error in our model.
Note that the meaningful rankings are only the ones truncated at the position of product 0.
That is, a preference list σ will have as many elements as σ(0), where we will assume that
σ(0) ≥ 2, since a customer type whose first preference is not to purchase anything is irrelevant.
Considering all such meaningful rankings, the number of customer types in the market is at
P
n
most |σ| ≤ n+1
i=1 i−1 (i − 1)! = O(n!).
Arriving customers are assumed to be of type σ (i) with probability P(σ (i) ), i = 1, . . . , N .
This probability mass function (pmf) for customer types is denoted P(σ), and it too must be
estimated from the data. Customer types are assumed independent across periods.
Upon arrival, each customer is offered a set of products St ⊂ N , with 0 ∈ St , i.e. the
no-purchase alternative is always assumed available. We will further assume that |St | ≥ 2, so
that at least one product other than the no-purchase option is available in each period, else if
St = {0}, then the period is not informative. An arriving customer chooses her most preferred
product among those available in St . In symbols, if there is an arrival of type σ in period t, that
customer chooses jt = arg minj∈St σ(j).
We assume sales transactions are the only available date. That is, the only data available for
each period are purchase incidence (i.e., a purchase was observed or not), the product purchased,
and the set of products available during the period. Specifically, we assume data is not available
on no-purchase outcomes. This is a fundamental incompleteness in the data which is a common
in many retail settings.
Our objective is to compute MLE estimates for the parameters θ = (P(σ), λ), for a given set
of rankings σ. Note that it is not possible to precisely identify customer types or to distinguish
a period without an arrival from a period in which a customer arrived but chose not to purchase.
1
For some empirical evidence in the airline industry, we refer the reader to Haensel and Koole [20].
8
For each period with an observed purchase, the most we can say is that the customer’s choice is
consistent with a subset of the possible types (i.e., those types who rank the purchased product
higher than all other available products). For each period with no observed purchase, we can only
say that if there was an arrival, the customer who arrived preferred the no-purchase alternative
to any of the products offered. Hence, sales data provide only incomplete observations of the
choice model we wish to estimate.
More precisely, for jt ∈ St consider the set:
Mt (jt , St ) = {i : σ (i) (jt ) < σ (i) (k), ∀k ∈ St , k 6= jt }.
In words, Mt (jt , St ) is the set of customer types for which product jt (i.e., the product in St
chosen in period t) is ranked highest among the available products in St . The definition also
covers the case jt = 0, and the corresponding set Mt (0, St ). That is, Mt (jt , St ) is the set of
customer types that is compatible with the observed transaction jt and the offer set St .
To illustrate the model and definition, consider the following example:
Illustrative example. The selling horizon has T = 10 periods and there are n = 5 products
(plus the always-available no-purchase alternative 0). The market consists of two customer types
with rank preferences σ (1) = (1, 2) and σ (2) = (1, 2, 3, 4, 5), respectively. For instance, if product
indices follow an increasing price order, type 1 customers (price sensitive customers) prefer only
the cheapest products 1 and 2, while type 2 customers (price insensitive customers) are willing
to buy up to the most expensive product 5.
Table 1 describes the parameters of our model. A label “Yes” in position (j, t) means that
product j is available in period t. The table also reports the observed sales (a zero means
no-purchase). Following this information, the sets Mt (jt , St ) of compatible customer types for
each period that could be inferred from the previous data are given. Lastly, the table lists the
unobservable data: the specific type of the customer that arrived in each period (or the no
arrival incidence, represented by a zero).
For instance, in period 1 a unit of product 1 was purchased. The offer set was S1 =
{0, 1, 2, 3, 4}. Since product 1 is the top choice for both customer types, we cannot distinguish which type indeed arrived and purchased the product, i.e., M1 (1, {0, 1, 2, 3, 4}) = {1, 2}.
It turned out that in this period the (unobserved) arrival was of type 1.
In the second period, the full assortment was available, but no transaction was observed. No
customer type is compatible with this outcome, since an arrival from either type 1 or 2 would
have resulted in a purchase of product 1. Therefore, M2 (0, {0, 1, 2, 3, 4, 5}) = ∅. In this period,
we can say with certainty that there was no arrival.
For period 3, product 4 was purchased. The only compatible type with this outcome is type 2,
since it is the only type who has product 4 ranked higher than not purchasing and ranked higher
than product 5, i.e., with σ (2) (4) < σ (2) (0) and σ (2) (4) < σ (2) (5). Hence, M3 (4, {4, 5}) = {2}
9
Period 8 shows yet a different situation. Product 3 is the only available one, and no transaction was observed. This could be due to the fact that a type-1 customer arrived or no customer
arrived. (The latter happened in this case.) Note that we could not have had an arrival of a
type-2 customer since she would have purchased the product. So here, M8 (0, {3}) = {1}
Table 1: Availabilities and purchases for the Illustrative Example
Observable data: Availabilities and transactions
Product
Period
5
6
1
2
3
4
7
8
9
10
1
2
3
4
5
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
No
No
No
Yes
Yes
Yes
No
No
No
Yes
No
No
Yes
No
Yes
No
Yes
Yes
No
Yes
No
No
No
No
Yes
No
No
Yes
No
No
Yes
Yes
No
Yes
No
No
Yes
Yes
No
Yes
Transactions
1
0
4
1
3
2
5
0
1
0
{1, 2}
∅
{2}
{1, 2}
{2}
{1, 2}
{2}
{1}
{1, 2}
∅
1
No arr.
2
1
2
1
2
No arr.
2
No arr.
Compatible customer types
Sets Mt (jt , St )
Unobservable data
Cust. types
If we had the chance to observe the no arrivals and the customer types when an arrival occurs,
we could easily compute MLEs from the input data. For this particular case, we would come
up with estimates λ̂ = 7/10, P̂(σ (1) ) = 3/7, and P̂(σ (2) ) = 4/7. However, the incompleteness
in the data creates a challenging estimation problem. Let P and P̄ be, respectively, the set of
periods with purchases and no purchases. Let xi = P(σ (i) ) denote the probability a customer is
of a type i. The incomplete data log-likelihood function for our problem is then given by:



X
X
log λ + log 
LI (x, λ) =
xi 
t∈P
i∈Mt (jt ,St )

+
X
t∈P̄,
Mt (0,St )6=∅
log λ

X
i∈Mt (0,St )
xi + (1 − λ) +
X
log(1 − λ).
(1)
t∈P̄,
Mt (0,St )=∅
The first term accounts for the likelihood of the observed transactions in the periods with
purchases (e.g., periods 1, 3-7, and 9 in Table 1). The second term accounts for the no-purchase
periods, where an arriving customer preferred not to buy (i.e., customer types in Mt (0, St ))
or no customer arrived at all (e.g., as in period 8 in Table 1)). The third term accounts for
the periods where none of the customer types would have picked a product from the available
assortment, and hence |Mt (0, St )| = 0. This last case indicates with certainty that no arrival
occurred (e.g., periods 2 and 10 in Table 1)).
10
The MLE estimation problem can be formulated as follows:
max LI (x, λ)
x,λ
N
X
s.t.:
xi = 1,
(2)
i=1
x ≥ 0,
0 ≤ λ ≤ 1.
This constrained, nonlinear optimization problem is hard to solve. In fact, one can show the
objective function is not even quasi-concave in general, so we cannot preclude the possibility
that there maybe local optima. While this is not entirely surprising, we formalize this property
in a proposition:
Proposition 1 The function LI (x, λ) in the optimization problem (2) is not quasiconcave in
general.
Proof. We exhibit an example where the objective function in (2) is not quasiconcave over
the feasible set. Consider a case where there are three customer types and, given the product
availability and transaction data, the objective function is:
LI (x, λ) = 3(log λ + log x1 ) + log λ + log(x2 + x3 ) + log λ + log x3
+2 log(λ(x2 + x3 ) + 1 − λ) + 100 log(λ(x1 + x2 ) + 1 − λ) + 3 log(1 − λ).
Take the following two points (x, λ): The first one is (x1 , λ1 ) = (0.08, 0.15, 0.77, 0.1), with
LI (x1 , λ1 ) = −12.07; the second one is (x2 , λ2 ) = (0.07, 0.64, 0.29, 0.7), with LI (x2 , λ2 ) =
−16.747. If we take α = 0.4, we get an intermediate point (xα , λα ) = α(x1 , λ1 )+(1−α)(x2 , λ2 ) =
(0.074, 0.444, 0.482, 0.46), with LI (xα , λα ) = −17.316, which is smaller than the previous two.
Therefore, quasiconcavity is violated.
In order to simplify the solution of (2), we next utilize the EM method of Dempster et al. [16].
The EM method helps overcome two fundamental kinds of incompleteness in the likelihood
function (1): 1) The fact that it is impossible to distinguish a period with no arrival from a
period with an arrival but a no-purchase outcome; and 2) the fact that we do not know the
customer type that arrives in each period.
4
The EM solution approach
The building block for our simplified approach to solving the estimation problem (2) is to
consider the complete data log-likelihood function; that is, the likelihood function we would get
if we were able to directly observe whether an arrival occurred as well as the precise type of each
11
customer who did arrive. As shown below, under these assumptions, a reformulation of (1) turns
into a trivial optimization problem. The EM algorithm exploits this simplification by using the
complete data log-likelihood, but replacing the complete data with beliefs (expectations) about
their values based on the incomplete data.
4.1
The complete data log-likelihood function
For the periods with no observed transactions, let at = 1 if there is an arrival in the period, and
at = 0 otherwise. Note that at = 1 accounts for an arrival that either buys an outside option or
does not buy at all. Define xt = P(σt ) the probability that a type σt arrival occurs in period t.
The complete data log-likelihood function is
X
X
X
(1 − at ) log(1 − λ).
(3)
(at log λ + at log xt ) +
(log λ + log xt ) +
L(x, λ) =
t∈P
t∈P̄
t∈P̄
The first term in (3) accounts for the observed purchases; there is one term for every period in
which there is a sale. The second term accounts for customer arrivals that do not purchase at
all. The third term accounts for periods with no arrivals. Note that the function L is separable
in x and λ. Our objective is to maximize L(x, λ) subject to the constraints in (2).
Define the function F (λ) as
X
X
F (λ) = |P| log λ +
at log λ + |P̄| log(1 − λ) −
at log(1 − λ).
t∈P̄
t∈P̄
This function is globally concave in λ. Taking derivative and setting it equal to zero, we obtain
the unique maximizer
P
P
|P| + t∈P̄ at
|P| + t∈P̄ at
∗
λ =
=
,
T
|P| + |P̄|
which is simply the number of arrivals divided by the total number of periods. Clearly, λ∗
satisfies 0 ≤ λ∗ ≤ 1.
Define
N
X
H(x) =
mi log xi ,
i=1
where
mi =
X
I{σt = σ (i) } +
t∈P
X
at I{σt = σ (i) }.
(4)
t∈P̄
That is, mi counts the number of occurrences of a type σ (i) arrival, both in periods with and
without purchases. For mi > 0, i = 1, . . . , N , the problem of maximizing H(x) can be posed as
follows:
N
N
X
X
max
mi log xi , s.t.
xi = 1, xi ≥ 0, ∀i.
x
i=1
i=1
P
The KKT conditions give the unique maximizers x∗i = mi / N
k=1 mk , which is simply the number
of type i customers observed divided by the total number of customers observed.
12
4.2
The two main steps of the EM algorithm
The EM method works by starting with arbitrary initial estimates of the parameters θ̂ = (x̂, λ̂).
These estimates are then used to compute the conditional expected value of L: E[L(θ|θ̂)] (the
“E”, expectation, step). Effectively, this replaces the missing data (i.e., the indicators at for
the no-purchase time periods, and the number of occurrences of each preference list) by their
expected values conditioned on the current estimates. The resulting expected log-likelihood
function is then maximized to generate new estimates θ̂ (the “M”, maximization, step), and the
procedure is repeated until convergence. We next describe the two steps of each iteration.
4.2.1
The E-step
In the E-step, the unknown data values are the at for all t ∈ P̄. However, given current estimates
θ̂ = (x̂, λ̂), we can determine the expected values of these arrival indicators. First, we use Bayes’
theorem to update the probability mass function over the set of rankings:
(i)
P(σt |jt , St , θ̂) =
=
P(jt |σ (i) , St , θ̂)P(σ (i) |θ̂)
P(jt |St , θ̂)
I{i ∈ Mt (jt , St )}x̂i
P
,
h∈Mt (jt ,St ) x̂h
(5)
where P(jt |σ (i) , St , θ̂) in the numerator of (5) stands for the conditional probability of choosing
product jt ∈ N . In particular, the case jt = 0 implements the update for the no-purchase
probability.
Again using Bayes’ theorem, we get the update for ât . As it is implicit in (3), if t ∈ P, then
ât = 1. For t ∈ P̄,
ât = E[at |t ∈ P̄, θ̂] = P(at = 1|t ∈ P̄, θ̂)
=
=
P(t ∈ P̄|at = 1, θ̂)P(at = 1|θ̂)
P(t ∈ P̄|θ̂)
P(0|t, St , θ̂) λ̂
λ̂P(0|t, St , θ̂) + (1 − λ̂)
,
(6)
where
P(0|t, St , θ̂) =
X
x̂i
i∈Mt (0,St )
represents the conditional probability of no-purchasing. If Mt (0, St ) = ∅, then P(0|t, St , θ̂) = 0,
and therefore ât = 0. This is the case when all products are offered and no transaction occurs,
which reveals that no arrival occurs.
From (4), we get the estimates
m̂i = E[mi |θ̂]
X
X
=
P(σ (i) |jt , St , θ̂) +
ât P(σ (i) |0, St , θ̂).
t∈P
t∈P̄
13
(7)
In words, m̂i represents the conditional expected number of occurrences of the preference list σ (i) .
P
PT
Note that N
i=1 m̂i =
t=1 ât .
Substituting ât and m̂i into (3), we obtain the expected, complete data log-likelihood function




N
X
X
X
E[L(x, λ)|θ̂] =
m̂i log xi + |P| +
ât  log λ + |P̄| −
ât  log(1 − λ).
(8)
i=1
t∈P̄
t∈P̄
This is the function to be maximized in the current iteration.
4.2.2
The M-step
∗
In order to determine a maximizer θ̂ of (8), we use the results from equation (3). The function
E[L(x, λ)|θ̂] is globally concave in (x, λ), separable in both variables, and has unique, closedform maximizers:
P
|P| + t∈P̄ ât
m̂i
∗
, and x̂∗i = PN
, i = 1, . . . , N.
(9)
λ̂ =
T
k=1 m̂k
The simplicity of this maximization step is the most appealing feature of the EM algorithm.
4.3
Pseudocode
We next summarize the entire EM algorithm using pseudocode.
EM algorithm for estimating the rank-based choice model
[Input data]: Number of customer types N , availability information represented by sets St and transaction
data jt , t = 1, . . . , T .
[Initialization]: Set xi := 1/N , and λ = 0.5. Based on the availability information and transaction data,
build the sets Mt (jt , St ) := {i : σ (i) (jt ) < σ (i) (k), ∀k ∈ St , k 6= jt }. Set at := 0, t = 1, . . . , T .
Repeat
Set mi := 0, xit := 0, i = 1, . . . , N, t = 1, . . . , T .
[E-step]:
For t := 1, . . . , T do
[Update probabilities of customer types]
For i ∈ Mt (jt , St ) do
P
Set xit := xi /( h∈Mt (jt ,St ) xh ).
Endfor
[Update estimates at ]
If jt > 0
Set at := 1,
Else (i.e., jt = 0)
If Mt (0, St ) = ∅
Set at := 0,
Else
P
P
Set at := λ i∈Mt (0,St ) xi /(λ i∈Mt (0,St ) xi + (1 − λ))
14
Endif
Endif
Endfor
[Compute estimates for mi ]
For i := 1 to N do
For t := 1 to T do
Set mi := mi + at xit
Endfor
Endfor
[M-step]:
For i := 1 to N do
PT
Set xi := mi / t=1 at .
Endfor
PT
Set λ := t=1 at /T .
Until Stopping criteria is met.
A few remarks on the implementation are in order. The initialization of λ and xi , i =
1, . . . , N , is arbitrary; we merely need starting values different from zero. In our numerical
experiments, we started from uniform values (i.e., λ = 0.5, xi = 1/N ). (See Section 6.2 for further
discussion.) The stopping criteria can be based on various measures of numerical convergence,
e.g., that the difference between all values xi and the value λ from two consecutive iterations
of the algorithm is less than a small constant or that the Euclidean norm of the difference
between two consecutive vectors (x, λ) is less than . Alternatively, we could set a maximum
number of iterations. In all of our experiments we observed very fast convergence, so it would
appear that the stopping criteria is not critical.
4.4
Convergence
The concavity of the complete data log-likelihood function guarantees that our procedure is
indeed an EM algorithm, as opposed to a so-called Generalized EM algorithm (GEM). In the case
of GEM, the M-step only requires that we generate an improved set of estimates over the current
ones (i.e., it requires to find improved estimates (x, λ) such that E[L(x, λ)|θ̂] ≥ E[L(x̂, λ̂)|θ̂]),
and the conditions for convergence are more stringent (e.g., see McLachlan and Krishnan [27,
Chapter 3] for further discussion.)
The following result argues that since our EM method satisfies a mild regularity condition,
the likelihood converges to a stationary value of the incomplete-data log-likelihood function (1):
Theorem 1 The conditional expected value E[L(x, λ)|θ̂] in (8) is continuous both in (x, λ) > 0
and (x̂, λ̂) > 0, and hence all limit points of any instance {(x̂(k) , λ̂(k) ), k = 1, 2, . . .} of the
EM algorithm are stationary points of the corresponding incomplete-data log-likelihood function LI (x, λ), and {LI (x̂(k) , λ̂(k) ), k = 1, 2, . . .} converges monotonically to a value L∗I =
LI (x∗ , λ∗ ), for some stationary point (x∗ , λ∗ ).
15
Proof. The result follows from the fact that m̂i in (7) and ât in (6) are continuous in x̂ and
λ̂ according to equation (5). Clearly, E[L(x, λ)|θ̂] is also continuous in (x, λ). Therefore, from
Theorem 2 in Wu [41] (see also McLachlan and Krishnan [27, Theorem 3.2]) the EM algorithm, given the unique maximizer found in the M-step, indeed generates an implied sequence
{(x̂(k) , λ̂(k) ), k = 1, 2, . . .}.
As pointed out by Wu [41, Section 2.2], the convergence of {LI (x̂(k) , λ̂(k) )} to LI (x∗ , λ∗ ), for
some stationary point (x∗ , λ∗ ), does not automatically imply the convergence of {(x̂(k) , λ̂(k) )}
to a point (x∗ , λ∗ ). Convergence of the EM estimator in the later sense usually requires more
stringent regularity conditions. Boyles [6] further investigates these requirements. In his Theorem 2, he states sufficient conditions under which the sequence of iterates (x̂(k) , λ̂(k) ) converges
to a compact, connected component of stationary points of the incomplete data log-likelihood
function with value L∗I . Most of the conditions are rather mild and are satisfied by our algorithm.
The stringent condition is the one that requires
(k+1)
(k) θ̂
− θ̂ → 0,
as k → ∞.
(10)
As Boyles [6] comments, it is difficult to verify (10) in general. But under this condition, the
(k)
sequence {θ̂ } will seek an isolated plateau of stationary points, and for k sufficiently large will
not leap over valleys to neighboring plateaux. See also Wu [41, Theorem 5].
Nevertheless, as a practical matter, the convergence of the sequence of points {(x̂(k) , λ̂(k) )}
can be checked numerically as part of the EM procedure. In our experiments reported in
Section 6, we consistently observed that the sequence of points visited by EM converges to a
limit point. Another caveat is the fact that the stationary point of LI (x, λ) is not guaranteed
to be a global maximum, but this feature is also shared by any standard nonlinear optimization
method working directly on the incomplete log-likelihood function.
5
A market discovery procedure
Our EM algorithm guarantees convergence to a stationary value of the incomplete data loglikelihood function for a predefined set of customer types σ = {σ (1) , . . . , σ (N ) }. To keep the
procedure tractable, this subset will typically have a small cardinality. But one might ask: is
there an additional customer type σ (N +1) that, when added to the set σ, increases the likelihood
of the observed data? We call this process of finding customer types that improve the likelihood
the market discovery problem, and below we provide a procedure to solve it.
5.1
Finding relevant customer types
P
For the analysis below, let yt = i∈Mt (jt ,St ) xi be the aggregated likelihood of the customer
types that would pick alternative jt in period t. We assume yt = 0 if Mt (0, St ) = ∅. Then,
16
rewrite the incomplete data log-likelihood function as a function of (y, λ). Define the matrix
A ∈ RT ×N , with elements ati = 1 if i ∈ Mt (jt , St ) (here, jt could also be zero), and ati = 0
otherwise. Consider the problem:
max LI (y, λ)
y ,λ
s.t.: 1 −
N
X
xi ≥ 0,
(11)
i=1
Ax − y ≥ 0,
1 − λ ≥ 0,
x, y, λ ≥ 0.
Take a stationary point (x∗ , λ∗ ) of function (1) that satisfies the constraints in (2). Note that the
P
point (x∗ , y ∗ , λ∗ ), where yt∗ = i∈Mt (jt ,St ) x∗i , is a feasible and stationary point of problem (11).
In particular, the first two inequalities are binding at (x∗ , y ∗ , λ∗ ).
Define the Lagrangian
φ(x, y, λ, σ, µ, γ) = LI (y, λ) + σ(1 −
N
X
xi ) + µT (Ax − y) + γ(1 − λ).
i=1
Let β ∗ , µ∗ , and γ ∗ be the dual variables corresponding to the first three constraints, respectively.
Their values are still unknown, but conditioned on them, define the restricted Lagrangian
G(x, y, λ) = φ(x, y, λ, β ∗ , µ∗ , γ ∗ ) = LI (y, λ) + β ∗ (1 −
N
X
xi ) + µ∗ T (Ax − y) + γ ∗ (1 − λ). (12)
i=1
The KKT necessary conditions for optimality of (11) are:
∇G(x, y, λ) = 0,
σ(1 −
N
X
(13)
xi ) = 0,
i=1
µt (AT
t x − yt ) = 0,
for all t,
γ(1 − λ) = 0,
σ, µ, γ ≥ 0,
where At is the t-th row of matrix A. These conditions are satisfied at the point (x∗ , y ∗ , λ∗ , β ∗ , µ∗ , γ ∗ ),
and can be used to find the specific values β ∗ , µ∗ and γ ∗ :
• For t ∈ P:
1
∂G(x, y, λ) = ∗ − µ∗t = 0,
∗
∗
∗
∂yt
yt
(x,y ,λ)=(x ,y ,λ )
17
and then µ∗t =
1
.
yt∗
• For t ∈ P̄, Mt (0, St ) 6= ∅:
λ∗
∂G(x, y, λ) =
− µ∗t = 0,
∂yt
λ∗ yt∗ + 1 − λ∗
(x,y ,λ)=(x∗ ,y ∗ ,λ∗ )
and then µ∗t =
λ∗
.
λ∗ yt∗ + 1 − λ∗
• For t ∈ P̄, Mt (0, St ) = ∅:
∂G(x, y, λ) = −µ∗t = 0,
∂yt
(x,y ,λ)=(x∗ ,y ∗ ,λ∗ )
and then µ∗t = 0.
• The value β ∗ can be computed from the derivative with respect to any xi :
∂G(x, y, λ)
= −β ∗ +
∂xi
X
µ∗t = 0, for all i.
(14)
t:i∈Mt (jt ,St )
• The value γ ∗ can be computed from the derivative with respect to λ:
∂G(x, y, λ) ∗ |P|
=
−γ
+ ∗+
∂λ
λ
(x,y ,λ)=(x∗ ,y ∗ ,λ∗ )
X
t∈P̄,
Mt (0,St )6=∅
yt∗ − 1
−
λ∗ yt∗ + 1 − λ∗
X
t∈P̄,
Mt (0,St )=∅
1
= 0.
1 − λ∗
Going back to our Lagrangian in (12), we have
G(x, y, λ) ≥ LI (y, λ),
for all (x, y, λ) feasible of (11),
while G(x∗ , y ∗ , λ∗ ) = LI (y ∗ , λ∗ ). Hence, if (x∗ , y ∗ , λ∗ ) is a maximizer of G(x, y, λ), it will also
be a maximizer of LI (y, λ).
Consider the gradient ∇G(x, y, λ) in (13), and focus again on the derivative (14). Suppose that we could find a new, augmented point (x, xN +1 ), with corresponding column for the
matrix A, A(N +1) , such that
∂G(x, y, λ)
= −β ∗ +
∂xN +1
X
µ∗t > 0.
(15)
t:σ (N +1) (jt )<σ (N +1) (i),
∀i∈St ,i6=jt
Then, customer type (N + 1) defines a direction of improvement for the reduced Lagrangian,
and hence becomes a candidate for increasing the value of the incomplete data log-likelihood
function. The condition (15) has an intuitive interpretation in terms of our problem: We are
seeking a customer type for which the marginal increase in the likelihood, LI (x, λ), from shifting
probability mass to the new type more than compensates for the marginal reduction, β ∗ , in
likelihood due to taking away that probability mass from the current set of types.
18
5.2
Complexity of the discovery problem
Determining such a customer type (N + 1) can be posed as an optimization problem: Given
sets St and coefficients µ∗t , 1 ≤ t ≤ T , we are interested in maximizing the second term in (15).
This problem is closely related to a single-machine scheduling problem with (n + 1) jobs of unit
length (recall that there are n products plus the no-purchase option) called the linear ordering
problem (e.g., see Martı́ and Reinelt [26]). An important difference between our formulation
and the standard linear ordering problem in the literature is that in our case, the cost structure
is not defined at the job-pair level (i.e., there is no explicit cost for having job i before job j),
but rather it is defined at the bundle level (i.e., there is a cost for having a job scheduled earlier
than a set of other jobs). In addition, here we are maximizing benefits rather than minimizing
cost.
The linear ordering problem is known to be NP-Hard in its general form, but given the
particular structure of the variation that we are considering here, its complexity is worth verifying
formally. The following proposition establishes that the NP-hard, maximum independent set
problem (see Garey and Johnson [18])) can be reduced to (15), so that our market discovery
problem is also NP-hard. Therefore, it is unlikely that there exists an efficient algorithm for
solving it.
Proposition 2 The optimization problem
max
T
X
µ∗t wt
t=1
s.t.:
wt = I{σ(jt ) < σ(i), ∀i ∈ St , i 6= jt }, t = 1, . . . , T,
(16)
is NP-Hard.
Proof. Given a connected graph G = (V, E), with nodes V = {1, 2, . . . , v}, v ≥ 2, and arcs
E ⊂ {(i, j) ∈ V × V, i < j}, a maximum independent set of G is a subset V 0 ⊂ V such that there
is no arc between two nodes in V 0 , and V 0 is of maximum cardinality.
Let I be an instance of maximum independent set. We will construct an instance J of
problem (16) corresponding to I. There will be n = v products and T = v binary decision
variables w1 , . . . , wv . The variable wi represents node i ∈ V . For each node i ∈ V , we consider
the set of adjacent nodes Γ(i) = {j : (i, j) ∈ E}. Then, for every node j ∈ V , we define the
S
“transaction” jt := j and the availability set St := Γ(j) {i, 0}. Finally, we set the coefficients
µ∗t := 1.
Note that in the optimum of J, we cannot have wi0 = wj0 = 1, for an arc (i0 , j0 ) ∈ E.
Arguing by contradiction: Take an arc (i0 , j0 ) ∈ E. If wi0 = 1, and since j0 ∈ Γ(i0 ), then
σ(i0 ) < σ(j0 ). Similarly, if wj0 = 1, and since i0 ∈ Γ(j0 ), then σ(j0 ) < σ(i0 ), which gives
19
an inconsistency in the order σ. Therefore, the output of the optimization problem (16) for
instance J is an independent set of maximum cardinality.
5.3
A MIP formulation to discover new types
Finding a new customer type that increases the likelihood of the observed data reduces to finding
a type that maximizes the sum in (15). Of course, a simple necessary condition for such a type
P
to exist is that Tt=1 µ∗t > β ∗ . If this condition is satisfied, then the problem of finding a new
customer type may be formulated as a MIP.
Define binary linear ordering variables xji , which are equal to 1 if product j goes before i
in the sequence (i.e., preference list), and equal to zero otherwise. In addition, define binary
variables wt , which are equal to 1 if the dual variable u∗t must be added to the sum in (15), and
equal to zero otherwise. Overall, there are O(n2 + T ) binary variables. The MIP is as follows:
max
T
X
wt u∗t
t=1
s.t.:
xji + xij
= 1, ∀j, i, 0 ≤ j < i ≤ n,
xji + xil + xlj ≤ 2, ∀j, i, l, j 6= i 6= l, 0 ≤ j, i, l ≤ n
n
X
xj0 ≥ 1,
(17)
j=1
wt ≤ xjt ,i , ∀i ∈ St , jt 6= i, and ∀t,
xji , wt ∈ {0, 1},
0 ≤ j < i ≤ n, 1 ≤ t ≤ T.
The first set of equalities ensure that either product j goes before product i in the preference list, or product i goes before product j. The second constraints represent transitivity
constraints that ensure a linear ordering between three products. The third constraint precludes
the no purchase alternative from being the most preferred option. The fourth set of constraints
guarantees that in period t, product jt must be preferred over all other available products in St
for the proposed type to make a contribution to the objective function in period t. Note that
wt = 1 only if xjt ,i = 1, for all i ∈ St , i 6= jt .
If the optimal objective function value is bigger than σ ∗ , then we have found a customer type
that will increase the value LI (x, λ). The position of product j in the new ranking, σ (N +1) (j),
can be determined from:
X
σ (N +1) (j) =
xij + 1.
i∈N ,i6=j
5.4
Implementation
The discovery mechanism starts from an initial set σ = {σ (1) , . . . , σ (N ) } of customer types. This
initial set can be built upon an informed guess of the market composition. For instance, we can
20
assume an independent demand model, where we have a customer type σ (jt ) = (jt ) for each of
the transactions jt we observe in the market, leading to N ≤ n singleton types.
Then, we apply our EM algorithm, and get initial proportions xi , 1 ≤ i ≤ N . In our
experience, in order to ensure numerical stability of the procedure, we run a filtering procedure
that eliminates the customer types with very low values of xi (i.e., those less relevant in the
market). In our examples, we set this tolerance value at 1e-4. The procedure also involves
P 0
0
readjusting the remaining proportions xi so that N
i=1 xi = 1, where N ≤ N . Next, we apply
our market discovery proposal using the MIP formulation (17), followed by the filtering procedure
to discard irrelevant customer types. This sequence of discovery-filter steps is repeated until a
convergence criteria (e.g., the difference of the log-likelihood values of two consecutive iterations)
or a maximum number of new types is reached.
6
Numerical examples
We next present the results of some tests of our EM algorithm on both simulated and real-world
data sets. In all the examples, we set a stopping criteria based on the difference between the
vectors (x̂, λ̂) produced by two consecutive iterations of the EM method, halting the procedure
as soon as the absolute value of all the elements of the difference vector was smaller than 1e-4.
We also set a maximum number of iterations at 1e7. In all the experiments below we verified
the numerical convergence of the of the EM iterates to a limit point, though such convergence
in theory is not guaranteed.
The algorithm was implemented using the MATLAB2 procedural language, in which the
method detailed in Section 4.3 is simple to code.
6.1
Comparison with respect to direct MLE
Our first experiment is on simulated data, in which we know the underlying set of types that
generated the data. We also assume complete information in the sense that the modeler knows
the description of the customer types existing in the market, though he does not know the
proportions of each type nor the overall volume of demand. The objective of the experiments
is to assess the performance of our EM method relative to the direct maximization of the
incomplete data log-likelihood function (labeled Direct Max in what follows). For the latter, we
used the built-in MATLAB function “fmincon”, which finds a constrained minimum of a function
of several variables, in our case subject to linear constraints and 0-1 bounds for the decision
variables (x, λ). We set sequential quadratic programming (SQP) as the algorithm to be used
by the MATLAB function. SQP closely mimics Newton’s method for constrained optimization
just as is done for unconstrained optimization. At each major iteration, an approximation is
2
MATLAB is a trademark of The MathWorks, Inc. We used version 7.10.0.499 (R2010a) for Microsoft Windows 7 on a CPU with Intel Core i7 processor and 4Gb of RAM.
21
made of the Hessian of the Lagrangian function using a quasi-Newton updating method. This
is then used to generate a quadratic programming subproblem whose solution is used to form a
search direction for a line search procedure. We set both the iteration and function evaluation
limits at 1,000, and the tolerance (i.e., the difference between two consecutive function values)
at 1e-3. The starting point for both EM and Direct Max was the same: xi = 1/N , λ = 0.5.
We fixed n = 10 and the arrival rate at λ ∈ {0.2, 0.8}. We generated between 100 and 500 instances of transaction data for different combinations of length of selling season T ∈ {1000, 2000, 4000, 8000},
and number of customer types N ∈ {30, 50, 100, 130, 150, 200}. The types themselves were generated by sampling random permutations of the n products along with the no-purchase alternative.
For any instance, each product was available randomly and independently in each period with
probability 0.7. Then, based on the true customer types, we apply our EM and Direct Max to
maximize the incomplete data log-likelihood function.
Table 2 and Table 3 summarize the results, fixing the values provided by Direct Max as
the baseline. We report the difference between the log-likelihood values and root mean squared
errors (RMSEs)3 of our EM algorithm with respect to Direct Max; desirable outcomes are
positive values in the log-likelihood difference, and negative values in the RMSE difference. The
results show a slight dominance of EM on the final log-likelihood values and on the RMSEs.
However, the differences are small in all cases (note that zero is included in 16 of the 48 reported
95% CIs for the mean RMSE difference). The main substantial improvement provided by our
EM method is the computational time: EM is generally 10 times faster than Direct Max, and
this factor may scale up to 20 in some cases. If one thinks of a major carrier in an airline
revenue management setting estimating hundreds of thousands of O-D pairs on a daily or even
more frequent basis, such differences in computing times are indeed significant.
6.2
The effect of the starting point on EM
Given the non-quasiconcavity of the incomplete data log-likelihood function identified in Proposition 1, convergence of the EM method could well be affected by the starting point. So we
next study the effect that different starting points have on the performance of the EM method.4
To this end, we consider again the setting in Section 6.1, but here we generate a single random
instance for each combination (T, N ). Then, for each such instance, we solved the problem starting from 50 randomly generated starting points. We summarize the results for the log-likelihood
and RMSE values obtained in Table 4 for the low demand case (λ = 0.2) and in Table 5 for the
high demand case (λ = 0.8).
We observe that in general, there is little variability with respect to the level set of the log3
The RMSE measures the difference between the total observed and the expected number of purchases throughout the horizon, given the varying availability of products.
4
Recall that, as we discussed in Section 4.4, the local convergence property of the EM method is also shared
by any standard steepest ascent nonlinear optimization algorithm.
22
Table 2: Comparative results with respect to Direct Max (n = 10, λ = 0.2). Results are 95% CI for the mean
values. Times are in seconds.
T
N
Log value diff.
Center Width
RMSE diff.
Center Width
Direct Max time
Center Width
EM time
Center Width
1000
30
50
100
130
150
200
0.00%
0.01%
0.01%
0.01%
0.01%
0.01%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
-0.40%
-0.93%
-0.15%
-0.81%
-0.76%
-0.90%
0.24%
0.33%
0.32%
0.36%
0.40%
0.37%
0.92
1.70
4.09
5.92
6.93
10.66
0.02
0.03
0.07
0.11
0.11
0.18
0.14
0.22
0.44
0.58
0.67
0.92
0.01
0.00
0.01
0.02
0.02
0.03
2000
30
50
100
130
150
200
0.00%
0.01%
0.01%
0.02%
0.02%
0.02%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
-0.34%
-0.53%
-0.05%
-0.67%
-0.76%
-0.81%
0.38%
0.41%
0.20%
0.42%
0.45%
0.48%
1.96
3.73
8.89
12.20
14.66
21.89
0.03
0.05
0.13
0.18
0.20
0.30
0.26
0.45
0.92
1.17
1.41
1.98
0.01
0.02
0.03
0.04
0.04
0.08
4000
30
50
100
130
150
200
0.01%
0.01%
0.02%
0.02%
0.02%
0.03%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
-0.26%
-0.42%
-0.14%
-0.55%
-0.74%
-0.61%
0.44%
0.48%
0.56%
0.60%
0.54%
0.62%
4.13
8.07
19.79
27.49
32.46
47.44
0.06
0.10
0.25
0.26
0.34
0.46
0.63
1.03
2.05
3.05
3.46
4.85
0.03
0.00
0.06
0.10
0.10
0.12
8000
30
50
100
130
150
200
0.00%
0.01%
0.02%
0.03%
0.03%
0.03%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
-0.06%
-1.03%
-1.03%
-0.92%
-0.77%
-0.08%
0.70%
0.59%
0.30%
0.82%
0.76%
0.71%
9.07
17.60
43.96
57.93
71.76
99.38
0.09
0.20
0.40
0.67
0.72
1.05
1.29
2.19
4.34
6.45
7.45
8.85
0.07
0.07
0.14
0.19
0.24
0.23
likelihood function reached by the algorithm despite starting from different points. However,
there is slightly more variability with respect to the RMSE, specially for the high demand case.
Therefore, the limit points of the EM algorithm are different, but the log-likelihood values are
very close. The main take-away from this set of computations is that it may be worth trying
different starting points on the estimation when the density of transactions is high (i.e., the
number of periods with observed no-purchases is small).
6.3
The effect of incomplete market information
Here, we assess the value of market information, in the sense of knowing (or now knowing) the
set of customer types that make up the market. To this end, we generate random instances of
transactions similarly to what was described in Section 6.1, but with the probability of a product
being available in each period equal to 0.5. The key distinction is that in these computations,
the modeler does not know the description of all types; that is, the estimation is performed
23
Table 3: Comparative results with respect to Direct Max (n = 10, λ = 0.8). Results are 95% CI for the mean
values. Times are in seconds.
T
N
Log value diff.
Center Width
RMSE diff.
Center Width
Direct Max time
Center Width
EM time
Center Width
1000
30
50
100
130
150
200
0.01%
0.02%
0.03%
0.04%
0.04%
0.05%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
-0.28%
-0.57%
-0.77%
-0.45%
-1.37%
-1.47%
0.20%
0.24%
0.29%
0.66%
0.70%
0.74%
1.27
2.68
7.10
10.30
13.10
20.04
0.01
0.02
0.04
0.13
0.18
0.24
0.15
0.24
0.48
0.61
0.74
0.98
0.00
0.00
0.01
0.02
0.02
0.03
2000
30
50
100
130
150
200
0.01%
0.02%
0.03%
0.04%
0.05%
0.06%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
-0.10%
0.16%
-0.74%
-1.07%
-1.03%
-2.20%
0.26%
0.32%
0.36%
0.80%
0.79%
0.78%
2.47
5.27
14.84
20.75
25.60
40.75
0.01
0.03
0.08
0.21
0.22
0.50
0.27
0.46
0.98
1.26
1.53
2.17
0.01
0.01
0.01
0.03
0.04
0.06
4000
30
50
100
130
150
200
0.04%
0.04%
0.05%
0.05%
0.05%
0.06%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
-0.45%
-1.37%
-1.47%
-1.11%
-1.61%
-2.02%
0.30%
0.31%
0.33%
1.26%
1.32%
1.33%
10.30
13.10
20.04
39.66
46.21
74.03
0.06
0.08
0.11
0.36
0.43
0.71
0.61
0.74
0.98
2.82
3.46
4.42
0.01
0.01
0.01
0.08
0.10
0.11
8000
30
50
100
130
150
200
0.04%
0.05%
0.06%
0.05%
0.06%
0.07%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
-1.07%
-1.03%
-2.20%
1.49%
2.25%
2.27%
0.36%
0.35%
0.35%
2.06%
2.46%
2.06%
20.75
25.60
40.75
87.79
108.93
173.07
0.09
0.10
0.22
0.78
0.96
1.50
1.26
1.53
2.17
5.38
6.51
8.00
0.01
0.02
0.03
0.20
0.25
0.31
with a set of types that differs from the set of types that generated the data. The aim is to
understand how miss-specification of the set of customer types affects the performance of the
estimation.
We consider two cases. In the first one, the modeler assumes an underlying independent
demand model. This is represented by preference lists of size one. For n = 10, there are ten
customer types, each a singleton of the form σ (k) = (k), k = 1, . . . , 10. In the second case, the
modeler assumes an independent demand model (i.e., the ten previous types) and 10 additional
customer types, which correspond to the ten most probable types among the randomly generated
N types. The rational is that the modeler may not be aware of the fine granularity of the market
composition, but is perhaps aware of the most common customer types in the market.
Table 6 summarizes the results for a low demand intensity (λ = 0.2) and a high demand
intensity (λ = 0.8) scenarios. In this experiment, we conducted an out-of-sample test. The value
of T reported in Table 6 corresponds to the in-sample, training phase where we estimate the
24
Table 4: Performance of EM under different starting points (n = 10, λ = 0.2).
T
N
St. Dev.
Min
RMSE
Max Mean
1000
30
50
100
130
150
200
-836.29
-838.58
-881.82
-849.13
-771.51
-891.27
-836.25
-838.48
-881.77
-848.98
-771.44
-891.02
-836.26
-838.52
-881.80
-849.03
-771.47
-891.11
0.01
0.02
0.01
0.03
0.02
0.06
1.25
1.18
1.14
2.00
1.23
2.11
1.27
1.23
1.23
2.12
1.41
2.18
1.26
1.20
1.19
2.06
1.33
2.14
0.00
0.01
0.02
0.03
0.03
0.02
2000
30
50
100
130
150
200
-1655.11
-1679.00
-1710.47
-1755.93
-1870.90
-1703.58
-1654.98
-1678.86
-1710.13
-1755.39
-1870.52
-1703.19
-1655.01
-1678.90
-1710.24
-1755.52
-1870.69
-1703.38
0.03
0.03
0.08
0.09
0.10
0.11
3.15
2.10
1.41
1.39
2.24
1.52
3.19
2.18
1.48
1.47
2.37
1.67
3.18
2.14
1.44
1.44
2.32
1.58
0.01
0.02
0.02
0.02
0.02
0.03
4000
30
50
100
130
150
200
-3413.08
-3550.82
-3497.67
-3495.93
-3361.43
-3398.68
-3412.87
-3550.28
-3496.89
-3494.95
-3360.62
-3397.85
-3412.94
-3550.45
-3497.14
-3495.32
-3360.97
-3398.26
0.05
0.11
0.19
0.26
0.20
0.17
1.71
3.51
2.27
1.64
1.89
2.17
1.94
3.79
2.54
1.81
2.03
2.42
1.79
3.68
2.39
1.72
1.96
2.29
0.05
0.04
0.04
0.03
0.03
0.06
8000
30
50
100
130
150
200
-6778.86
-6900.93
-6895.88
-6974.29
-7054.32
-7109.21
-6778.63
-6899.96
-6894.53
-6972.09
-7052.29
-7106.18
-6778.73
-6900.37
-6894.96
-6972.81
-7053.22
-7107.10
0.04
0.24
0.33
0.46
0.49
0.56
6.07
2.88
2.56
3.30
2.37
1.97
6.23
3.27
2.85
3.65
2.62
2.41
6.13
3.06
2.68
3.54
2.50
2.19
0.04
0.08
0.06
0.06
0.06
0.10
Min
Log value
Max
Mean
St. Dev.
parameters. Then, we apply those parameters over 100 randomly generated instances from the
true underlying demand model, over 2,000 additional, out-of-sample periods. The baseline for
the RMSE is the one obtained under the true market description. We report 95% CIs for the
mean RMSE differences from using 20 types (labeled “choice under incomplete information”) and
from using 10 singleton types (labeled “independent demand”). For both cases, the estimates
were computed using EM.
We observe substantially different behavior for these two demand scenarios. Both the choice
model under incomplete information and the independent demand model perform equally well
relative to the true underlying demand model for the low intensity case. The choice-based
estimation is off by around 50%, while the independent demand is off by 52%. So in this case,
augmenting the independent demand model with additional customer types provides minimal
benefit.
In contrast, for the high intensity case the difference between the two demand models is
dramatic. The RMSE is on average 430% higher than the true underlying demand model for
25
Table 5: Performance of EM under different starting points (n = 10, λ = 0.8).
T
N
St. Dev.
Min
RMSE
Max Mean
1000
30
50
100
130
150
200
-1884.50
-1997.04
-1972.00
-1975.68
-1974.82
-1980.70
-1884.13
-1996.62
-1971.04
-1975.02
-1974.01
-1980.10
-1884.19
-1996.81
-1971.35
-1975.27
-1974.40
-1980.42
0.05
0.11
0.17
0.15
0.19
0.13
2.84
2.34
1.97
0.94
1.26
0.80
3.09
2.49
2.32
1.15
1.36
0.94
2.99
2.40
2.22
1.07
1.31
0.87
0.03
0.04
0.06
0.04
0.02
0.03
2000
30
50
100
130
150
200
-3831.58
-3932.34
-3939.98
-3969.46
-3984.72
-4009.07
-3831.18
-3930.84
-3938.27
-3967.58
-3982.70
-4006.75
-3831.40
-3931.30
-3938.97
-3968.35
-3983.59
-4007.73
0.09
0.24
0.45
0.42
0.50
0.51
2.41
2.83
1.48
2.70
1.66
1.75
2.62
3.17
1.76
3.08
1.96
2.14
2.51
3.05
1.65
2.91
1.85
1.96
0.04
0.06
0.06
0.08
0.08
0.07
4000
30
50
100
130
150
200
-7884.72
-7855.22
-8015.32
-8046.40
-8025.75
-7973.06
-7880.98
-7853.69
-8011.10
-8041.32
-8020.91
-7966.80
-7881.30
-7854.16
-8012.61
-8043.12
-8022.95
-7969.40
0.54
0.30
0.93
1.05
1.08
1.57
4.83
5.57
4.80
3.30
3.55
2.30
5.63
6.33
5.62
3.78
4.17
3.03
5.37
5.92
5.00
3.56
3.91
2.76
0.11
0.15
0.15
0.11
0.12
0.12
8000
30
50
100
130
150
200
-15038.08
-15700.30
-16171.93
-16119.54
-15891.41
-16126.69
-15037.11
-15694.79
-16163.55
-16111.52
-15881.56
-16113.81
-15037.60
-15696.06
-16167.05
-16115.07
-15886.66
-16119.09
0.28
1.02
2.04
2.01
2.16
2.80
5.46
3.41
2.25
1.60
2.56
1.88
6.10
4.77
3.41
2.54
3.66
3.44
5.84
4.03
2.71
2.15
3.20
2.55
0.14
0.26
0.25
0.22
0.24
0.27
Min
Log value
Max
Mean
St. Dev.
the choice-based model, and on average 1427% higher for the independent demand model.
In summary, for both intensity cases, the choice under incomplete information works better
than the independent demand assumption, but the difference is minimal for the low demand case,
where few transactions are observed throughout the horizon, and hence knowing the preference
order of the customer is less relevant. For the high intensity demand case, many transactions
are observed and therefore understanding the market composition is critical.
6.4
The effect of discovering new types
In this section, we test our market discovery mechanism proposed in Section 5. The procedure
is initialized with the observed transactions assuming an independent demand model, where we
add a customer type σ (jt ) = (jt ) for each of the transactions jt we observe during the horizon.
Using this set of independent demand types as a starting point, we then generated new types
using the MIP-based procedure described in Section 5. The MIP for the discovery mechanism
was implemented using the MATLAB interface from IBM ILOG CPLEX v12.3.
26
Table 6: RMSE comparison between choice under incomplete information and the independent demand assumption. The baseline is the RMSE obtained under the true underlying choice model. Results are 95% CIs for the
mean difference.
T
N
λ = 0.2
Choice under incomp. inf. Independent demand
Center
Width
Center
Width
λ = 0.8
Choice under incomp. inf. Independent demand
Center
Width
Center
Width
1000
30
50
100
130
150
200
48.55%
49.18%
51.07%
48.29%
51.54%
51.42%
9.97%
9.73%
10.93%
10.51%
9.42%
10.87%
48.99%
51.16%
52.70%
49.60%
51.82%
52.55%
10.00%
10.34%
11.36%
10.49%
9.52%
11.38%
183.06%
351.59%
494.88%
532.44%
536.07%
566.19%
23.71%
30.36%
39.95%
38.07%
49.08%
49.29%
1386.13%
1451.24%
1441.92%
1472.40%
1429.45%
1480.30%
91.20%
81.16%
79.29%
72.94%
87.92%
86.71%
2000
30
50
100
130
150
200
45.98%
57.98%
45.56%
44.37%
48.54%
57.45%
8.80%
9.05%
8.50%
11.83%
9.21%
11.62%
50.52%
62.27%
46.99%
45.56%
49.89%
58.24%
10.08%
9.71%
9.11%
11.89%
9.46%
11.85%
164.33%
312.61%
480.03%
561.12%
492.07%
568.76%
17.90%
30.18%
48.66%
57.32%
40.71%
39.68%
1320.58%
1386.60%
1481.61%
1483.49%
1388.06%
1490.92%
77.47%
73.44%
91.22%
92.93%
74.19%
78.70%
4000
30
50
100
130
150
200
48.88%
53.38%
55.07%
60.26%
53.56%
43.43%
12.49%
10.41%
12.84%
12.69%
10.52%
9.03%
51.35%
56.24%
56.27%
62.92%
53.68%
43.66%
11.22%
11.63%
13.42%
13.75%
10.61%
8.92%
185.93%
312.20%
464.25%
456.78%
538.68%
554.84%
18.88%
29.37%
36.53%
32.98%
46.69%
48.00%
1455.95%
1430.55%
1409.72%
1370.59%
1497.11%
1452.82%
76.90%
73.07%
76.61%
69.26%
89.16%
81.01%
8000
30
50
100
130
150
200
49.16%
41.37%
51.21%
58.75%
52.58%
49.73%
12.10%
10.12%
11.60%
12.57%
10.70%
10.31%
48.77%
47.93%
53.24%
59.80%
55.31%
52.05%
12.07%
10.94%
12.42%
12.26%
11.54%
11.07%
170.84%
292.61%
475.71%
526.16%
541.85%
564.84%
20.84%
30.75%
41.94%
50.92%
41.34%
47.59%
1299.80%
1362.90%
1398.92%
1460.88%
1462.92%
1449.15%
74.77%
71.43%
76.16%
90.99%
89.47%
78.42%
In Figure 1 we illustrate an example of the evolution of the procedure on five random instances
of data. The initial set of customer types are built using the independent demand assumption
based on the observed transactions as described, and the performance of this initial set is reflected
in the first data point of each series (zero value on the horizontal axis). The graphs shows the
value of the incomplete data log-likelihood function (left) and the RMSEs (right) as a function
of the number of types discovered by our mechanism. In this case, we stopped after seven new
types were added to the original set, although for one of the instances the mechanism could only
find two new types. We observe that the beneficial effect of new types tend to be decreasing in
the number of types considered, although not in a monotonic way.
Table 7 shows results of the performance of our market discovery mechanism under randomly
generated instances of data. For these experiments, the probability of a product being available
27
Log−likelihood Value for a Market Discovery Example
RMSE for a Market Discovery Example
60
−1150
−1200
50
−1300
40
−1350
RMSE
Log−likelihood value
−1250
−1400
30
−1450
20
−1500
−1550
10
−1600
−1650
0
1
2
3
4
5
6
0
7
Number of new customer types
0
1
2
3
4
5
6
7
Number of new customer types
Figure 1: An example based on synthetic data generated for n = 10 products, N = 5 customer types, T = 1000
periods, and λ = 0.7 arrivals per period. Each product is available in each period with probability 0.5. The initial
customer types are defined by the independent demand assumption built upon the observed transactions.
in each period is 0.5. The baselines are the true underlying demand model to generate the data
(30 instances for each experiment), and the independent demand model based on the observed
transactions. Desirable outcomes are positive values of log-value differences, and negative values
of RMSE differences, meaning that our market discovery mechanism was able to increase the
log-likelihood values and reduce the RMSEs.
Table 7: Effect of the application of the market discovery mechanism (n = 10, T = 1000). Comparative results
with respect to the true underlying demand model and the independent demand model. Results are 95% CIs for
the mean difference.
λ
N
Log-value difference
True demand model Independent demand model
Center
Width
Center
Width
RMSE difference
True demand model Independent demand model
Center
Width
Center
Width
0.2
5
10
15
11.42%
5.83%
12.74%
10.75%
6.37%
10.59%
17.21%
8.78%
15.02%
10.07%
6.18%
10.32%
-61.08%
-65.41%
-60.17%
9.80%
5.65%
12.04%
38.43%
41.33%
37.37%
78.19%
23.72%
33.79%
0.5
5
10
15
4.08%
24.17%
44.35%
6.48%
15.23%
17.71%
14.04%
29.13%
46.52%
5.89%
14.24%
17.02%
-58.32%
-72.45%
-83.43%
8.59%
8.53%
6.15%
-78.80%
-86.04%
-88.65%
5.33%
3.56%
4.17%
0.8
5
10
15
7.13%
10.63%
23.85%
9.04%
10.84%
15.29%
25.76%
23.05%
32.54%
7.43%
9.41%
13.56%
-47.23%
-53.78%
-60.55%
9.22%
9.09%
10.30%
-94.07%
-94.69%
-95.97%
1.10%
0.97%
0.93%
Note that our procedure is able to match (or even beat) the performance of the true underlying demand model both in terms of log-likelihood and RMSE differences. Indeed, when
looking at the RMSE difference, our mechanism consistently beats the true underlying demand
model. When compared to the independent demand model, our mechanism dominates from
a log-likelihood value point of view. Note, however, that our tests are within sample, so the
improved performance relative to the true model is likely due to some degree of over-fitting.
But when looking at the RMSE difference, we see that for λ = 0.2 (i.e., for a low demand
28
volume), the independent demand model dominates. This is consistent with our observation in
Section 6.3, in the sense that there is not much benefit considering a more sophisticated model
of demand under a low volume demand scenario. However, under moderate or high volume demands, our market discovery mechanism exhibits very good performance, providing significant
gains in terms of the quality of the estimates.
While our market discovery mechanism performs well, it is a computationally intensive procedure, since it involves solving a MIP whose difficulty is increasing in the underlying market
complexity (i.e., number of base types N and demand intensity λ). For instance, in our implementation, the average computational time to discover one additional type increased from
2.3 seconds for the case (N = 5, λ = 0.2) to 10 seconds for the case (N = 15, λ = 0.8).
6.5
Examples based on real data
In order to illustrate the performance of our method on real-world data, in this section we
present results applying it to two data sets that we used in our previous paper Vulcano et
al. [40, Section 5.2] (one for an airline market and one for a retail market), and to another hotel
data set publicly available in Bodea et al. [4].
6.5.1
Airline and retail data
The data we obtained for testing the single-class multinomial logit (MNL) model of our previous
paper was collected for one-day long periods, and therefore consisted of multiple transactions
per period. Our approach here was to transform these data into a form compatible with our
model, in which we have at most one transaction per period. To do so, we used the market share
estimate provided by the data owners to subdivide time periods with multiple observations as
follows: if in a given day we observe two sales of product 1 and three sales of product 4, for
a total of five transactions, and if the market share estimate is 0.5, then we split the period
into (2 + 3)/0.5 = 10 sub-periods, two of them with a sale of product 1, three of them with
a sale of product 4, and the remaining five periods with no transactions. When a full period
in the original data does not register any transaction, it is replicated just once. The original
availability of the products on a given day is replicated for each of the 10 periods. Note that this
ensures that the arrival intensity is approximately constant across different periods, which is
consistent with the underlying arrival process of our model, as opposed to the non-homogeneous
Bernoulli process in our prior work [40].
For each of the examples below, we tested the performance of our market discovery mechanism starting from the independent demand assumption, comparing it to the single-class MNL
that we tested in [40]. For the market discovery mechanism, we set a limit of a maximum of
50 types to discover. Due to the somewhat ad-hoc data transformation process described above,
the comparative results should be taken with some caution. Nevertheless, the results give a
29
sense of the estimation properties of our non-parametric, rank-based model relative to a more
standard parametric MNL model.
Airline market example
The first example is based on data from a leading commercial airline serving a sample O-D
market with two daily flights. The original data set covered 77 observation days, with 7 products
(fare-class combinations) from one flight and 8 from the other one, for a total of 15 products.
As in our previous paper, we computed two sets of estimates for the demand under different
assumptions: In the joint, multi-flight case we assumed customers chose between both flights in
the day, so the consideration set consisted of all 15 products with a market share of 50%; in the
independent-flight case, we assumed customers were interested in only one of the two flights,
implying there were two disjoint consideration sets, with a market share of 25% per flight.
Table 8 shows the results. Besides checking the value of LI (x, λ) for each pair (x̂, λ̂) under
the independent demand assumption and our market discovery mechanism (the log-likelihood
function for the MNL is different), we aggregated the observed bookings and the predicted
bookings across all the periods and computed RMSEs.
Table 8: Estimation results for the Airline Market Example
Consideration set Measure
Estimation method
Indep.
Market disc. MNL
Flight 1
LI (v, λ)
RMSE
-652.99
1.41
-564.30
5.57
2.18
Flight 2
LI (v, λ)
RMSE
-734.49
4.43
-605.21
4.21
2.00
Joint
LI (v, λ)
RMSE
-891.15
6.37
-717.85
4.17
6.31
In terms of the log-likelihood value, the general choice model clearly dominates the independent demand model, confirming the improvement provided by our market discovery mechanism.
But in terms of RMSE, we observe that the general choice model is dominated by either the independent demand model or the MNL for the single flight cases. This is perhaps due to the fact
that the data in these cases are too sparse; with only 25% of market share and with T = 737 and
T = 829 periods, respectively, we have only around 200 transactions in total, and this does not
seem to be enough to calibrate a choice model well. This is also consistent with our observation
in Table 7 for the simulated data, where the independent demand dominates in the low-arrival
intensity case.
As Table 8 shows, the likelihood and RMSE performance of our demand model clearly
improves in the joint flight case. In this case, the the number of products in the consideration
set and the number of observed transactions is larger, since T = 525 and the market share
is 50%, giving approximately 260 transactions. Besides an increase in the number of transactions,
30
the improvement in performance is likely due to the fact that our rank-based model is better
at representing customer preferences in this joint-flight case compared to the restrictive MNL
model. Indeed, a nested logit model would likely be a more appropriate parametric model for
this case, in which a customer first chooses a flight and then chooses a particular fare class
within that flight.
It is interesting to observe the final composition of customer types provided by our market
discovery mechanism. In all cases, we started from the singletons of the independent demand
assumption. In general, after adding about 5 to 10 types, we obtain most of the improvement in
log-likelihood values. Yet the final composition of types consisted of either zero or one singletons
for each of the three cases (recall that there is a filtering process that drops the types with very
low probabilities). The hardest case was the joint flight estimation, where the market ended
with 37 customer types, and the procedure stopped because of the maximum number of types
limit. Finding a customer type for this case took about 5 seconds.
Retail market example
This example illustrates our EM method and market discovery mechanism applied to sales data
from a retail chain. We consider sales observed during eight weeks over a sample selling season.
We assume a choice set defined by 6 substitutable products within the same small subcategory
of SKUs. The market share of this retail location is estimated to be 48%. Applying a similar
transformation to the previous example, we get a data set with at most one transaction per
period, resulting in a selling horizon of length T = 245. Here, we also tested the performance of
our market discovery mechanism against the independent demand model and the MNL model.
Our market discovery increased the log-likelihood value of the independent demand model
from −286.79 to −241.34, and the RMSE was reduced from 3.81 to 1.61 after getting a final
composition of 12 customer types with only one singleton. This RMSE beats the 1.86 corresponding to the MNL model reported in [40]. Again, the higher density of data seems to be a
critical factor for the success of our market discovery mechanism.
6.5.2
Hotel data
This example is based on the publicly available data used by Bodea et al. [4]. The booking
records correspond to transient customers (predominately business travelers) with check-in dates
between March 12, 2007, and April 15, 2007 in one of five continental U.S. hotels. For every
hotel, a minimum booking horizon of four weeks for each check-in date was considered. Rate and
room type availability information present at the time of booking was recorded for reservations
made via the hotel or customer relationship officers (CROs), the hotels web sites, and off-line
travel agencies. Due to the memory limitations of our MATLAB-linked CPLEX optimizer, we
focus in particular on the properties labeled Hotel 4 and Hotel 5, with a market share of 20%
each.
We define a product as a room type (e.g., king non-smoking, queen smoking, etc), and
31
a period as a (booking date, check-in date)-pair. Since periods with no purchases were not
recorded in the original data set, we use the market share information to augment the data
with no purchases, similarly to what we did with the previous examples. Specifically, for each
booking record we created four no-purchase records with the same availability of products as
the one for the booking record. Table 9 summarizes further details and the estimation results of
our EM method and market discovery mechanism relative to the independent demand model.
Table 9: Estimation results for the Hotel Example
Feature
Number
Number
Number
Number
of
of
of
of
Hotel 4
products n
original bookings
bookings after preprocessing
periods T
Initial number of singletons under Indep. Demand
Indep. Demand RMSE
Indep. Demand Log-value
Final number of types under Market Discovery
Market Discovery RMSE
Market Discovery Log-value
Hotel 5
9
373
288
1440
8
310
245
1225
9
1.34
-977.48
8
1.40
-1092.46
21, with 1 singleton
1.08
-948.83
22, with 2 singletons
0.89
-1062.05
We note that for both hotels, our market discovery mechanism clearly outperforms the independent demand model, both in terms of log-likelihood and RMSE. The RMSE improvements
here occur even under low market share (i.e., a low value of λ). Intuitively, the difference relative
to the previous data sets is that here products are more widely available (on average, 75% of
the time), and therefore the booking behavior is better explained by a richer choice model. One
can observe this by looking at the final number of customer types discovered by our mechanism
and by the fact that most of the original singletons of the independent demand assumption are
removed from the final set of types.
7
Conclusions
Our demand model combines a Bernoulli process model of arrivals with a non-parametric, rankbased choice model over multiple periods and allows for different availability of products over
time. This specification is quite general and is essentially compatible with any utility maximizing
model of preferences. The method can be used to untruncate demand and correct for spill and
recapture effects across an entire set of substitutable products. More generally, it serves as a
useful and general model of customer demand in a potentially wide range of retail settings.
Given a set of potential customer types (rankings), our EM method provides a very efficient
algorithm for calculating maximum likelihood estimates for the probability of each customer
type and for the arrival rate. The only required inputs are observed historical sales, product
32
availability data, and the given set of customer types. Through an extensive numerical study,
we showed that our method is effective in terms of reliably maximizing the likelihood function
and is an order of magnitude faster computationally than direct maximization of the incomplete
log-likelihood function.
Importantly, we also provide a market discovery mechanism that automatically augments the
initial set of potential customer type based on a column-generation-like approach that involves
solving a novel MIP to discover new types that improve the likelihood. We show that our market
discovery mechanism can significantly improve the quality of the estimates obtained if we start
from singletons from the independent demand assumption as the initial set of customer types.
Sample testing on real-world data sets shows that the market discovery mechanism is effective
relative to parametric models, especially in cases with larger, complex choice sets. The ability
to automate the specification of the set of model types in this way we feel further enhances the
practical appeal of the method.
References
[1] S.E. Andersson. Passenger choice analysis for seat capacity control: A pilot project in Scandinavian Airlines. International Transactions in Operational Research, 5:471–486, 1998.
[2] R. Anupindi, M. Dada, and S. Gupta. Estimation of consumer demand with stock-out based
substitution: An application to vending machine products. Marketing Science, 17:406–423,
1998.
[3] M. Ben-Akiva and S. Lerman. Discrete Choice Analysis: Theory and Applications to Travel
Demand. The MIT Press, Cambridge, MA, sixth edition, 1994.
[4] T.D. Bodea, M.E. Ferguson, and L.A. Garrow. Choice-based revenue management: Data
from a major hotel chain. M&SOM, 11(2):356–361, 2009.
[5] S. Borle, P. Boatwright, J. Kadane, J. Nunes, and S. Galit. The effect of product assortment
changes on customer retention. Marketing Science, 24:616–622, 2005.
[6] R. Boyles. On the convergence of the EM algorithm. Journal of the Royal Statistical Society
(Series B), 45:47–50, 1983.
[7] H. Bruno and N. Vilcassim. Structural demand estimation with varying product availability.
Marketing Science, 27:1126–1131, 2008.
[8] R. Bucklin and S. Gupta. Brand choice, purchase incidence, and segmentation: An integrated modeling approach. Journal of Marketing Research, 29:201–215, 1992.
33
[9] K. Campo, E. Gijsbrechts, and P. Nisol. The impact of retailer stockouts on whether, how
much, and what to buy. International Journal of Research in Marketing, 20:273–286, 2003.
[10] J. Chaneton and G. Vulcano. Computing bid-prices for revenue management under customer choice behavior. M&SOM, 13(4):452–470, 2011.
[11] L. Chen and T. Homem de Mello. Mathematical programming models for revenue management under customer choice. European Journal of Operational Research, 203:294–305,
2010.
[12] P. Chintagunta. Investigating purchase incidence, brand choice and purchase quantity
decisions on households. Marketing Science, 12:184–208, 1993.
[13] P. Chintagunta and J-P Dubé. Estimating a SKU-level brand choice model that combines
household panel data and store data. Journal of Marketing Research, 42:368–379, 2005.
[14] C. Conlon and J. Mortimer. Demand estimation under incomplete product availability.
Working paper, Department of Economics, Harvard University, Cambridge, MA, 2010.
[15] W. Cooper, T. Homem de Mello, and A. J. Kleywegt. Models of the spiral-down effect in
revenue management. Operations Research, 54(5):968–987, 2006.
[16] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the
EM algorithm. Journal of the Royal Statistical Society B, 39:1–38, 1977.
[17] V. Farias, S. Jagabathula, and D. Shah. A non-parametric approach to modeling choice
with limited data. Working paper, MIT Sloan School, Cambridge, MA, 2010.
[18] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of
NP-Completeness. W. H. Freeman and Company, San Francisco, CA, 1979.
[19] T. Gruen, D. Corsten, and S. Bharadwaj. Retail out-of-stocks: A worldwide examination
of causes, rates, and consumer responses. Grocery Manufacturers of America, 2002.
[20] A. Haensel and G. Koole. Estimating unconstrained demand rate functions using customer
choice sets. Journal of Revenue and Pricing Management, 9:1–17, 2010.
[21] A. Haensel, G. Koole, and J. Erdman. Estimating unconstrained customer choice set demand: A case study on airline reservation data. Working paper, VU University Amsterdam,
The Netherlands, 2011.
[22] S. Ja, B. V. Rao, and S. Chandler. Passenger recapture estimation in airline revenue
management. In AGIFORS 41st Annual Symposium, Sydney, Australia, August 2001.
Presentation.
34
[23] K. Kalyanam, S. Borle, and P. Boatwright. Deconstructing each item’s category contribution. Marketing Science, 26:327–341, 2007.
[24] G. Kök and M. Fisher. Demand estimation and assortment optimization under substitution:
Methodology and application. Operations Research, 55:1001–1021, 2007.
[25] S. Mahajan and G.J. van Ryzin. Stocking retail assortments under dynamic consumer
substitution. Operations Research, 49:334–351, 2001.
[26] R. Martı́ and G. Reinelt. The Linear Ordering Problem: Exact and Heuristic Methods in
Combinatorial Optimization. Springer, first edition, 2011.
[27] G. McLachlan and T. Krishnan. The EM Algorithm and Extensions. John Wiley & Sons,
New York, NY, 1996.
[28] A. Musalem, M. Olivares, E. Bradlow, C. Terwiesch, and D. Corsten. Structural estimation
of the effect of out-of-stocks. Management Science, 56:1180–1197, 2010.
[29] J. Newman, M. Ferguson, and L. Garrow. Estimation of choice-based models using sales
data from a single firm. In INFORMS Revenue Management and Pricing Section Conference, New York, NY, June 2011. Presentation.
[30] C. Queenan, M. Ferguson, J. Higbie, and R. Kapoor. A comparison of unconstraining
methods to improve revenue management systems. Production and Operations Management, 16:729–746, 2007.
[31] R. Ratliff, B. Rao, C. Narayan, and K. Yellepeddi. A multi-flight recapture heuristic for
estimating unconstrained demand from airline bookings. Journal of Revenue and Pricing
Management, 7:153–171, 2008.
[32] P. Rusmevichientong, B. van Roy, and P. Glynn. A nonparametric approach to multiproduct
pricing. Operations Research, 54(1):82–98, 2006.
[33] C. Stefanescu. Multivariate customer demand: Modeling and estimation from censored
sales. Working paper, European School of Management and Technology, Berlin, Germany,
2009.
[34] J. Swait and T. Erdem. The effects of temporal consistency of sales promotions and availability on consumer choice behavior. Journal of Marketing Research, 39:304–320, 2002.
[35] K. T. Talluri and G. J. van Ryzin. Revenue management under a general discrete choice
model of consumer behavior. Management Science, 50:15–33, 2004.
35
[36] K.T. Talluri. A finite-population revenue management model and a risk-ratio procedure
for the joint estimation of population size parameters. Working paper, Department of
Economics and Business, Universitat Pompeu Fabra, Barcelona, Spain, 2009.
[37] K. Train. Discrete choice methods with simulation. Cambridge University Press, New York,
NY, 2003.
[38] G. van Ryzin and G. Vulcano. Computing virtual nesting controls for network revenue
management under customer choice behavior. M&SOM, 10(3):448–467, 2008.
[39] G. Vulcano, G. van Ryzin, and W. Chaar. Choice-based revenue management: An empirical
study of estimation and optimization. M&SOM, 12(3):371–392, 2010.
[40] G. Vulcano, G. van Ryzin, and R. Ratliff. Estimating primary demand for substitutable
products from sales transaction data. Leonard N. Stern School of Business, New York
University, New York, NY. Forthcoming in Operations Research, 2008.
[41] C.F. Wu. On the convergence properties of the EM algorithm. The Annals of Statistics,
11:95–103, 1983.
[42] D. Zhang and W. Cooper. Revenue management for parallel flights with customer-choice
behavior. Operations Research, 53(3):415–431, 2006.
36