Fruit Demand: Buying Continuous Quantities of a

Fruit Demand: Buying Continuous Quantities of a Few
Potentially Complementary Goods Among Many
Arthur Lewbel and Lars Nesheim
Boston College and University College London
March 2013
Abstract
We propose a demand model that allows consumers to choose a small number of different
goods from a large menu of available goods, and consume these goods in continuous quantities. The model allows the chosen goods to be substitutes or complements. All errors in the
model are random utility preference parameters. Non-negativity constraints that consumers
consider when utility maximizing produce corners that make different consumers purchase
different numbers of types of goods. We show semparametric identification of the model
where the joint distribution of the random utility terms is not parameterized. We apply the
model to the demand for fruit.
We propose a demand model to analyse consumer choice of a small basket of goods from
a large menu of available goods. The chosen goods may be substitutes or complements.
Demand elasticities depend on choices at both the intensive and extensive margins as well
as on the items in the basket. Because all errors in the model are random utility preference
parameters, different consumers choose completely different baskets of goods and demand
elasticities vary significantly across consumers. We show semiparametric identification of the
model (allowing for a nonparametric distribution of the random utility terms). We apply the
model to the demand for fruit and vegetables and estimate demand elasticities in this market
JEP codes:
1
Keywords:
1
Introduction
We propose a demand model that has some features of discrete demands like multinomial choice, and
other features of traditional continuous demand systems. As a motivating example, consider consumer’s
demand for fresh fruit. There are dozens or hundreds of different types of fruit consumers can choose
among. Consumers typically choose three or four types of fruit to purchase, and buy them in varying
quantities.
The most popular method of dealing with similar demand problems, as exemplified by the BLP model,
would be to discretize purchases and treat each unit purchased as a separate multinomial choice decision. However, the assumptions that underlie this methodology are likely to be seriously violated in fruit
demand.
One issue is continuity of fruit demand. Purchases are often by weight, and nonuniformity of the size
of each fruit and the number chosen make it possible to purchase fruit in close to continuous quantities.
For this issue, some extensions to multinomial choice models exist that combine a discrete choice for a
product with a continuous choice for the quantity of that product. See, e.g., Hendel (1999) or Dube (2004).
A far more serious limitation of multinomial choice models is that they generally rule out complements, while patterns of fruit consumption indicate that some fruits are strong complements for others
(e.g., different types of berries are frequently purchased together and consumed jointly, and varying fruits
are complementary inputs to home production of dishes like fruit salad). It would be possible to allow for
complements in a discrete choice framework by considering varying combinations of fruit as if they were
distinct goods (e.g., treating an apple, a banana, and the combination of both as three separate possible
choices). However, the number of possible combinations of just two or three fruits out of over a hundred
available fruits makes this approach completely impractical. In our empirical application each consumer
chooses up to three options from a set of 80 fruits and vegetables. There are 85,400 distinct combinations
to be considered. The combinations purchased vary dramatically across consumers. The problem be2
comes even more impossible to implement when continuous quantities are considered, or even nonbinary
quantities such as treating two apples and one banana as yet another choice the consumer could make.
Even when fruits are not complements, they are still generally purchased in bundles, making the assumption of independence across purchases that underlies the multinomial logit model very unlikely to
hold, even approximately.
Traditional continuous demand models were designed to handle joint purchases of bundles in continuous quantities, but such models generally assume each consumer buys most or all goods in nonzero
amounts. Methods exist for dealing with small numbers of zeros in such models, but in our case the
overwhelming majority of available goods are purchased in zero amounts by each consumer, i.e., each
consumer selects a few (typically two to four) types of fruits to buy from the universe of one to two
hundred types. Using traditional demand systems, large numbers of zeros are usually dealt with by aggregating up to a few different broad categories of goods (like in our case just having two or three types of
fruit). However, for many applications we (including, e.g., marketers, sellers, and nutrition policy makers)
will be interested in the determinants of demand for each type of fruit, not just broad aggregates.
These issues are important for analysing aggregate demand elasticities. First, the aggregate demand
elasticities will be determined by changes in demand at both the intensive and the extensive margin. In
response to an increase in the price of apples, some people buy fewer apples. Some people switch from
buying apples to pears. Second, the individual elasticities vary significantly across households because
different households choose to buy different combinations. The substition patterns for someone who buys
applies, bananas and courgettes are different from someone who buys only apples and bananas or from
someone who buys apples, pears, and lettuce.
Our approach is to incorporate a Gorman (1980) and Lancaster (1966) type characteristics model into
a continuous quadratic demand system, with unobserved heterogeniety in the demand for characteristics,
and allowing for many corner solutions in the demand for characteristics.
3
2
Literature
As summarized by, e.g., Blundell and Meghir (1987), the literature on continuous demand considers three
main causes for observing zero purchases of some goods. One reason is lexicographic preferences, that is
an individual might prefer to consume any amount of other goods, no matter how small, to a given good.
More simply, this occurs when an individual will never choose to buy the good in question, regardless
of income or relative price. An example would be the zero cigarette consumption of a nonsmoker. A
second reason is infrequency of purchase. A good that is durable or storable may be consumed regularly,
but purchases of it may not be observed in a given time period of observation. A third reason is corners.
This occurs when, while maximizing utility, the constraint that quantities are nonnegative is binding. This
means that at a given level of prices and total expenditures a consumer chooses to purchase a zero quantity
of a good, where in some other price and total expenditure regimes (such as one where the price of the
good in question is lower), the consumer purchases a positive amount.
In our fruit demand example, infrequency of purchase can be largely ruled out over time spans longer
than a few days, because fresh fruit is not durable and cannot be stored for very long.
Lexicographic preferences are typically modeled analogously to Heckman (1979) type sample selection models, where a binary choice equation models the decision of whether to consume the good or not,
and then ordinary demand systems are are estimated either including or excluding the good in question.
Systems of equations like these can be estimated parametrically using Shonkwiler and Yen (1999), or,
specifically for demand systems, Yen and Lin, (2006) A recent example of demand system estimation of
this type (still with a small number of goods) is the semiparametric estimator of Sam and Zheng (2010).
Models like these that have equations for determining zero versus non-zero consumption that are separate
from equations determining desired quantities require utility functions that are fundamentally different for
non-consumers and consumers of a good. These types of models are generally most appropriate for goods
that a significant fraction of the population would never consume, like tobacco or alcohol.
In our model we focus on corners, since it is likely that very few types of fruit are such that one would
never purchase under any circumstance. Our model allows for substantial heterogeneity of preferences,
4
and so can roughly accomodate lexicographic preferences by having the value of parameters that would
result in positive purchases of a good be set equal to extreme values for consumers who would never
purchase the good.
An extreme version of models based on corners are brand choice models, where the constraint that
consumers buy exactly one brand is imposed either a priori or by the structure of the utility function. For
example, Hendel (1999) proposes a model in which firms maximize a profit function by choosing a single
brand (of computer) along with deciding how many units to buy (firms that are observed to buy multiple
brands are divided into separate tasks, and each task is treated as if it was an individual firm choosing one
brand). Similarly Dube (2004) proposes a model where the purchase for each "consumption occasion" is
the decision to purchase a single brand, but in a continuous quantity. However, a drawback of all these
discrete choice based models is that they rule out the possibility of different goods being complements.
None would, e.g., allow for the possibility of making a fruit salad. In contrast, our model is based on
directly on continuous joint demand for multiple goods, and so allows for goods to be complements, and
more generally places no separability restrictions on the demands for different goods.
Corners in continuous demand models are generally modeled as censored regressions, such as tobit
models. The early continuous demand system literature that considered corners formally focused on cases
where either a single good, or a very small number of goods, may have zeros. Examples include Wales
and Woodland (1983) and Lee and Pitt (1986). Applications of continuous demand systems with many
goods and censoring work as follows. Let p and y be a price vector and total expenditures, respectively.
Utility maximization without nonnegativity constraints are first used to derive models of the form q j D
f j . p; y/Ce j for each good J , where q j is a latent quantity and e j is an error term. Each observed quantity
q j is then assumed to be given by q j D maxf0; q j g. Examples of such models include Golan, Perloff, and
Shen (2001) and Meyerhoefer, Ranney, and Sahn (2005).
These censored demand models have two flaws. First, either the utility derived demand functions are
f j . p; y/, and the errors e j are arbitrarily appended, or errors are incorporated as random utility parameters
yielding demand equations of the form q j D f j . p; y; e/ C e j , and each function f j .P; Y; e/ is approx-
5
imated by some function f j .P; Y /. The commonest example of this latter method is based on Deaton
and Muellbauer’s (1980) Almost Ideal Demand System (AID), where the vector e appears in the demand
functions f j .P; Y; e/ only inside a general price index, which is replaced by an approximate Stone price
index. Second, and more seriously, these models are not fully consistent with utility maximization. This
is because the nonnegativity constraints are not incorporated into the consumer’s utility maximization. In
these models, the consumer first chooses possibly negative quantities for some goods to maximize utility,
and then actually purchases zero quantities for these goods. These problems apply to almost all demand
systems with many goods that allow for censoring based either on e or those based on separate selection
equations (except for the brand choice models that forbid complementarities, that solve this problem by
imposing extreme forms of separability).
In contrast, the model we propose overcomes all of these problems. It has each consumer take all the
nonnegativity constraints directly into account when maximizing their utility functions, it fully incorporates error terms into the model directly as preference heterogeneity parameters, and it allows for arbitrary
patterns of substitutability or complementarity among the goods.
3
The Model
Let q ji be the quantity of good j that is purchased by consumer i, and let qi 2 RCJ be the bundle of
goods purchased by consumer i, so qi is the J vector of elements qi j . Suppose that the utility consumer
i gets from purchasing this bundle of goods is a function of K unobserved, latent attributes. Let Bk j be
the quantity of attribute k that a consumer derives from buying a unit of good j, and let B be the K
J
matrix of elements Bk j . Then the quantity bundle of the K attributes that a consumer derives utility from
is the vector Bqi . Assume K < J; rank.B/ D K and B T B
0
This is essentially the Gorman-Lancaster linear household technologies model. We assume consumer’s
have quasilinear quadratic utility over attributes. Where our model departs from Gorman is that we assume
that maximized utility can have many corners, (i.e., points where indifference curves intersect with axes
in attribute space), and we assume that location shifts in the utility for each attribute vary randomly across
6
consumers. Also, most past implementations of Gorman’s model assumed K D J , whereas we will have
K much smaller than J .
The Gorman Lancaster model with K much smaller than J accounts for most of the zeros in observed
demands, since it implies that any one consumer will not need to purchase positive quantities of more
than K different goods. Preference variation across consumers will make different consumers to choose
different goods. However, the Gorman model is still inadequate, because it implies that all consumers will
(with probability one) buy the same number of goods, K . This is role of corners.
Analogous to a Tobit model, in our model the value of a latent index (i.e., each element of Bqi ,
corresponding to a quantity of each attribute) plus an error term determines whether a given attribute is
desired sufficiently (relative to its cost) to purchase in nonzero amounts. These error terms are location
shifts in the utility for each attribute. The presence of these corners is what causes different consumers to
choose varying numbers of different goods to purchase in nonzero amounts, from zero to K .
Other models exist that deal with zeros in a similar way, such as the censored Almost Ideal Demand
System discussed earlier, but these models have the drawbacks described in the previous section.
We assume that each individual i maximizes the quasilinear quadratic utility function
q0i
Bqi /T .ei
0:5 .ei
Bqi / such that yi
p T qi and qi
0
where yi 2 R is total expenditures, q0i 2 R is a numeraire good, p 2 RCJ is a price vector, and ei 2 R K ,
which is randomly distributed in the population, is a vector of preference parameters corresponding to
satiation levels or bliss points for each attribute k. Our model can be straightforwardly generalize to
remove the quasilinearity assumption and thereby introduce income effects, by writing the utility function
as U q0i ; 0:5 .ei
Bqi /T .ei
Bqi / for some monotonically increasing utility function U . But we put
aside this complication for now since the application we have in mind, fruit demand, is generally a small
component of one’s overall budget.
This utility function can nests both standard continuous demand systems and standard discrete choice
models. Continuous quadratic direct utility function correspond to this model with K D J , and the model
becomes equivalent to discrete choice models like multinomial probit by taking e to be normal, restricting
7
B to ensure the optimal q has at most one nonzero element, and adding the constraint that q
1. However,
we do not impose these restrictions.
Substituting out the budget constraint, the consumer chooses qi to maximize
yi
p T qi
Bqi /T .ei
0:5 .ei
Bqi / such that qi
0
(1)
Our baseline model assumes all consumers have the same B and the vector e is continuously distributed
and has support on a set in R K that has positive volume. All consumers in a single market face the same
vector of prices p but consumers in different markets may face different prices. A market is defined by a
time period, a geographic location, and perhaps a store.
4
First order conditions
To ease notation, drop the i subscript. The Lagrangian for each consumer’s maximization problem is
L .q; / D y
pT q
e/T .Bq
0:5 .Bq
e/ C
T
q
where is a vector of Lagrange multipliers. The first order conditions are
0 D
0 D
B T .Bq
p
T
q,
0, q
e/ C
(2)
0
and the second order conditions are
BT B
0:
Due to quasilinearity, the value of y does not affect the maximizing choice of q. This model implicitly
assumes either that the numeraire can be consumed in negative quantities, or that y
p T q for any
optimizing value of q. Note that this latter condition will hold automatically as long as y is large enough
to purchase a bundle q that attains the satiation level Bq D e (though consumers in that situation may still
not choose to buy that bundle, if the utility value of holding more of the numeraire is greater).
Each consumers can maximize utility by buying nonzero amounts of at most K goods. Given prices
and B; the first order conditions define a linear partition of R K with at most R D
8
J
K
elements. Let
Er be an element of the partition. All consumers with e 2 Er choose a quantity qr with the same nonzero components. For each consumer, calculating their optimal quantity bundle q entails solving a convex
quadratic program. Even though finding an optimum requires finding a bundle with
J
K
non-zero elements,
because the problem is convex, there exist algorithms for obtainin a solution in polynomial time (interior
point and related methods).
To prove identification, it will be useful to characterize solutions that have the maximum number K
of nonzero elements. To do so, let q D q 1 ; q 2 be a vector for which q 1
0 and q 2 D 0 such that
dim q 1 D K : Let p1 and p2 be the corresponding price subvectors and B1 and B2 the corresponding
submatrices of B so that
2
6 B1 7
BD4
5:
B2
Then q is optimal for all e satisfying
The first equation says that
3
p1
B1T B1 q 1
e
D 0
(3)
p2
B2T B1 q 1
e
0
(4)
q1
0
(5)
p1 D B1T B1 q 1
B1T
e , which if B1 is nonsingular means
1
p1 D B1 q 1
e:
We can substitute this into inequality .4/ to obtain
p2 C B2T B1T
1
p1
0:
(6)
p1
0
(7)
It is then optimal to choose bundle q if
p2
B2T B1T
q 1 D B1T B1
1
9
1
B1T e
p1
0
(8)
5
Identification
In this section we discuss conditions sufficient to ensure that B is point identified. We also discuss identification of the distribution of e. Suppose that, for some e, B T e
0. Then it follows immediately from the
first order conditions (2) that q D 0 regardless of what value p takes on. It therefore follows immediately
that nothing can be identified regarding the distribution of e for all e 2 fe j B T e
0g, other than the
probability of lying in this set. So the best that can be established (which we provide below) is point
identification of the distribution of e over all values e that are not in this set.
ASSUMPTION A1: Consumers buy the minimum number of different goods necessary to maximize
utility given by equation (1). Assume that p is continuously distributed on the positive orthant with a
density that is strictly positive almost everywhere on the positive orthant. Assume that the distribution of
q given p is known.
Buying the minimum number of goods is essentially a tie breaker for knife edge situations where utility
can be maximized in more than one way. Given the assumed continuity of prices, these knife edges occur
with probability zero. The distribution of q in a population facing prices p is in principle observable, so
Assumption A1 essentially says that, for proving identification, this distribution is assumed to be known
for any value of p.
ASSUMPTION A2: The K
.K
K
1/ matrix B
j
J matrix B has rank K . For every column B j of B; there exists a
consisting of K
1 columns of B such that e
Bj D
Bj
B
j
is nonsingular.
Without loss of generality, B has the scale and rotation normalizations described in Appendix A.
Assumption A2 will ensure that for every good j, there exists a set of K goods including good j
such that some consumers choose to buy a bundle consisting of the those K goods. Then identification of
the j’th column of B will be assured using constructions like that of the previous section based on those
consumers, with nonsingularity of e
B j in Assumption A2 taking the place of nonsingularity of B1 in the
previous section.
10
For any K
K matrix A such that A T A D I , our utility function is observationally equivalent to a
utility function that replaces B and e with AB and Ae. More generally, B can only be identified up to a set
of scale and rotation normalizations. These are the normalizations that are described in detail in Appendix
A.
ASSUMPTION A3: Let f e denote the density function of e. The density f e is strictly positive on the
set E D e j B T e
0 . e is distributed independently of p.
As discussed above, for e 2
= E, it is not possible to learn anything about f e .e/ other than the total
probability of not lying in the set E, so we focus on identification of f e .e/ inside the set E.
Let B be the set of unique combinations of K different goods chosen from the J available goods, and
let R D
J
K
be the total number of elements of B: Let r 2 f1; :::Rg index each possible element of B. Let
i r D i 1r ; :::; i kr be an element of B and let qr be a vector of quantities satisfying qr .i/ D 0 if i 2
= i r : We
call qr a K dimensional basket or bundle corresponding to the list i r . So for a given basket qr , i r indexes
the nonzero elements of qr : Let pr be the K vector of prices of the goods qr , and let p
r
be the J
K
vector of prices of all the other goods. Let B r D B .; i r / be the submatrix of B corresponding to these
e
nonzero elements. Let R
e
B denote the smallest set of bundles such that B r is nonsingular for all r 2 R
e has at least J=K elements and no more than J
and B j is a column in B r for some r: The set R
K C1
elements. By Assumption A2, for every good j the column B j lies in some nonsingular B r .
e there is a set A
LEMMA 1: For every r 2 R;
P Y and a set Q r D qr 2 R J with qr .i/ D 0 if i 2
= ir
such that Pr .Q r jA / > 0:
PROOF: Consider qr 2 Q r : It is optimal to choose qr when inequalities .7/ and .8/ are satisfied for
q D qr . That is when p
r
B T r BrT
1
pr
0 and qr D BrT Br
1
BrT e
pr
0. Assumptions A2
and A3 ensure that this event has positive probability.
Given Lemma 1, we can now establish identification of B. For each good j, there is a subset Br of
K goods as described above that includes good j. For this set of goods let pr be sufficiently low, and let
p
r
be sufficiently high, to yield a positive probability of observing bundles q r in which q r .i/ > 0 for all
11
i 2 Br : Then q r > 0 for all p 0 D pr0 ; p 0
Let Br be the K
r
where pr0
pr and p 0
r
p
r
p0 ; y .
K submatrix consisting of the columns of B corresponding to the set Br of
these K goods, and let pr and q r denote K vectors of prices and quantities of those K goods. By the
first order conditions, a consumer buying q r has BrT Br q r D BrT e
pr . By assumption A2, BrT Br is
nonsingular. The demand functions for these K goods for the consumers in this region are therefore
q r D BrT Br
1
BrT e
pr . Since the the distribution of e does not depend on pr , the derivative with
respect to prices pr of the conditional mean (or any conditional quantile) of q r conditioning on p (which
1
can be calculated at any point that is not on the boundary of the region) is BrT Br
, which identifies
BrT Br .
By Assumption A2, each good j appears in some bundle r for which the above derivation can be
performed and BrT Br can be identified, so all of the columns of B are recoverable up to normalizations
from the collection of estimates of BrT Br . At most J
K such bundles r would be required (so that each
good j appears in at least one such bundle) and as few as J=K such bundles might be needed.
For each r; we identify
Ar D BrT Br :
In addition, these matrices share common elements. So, we can pick one bundle r and define
Ar D Dr Cr CrT Dr
where Dr is a positive diagonal matrix and Cr is the Cholesky decomposition of a correlation matrix. We
can then define
Br D CrT Dr :
This provides the rotation and scale normalizations up to which B will be identified. Given Br D CrT Dr ,
the remaining columns of B are identified by sequentially dropping the last column of Br and replacing
X
2
it with each remaining column of B: The elements of column j for j 2
= Br satisfy
B j .i/ D d 2j for
i
some d j > 0.
Having now shown identification of B, consider the distribution of e. Given Br for all possible bundles
12
r , we can observe BrT e D BrT Br q r C pr for all observable q r ; pr pairs. Since q r and pr are nonnegative,
we can uncover observations of BrT e and hence of e for all e 2 E, thereby identifying f e .e/ for all e 2 E.
Note that if e is finitely parameterized, then in general those parameters will be identified given nonparametric identification of f e .e/ over the region E. In our empirical application we will take this approach to identification, by assuming e is normally distributed.
6
Empirical Application
We use data from the Kantar (formerly TNS) World Panel for the UK for calendar year 2008 on all
purchases of food brought into the home by 16,637 households. Households record purchases of all items
bought using handheld scanners and record prices from till receipts. The data contain a large set of product
attributes (at the barcode level) as well as household characteristics.1 We use data on all purchases of fruits
and vegetables. There are 101 categories of fruits and vegetables including for example apricots, bananas,
lettuce, apples, courgettes. Table 1 shows the top ten most frequently purchased categories. The top three
categories are bananas (8.55% of purchases), tomatoes (7.79%) and dessert apples(5.97%). Table 2 shows
how frequently households purchased baskets containing different numbers of items. 70% of the time,
households purchased more than one item and 90% of the time they purchased 7 or fewer items. On
nearly 10% of shopping trips, household purchased more than 7 items. A discrete choice model cannot
capture this type of variation.
Table 3 shows the most frequenly purchased 2-item combinations. 26% of these combinations include
purchase of a banana. However, the second good purchased varies significantly. The table only shows
30% of the 2-item combinations purchased. Many other 2-item combinations with smaller market shares
are also purchased. None has a large share but together they account for a large share of 2-item baskets.
Our model can account for this variety of choices.
Table 4 shows for selected categories, conditional on purchase of an item in that category, how frequenly each number of items was purchased. In all cases, 6 or more items are purchased more than 41%
1 See
Leicester and Oldfield (2009) for further information on the data.
13
of the time. There is no obvious pattern.
For every purchase occasion, we observe the expenditure and price of all items purchased. However,
for items not purchased the price is not observed. We impute prices using a hedonic regression. For each
category we estimate a hedonic price model
ln pit D xit C h .t/ C "it
where ln pit is the price of item i in period t; xit is a vector of characteristics of item i in period t and h .t/
is a 6th order polynomial in time. Time is measured as the day within the year. Characteristics included are
country of origin, branded, organic, tiering, fascia, and packaging. Figures 1-3 show the price estimates
for apricots, bananas and cherries. Each figure shows a scatter plot of log price and the predicted log
price. For apricots and cherries, prices rise in the spring and the autumn. These are periods when fresh
apricots and cherries are more costly and more scarce. In contast, the price of bananas is relatively flat.
The pictures also make clear that at a single point in time there is a great deal of variability in price. This
variation is primarily due to quality variation (tiering and fascia) and due to promotions.
Using this data for 101 categories,we estimated the parameters of the model. Preliminary results are
displayed in Figures 4-5. These figures show for categories 1-10 and 11-20 histograms of the estimated
elasticities as well as the market shares for each of these 20 categories. Two points are clear. For each
category, the distribution of elasticities varies significantly. Much of this variation is due to variation across
consumers in the number of items purchased. The elasticity is very different for someone who chooses to
buy 2 items than it is for someone who chooses to buy 6 items. Second, the elasticities vary significantly
across categories.
These results are preoiminary at this stage. We are currently doing more work to ensure that the results
are robust and to compute standard errors.
14
A
Appendix A: Parameterization of B
As discussed regarding Assumption A2, the matrix B is identified up to an arbitrary set of rotation
and scale normalizations. These normalizations can be imposed as follows. Define the partition B D
B1 B2
where B1 is K
K and B2 is K
J
K : Parameterize B1 and B2 as follows.
2
6 d1 d2 c21 d3 c31
6
6
6 0 d2 c22 d3 c32
6
6
B1 D 6
0
d3 c33
6 0
6
6
::
6
:
6
4
0
0
0
K ; d j > 0;
where for all j
j
X
r D1
3
dK cK 1 7
7
7
dK cK 2 7
7
7
dK cK 3 7
7
7
7
::
7
:
7
5
dK cK K
c2jr D 1; and for all r > j; c jr D 0: Thus, B1 is an upper triangular
matrix. For each j; the parameter d j captures the productivity of product j and the parameters c jr
j
r D1
capture complementarities.
For column j of B2 we parameterize
2
3
d
c
6 j j1 7
6
7
6
7
6 d j c j2 7
7
B2 . j/ D 6
6
7
:
:
6
7
:
6
7
4
5
djcjK
with d j > 0 and
K
X
r D1
c2jr D 1:
The matrix B has J C
column j plus
K .K 1/
2
K .K 1/
2
C .J
K / .K
free parameters c jr
j;r
1/ parameters. There is one parameter d j for each
in B1 and .J
B2 :
15
K / .K
1/ free parameters c jr
j;r
in
We can store the c jr parameters in a matrix as follows
2
c21
cK 1
6 1
6
q
6
2
6 0
1 c21
cK 2
6
6
:::
C D6
6
6
v
u
6
K
X1
u
6
t1
4
cK j
3
7
7
7
7
7
7
7
7
7
7
7
5
jD1
For j
K ; the elements of column j of B can be written using hyper-spherical coordinates
B .1; j/ D d j cos
1
B .2; j/ D d j sin
1
cos
2
B .3; j/ D d j sin
1
sin
2
(9)
cos
3
::
:
B .j
1; j/ D d j sin
B . j; j/ D d j sin
with di > 0;
k
2 [0; ] for k < j 1 and
j 1
sin
1
sin
1
j 2
cos
j 2
sin
j 1
j 1
2 [0; 2 / : See http://en.wikipedia.org/wiki/Hypersphere#Hyperspher
Consider column j of B2 and define e
B1 by replacing column K of B1 with column j of B2 : Then
define
T
eD
A
B11 B21
B11 B21
e is identified from people who purchase items 1 through K
A
:
1 and item K C j: The first K
(10)
1 columns
of e
B1 are constrained to equal those in B1 : The final column is constrained by .10/ : Therefore, we can
16
parameterize column j of B2 as
B2 .1; j/ D d j cos
1
B1 .2; j/ D d j sin
1
cos
2
B1 .3; j/ D d j sin
1
sin
2
(11)
cos
3
K 2
cos
::
:
B1 .K
1; j/ D d j sin
B2 .K ; j/ D d j sin
B
sin
1
1
sin
K 2
K 1
sin
K 1
:
Appendix B: Estimation details
C
The data consists of independent observations of .yi ; pi ; qi / for each household i: pi 2 R C
J and qi 2 R J :
We drop the i subscript for ease of exposition. We write down the likelihood function for each of three
cases. Case 1 is the case if a consumer purchases exactly K goods. In this case, the mapping from data to
random utility errors is one-to-one. Case 2 is the case if a consumer chooses fewer than K items but more
than zero. In this case, many values of the random utility errors are consistent with the observed choice.
In this case, the likelihood function is the integral over a region of the random utility error space. Case 3
is the case if a consumer chooses to purchase nothing. This case is similar to case 2 but the integral must
be computed over a different region.
B.1
Case 1: choice of K goods
Suppose the goods are sorted so that q D .q1 ; 0/ : Let p D . p1 ;2 / be the corresponding vector of prices.
That is, the first K elements are non-negative and the remaining J
17
K elements are 0. Then inverse
demand is
e D
B1T
1
D
B1T
1
p1 C B1T B1 q1
p1 C B1 q1
1
B2T B1T
p2
p1 :
and the log-likelihood is
ln f q .q1 / D ln f e
1
B1T
p1 C B1 q1 C ln .det .B1 //
(12)
where f q is the density of q1 ; f e is the density of e and B1 is the submatrix of B corresponding to the
items in the vector q1 : Note that parameter values must satisfy the constraints that p2
B.2
B2T B1T
1
p1 :
Case 2: Choice of fewer than K goods
Suppose a household chooses q D .q1 ; 0/ with q1 > 0 and dim .q1 / D d1 < K : In this case, for each q1
there are multiple vectors e that solve .??/ : In fact, there is a linear space of dimension K
words, for each .q1 ; e2 / 2 Rd1
e1 D
T
B11
RK
1
d1 ;
d1 : In other
there is a unique e1 defined by
1
T
p1 C B11 C B11
T
B12
B12 q1
T
B11
1
T
B12
e2
(13)
D G 0 C G 1 q 1 C G 2 e2
where
G0 D
1
T
B11
p1
(14)
T
G 1 D B11 C B11
G2 D
T
B11
1
1
T
B12
B12
T
B12
:
Here we have assumed that B11 is invertible. Since B1 has rank d1 by assumption, we can always find a
partition with an invertible B11 :
Consider the partially observed random vector .q1 ; e2 / : q1 is observed but e2 is not. The expressions
above imply that the density of .q1 ; e2 / is
f q1 e2 .q1 ; e2 / D f e .G 0 C G 1 q1 C G 2 e2 ; e2 / det .G 1 /
18
where .G 0 ; G 1 ; G 2 / are defined in .14/ :
We observe q1 if inequality .??/ is satisfied. Partitioning B2 .K
2
3
6 B21 7
B2 D 4
5
B22
where B21 is size .d1
T
B22
T
T
B21
B11
J
1
d1 / and B22 is size .K
T
B12
e2
d1
1
T
T
B21
B11
p2
d1 / as
J
d1 / ; this inequality is
J
T
p1 C B22
B12
T
T
B21
B11
1
T
B12
B12 q1 :
(15)
Rewrite .15/ as
M1 e2
M2
(16)
where
T
M1 D B22
is a .J
d1
K
d1
T
B12
d1 / matrix and
M 2 D p2
is .J
1
T
T
B21
B11
T
T
B21
B11
1
T
p1 C B22
B12
T
T
B21
B11
1
T
B12
B12 q1
1/ : Then the likelihood is
f q1 .q1 / D
Let d2 D K
Z
f q1 e2 .q1 ; e2 / 1 .M1 e2
M2 / de2 :
(17)
d and suppose the density of e has the form
0:5e2T e2
e
:
f e .e1 ; e2 / D e
f e .e1 ; e2 /
d2
.2 / 2
The matrix M1 has the QR decomposition
M1 D R Q
where R is .J
d1
variable e2 D Q
1 x;
d2 / lower triangular and Q is .d2
d2 / orthogonal. Then using the change of
the integral can be written as
Z
T
e 0:5e2 e2
e
f q1 .q1 / D
f q1 e2 .q1 ; e2 /
de2
d2
2
.2
/
R Qe2 D
Z
0:5x T x
e
1
e
D
f q1 e2 q1 ; Q x
dx
d2
2
.2 /
Rx D
19
(18)
(19)
1Q
since Q is an orthogonal matrix. (That is Q
D I and det .Q/ D 1/ The matrix R is lower triangular.
Therefore, row i has at most i nonzero elements.
Start from xd2 : Let JdC2 be the set of rows of R that have positive elements in column d2 and Jd2 the set
with negative elements: Then for all j 2 JdC2 ;
X
Dj
xd2
1
and for all j 2 Jd2 ;
R . j; d2 /
X
Dj
R . j; i/ xi
i<d2
R . j; i/ xi
i<d2
xd2
R . j; d2 /
i
h
So, the bounds on xd2 are xd2 2 xdL2 ; xdH2 where
0
0
B
B
B
xdL2 D max B
1;
max
@
@
and
j2Jd
0
0
Dj
d2
X
R . j; i/ xi
i<d2
R . j; d2 /
11
CC
CC
AA
11
CC
CC :
AA
xH
Zx1
Z d2
xdL
x1L
Next for all j
R . j; d2 /
1 through 1: Then the integral is
H
f q1 .q1 / D
R . j; i/ xi
i<d2
2
B
B
B
xdH2 D min B
1;
min
@
@
j2J C
We repeat the calculation for j D d2
X
Dj
1:
2
e
f q1 e2 q1 ; Q
1
e
x
0:5x T x
d2
d x:
(20)
.2 / 2
d2 define u j D 8 x j : Then making the change of variables, the integral is equivalent
to
H
f q1 .q1 / D
Zu 1
u 1L
uH
Z d2
u dL
2
e
f q1 e2 q1 ; Q
where
u Lj D 8 x Lj
u Hj D 8 x H
:
j
20
1
x .u/ du
(21)
Finally, for all j
d2 making the change of variable u j D
f q1 .q1 / D
Z1
1
B.3
C
Z1 Y
d2
1
jD1
u Hj
u Lj
2
!
u Hj u Lj
.1Cv j /
2
e
f q1 e2 q1 ; Q
1
; this is equivalent to
x .v/ dv:
(22)
Case 3: Choice of 0 goods
References
Blundell, R. and C. Meghir, (1987), "Bivariate alternatives to the Tobit model," Journal of Econometrics,
34, 179-200.
Golan, A., J. M. Perloff, and E. Z. Shen (2001), " Estimating a Demand System with Nonnegativity
Constraints: Mexican Meat Demand," Review of Economics and Statistics 83 541-550
Gorman, W. M., (1980), "A Possible Procedure for Analysing Quality Differentials in the Egg Market,"
The Review of Economic Studies, 47, 843-856.
Heckman, J. (1979), "Sample Selection as a Specification Error," Econometrica, 47, 153–61.
Hendel, I. (1999), "Estimating multiple-discrete choice models: An application to computerization
returns," Review of Economic Studies, 66, 423–446.
Lancaster, K. (1966), "A New Approach to Consumer Theory", Journal of Political Economy, 74,
132-157.
Lee, L., and M. M. Pitt, (1986), " Microeconometric Demand Systems with Binding Non negativity
Constraints: The Dual Approach," Econometrica, 54, 1237–42.
Meyerhoefer, C.D., C. K. Ranney, and D. E. Sahn (2005), "Consistent Estimation of Censored Demand
Systems Using Panel Data," American Journal of Agricultural Economics, 87, 660-672.
Sam, A. G. and and Y. Zheng (2010), "Semiparametric Estimation of Consumer Demand Systems with
Micro Data," American Journal of Agricultural Economics, 92, 246-257.
Shonkwiler, J. S., and S. T. Yen (1999), "Two-Step Estimation of a Censored System of Equations,"
American Journal of Agricultural Economics, 81, 972–82.
21
Wales, T. J.,and A. D. Woodland (1983) "Estimation of Consumer Demand Systems with Binding
Non-negativity Constraints," Journal of Econometrics, 21, 263–85.
Yen, S. T., and B. Lin, (2006) "A Sample Selection Approach to Censored Demand Systems," American Journal of Agricultural Economics, 88, 742–49.
Yen, S. T., B. Lin, and D. M. Smallwood (2003), "Quasi- and Simulated-likelihood Approaches to Censored Demand Systems: Food Consumption by Food Stamp Recipients in the United States," American
Journal of Agricultural Economics, 85, 458–78.
D
Tables
Table 1: Most frequently purchased categories
Type
Freq.
Percent
Cum.
Banana
347,843
8.55
8.55
Tomato
317,190
7.79
16.34
Dessert Apples
242,778
5.97
22.31
Mushroom
224,796
5.52
27.83
Cucumber
198,320
4.87
32.70
Carrots
165,550
4.07
36.77
Old Potatoes
154,139
3.79
40.56
Lettuce
152,148
3.74
44.30
Berries+Currants 134,735
3.31
47.61
New Potatoes
130,505
3.21
50.81
Table 2: Number of items purchased
22
No. of items
Freq.
Percent
Cum.
1
351,723
30.31
30.31
2
225,232
19.41
49.72
3
157,413
13.56
63.28
4
113,592
9.79
73.07
5
83,837
7.22
80.29
6
62,932
5.42
85.72
7
46,986
4.05
89.77
8
34,722
2.99
92.76
9
25,425
2.19
94.95
10
18,206
1.57
96.52
11
12,878
1.11
97.63
12
9,006
0.78
98.40
Table 3: Most frequent 2-item combinations
23
Combination
Freq.
Percent
Cum.
(banana,dessert apples)
32,886
4.07
4:07
(banana,berries)
32,884
4.07
8.14
(banana,broccoli)
27,399
3.39
12.53
(baking potato,banana)
26,915
3.33
15.86
(banana,carrots)
24,414
3.02
18.88
(banana,cucumber)
20,692
2.56
21.44
(banana,beans)
16,694
2.06
23.50
(banana,cabbage)
11,717
1.45
24.95
(cucumber,lettuce)
11,226
1.39
26.34
(banana,easy peelers)
10,435
1.29
27.63
(broccoli,carrots)
9,975
1.23
28.86
(cucumber,dessert apples)
8,929
1.10
29.96
Table 4: Number of items purchased conditional on purchase of a vegetable type
24
Type
Apricot
Artichokes
Asparagus
Aubergines
Avocado
Baking Pot.
Bananas
E
1
2
3
4
5
6+
129
183
240
249
220
1,272
5.63
7.98
10.47
10.86
9.59
55.47
17
12
11
12
13
93
10.76
7.59
6.96
7.59
8.23
58.86
755
1,159
1,296
1,215
1,146
6,400
6.31
9.68
10.83
10.15
9.57
53.46
245
361
490
475
482
3,693
4.26
6.28
8.53
8.27
8.39
64.27
1,258
1,687
1,727
1,746
1,657
9,235
7.27
9.75
9.98
10.09
9.57
53.35
9,488
9,290
8,853
7,984
7,180
30,558
12.93
12.66
12.07
10.88
9.79
41.66
46,839
49,738
45,726
40,383
34,945
130,212
13.47
14.30
13.15
11.61
10.05
37.43
Figures
Results: Categories 1-10
25
0
1
2
3
Figure 1: Log price of apricots
0
100
200
group(date)
LogPrice
300
400
300
400
LogP1
0
.5
1
1.5
2
Figure 2: Log price of bananas
0
100
200
group(date)
LogPrice
26
LogP1
0
1
2
3
Figure 3: Log price of cherries
0
100
200
group(date)
300
LogPrice
400
LogP1
Figure 4: Elasticities: Categories 1-10
Type 1, Mean = -0.006, share = 0.008
1
0.5
0
-0.015
-0.01
Type 2, Mean = -0.052, share = 0.008
-0.005
0
2
1
0
-0.1
0
10
5
0
-25
Type 3, Mean = -0.0023, share = 0.008
1
0.5
0
-5
-4
-3
-2
-1
-0.15
-0.1
-0.02
0
-20
-15
-5
0
-10
Type 6, Mean = -0.6, share = 0.284
-0.05
200
100
0
-40
0
Type 7, Mean = -4.7, share = 0.004
-30
-20
-10
0
Type 8, Mean = -0.7, share = 0.064
1
0.5
40
20
0
-15
-10
-5
0
0
-10
0
100
50
0
-40
Type 9, Mean = -6.2, share = 0.018
-20
-15
-10
-8
-6
-4
-2
0
Type 10, Mean = -2.1, share = 0.122
5
0
-25
-0.04
-3
Type 5, Mean = -0.011, share = 0.084
-0.2
-0.06
Type 4, Mean = -2.4, share = 0.026
x 10
40
20
0
-0.25
-0.08
-5
27
-30
-20
-10
0
Figure 5: Elasticities: categories 11-20
Type 11, Mean = -2.9, share = 0.118
Type 12, Mean = -2.8, share = 0.038
50
0
-50
-40
-30
-20
-10
0
20
10
0
-40
0
100
50
0
-50
0
40
20
0
-15
Type 13, Mean = -0.14, share = 0.06
20
10
0
-0.8
-0.6
-0.4
-40
-30
-0.2
-20
-10
Type 17, Mean = -0.94, share = 0.012
0
-40
-30
-20
-10
0
-10
-5
0
Type 18, Mean = -0.59, share = 0.004
1
0.5
-4
-2
0
0
-1.5
0
1
0.5
0
-10
Type 19, Mean = -0.0011, share = 0.006
2
1
0
-3
-10
Type 16, Mean = -0.37, share = 0.068
5
0
-6
-20
Type 14, Mean = -3.5, share = 0.156
Type 15, Mean = -15, share = 0.062
10
5
0
-50
-30
-2
-1
-0.5
0
Type 20, Mean = -4.3, share = 0.006
-1
x 10
-3
28
-8
-6
-4
-2
0