Fruit Demand: Buying Continuous Quantities of a Few Potentially Complementary Goods Among Many Arthur Lewbel and Lars Nesheim Boston College and University College London March 2013 Abstract We propose a demand model that allows consumers to choose a small number of different goods from a large menu of available goods, and consume these goods in continuous quantities. The model allows the chosen goods to be substitutes or complements. All errors in the model are random utility preference parameters. Non-negativity constraints that consumers consider when utility maximizing produce corners that make different consumers purchase different numbers of types of goods. We show semparametric identification of the model where the joint distribution of the random utility terms is not parameterized. We apply the model to the demand for fruit. We propose a demand model to analyse consumer choice of a small basket of goods from a large menu of available goods. The chosen goods may be substitutes or complements. Demand elasticities depend on choices at both the intensive and extensive margins as well as on the items in the basket. Because all errors in the model are random utility preference parameters, different consumers choose completely different baskets of goods and demand elasticities vary significantly across consumers. We show semiparametric identification of the model (allowing for a nonparametric distribution of the random utility terms). We apply the model to the demand for fruit and vegetables and estimate demand elasticities in this market JEP codes: 1 Keywords: 1 Introduction We propose a demand model that has some features of discrete demands like multinomial choice, and other features of traditional continuous demand systems. As a motivating example, consider consumer’s demand for fresh fruit. There are dozens or hundreds of different types of fruit consumers can choose among. Consumers typically choose three or four types of fruit to purchase, and buy them in varying quantities. The most popular method of dealing with similar demand problems, as exemplified by the BLP model, would be to discretize purchases and treat each unit purchased as a separate multinomial choice decision. However, the assumptions that underlie this methodology are likely to be seriously violated in fruit demand. One issue is continuity of fruit demand. Purchases are often by weight, and nonuniformity of the size of each fruit and the number chosen make it possible to purchase fruit in close to continuous quantities. For this issue, some extensions to multinomial choice models exist that combine a discrete choice for a product with a continuous choice for the quantity of that product. See, e.g., Hendel (1999) or Dube (2004). A far more serious limitation of multinomial choice models is that they generally rule out complements, while patterns of fruit consumption indicate that some fruits are strong complements for others (e.g., different types of berries are frequently purchased together and consumed jointly, and varying fruits are complementary inputs to home production of dishes like fruit salad). It would be possible to allow for complements in a discrete choice framework by considering varying combinations of fruit as if they were distinct goods (e.g., treating an apple, a banana, and the combination of both as three separate possible choices). However, the number of possible combinations of just two or three fruits out of over a hundred available fruits makes this approach completely impractical. In our empirical application each consumer chooses up to three options from a set of 80 fruits and vegetables. There are 85,400 distinct combinations to be considered. The combinations purchased vary dramatically across consumers. The problem be2 comes even more impossible to implement when continuous quantities are considered, or even nonbinary quantities such as treating two apples and one banana as yet another choice the consumer could make. Even when fruits are not complements, they are still generally purchased in bundles, making the assumption of independence across purchases that underlies the multinomial logit model very unlikely to hold, even approximately. Traditional continuous demand models were designed to handle joint purchases of bundles in continuous quantities, but such models generally assume each consumer buys most or all goods in nonzero amounts. Methods exist for dealing with small numbers of zeros in such models, but in our case the overwhelming majority of available goods are purchased in zero amounts by each consumer, i.e., each consumer selects a few (typically two to four) types of fruits to buy from the universe of one to two hundred types. Using traditional demand systems, large numbers of zeros are usually dealt with by aggregating up to a few different broad categories of goods (like in our case just having two or three types of fruit). However, for many applications we (including, e.g., marketers, sellers, and nutrition policy makers) will be interested in the determinants of demand for each type of fruit, not just broad aggregates. These issues are important for analysing aggregate demand elasticities. First, the aggregate demand elasticities will be determined by changes in demand at both the intensive and the extensive margin. In response to an increase in the price of apples, some people buy fewer apples. Some people switch from buying apples to pears. Second, the individual elasticities vary significantly across households because different households choose to buy different combinations. The substition patterns for someone who buys applies, bananas and courgettes are different from someone who buys only apples and bananas or from someone who buys apples, pears, and lettuce. Our approach is to incorporate a Gorman (1980) and Lancaster (1966) type characteristics model into a continuous quadratic demand system, with unobserved heterogeniety in the demand for characteristics, and allowing for many corner solutions in the demand for characteristics. 3 2 Literature As summarized by, e.g., Blundell and Meghir (1987), the literature on continuous demand considers three main causes for observing zero purchases of some goods. One reason is lexicographic preferences, that is an individual might prefer to consume any amount of other goods, no matter how small, to a given good. More simply, this occurs when an individual will never choose to buy the good in question, regardless of income or relative price. An example would be the zero cigarette consumption of a nonsmoker. A second reason is infrequency of purchase. A good that is durable or storable may be consumed regularly, but purchases of it may not be observed in a given time period of observation. A third reason is corners. This occurs when, while maximizing utility, the constraint that quantities are nonnegative is binding. This means that at a given level of prices and total expenditures a consumer chooses to purchase a zero quantity of a good, where in some other price and total expenditure regimes (such as one where the price of the good in question is lower), the consumer purchases a positive amount. In our fruit demand example, infrequency of purchase can be largely ruled out over time spans longer than a few days, because fresh fruit is not durable and cannot be stored for very long. Lexicographic preferences are typically modeled analogously to Heckman (1979) type sample selection models, where a binary choice equation models the decision of whether to consume the good or not, and then ordinary demand systems are are estimated either including or excluding the good in question. Systems of equations like these can be estimated parametrically using Shonkwiler and Yen (1999), or, specifically for demand systems, Yen and Lin, (2006) A recent example of demand system estimation of this type (still with a small number of goods) is the semiparametric estimator of Sam and Zheng (2010). Models like these that have equations for determining zero versus non-zero consumption that are separate from equations determining desired quantities require utility functions that are fundamentally different for non-consumers and consumers of a good. These types of models are generally most appropriate for goods that a significant fraction of the population would never consume, like tobacco or alcohol. In our model we focus on corners, since it is likely that very few types of fruit are such that one would never purchase under any circumstance. Our model allows for substantial heterogeneity of preferences, 4 and so can roughly accomodate lexicographic preferences by having the value of parameters that would result in positive purchases of a good be set equal to extreme values for consumers who would never purchase the good. An extreme version of models based on corners are brand choice models, where the constraint that consumers buy exactly one brand is imposed either a priori or by the structure of the utility function. For example, Hendel (1999) proposes a model in which firms maximize a profit function by choosing a single brand (of computer) along with deciding how many units to buy (firms that are observed to buy multiple brands are divided into separate tasks, and each task is treated as if it was an individual firm choosing one brand). Similarly Dube (2004) proposes a model where the purchase for each "consumption occasion" is the decision to purchase a single brand, but in a continuous quantity. However, a drawback of all these discrete choice based models is that they rule out the possibility of different goods being complements. None would, e.g., allow for the possibility of making a fruit salad. In contrast, our model is based on directly on continuous joint demand for multiple goods, and so allows for goods to be complements, and more generally places no separability restrictions on the demands for different goods. Corners in continuous demand models are generally modeled as censored regressions, such as tobit models. The early continuous demand system literature that considered corners formally focused on cases where either a single good, or a very small number of goods, may have zeros. Examples include Wales and Woodland (1983) and Lee and Pitt (1986). Applications of continuous demand systems with many goods and censoring work as follows. Let p and y be a price vector and total expenditures, respectively. Utility maximization without nonnegativity constraints are first used to derive models of the form q j D f j . p; y/Ce j for each good J , where q j is a latent quantity and e j is an error term. Each observed quantity q j is then assumed to be given by q j D maxf0; q j g. Examples of such models include Golan, Perloff, and Shen (2001) and Meyerhoefer, Ranney, and Sahn (2005). These censored demand models have two flaws. First, either the utility derived demand functions are f j . p; y/, and the errors e j are arbitrarily appended, or errors are incorporated as random utility parameters yielding demand equations of the form q j D f j . p; y; e/ C e j , and each function f j .P; Y; e/ is approx- 5 imated by some function f j .P; Y /. The commonest example of this latter method is based on Deaton and Muellbauer’s (1980) Almost Ideal Demand System (AID), where the vector e appears in the demand functions f j .P; Y; e/ only inside a general price index, which is replaced by an approximate Stone price index. Second, and more seriously, these models are not fully consistent with utility maximization. This is because the nonnegativity constraints are not incorporated into the consumer’s utility maximization. In these models, the consumer first chooses possibly negative quantities for some goods to maximize utility, and then actually purchases zero quantities for these goods. These problems apply to almost all demand systems with many goods that allow for censoring based either on e or those based on separate selection equations (except for the brand choice models that forbid complementarities, that solve this problem by imposing extreme forms of separability). In contrast, the model we propose overcomes all of these problems. It has each consumer take all the nonnegativity constraints directly into account when maximizing their utility functions, it fully incorporates error terms into the model directly as preference heterogeneity parameters, and it allows for arbitrary patterns of substitutability or complementarity among the goods. 3 The Model Let q ji be the quantity of good j that is purchased by consumer i, and let qi 2 RCJ be the bundle of goods purchased by consumer i, so qi is the J vector of elements qi j . Suppose that the utility consumer i gets from purchasing this bundle of goods is a function of K unobserved, latent attributes. Let Bk j be the quantity of attribute k that a consumer derives from buying a unit of good j, and let B be the K J matrix of elements Bk j . Then the quantity bundle of the K attributes that a consumer derives utility from is the vector Bqi . Assume K < J; rank.B/ D K and B T B 0 This is essentially the Gorman-Lancaster linear household technologies model. We assume consumer’s have quasilinear quadratic utility over attributes. Where our model departs from Gorman is that we assume that maximized utility can have many corners, (i.e., points where indifference curves intersect with axes in attribute space), and we assume that location shifts in the utility for each attribute vary randomly across 6 consumers. Also, most past implementations of Gorman’s model assumed K D J , whereas we will have K much smaller than J . The Gorman Lancaster model with K much smaller than J accounts for most of the zeros in observed demands, since it implies that any one consumer will not need to purchase positive quantities of more than K different goods. Preference variation across consumers will make different consumers to choose different goods. However, the Gorman model is still inadequate, because it implies that all consumers will (with probability one) buy the same number of goods, K . This is role of corners. Analogous to a Tobit model, in our model the value of a latent index (i.e., each element of Bqi , corresponding to a quantity of each attribute) plus an error term determines whether a given attribute is desired sufficiently (relative to its cost) to purchase in nonzero amounts. These error terms are location shifts in the utility for each attribute. The presence of these corners is what causes different consumers to choose varying numbers of different goods to purchase in nonzero amounts, from zero to K . Other models exist that deal with zeros in a similar way, such as the censored Almost Ideal Demand System discussed earlier, but these models have the drawbacks described in the previous section. We assume that each individual i maximizes the quasilinear quadratic utility function q0i Bqi /T .ei 0:5 .ei Bqi / such that yi p T qi and qi 0 where yi 2 R is total expenditures, q0i 2 R is a numeraire good, p 2 RCJ is a price vector, and ei 2 R K , which is randomly distributed in the population, is a vector of preference parameters corresponding to satiation levels or bliss points for each attribute k. Our model can be straightforwardly generalize to remove the quasilinearity assumption and thereby introduce income effects, by writing the utility function as U q0i ; 0:5 .ei Bqi /T .ei Bqi / for some monotonically increasing utility function U . But we put aside this complication for now since the application we have in mind, fruit demand, is generally a small component of one’s overall budget. This utility function can nests both standard continuous demand systems and standard discrete choice models. Continuous quadratic direct utility function correspond to this model with K D J , and the model becomes equivalent to discrete choice models like multinomial probit by taking e to be normal, restricting 7 B to ensure the optimal q has at most one nonzero element, and adding the constraint that q 1. However, we do not impose these restrictions. Substituting out the budget constraint, the consumer chooses qi to maximize yi p T qi Bqi /T .ei 0:5 .ei Bqi / such that qi 0 (1) Our baseline model assumes all consumers have the same B and the vector e is continuously distributed and has support on a set in R K that has positive volume. All consumers in a single market face the same vector of prices p but consumers in different markets may face different prices. A market is defined by a time period, a geographic location, and perhaps a store. 4 First order conditions To ease notation, drop the i subscript. The Lagrangian for each consumer’s maximization problem is L .q; / D y pT q e/T .Bq 0:5 .Bq e/ C T q where is a vector of Lagrange multipliers. The first order conditions are 0 D 0 D B T .Bq p T q, 0, q e/ C (2) 0 and the second order conditions are BT B 0: Due to quasilinearity, the value of y does not affect the maximizing choice of q. This model implicitly assumes either that the numeraire can be consumed in negative quantities, or that y p T q for any optimizing value of q. Note that this latter condition will hold automatically as long as y is large enough to purchase a bundle q that attains the satiation level Bq D e (though consumers in that situation may still not choose to buy that bundle, if the utility value of holding more of the numeraire is greater). Each consumers can maximize utility by buying nonzero amounts of at most K goods. Given prices and B; the first order conditions define a linear partition of R K with at most R D 8 J K elements. Let Er be an element of the partition. All consumers with e 2 Er choose a quantity qr with the same nonzero components. For each consumer, calculating their optimal quantity bundle q entails solving a convex quadratic program. Even though finding an optimum requires finding a bundle with J K non-zero elements, because the problem is convex, there exist algorithms for obtainin a solution in polynomial time (interior point and related methods). To prove identification, it will be useful to characterize solutions that have the maximum number K of nonzero elements. To do so, let q D q 1 ; q 2 be a vector for which q 1 0 and q 2 D 0 such that dim q 1 D K : Let p1 and p2 be the corresponding price subvectors and B1 and B2 the corresponding submatrices of B so that 2 6 B1 7 BD4 5: B2 Then q is optimal for all e satisfying The first equation says that 3 p1 B1T B1 q 1 e D 0 (3) p2 B2T B1 q 1 e 0 (4) q1 0 (5) p1 D B1T B1 q 1 B1T e , which if B1 is nonsingular means 1 p1 D B1 q 1 e: We can substitute this into inequality .4/ to obtain p2 C B2T B1T 1 p1 0: (6) p1 0 (7) It is then optimal to choose bundle q if p2 B2T B1T q 1 D B1T B1 1 9 1 B1T e p1 0 (8) 5 Identification In this section we discuss conditions sufficient to ensure that B is point identified. We also discuss identification of the distribution of e. Suppose that, for some e, B T e 0. Then it follows immediately from the first order conditions (2) that q D 0 regardless of what value p takes on. It therefore follows immediately that nothing can be identified regarding the distribution of e for all e 2 fe j B T e 0g, other than the probability of lying in this set. So the best that can be established (which we provide below) is point identification of the distribution of e over all values e that are not in this set. ASSUMPTION A1: Consumers buy the minimum number of different goods necessary to maximize utility given by equation (1). Assume that p is continuously distributed on the positive orthant with a density that is strictly positive almost everywhere on the positive orthant. Assume that the distribution of q given p is known. Buying the minimum number of goods is essentially a tie breaker for knife edge situations where utility can be maximized in more than one way. Given the assumed continuity of prices, these knife edges occur with probability zero. The distribution of q in a population facing prices p is in principle observable, so Assumption A1 essentially says that, for proving identification, this distribution is assumed to be known for any value of p. ASSUMPTION A2: The K .K K 1/ matrix B j J matrix B has rank K . For every column B j of B; there exists a consisting of K 1 columns of B such that e Bj D Bj B j is nonsingular. Without loss of generality, B has the scale and rotation normalizations described in Appendix A. Assumption A2 will ensure that for every good j, there exists a set of K goods including good j such that some consumers choose to buy a bundle consisting of the those K goods. Then identification of the j’th column of B will be assured using constructions like that of the previous section based on those consumers, with nonsingularity of e B j in Assumption A2 taking the place of nonsingularity of B1 in the previous section. 10 For any K K matrix A such that A T A D I , our utility function is observationally equivalent to a utility function that replaces B and e with AB and Ae. More generally, B can only be identified up to a set of scale and rotation normalizations. These are the normalizations that are described in detail in Appendix A. ASSUMPTION A3: Let f e denote the density function of e. The density f e is strictly positive on the set E D e j B T e 0 . e is distributed independently of p. As discussed above, for e 2 = E, it is not possible to learn anything about f e .e/ other than the total probability of not lying in the set E, so we focus on identification of f e .e/ inside the set E. Let B be the set of unique combinations of K different goods chosen from the J available goods, and let R D J K be the total number of elements of B: Let r 2 f1; :::Rg index each possible element of B. Let i r D i 1r ; :::; i kr be an element of B and let qr be a vector of quantities satisfying qr .i/ D 0 if i 2 = i r : We call qr a K dimensional basket or bundle corresponding to the list i r . So for a given basket qr , i r indexes the nonzero elements of qr : Let pr be the K vector of prices of the goods qr , and let p r be the J K vector of prices of all the other goods. Let B r D B .; i r / be the submatrix of B corresponding to these e nonzero elements. Let R e B denote the smallest set of bundles such that B r is nonsingular for all r 2 R e has at least J=K elements and no more than J and B j is a column in B r for some r: The set R K C1 elements. By Assumption A2, for every good j the column B j lies in some nonsingular B r . e there is a set A LEMMA 1: For every r 2 R; P Y and a set Q r D qr 2 R J with qr .i/ D 0 if i 2 = ir such that Pr .Q r jA / > 0: PROOF: Consider qr 2 Q r : It is optimal to choose qr when inequalities .7/ and .8/ are satisfied for q D qr . That is when p r B T r BrT 1 pr 0 and qr D BrT Br 1 BrT e pr 0. Assumptions A2 and A3 ensure that this event has positive probability. Given Lemma 1, we can now establish identification of B. For each good j, there is a subset Br of K goods as described above that includes good j. For this set of goods let pr be sufficiently low, and let p r be sufficiently high, to yield a positive probability of observing bundles q r in which q r .i/ > 0 for all 11 i 2 Br : Then q r > 0 for all p 0 D pr0 ; p 0 Let Br be the K r where pr0 pr and p 0 r p r p0 ; y . K submatrix consisting of the columns of B corresponding to the set Br of these K goods, and let pr and q r denote K vectors of prices and quantities of those K goods. By the first order conditions, a consumer buying q r has BrT Br q r D BrT e pr . By assumption A2, BrT Br is nonsingular. The demand functions for these K goods for the consumers in this region are therefore q r D BrT Br 1 BrT e pr . Since the the distribution of e does not depend on pr , the derivative with respect to prices pr of the conditional mean (or any conditional quantile) of q r conditioning on p (which 1 can be calculated at any point that is not on the boundary of the region) is BrT Br , which identifies BrT Br . By Assumption A2, each good j appears in some bundle r for which the above derivation can be performed and BrT Br can be identified, so all of the columns of B are recoverable up to normalizations from the collection of estimates of BrT Br . At most J K such bundles r would be required (so that each good j appears in at least one such bundle) and as few as J=K such bundles might be needed. For each r; we identify Ar D BrT Br : In addition, these matrices share common elements. So, we can pick one bundle r and define Ar D Dr Cr CrT Dr where Dr is a positive diagonal matrix and Cr is the Cholesky decomposition of a correlation matrix. We can then define Br D CrT Dr : This provides the rotation and scale normalizations up to which B will be identified. Given Br D CrT Dr , the remaining columns of B are identified by sequentially dropping the last column of Br and replacing X 2 it with each remaining column of B: The elements of column j for j 2 = Br satisfy B j .i/ D d 2j for i some d j > 0. Having now shown identification of B, consider the distribution of e. Given Br for all possible bundles 12 r , we can observe BrT e D BrT Br q r C pr for all observable q r ; pr pairs. Since q r and pr are nonnegative, we can uncover observations of BrT e and hence of e for all e 2 E, thereby identifying f e .e/ for all e 2 E. Note that if e is finitely parameterized, then in general those parameters will be identified given nonparametric identification of f e .e/ over the region E. In our empirical application we will take this approach to identification, by assuming e is normally distributed. 6 Empirical Application We use data from the Kantar (formerly TNS) World Panel for the UK for calendar year 2008 on all purchases of food brought into the home by 16,637 households. Households record purchases of all items bought using handheld scanners and record prices from till receipts. The data contain a large set of product attributes (at the barcode level) as well as household characteristics.1 We use data on all purchases of fruits and vegetables. There are 101 categories of fruits and vegetables including for example apricots, bananas, lettuce, apples, courgettes. Table 1 shows the top ten most frequently purchased categories. The top three categories are bananas (8.55% of purchases), tomatoes (7.79%) and dessert apples(5.97%). Table 2 shows how frequently households purchased baskets containing different numbers of items. 70% of the time, households purchased more than one item and 90% of the time they purchased 7 or fewer items. On nearly 10% of shopping trips, household purchased more than 7 items. A discrete choice model cannot capture this type of variation. Table 3 shows the most frequenly purchased 2-item combinations. 26% of these combinations include purchase of a banana. However, the second good purchased varies significantly. The table only shows 30% of the 2-item combinations purchased. Many other 2-item combinations with smaller market shares are also purchased. None has a large share but together they account for a large share of 2-item baskets. Our model can account for this variety of choices. Table 4 shows for selected categories, conditional on purchase of an item in that category, how frequenly each number of items was purchased. In all cases, 6 or more items are purchased more than 41% 1 See Leicester and Oldfield (2009) for further information on the data. 13 of the time. There is no obvious pattern. For every purchase occasion, we observe the expenditure and price of all items purchased. However, for items not purchased the price is not observed. We impute prices using a hedonic regression. For each category we estimate a hedonic price model ln pit D xit C h .t/ C "it where ln pit is the price of item i in period t; xit is a vector of characteristics of item i in period t and h .t/ is a 6th order polynomial in time. Time is measured as the day within the year. Characteristics included are country of origin, branded, organic, tiering, fascia, and packaging. Figures 1-3 show the price estimates for apricots, bananas and cherries. Each figure shows a scatter plot of log price and the predicted log price. For apricots and cherries, prices rise in the spring and the autumn. These are periods when fresh apricots and cherries are more costly and more scarce. In contast, the price of bananas is relatively flat. The pictures also make clear that at a single point in time there is a great deal of variability in price. This variation is primarily due to quality variation (tiering and fascia) and due to promotions. Using this data for 101 categories,we estimated the parameters of the model. Preliminary results are displayed in Figures 4-5. These figures show for categories 1-10 and 11-20 histograms of the estimated elasticities as well as the market shares for each of these 20 categories. Two points are clear. For each category, the distribution of elasticities varies significantly. Much of this variation is due to variation across consumers in the number of items purchased. The elasticity is very different for someone who chooses to buy 2 items than it is for someone who chooses to buy 6 items. Second, the elasticities vary significantly across categories. These results are preoiminary at this stage. We are currently doing more work to ensure that the results are robust and to compute standard errors. 14 A Appendix A: Parameterization of B As discussed regarding Assumption A2, the matrix B is identified up to an arbitrary set of rotation and scale normalizations. These normalizations can be imposed as follows. Define the partition B D B1 B2 where B1 is K K and B2 is K J K : Parameterize B1 and B2 as follows. 2 6 d1 d2 c21 d3 c31 6 6 6 0 d2 c22 d3 c32 6 6 B1 D 6 0 d3 c33 6 0 6 6 :: 6 : 6 4 0 0 0 K ; d j > 0; where for all j j X r D1 3 dK cK 1 7 7 7 dK cK 2 7 7 7 dK cK 3 7 7 7 7 :: 7 : 7 5 dK cK K c2jr D 1; and for all r > j; c jr D 0: Thus, B1 is an upper triangular matrix. For each j; the parameter d j captures the productivity of product j and the parameters c jr j r D1 capture complementarities. For column j of B2 we parameterize 2 3 d c 6 j j1 7 6 7 6 7 6 d j c j2 7 7 B2 . j/ D 6 6 7 : : 6 7 : 6 7 4 5 djcjK with d j > 0 and K X r D1 c2jr D 1: The matrix B has J C column j plus K .K 1/ 2 K .K 1/ 2 C .J K / .K free parameters c jr j;r 1/ parameters. There is one parameter d j for each in B1 and .J B2 : 15 K / .K 1/ free parameters c jr j;r in We can store the c jr parameters in a matrix as follows 2 c21 cK 1 6 1 6 q 6 2 6 0 1 c21 cK 2 6 6 ::: C D6 6 6 v u 6 K X1 u 6 t1 4 cK j 3 7 7 7 7 7 7 7 7 7 7 7 5 jD1 For j K ; the elements of column j of B can be written using hyper-spherical coordinates B .1; j/ D d j cos 1 B .2; j/ D d j sin 1 cos 2 B .3; j/ D d j sin 1 sin 2 (9) cos 3 :: : B .j 1; j/ D d j sin B . j; j/ D d j sin with di > 0; k 2 [0; ] for k < j 1 and j 1 sin 1 sin 1 j 2 cos j 2 sin j 1 j 1 2 [0; 2 / : See http://en.wikipedia.org/wiki/Hypersphere#Hyperspher Consider column j of B2 and define e B1 by replacing column K of B1 with column j of B2 : Then define T eD A B11 B21 B11 B21 e is identified from people who purchase items 1 through K A : 1 and item K C j: The first K (10) 1 columns of e B1 are constrained to equal those in B1 : The final column is constrained by .10/ : Therefore, we can 16 parameterize column j of B2 as B2 .1; j/ D d j cos 1 B1 .2; j/ D d j sin 1 cos 2 B1 .3; j/ D d j sin 1 sin 2 (11) cos 3 K 2 cos :: : B1 .K 1; j/ D d j sin B2 .K ; j/ D d j sin B sin 1 1 sin K 2 K 1 sin K 1 : Appendix B: Estimation details C The data consists of independent observations of .yi ; pi ; qi / for each household i: pi 2 R C J and qi 2 R J : We drop the i subscript for ease of exposition. We write down the likelihood function for each of three cases. Case 1 is the case if a consumer purchases exactly K goods. In this case, the mapping from data to random utility errors is one-to-one. Case 2 is the case if a consumer chooses fewer than K items but more than zero. In this case, many values of the random utility errors are consistent with the observed choice. In this case, the likelihood function is the integral over a region of the random utility error space. Case 3 is the case if a consumer chooses to purchase nothing. This case is similar to case 2 but the integral must be computed over a different region. B.1 Case 1: choice of K goods Suppose the goods are sorted so that q D .q1 ; 0/ : Let p D . p1 ;2 / be the corresponding vector of prices. That is, the first K elements are non-negative and the remaining J 17 K elements are 0. Then inverse demand is e D B1T 1 D B1T 1 p1 C B1T B1 q1 p1 C B1 q1 1 B2T B1T p2 p1 : and the log-likelihood is ln f q .q1 / D ln f e 1 B1T p1 C B1 q1 C ln .det .B1 // (12) where f q is the density of q1 ; f e is the density of e and B1 is the submatrix of B corresponding to the items in the vector q1 : Note that parameter values must satisfy the constraints that p2 B.2 B2T B1T 1 p1 : Case 2: Choice of fewer than K goods Suppose a household chooses q D .q1 ; 0/ with q1 > 0 and dim .q1 / D d1 < K : In this case, for each q1 there are multiple vectors e that solve .??/ : In fact, there is a linear space of dimension K words, for each .q1 ; e2 / 2 Rd1 e1 D T B11 RK 1 d1 ; d1 : In other there is a unique e1 defined by 1 T p1 C B11 C B11 T B12 B12 q1 T B11 1 T B12 e2 (13) D G 0 C G 1 q 1 C G 2 e2 where G0 D 1 T B11 p1 (14) T G 1 D B11 C B11 G2 D T B11 1 1 T B12 B12 T B12 : Here we have assumed that B11 is invertible. Since B1 has rank d1 by assumption, we can always find a partition with an invertible B11 : Consider the partially observed random vector .q1 ; e2 / : q1 is observed but e2 is not. The expressions above imply that the density of .q1 ; e2 / is f q1 e2 .q1 ; e2 / D f e .G 0 C G 1 q1 C G 2 e2 ; e2 / det .G 1 / 18 where .G 0 ; G 1 ; G 2 / are defined in .14/ : We observe q1 if inequality .??/ is satisfied. Partitioning B2 .K 2 3 6 B21 7 B2 D 4 5 B22 where B21 is size .d1 T B22 T T B21 B11 J 1 d1 / and B22 is size .K T B12 e2 d1 1 T T B21 B11 p2 d1 / as J d1 / ; this inequality is J T p1 C B22 B12 T T B21 B11 1 T B12 B12 q1 : (15) Rewrite .15/ as M1 e2 M2 (16) where T M1 D B22 is a .J d1 K d1 T B12 d1 / matrix and M 2 D p2 is .J 1 T T B21 B11 T T B21 B11 1 T p1 C B22 B12 T T B21 B11 1 T B12 B12 q1 1/ : Then the likelihood is f q1 .q1 / D Let d2 D K Z f q1 e2 .q1 ; e2 / 1 .M1 e2 M2 / de2 : (17) d and suppose the density of e has the form 0:5e2T e2 e : f e .e1 ; e2 / D e f e .e1 ; e2 / d2 .2 / 2 The matrix M1 has the QR decomposition M1 D R Q where R is .J d1 variable e2 D Q 1 x; d2 / lower triangular and Q is .d2 d2 / orthogonal. Then using the change of the integral can be written as Z T e 0:5e2 e2 e f q1 .q1 / D f q1 e2 .q1 ; e2 / de2 d2 2 .2 / R Qe2 D Z 0:5x T x e 1 e D f q1 e2 q1 ; Q x dx d2 2 .2 / Rx D 19 (18) (19) 1Q since Q is an orthogonal matrix. (That is Q D I and det .Q/ D 1/ The matrix R is lower triangular. Therefore, row i has at most i nonzero elements. Start from xd2 : Let JdC2 be the set of rows of R that have positive elements in column d2 and Jd2 the set with negative elements: Then for all j 2 JdC2 ; X Dj xd2 1 and for all j 2 Jd2 ; R . j; d2 / X Dj R . j; i/ xi i<d2 R . j; i/ xi i<d2 xd2 R . j; d2 / i h So, the bounds on xd2 are xd2 2 xdL2 ; xdH2 where 0 0 B B B xdL2 D max B 1; max @ @ and j2Jd 0 0 Dj d2 X R . j; i/ xi i<d2 R . j; d2 / 11 CC CC AA 11 CC CC : AA xH Zx1 Z d2 xdL x1L Next for all j R . j; d2 / 1 through 1: Then the integral is H f q1 .q1 / D R . j; i/ xi i<d2 2 B B B xdH2 D min B 1; min @ @ j2J C We repeat the calculation for j D d2 X Dj 1: 2 e f q1 e2 q1 ; Q 1 e x 0:5x T x d2 d x: (20) .2 / 2 d2 define u j D 8 x j : Then making the change of variables, the integral is equivalent to H f q1 .q1 / D Zu 1 u 1L uH Z d2 u dL 2 e f q1 e2 q1 ; Q where u Lj D 8 x Lj u Hj D 8 x H : j 20 1 x .u/ du (21) Finally, for all j d2 making the change of variable u j D f q1 .q1 / D Z1 1 B.3 C Z1 Y d2 1 jD1 u Hj u Lj 2 ! u Hj u Lj .1Cv j / 2 e f q1 e2 q1 ; Q 1 ; this is equivalent to x .v/ dv: (22) Case 3: Choice of 0 goods References Blundell, R. and C. Meghir, (1987), "Bivariate alternatives to the Tobit model," Journal of Econometrics, 34, 179-200. Golan, A., J. M. Perloff, and E. Z. Shen (2001), " Estimating a Demand System with Nonnegativity Constraints: Mexican Meat Demand," Review of Economics and Statistics 83 541-550 Gorman, W. M., (1980), "A Possible Procedure for Analysing Quality Differentials in the Egg Market," The Review of Economic Studies, 47, 843-856. Heckman, J. (1979), "Sample Selection as a Specification Error," Econometrica, 47, 153–61. Hendel, I. (1999), "Estimating multiple-discrete choice models: An application to computerization returns," Review of Economic Studies, 66, 423–446. Lancaster, K. (1966), "A New Approach to Consumer Theory", Journal of Political Economy, 74, 132-157. Lee, L., and M. M. Pitt, (1986), " Microeconometric Demand Systems with Binding Non negativity Constraints: The Dual Approach," Econometrica, 54, 1237–42. Meyerhoefer, C.D., C. K. Ranney, and D. E. Sahn (2005), "Consistent Estimation of Censored Demand Systems Using Panel Data," American Journal of Agricultural Economics, 87, 660-672. Sam, A. G. and and Y. Zheng (2010), "Semiparametric Estimation of Consumer Demand Systems with Micro Data," American Journal of Agricultural Economics, 92, 246-257. Shonkwiler, J. S., and S. T. Yen (1999), "Two-Step Estimation of a Censored System of Equations," American Journal of Agricultural Economics, 81, 972–82. 21 Wales, T. J.,and A. D. Woodland (1983) "Estimation of Consumer Demand Systems with Binding Non-negativity Constraints," Journal of Econometrics, 21, 263–85. Yen, S. T., and B. Lin, (2006) "A Sample Selection Approach to Censored Demand Systems," American Journal of Agricultural Economics, 88, 742–49. Yen, S. T., B. Lin, and D. M. Smallwood (2003), "Quasi- and Simulated-likelihood Approaches to Censored Demand Systems: Food Consumption by Food Stamp Recipients in the United States," American Journal of Agricultural Economics, 85, 458–78. D Tables Table 1: Most frequently purchased categories Type Freq. Percent Cum. Banana 347,843 8.55 8.55 Tomato 317,190 7.79 16.34 Dessert Apples 242,778 5.97 22.31 Mushroom 224,796 5.52 27.83 Cucumber 198,320 4.87 32.70 Carrots 165,550 4.07 36.77 Old Potatoes 154,139 3.79 40.56 Lettuce 152,148 3.74 44.30 Berries+Currants 134,735 3.31 47.61 New Potatoes 130,505 3.21 50.81 Table 2: Number of items purchased 22 No. of items Freq. Percent Cum. 1 351,723 30.31 30.31 2 225,232 19.41 49.72 3 157,413 13.56 63.28 4 113,592 9.79 73.07 5 83,837 7.22 80.29 6 62,932 5.42 85.72 7 46,986 4.05 89.77 8 34,722 2.99 92.76 9 25,425 2.19 94.95 10 18,206 1.57 96.52 11 12,878 1.11 97.63 12 9,006 0.78 98.40 Table 3: Most frequent 2-item combinations 23 Combination Freq. Percent Cum. (banana,dessert apples) 32,886 4.07 4:07 (banana,berries) 32,884 4.07 8.14 (banana,broccoli) 27,399 3.39 12.53 (baking potato,banana) 26,915 3.33 15.86 (banana,carrots) 24,414 3.02 18.88 (banana,cucumber) 20,692 2.56 21.44 (banana,beans) 16,694 2.06 23.50 (banana,cabbage) 11,717 1.45 24.95 (cucumber,lettuce) 11,226 1.39 26.34 (banana,easy peelers) 10,435 1.29 27.63 (broccoli,carrots) 9,975 1.23 28.86 (cucumber,dessert apples) 8,929 1.10 29.96 Table 4: Number of items purchased conditional on purchase of a vegetable type 24 Type Apricot Artichokes Asparagus Aubergines Avocado Baking Pot. Bananas E 1 2 3 4 5 6+ 129 183 240 249 220 1,272 5.63 7.98 10.47 10.86 9.59 55.47 17 12 11 12 13 93 10.76 7.59 6.96 7.59 8.23 58.86 755 1,159 1,296 1,215 1,146 6,400 6.31 9.68 10.83 10.15 9.57 53.46 245 361 490 475 482 3,693 4.26 6.28 8.53 8.27 8.39 64.27 1,258 1,687 1,727 1,746 1,657 9,235 7.27 9.75 9.98 10.09 9.57 53.35 9,488 9,290 8,853 7,984 7,180 30,558 12.93 12.66 12.07 10.88 9.79 41.66 46,839 49,738 45,726 40,383 34,945 130,212 13.47 14.30 13.15 11.61 10.05 37.43 Figures Results: Categories 1-10 25 0 1 2 3 Figure 1: Log price of apricots 0 100 200 group(date) LogPrice 300 400 300 400 LogP1 0 .5 1 1.5 2 Figure 2: Log price of bananas 0 100 200 group(date) LogPrice 26 LogP1 0 1 2 3 Figure 3: Log price of cherries 0 100 200 group(date) 300 LogPrice 400 LogP1 Figure 4: Elasticities: Categories 1-10 Type 1, Mean = -0.006, share = 0.008 1 0.5 0 -0.015 -0.01 Type 2, Mean = -0.052, share = 0.008 -0.005 0 2 1 0 -0.1 0 10 5 0 -25 Type 3, Mean = -0.0023, share = 0.008 1 0.5 0 -5 -4 -3 -2 -1 -0.15 -0.1 -0.02 0 -20 -15 -5 0 -10 Type 6, Mean = -0.6, share = 0.284 -0.05 200 100 0 -40 0 Type 7, Mean = -4.7, share = 0.004 -30 -20 -10 0 Type 8, Mean = -0.7, share = 0.064 1 0.5 40 20 0 -15 -10 -5 0 0 -10 0 100 50 0 -40 Type 9, Mean = -6.2, share = 0.018 -20 -15 -10 -8 -6 -4 -2 0 Type 10, Mean = -2.1, share = 0.122 5 0 -25 -0.04 -3 Type 5, Mean = -0.011, share = 0.084 -0.2 -0.06 Type 4, Mean = -2.4, share = 0.026 x 10 40 20 0 -0.25 -0.08 -5 27 -30 -20 -10 0 Figure 5: Elasticities: categories 11-20 Type 11, Mean = -2.9, share = 0.118 Type 12, Mean = -2.8, share = 0.038 50 0 -50 -40 -30 -20 -10 0 20 10 0 -40 0 100 50 0 -50 0 40 20 0 -15 Type 13, Mean = -0.14, share = 0.06 20 10 0 -0.8 -0.6 -0.4 -40 -30 -0.2 -20 -10 Type 17, Mean = -0.94, share = 0.012 0 -40 -30 -20 -10 0 -10 -5 0 Type 18, Mean = -0.59, share = 0.004 1 0.5 -4 -2 0 0 -1.5 0 1 0.5 0 -10 Type 19, Mean = -0.0011, share = 0.006 2 1 0 -3 -10 Type 16, Mean = -0.37, share = 0.068 5 0 -6 -20 Type 14, Mean = -3.5, share = 0.156 Type 15, Mean = -15, share = 0.062 10 5 0 -50 -30 -2 -1 -0.5 0 Type 20, Mean = -4.3, share = 0.006 -1 x 10 -3 28 -8 -6 -4 -2 0
© Copyright 2025 Paperzz