Markup Estimates from Discrete Choice Models

Markup Estimates from Discrete Choice Models:
An Assessment Using Auxiliary Information∗
Adamos Adamou†
Sofronis Clerides‡
March 2010
Abstract
Discrete choice models (DCMs) have been widely used in recent years to estimate
demand for differentiated products. These models yield estimates of marginal cost
and markups that are very useful for policy analysis. Due to lack of data, researchers
are rarely able to evaluate the accuracy of their estimates by comparing them to
actual data or alternative estimates. In this paper we exploit the availability of auxiliary information and an idiosyncracy of the Cyprus tax system to obtain estimates
of markups for automobiles using assumptions that are completely orthogonal to
those used in DCMs. A comparison of the two sets of markups shows them to be
reasonably similar, which bodes well for DCMs.
Keywords: discrete choice models, markup estimation.
JEL Classification: L1.
∗
We thank the Cyprus Road Transport Department for providing vehicle registration data.
University of Cyprus; [email protected].
‡
University of Cyprus and CEPR; [email protected].
†
1
Introduction
Discrete choice models (DCMs) have been widely used in recent years to estimate demand
for differentiated products (Berry, 1994; Berry, Levinsohn, and Pakes, 1995). When coupled with an assumption on firm behavior (typically Bertrand-Nash pricing), DCMs can
produce estimates of marginal cost and therefore also of markups (defined as the absolute
difference between price and marginal cost). These estimates have been used to address
several questions that interest industrial organization economists, such as the impact of
mergers and the measurement of market power. Actual markups are very rarely observed
in practice because they are private information that firms are not generally willing to
divulge. Moreover, firms’ notion of a markup may not necessarily coincide with the corresponding notion in economics because firms do not typically think in terms of marginal
cost. As a result, it is very rare that economists are able to compare their estimates of
marginal costs or markups with their “true” counterparts. Being able to do so would be
very useful as it would be a good way of assessing the performance of our models.
In this paper we take advantage of the availability of some auxiliary information in the
Cyprus automobile market in order to obtain estimates of markups that are completely
independent of those obtained from differentiated product models. We refer to these
markups as model-free or calculated markups, as they are computed from simple algebraic
relationships and are not the outcome of econometric estimation. We caution that our
alternative markup estimates are not hard data; we do need to make assumptions in order
to compute them. Our estimates are model-free but not assumption-free. The usefulness
of the exercise lies in the fact that these assumptions are very different from those made
in the standard differentiated product model, hence the calculated markups can be used
as a useful benchmark for comparison.
The idiosyncracy of our data stems from the tax system. Automobiles in Cyprus are
heavily taxed with a variety of different instruments. The most important ones are a
consumption tax that is a percentage of the vehicle’s import price and a flat per unit
tax. Some groups and individuals that meet certain criteria can be exempt from paying
taxes on automobile purchases. For a period of several years we are able to observe two
prices for each model: a price with taxes and a price without them. Thus for each model
we have two expressions linking marginal cost and prices but we have three unknowns:
1
marginal cost, the markup for taxed vehicles and the markup for tax-free vehicles. By
making an assumption on the relationship between the two markups we can obtain the
desired estimates. A simple assumption one can make is that the two markups are the
same. We explore this and several other possibilities. The fact that the consumption tax
is imposed on the import price, which is essentially marginal cost, is the key feature of
the data that we are able to leverage.
The fact that our alternative markup estimates are based on specific assumptions
means that what we do is not a proper test of DCMs. Our interpretation of our exercise
is the following. We use two different sets of assumptions to generate two estimates of
markups. If the two estimates are similar (as measured by some metric that remains to be
determined), then this would be fairly strong evidence in favor of both sets of assumptions
because it is unlikely that we would obtain similar estimates if one or both assumptions
are incorrect. Conversely, if the two markups differ, this would indicate that at least one
of the assumptions is wrong but it would not tell us which one.
In this paper, demand is estimated using our entire dataset with limited characteristics
and a reduced-period subset which allows us to include additional characteristics but at
the expense of having fewer observations. By using the entire dataset, we found that the
markups obtained by DCMs appeared to be higher compared to the model-free markups.
However, this phenomenon is not observed in the case of our subset. Furthermore we
examine the impact of product characteristic on markups obtained by the DCMs and the
accounting model via an appropriate statistical test. For the whole period dataset, we fail
to reject the hypothesis that the coefficients are the same, whereas, for the subset only the
coefficient of a specific product characteristic is proved to be different. We conclude that
the comparison of the two sets of markups shows them to be reasonably similar, which
bodes well for DCMs.
2
Data
Information on car sales was obtained from the Cyprus Road Transport Department,
which keeps track of vehicle registrations. Prices of new automobiles for the period 19892002 were obtained from a local car magazine. The magazine also reports various vehicle
2
Table 1: Price and tax information for selected car models
Model (eng.
size, liters)
Year
Price with
tax (CP)
Price without
tax (CP)
Ad valorem
taxes (%)
Unit tax
(CP)
VAT rate
(%)
Toyota Corolla 1989
(1.299)
1990
1991
1992
1993
1994
1995
7150
6800
6700
7625
7850
8350
9300
3425
3250
3400
3900
4100
4200
5100
124.5
124.5
118.0
114.8
111.6
108.3
105.1
75
75
75
75
75
75
75
0
0
0
5
5
8
8
Honda Civic
(1.499)
1989
1990
1991
1992
1993
1994
1995
8300
8100
8313
9800
10846
12750
12750
3950
4275
4350
5450
5825
6200
6200
129.5
129.5
123.0
119.8
116.6
113.3
110.1
75
75
75
75
75
75
75
0
0
0
5
5
8
8
Peugeot 405
(1.899)
1990
1991
1992
1993
1994
1995
18500
18600
17500
15500
15800
17200
9200
9500
9200
8500
7700
8900
124.5
119.1
116.5
113.8
111.0
108.4
875
875
875
875
875
875
0
0
5
5
8
8
characteristics (such as length , width, cylinders, etc.) starting in 1995; only engine
capacity was reported prior to that. We were unable to locate all past issues, so data are
missing for some months, mostly in the earlier years. The number of models listed per
month ranges from 25 to 57. Estimating demand for the entire 1989-2002 period means
that the only characteristic we can use is engine capacity (we also use country dummies
to control for quality). As a test of possible biases, we also estimate demand for the
1995-2002 period, which allows us to include more characteristics but at the expense of
having fewer observations.
Several taxes and levies were imposed on automobiles during the period under examination. There were three different types of taxes on the vehicle’s import (cif) price. The
3
Table 2: Data summary
Variable
Engine Capacity (in liters)
Price with tax (CP)
Price without tax (CP)
Ad valorem taxes (%)
Unit Tax (CP)
VAT rate (%)
Sales (units)
Income (CP)
# of obs
Mean
Std. dev.
Min
Median
Max
616
616
483
616
616
616
616
616
1.6
14091
7452
105.2
381.4
7.3
158.8
10680
0.38
7572
3515
13.1
500.4
3.5
247.5
8196
0.8
4700
2450
58.6
0
0
1
0
1.6
11725
6550
103.6
275
8
61
9133
3.5
48000
19500
163
6575
13
1815
132914
total ad valorem tax rate depended on the size of the vehicle (in terms of engine capacity)
and its country of origin. During the period covered by the study ad valorem tax rates
ranged from 80%-130% for sedans and from 40%-60% for sport utility vehicles (SUVs).
A unit (per vehicle) tax as a function of engine capacity was also levied. Finally, a valueadded tax was introduced in 1995. All taxes were payable upon registration. In Table 1
we present prices and tax rates and levies for three selected automobile models. Prices
are the nominal final prices in Cyprus Pounds (CP) for taxed and untaxed cars. The unit
tax is also in Cyprus Pounds. The ratio of price with tax versus price without tax for
these models varies from 1.80 to 2.10. The unit tax is greater for Peugeot-405 because
it belongs to a higher engine capacity category. The ad valorem tax rate was more than
100% for all the three models. Table 2 summarizes our main variables. Tax-free prices
are available for about 78% of our sample.
3
Computation of markups from auxiliary data
There are no car manufacturers in Cyprus; all cars are sold by importers. The marginal
cost of vehicle model j for an importer is its import (cif) price, PjI . The vehicle is subject
to an ad valorem tax at a rate τj and a unit tax Tj , bringing the total marginal cost
to (1 + τj )PjI + Tj . Both τ and T are indexed by j as they are functions of model
characteristics. The importer adds his profit margin MjW T (the letters stand for ‘margin
with tax’) and then value-added tax is applied to the total at a rate v. The relationship
4
between the import price of a vehicle and the final price with all taxes applied is therefore
given by:
(1)
PjW T = (1 + τj )PjI + Tj + MjW T (1 + v)
If the importer sells a tax-free car, he is refunded all the taxes paid to the government.
Consequently, the consumer faces a final price independent of any taxes. In this case the
expression linking the import price and final price is simply
PjN T = PjI + MjN T
(2)
The markup MjN T imposed on a tax-free sale may differ from the markup MjW T imposed
on a tax-inclusive sale. Equations (1) and (2) have three unknowns: the two markups
and the import prices. In order to proceed with the markups calculation, we need an
assumption about the relationship between MjW T and MjN T . Despite the need for an
assumptions, equations (1) and (2) do give us some leverage, which comes from the fact
that the tax τj is levied on the import price, which is marginal cost, rather than on the
seller’s price, as is the case with VAT or sales taxes.
In order to illustrate exactly what it is we leverage, consider a seller selling a good
whose marginal cost is C. The government can observe C. It wants to impose a tax τ
which can be levied either on the marginal cost or on the seller’s final price. If the tax
is levied on the seller’s price, the relationship between cost and final price is given by
P = (C + M )(1 + τ ). If the tax is levied on marginal cost, the corresponding relationship
is P = C(1 + τ ) + M (M may be different in the two cases). Consider what happens
in either case when the tax rate is changed from τ 0 to τ 1 . In the case where the tax is
levied on marginal cost, the difference between the new price and old price is P 1 − P 0 =
(τ 1 − τ 0 )C + M 1 − M 0 . Thus we can identify marginal cost if we observe P 1 and P 0 and
make an assumption on the change in markups. If the tax is levied on seller price, then
the change in price is P 1 − P 0 = (τ 1 − τ 0 )C + M 1 (1 + τ1 ) − M 0 (1 + τ0 ). An assumption on
the change in markups is not sufficient to identify marginal cost. For example, assuming
M 1 = M 0 would leave us with P 1 − P 0 = (τ 1 − τ 0 )(C + M ), which still has two unknowns.
As in the example above, the information we have allows us to identify markups if we
are willing to make an assumption on the relationship between markups for taxed versus
tax-free cars. A reasonable first approximation is to assume that the importer charges
5
equal markups: MjW T = MjN T ≡ MjEQ .1 From equations (1) and (2) we obtain:
"
#
WT
P
1
j
(1 + τj )PjN T −
+ Tj
=
τj
1+v
MjEQ
(3)
Equation (3) is the expression we use for the calculation of markups under the equal
markups assumption. Note that, for the reasons explained above, the markup is not
defined in this case if τj = 0.
Another way to see this is to consider the difference between the pre-VAT price with
tax and price without tax:
PjW T
− PjN T = τj PjI + Tj + (MjW T − MjN T )
1+v
1
PjI =
τj
"
PjW T
− PjN T − Tj − MjW T − MjN T
1+v
(4)
#
(5)
A second approximation is to assume that the importer charges equal percentage
MjW T
MjN T
markups: (1+τj )P
Using this assumption and equations (1) and (2) we obtain:
I +T =
PI
j
j
(1 + τj )PjI + Tj
PjW T
=
(1 + v)PjN T
PjI
(6)
This leads to the following expression for taxed-cars-markups:
MjW T =
PjW T
h PWT
j
(1+v)
i
− Tj − (1 + τj )PjW T PjN T
PjW T − (1 + τj )(1 + v)PjN T
(7)
Equation (7) is the expression we use for the calculation of markups under the equal
percentage markup assumption. Note that if PjW T − (1 + τj )(1 + v)PjN T = 0, we cannot
solve for markups. This can only happen if Tj = 0 (see equation (6)).
1
We discuss possible justifications for this assumption later on in this section.
6
Another way to see this is to consider the difference between the pre-VAT price with
tax and price without tax given by equation (4). It is easy to solve for the markups
differences using the equal percentage markup assumption. The term MjW T − MjN T is
i
h
Tj
equal to τj + P I MjN T . Consequently equation (4) becomes:
j
PjW T
Tj
NT
I
− Pj = τj Pj + Tj + τj + I MjN T
1+v
Pj
(8)
If only a tax τ is imposed on the import price, the difference between the pre-VAT price
with tax and price without tax is equal to τ (P I + M N T ) = τ P N T . However, if a unit tax
NT
NT
is added to the import price this difference becomes T (1 + MP I ) = T PP I . From that we
can infer P I and therefore M W T .
Discussion of the assumptions
Poterba (1996) provides empirical evidence on price responses to changes in state and
local sales taxes (unit taxes) by exploiting variation in tax policy across cities and across
time. Using postwar (1947-1977) and prewar (1925-1939) price data on clothing and
personal care items (homogeneous products), he uses a conjectural variation model to
relax the monopoly assumption in order to test the hypothesis that taxes are fully shifted
to consumers. For the estimation, he uses the seemingly unrelated regressions (SUR)
technique to allow for correlations of the error terms for different cities in a given period.
He found that consumer prices adjust one-for-one with tax changes for the postwar period
and undershifting for the prewar period for clothing only. He concludes that his paper
”presents evidence that broadly supports the view that retail sales taxes are fully forward
shifted, raising consumer prices by the amount of the tax increase” (Poterba, 1996).
Besley and Rosen (1999) assembled a panel of quarterly data for 12 commodities and
155 cities over the period 1982-90 and employed a similar approach to Poterba (1996) to
test the same hypothesis. For the period they examine they found that for the majority
of commodities, taxes are overshifted to consumers (a tax increase of one dollar per unit
increases the price by more than one dollar). However, taxes on the other commodities
are found to be fully shifted to the consumers (the after-tax price increases by exactly the
amount of the tax).
7
Recall that the price of a tax-free automobile model j reflects its import price and the
importer’s markup. When taxes are introduced, its retail price will be equal to the old
price plus the amount of taxes plus the change of the importer’s markup due to taxes
introduction. If taxes are fully shifted to the consumers, then the importer’s markup
remains unchanged. Poterba (1996) and Besley and Rosen (1999) provide empirical evidence that taxes can be fully shifted to the consumers and this is in favor of our first
assumption. The result of Besley and Rosen (1999) that taxes are overshifted to consumers is in favor of our second assumption because under the equal percentage markup
assumption the level markups are higher in case of taxed automobiles (MjW T > MjN T ).
Of course both papers examine the impact of unit taxes on prices of non-durable goods
but at least they provide evidence that generally our assumptions hold in a retail pricing
framework.
Furthermore, we talked to two retailers about the pricing policies regarding tax-free
vehicle sales. The first one, the VW-Audi retailer, informed us that they set higher
markups for tax-free automobiles. On the other hand, the Mercedes retailer replied that
higher markups for untaxed products cannot be set as many customers may be informed
from the government’s related office about the exact amount of tax that is waived. Certainly, if a customer notices unreasonable higher markups, he will prefer to buy a similar
in quality product from another retailer. It is clear that, the second reply supports our
first assumption. As very few consumers have the right to buy a tax-free car, a retailer
who wants to increase his sales, can do that easier by decreasing his markups for taxed
products. Besides, the government’s revealed policy may force retailers against setting
higher markups for tax-free products. Undoubtedly, if all retailers follow the second perspective, they will set equal markups for taxed and untaxed products. However, there is
at least one retailer who does not agree with this story.
4
Estimation of markups using DCMs
This section describes how estimates of markups can be obtained using the simple logit
(SL), nested logit (NL) and random coefficients (RC) models. Actual estimates are reported along with calculated markups in section 5.
8
4.1
Markups from DCMs
Consider a market with J differentiated products. Let Pj denote the price of product j.
Similarly, xj denotes a K-dimensional vector of observed product characteristics of j and
ξj denotes an unobserved product characteristic of j. In every period, individual i makes
a choice among the J products available and choice 0, the option of no purchase.
In the simple logit model the utility consumer i obtains from buying brand j (time
subscripts are omitted for brevity) is given by the following equation:
uij = xj β − αPj + ξj + εij .
(9)
The term εij is an idiosyncratic shock with mean zero.
In the nested logit model the corresponding utility is:
uij = xj β − αPj + ξj + ζig (σ) + (1 − σ)εij .
(10)
The term ζig (σ) is a group-specific random coefficient that allows goods that belong to
the same group g to contribute a common component of utility to the individual i. The
parameter σ measures the extent to which products within the same group are substitutes
to each other.
In the random coefficients model utility is given by:
uij = xj βi − αi Pj + ξj + εij .
(11)
The term αi can be modeled as α + α̃Hi and the term βi can be formed as β + β̃Hi .
Where α̃ and β̃ are the variant across consumers parameters and Hi are the consumers’
characteristics which can be consumers’ demographics, random draws or a combination
of demographics and random draws.
Berry (1994) shows that in a market that firms assumed to be price setters to maximize
their profits and assuming the existence of pure-strategy interior equilibrium, then
Sj
Pj
= Cj −
,
1+v
(1 + v)(dSj /dPj )
9
(12)
where v is the VAT rate. In our case Cj = (1 + τj )PjI + Tj . We define the markup as
Mj ≡
Sj
Pj
− Cj = −
.
1+v
(1 + v)(dSj /dPj )
(13)
For each of the three models this is:
Simple logit markup:
1
α(1 + v)(1 − Sj )
(14)
1−σ
α(1 + v)[1 − σSj|g − (1 − σ)Sj ]
(15)
Sj
αi Sij (1 − Sij )dPH∗ (H)
(16)
MjSL =
Nested logit markup:
MjN L =
Random coefficients markup:
MjRC =
(1 + v)
R
where PH∗ (H) denotes population distributions functions according to consumers’ characteristics and Sij is the probability of consumer i purchasing product j.
4.2
Estimation details
The demand equation for the logit model that links market shares to prices and car
characteristics is given below:
ln(sjt ) − ln(s0t ) = xjt β − αPjt + ξjt
(17)
To estimate the demand function above, we must control for any correlation between
prices and the error term. The error term represents product characteristics that are
observed by consumers but not by the econometrician. This correlation is likely to be
positive because higher quality could lead suppliers to set higher prices. To control for
the endogeneity of price, we need to find variables that are correlated with price but are
independent of unobserved product characteristics. The instruments proposed by Berry,
Levinsohn, and Pakes (1995) and taxes can be used as candidate instruments. Among
10
these, we chose to use the sum of engine capacity of other products sold by the same firm
squared as an instrument for prices. The unit tax, unit tax squared and constant tax also
proved to be good instruments for prices. The choice of instruments was guided by the
appropriate tests for instrument relevance and overidentification.
The demand equation for the nested logit model links market shares to prices, car
characteristics and within-group share in the following way:
ln(sjt ) − ln(s0t ) = xjt β − αPjt + σ ln(sj/g ) + ξjt
(18)
For this model, per-unit tax and its square are used as instruments for prices. The log
of within share is also endogenous in this model so we use as an instrument the sum of
engine capacity of all the other products in the group. An important choice for the nested
logit model is the categorization of products into groups. Common practice in models of
automobile demand is to split the models on the basis of engine size. This nesting worked
for us also. We created three size categories (small, midsize, large) and a fourth group for
sport utility vehicles.2
The random coefficients model is estimated using Nevo’s algorithm. The set of instruments for prices we use in this model is the same with the set of instruments we use
for the simple logit model. The appropriate tests show that all these instruments we are
using for all three DCMs are correlated with prices (and within group shares for the case
of nested logit) and the overidentification test shows that they are uncorrelated with the
error term.
In the random coefficients model we had to choose what interactions of product and
consumer characteristics to put. We chose to put an interaction of prices with the
demographic-income and an interaction of engine capacity with draws obtained by a
multivariate normal distribution. We also tried another combination: an interaction of
prices and engine capacity with random draws, without using demographics as consumers’
2
Clerides (2008) uses different sigma for SUVs and non-SUVs. We did the same and we find a sigma
of 0.73 for non-SUVs and a sigma of 0.42 for SUVs. In this case, the null hypothesis that the model
is under-identified is rejected and the Hansen J statistic, which is a test of the null hypothesis that the
instruments are valid, shows a p-value of 0.98 and so it cannot be rejected. However, the sigma for SUVs
is statistically insignificant as we have only 23 observations of SUVs. That is why we constrained sigma
to be the same for all the groups.
11
characteristics. Income was demeaned across markets-years and across consumers. As
Nevo (2000) points out, if consumer characteristics changed during the computation, the
non-linear search is unlikely to converge. So we drew these characteristics once at the
beginning.
4.3
Results
Table 3 reports demand parameter estimates for the three discrete-choice models. The
simple logit model predicts an alpha of 0.246 and the coefficient of the attribute engine
capacity is positive. Both are statistically significant as expected. Country dummies and
constant are also statistically significant. French, German and Swedish products seem
to offer better quality compared to the omitted Japanese cars, whereas, Czech, English,
Italian, Korean, Russian and Spanish products tends to decrease the consumers’ mean
utility. The grouping in nested logit is relevant with a sigma higher than zero near
0.53. The signs of the rest coefficients are the same with simple logit model but their
absolute values are reduced as expected due to the existence of within shares in this
model. All the coefficients are statistically significant except of the coefficient of Italian
products. For both random coefficient models all the coefficients, except of the interaction
of engine capacity with random draws, are statistically significant. The coefficients of
the country dummies and constant are very near to those of simple logit. The same
happen for the coefficient of engine capacity since the coefficient of its interaction with
random draws is very low compared to the coefficient of engine capacity. The random
coefficients model (1) predicts that above-average-income consumers tend to be less pricesensitive. The richer consumer has an alpha of 0.1195 and the poorer consumer an alpha of
1.2191. The variation of engine capacity coefficient is 3.109-3.141. The random coefficients
model (2) predicts a lower alpha of 0.1046 and a higher alpha of 0.5346. The standard
deviation for the engine capacity, the absolute value of the coefficient, is 0.131 (statistically
insignificant) and is leading to a coefficient variation of 2.619-3.450. The implied mean
own price elasticity is -3.89 for the simple logit model, -5.25 for nested logit model, -4.28
for random coefficients model (1) and -4.15 for random coefficients model (2). These
elasticities are not very far away relatively to what is usually found in the literature. The
overidentification test has the null hypothesis that the instrument are valid, which cannot
be rejected.
12
Table 3: Demand estimates from DCMs
Variables
Price
SL
-0.246∗∗
(0.020)
Within-share
Czech Rep.
England
France
Germany
Italy
Korea
Russia
Spain
Sweden
Engine capacity
(liters)
Constant
-3.956∗∗
(0.350)
-1.224∗∗
(0.233)
0.532∗∗
(0.197)
1.893∗∗
(0.191)
-0.663∗∗
(0.220)
-2.087∗∗
(0.270)
-2.504∗∗
(0.488)
-1.321∗∗
(0.350)
1.788∗∗
(0.401)
2.768∗∗
(0.256)
-7.254∗∗
(0.346)
NL
-0.166∗∗
(0.027)
0.527∗∗
(0.162)
-1.832∗∗
(0.686)
-0.539∗
(0.242)
0.265∗
(0.130)
1.301∗∗
(0.215)
-0.230
(0.171)
-0.944∗
(0.378)
-1.322∗∗
(0.488)
-0.619∗
(0.281)
1.398∗∗
(0.260)
1.199∗
(0.482)
-4.867∗∗
(0.723)
RC-2
-0.380∗∗
(0.121)
-0.318∗∗
(0.051)
-3.995∗∗
(0.361)
-1.228∗∗
(0.238)
0.587∗∗
(0.207)
2.009∗∗
(0.198)
-0.672∗∗
(0.213)
-2.182∗∗
(0.274)
-2.776∗∗
(0.553)
-1.358∗∗
(0.339)
1.837∗∗
(0.571)
3.125∗∗
(0.375)
-6.613∗∗
(1.141)
-4.030∗∗
(0.364)
-1.230∗∗
(0.238)
0.565∗∗
(0.206)
1.990∗∗
(0.188)
-0.672∗∗
(0.214)
-2.160∗∗
(0.277)
-2.688∗∗
(0.505)
-1.360∗∗
(0.353)
1.807∗∗
(0.423)
3.013∗∗
(0.845)
-6.898∗∗
(0.947)
0.123†
(0.066)
Price*Income
(non-parametric )
Price*V
(parametric)
Engine capacity*V
(parametric)
Sargan-Hansen
test
RC-1
0.004
(8.602)
3,23 chisq(3)
p-value:0,36
1,42 chisq(1)
p-value:0,23
13
2,01 chisq(3)x
p-value:0,57
0.056†
(0.032)
-0.131
(1.760)
2,09 chisq(3)
p-value:0,55
5
Markup comparison
In order to compare the markups, we firstly deflated the markups obtained by accounting
model with the same deflator we used to deflate prices for the DCM markups estimation.
Secondly, as we have only 481 retail prices for taxed free cars, we compare the markups
only for those observations.
Table 4: Markup statistics
Stats
SL
NL
RC-1
RC-2
Equal
markups
Equal
% markups
Min
5%
25%
50%
75%
95%
Max
3.599 2.518 2.732 2.940
3.603 2.587 2.852 3.069
3.766 2.642 3.019 3.197
3.767 2.723 3.240 3.368
3.877 2.87 3.574 3.600
4.077 3.315 4.223 4.192
4.118 6.052 6.383 7.045
-2.032
0.707
1.421
2.269
3.201
4.977
8.895
-2,081
6.754
9.408
13.827
24.312
52.188
490.3
Mean
Std dev.
3.816 2.830 3.344 3.462
0.139 0.427 0.441 0.380
2.453
1.485
15.72
108.5
Table 4 provides the basic statistics about the markups variables that we estimate and
calculate. The markups estimated by the logit model have the lower standard deviation
as expected. The minimum value is 3.599 and the maximum value is 4.118 (recall that
the units are thousands of Cyprus pounds). The low deviation is due to the fact that only
the shares differentiate the markups among different products. The picture we get here
for the DCMs that allow for heterogeneity among consumers is that nested logit markups
can capture values lower than 2.732 which is the lower bound for random coefficients
markups but markups obtained by random coefficients can capture values higher than
6.052 which is the upper bound for nested logit markups. Calculated markups under the
equal percentage markups assumption are extremely high and this happens because under
this assumption the import prices are negative for 96.3% of our sample. One can see that
by solving for import prices in equation (6). The calculated markups under the equal
markup assumption capture same negative markups (only for 5 observations) that the
estimated markups obtained by discrete choice models cannot predict. They also capture
14
some relatively high markups compare to DCMs from 7.045 to 8.895. Additionally, they
are relatively lower compared to the markups obtained by DCMs. In table 5 we provide
the average markups per brand. The markups obtained by DCMs are much higher for
Alfaromeo, Fiat, Hyundai, Kia, Lada, Rover, Suzuki and Toyota. The RC-1 model is used
for all the graphs presented below.
Table 5: Markups by brand
Manufacturer
Alfaromeo
Audi
Bmw
Chrysler
Citroen
Fiat
Ford
Honda
Hyundai
Kia
Lada
Mazda
Mercedes
Mitsubishi
Nissan
Opel
Peugeot
Renault
Rover
Saab
Seat
Skoda
Subaru
Suzuki
Toyota
Volkswagen
Volvo
# of obs
NL
RC-1
Equal
Markups
13
12
34
4
17
16
26
42
18
7
7
15
14
31
25
36
42
4
6
7
19
4
13
17
29
22
1
2,722
2,667
2,751
2,597
2,671
2,719
2,730
2,875
2,672
2,635
2,809
2,873
2,771
3,564
2,884
2,792
2,883
2,726
2,763
2,697
2,698
2,553
2,627
2,777
2,846
2,846
2,642
3,201
3,609
4,001
3,387
3,158
2,955
3,119
3,588
3,064
2,849
2,930
3,144
4,011
3,378
3,201
3,216
3,230
3,086
4,175
4,144
3,150
2,981
3,625
2,905
3,256
3,311
3,921
1,533
2,294
3,194
3,780
2,803
1,304
2,315
2,686
1,503
1,303
0,795
2,339
3,179
2,666
2,946
2,309
2,781
2,910
1,523
2,094
1,983
2,215
5,847
1,120
1,603
2,821
4,137
Figure 1 reports the histogram of the markups scaled to density units. The bin’s
15
Nested Logit
Random Coefficients
Simple Logit
0
.5
1
1.5
2
0
.5
1
1.5
2
Calculated
−3
0
3
6
9
−3
0
3
6
9
Figure 1: Distribution of markups
width is 0.341 so the simple logit markups are concentrated in only 2 bins. For the nested
logit markups, the first bin covers the 75% of the observations whereas for the random
coefficients, it covers 32% of the observations. For calculated markups, there are more
bins that represent a relatively high percentage of observations.
Figure 2 graphs the distribution of calculated markups, nested logit markups and
random coefficient markups by engine size. The relation between engine capacity and
markups looks to be positive but this can be seen easier if we take into account the
rest of the characteristics that capture the quality of the products. However, this relationship seems to be stronger for markups obtained by random coefficients model and
the accounting model. We observe also that the markups obtained by the accounting
model are lower for tiny cars with engine capacity in liters lower than 1 compared to the
markups obtained by DCMs. In addition, for SUVs (all the observations with engine size
between 2.499-2.835) the markups obtained by accounting model and nested logit seem
to be higher compared to the ones obtained by random coefficients. This may happen
16
Markups
0 5 10
because we have a class for SUVs in the nested logit model.
1
1.5
2
2.5
Engine Capacity in liters
3.5
3
3.5
3
3.5
random_coefficients
Markups
0 5 10
calculated
3
1
1.5
2
2.5
Engine Capacity in liters
nested_logit
Markups
2.5 4.5 6.5
calculated
1
1.5
2
2.5
Engine Capacity in liters
random_coefficients
nested_logit
Figure 2: Markups and Engine Capacity
Next we proceed with an analysis based on markups to marginal cost ratio. These
M∗
P −C
are obtained using the Lerner index, so they are equal to jCj j = Cjj . Where Mj∗ are
the markups of product j obtained by DCM *. It is easy to see through equation (13)
Pj
Pj
that Cj is equal to (1+v)
− Mj . Note that the term (1+v)
is the price set by firms and
that markups are already contained the term (1+v). For the markups obtained by our
17
accounting model, the same calculation can be used as the marginal cost is (1 + τj )PjI + Tj
(see equation (1)).
The markups to marginal cost ratio is provided in table 6. The simple logit model
predicts a ratio with a high standard deviation as the level markups obtained by this
model vary from 3.599 to 4.118. The nested logit predicts a lower ratio compared to
random coefficients besause the absolute value of own price elasticities are higher for this
model. The ratio predicted by the accounting model is much lower compared to DCMs
ratio because the level markups obtained are lower compared to the markups obtained by
DCMs.
Table 6: Markups to marginal cost ratio
Stats
SL
NL
RC-1
RC-2
Equal
Markups
Minimum
0,069 0,065 0,112 0,125
Percentile 5% 0,144 0,099 0,158 0,155
Percentile 25% 0,236 0,168 0,222 0,222
Percentile 50% 0,417 0,285 0,344 0,348
Percentile 75% 0,706 0,444 0,483 0,536
Percentile 95% 1,214 0,664 0,713 0,818
Maximum
2,202 0,956 1,001 1,264
-0,064
0,041
0,128
0,207
0,301
0,575
1,185
Mean
Std. Dev.
0,239
0,165
0,523
0,366
0,32 0,373 0,405
0,183 0,182 0,222
Figure 3 graphs the distribution of calculated markups to marginal cost ratio, nested
logit markups to marginal cost ratio and random coefficient markups to marginal cost
ratio by engine size. For DCMs, the relation between engine capacity and markups ratio
is negative. The nested logit seems to predict lower markups ratio for tiny cars and higher
markups ratio for SUVs compared to the random coefficient model. The random coefficients markups ratio seem to have a stronger negative relationship with engine capacity
compared to the nested logit. If we ignore the tiny and SUVs, the accounting model
predicts also a strong negative relationship of markups ratio and engine capacity. The
accounting model markups ratio seems to be very low for the tiny cars compared to the
DCMs percentage markups. Finally both the accounting model and nested logit predict
higher markups ratio for SUVs. These results look like the results we get from figure 2.
18
0 .2 .4 .6 .8 1
Next, we form a test to get a clear picture about the markups comparisons.
1.5
2
Random Coefficients
2.5
3
1
1.5
2
Nested Logit
2.5
3
1
1.5
2
Accounting model
2.5
3
0
.5
1
1.5
0 .2 .4 .6 .8 1
1
Figure 3: Markups to Marginal Cost Ratio and Engine Capacity
Our next task is to test if the impact of engine capacity on markups obtained by
different approaches is the same for a pair of markups. In order to identify the relationship
between markups and engine capacity, we assume that this can be obtained by a hedonic
regression of markups on product characteristics similar to the hedonic regression of price
on product characteristics. Note that, we are using level markups and not percentage
19
markups. Finally the analysis above suggests that we should also include a dummy for
tiny cars and SUVs.
Define
3
k
X
X
muj =
(φp xpj + λp Dnl xpj + ρp Drc xpj ) +
(φi xij + λi Dnl xij + ρi Drc xij ).
p=1
(19)
i=4
where muj is a 3NX1 vector which contains calculated markups, markups estimated
by nested logit and markups estimated by random coefficients. x1j is a 3NX1 vector of
engine capacity in liters, x2j is a 3NX1 vector which contains the SUV dummy, x3j is a
3NX1 vector which contains the tiny cars dummy and the other k-3 vectors of product
characteristics include country dummies, year dummies and a constant. Dnl is a dummy
which gets the value one if the markups are estimated by nested logit and Drc is a
dummy which gets the value one if the markups are estimated by random coefficients. φ1
is the impact of engine capacity on calculated markups, φ1 + λ1 is the impact of engine
capacity on markups estimated by nested logit and φ1 +ρ1 is the impact of engine capacity
on markups estimated by random coefficients. φ2 is the impact of SUVs on calculated
markups and φ3 is the impact of tiny automobiles on calculated markups. Therefore we
can test if λp = 0, ρp = 0, λp = ρp .
Table 7 reports the coefficients obtained by the hedonic regression and provides also
the results of the tests. By taking into account year and country dummies, an increase
of engine capacity in liters by 1 lead to an increase of the markups obtained by the
accounting model by 1311 CP and an increase of markups estimated by nested logit
model by 84 CP. ρ1 is statistically insignificant. The F-test that tests if ρ1 is equal zero,
shows a p-value of 0.970 meaning that we fail to reject the hypothesis that ρ1 is equal zero.
Additionally, our accounting model predicts higher markups for SUVs, whereas random
coefficient model predicts lower markups for SUVs. The F-test that tests if λ2 is equal
zero, shows a p-value of 0.830 meaning that we fail to reject the hypothesis that λ2 is
equal zero. Finally, our accounting model predicts lower markups for tiny cars, whereas
random coefficient model predicts higher markups for tiny cars. The F-test that tests if
λ3 is equal zero, shows a p-value of 0.202 meaning that we fail to reject the hypothesis
that λ3 is equal zero. We conclude that 1) engine capacity has the same impact on
20
calculated markups and markups estimated by random coefficients and 2) the impact
of SUVs and tiny automobiles on calculated markups is the same with the impact of
SUVs and tiny automobiles on nested logit markups respectively. Additionally instead of
using markups obtained by random coefficient model (1) we examine markups obtained
by random coefficients model (2). φk + ρ∗k is the impact of characteristic k on markups
estimated by random coefficients model (2).The results are exactly the same.
Table 7: Markups testing
φk
λk
ρk
ρ∗k
λk
(p-val:
ρk
(p-val:
λk = ρk
(p-val:
ρ∗k
(p-val:
λk = ρ∗k
(p-val:
6
k=1
k=2
k=3
1,311∗∗
(0,283)
-1,227∗∗
(0,329)
-0,011
(0,289)
-0,251
(0,303)
1,783∗∗
(0,508)
-0,132
(0,645)
-2,739∗∗
(0,514)
-2,632∗∗
(0,522)
-0,366†
(0,203)
0,285
(0,222)
0,734∗∗
(0,207)
0,734∗∗
(0,212)
13,91
0,000)
0,00
0,970)
46,39
0,000)
0,69
0,407)
23,65
0,000)
0,04
0,838)
28,42
0,000)
41,66
0,000)
25,38
0,000)
36,3
0,000)
1,64
0,200)
12,59
0,000)
20,81
0,000)
12,00
0,001)
17,05
0,000)
(p-val:
(p-val:
(p-val:
(p-val:
(p-val:
(p-val:
(p-val:
(p-val:
(p-val:
(p-val:
Analysis with additional characteristics
As a test of possible biases, we also estimate demand for the 1995-2002 period, which
allows us to include more characteristics but at the expense of having fewer observations.
The set of instruments we use for logit and random coefficients models are per unit tax,
constant tax, import duty and the sum of engine capacity of other products sold by the
21
same firm. Import duty is proved to be a valid instrument for this subset. For nested
logit, per unit tax, constant tax, import duty and the sum of engine capacity of all the
other products in the group are used as instruments. The results are presented in table
8.
For the random coefficients models both interactions of consumer and product characteristics are insignificant. Maybe this is due to the fact that we have a low number of
observations. We increase the consumers draws from 2000 to 5000 but still the interactions are insignificant. However for model (2) alpha varies from 0.048 to 1.604, whereas
for model (1) the coefficient of both price and its interaction with income are insignificant.
Maybe this is due to the fact that there is not a lot of variation for income for the years
we examine. The alpha varies from -0.081 to 3.799, which means that very rich people
like prices. That is why we exclude this model from the analysis later. The two additional
attributes, frame and cylinders, have a positive and statistical significant coefficient. The
implied mean own price elasticity is -5.14 for the simple logit model, -6.76 for nested logit
model, -5.87 for random coefficients model (1) and -5.76 for random coefficients model
(2). As the price elasticities (in absolute values) are higher than before, we expect lower
markups for this subset.
In table 9 we can notice again that the logit model predicts markups with the lower
standard deviation. The random coefficient model predicts some markups with lower
values than the lower bound of nested logit markups, compared to the results of table 4.
The random coefficients model (2) predicts some huge markups because of the negative
alphas of some rich people. The markups obtained by the accounting model are no longer
very low compared to DCMs markups like in table 4.
In table 10 we provide the same test we present in table 7. x4j is a 3NX1 vector of frame
and x5j is a 3NX1 vector of cylinders. For engine capacity and tiny cars, we have similar
results as before. For frame and SUVs, all the coefficients are insignificantly different than
zero. However, for cylinders, we have different impacts on the 3 sets of markups. Both
discrete choice models predict higher markups for cars with more cylinders compared to
the markups obtained by the accounting model.
22
Table 8: Demand estimates (# of obs: 387)
Variables
Price
SL
-0,328∗∗
(0,036)
Within-share
Czech Rep.
England
France
Germany
Italy
Korea
Russia
Spain
Sweden
Engine Capacity
(In liters)
Frame
Cylinders
Constant
-3,874∗∗
(0,370)
-0,979∗∗
(0,316)
1,042∗∗
(0,245)
2,698∗∗
(0,231)
-0,679∗
(0,341)
-1,941∗∗
(0,302)
-4,570∗∗
(0,456)
-0,434
(0,458)
2,402∗∗
(0,537)
2,759∗∗
(0,332)
0,574∗∗
(0,153)
1,874∗∗
(0,414)
-18,778∗∗
(2,075)
NL
-0,229∗∗
(0,037)
0,492∗∗
(0,105)
-1,969∗∗
(0,475)
-0,459∗
(0,203)
0,737∗∗
(0,144)
1,796∗∗
(0,260)
-0,171
(0,207)
-0,931∗∗
(0,289)
-2,629∗∗
(0,568)
-0,254
(0,302)
1,270∗∗
(0,469)
1,174∗
(0,464)
0,495∗∗
(0,102)
1,305∗∗
(0,315)
-13,254∗∗
(2,027)
Price*Income
(non-parametric)
Price*V
(parametric)
Sargan-Hansen
test
RC-1
RC-2
-1,196
(0,837)
-0,780∗∗
(0,040)
-3,947∗∗
(0,400)
-0,899∗∗
(0,324)
1,279∗∗
(0,297)
2,823∗∗
(0,242)
-0,539
(0,335)
-2,162∗∗
(0,340)
-5,970∗∗
(1,014)
-0,256
(0,454)
2,427∗∗
(0,421)
3,283∗∗
(0,402)
0,899∗∗
(0,269)
0,692
(1,122)
-11,729†
(6,595)
-4,143∗∗
(0,615)
-0,960∗∗
(0,362)
1,230∗∗
(0,272)
2,781∗∗
(0,244)
-0,535†
(0,325)
-2,104∗∗
(0,311)
-5,684∗∗
(0,594)
-0,305
(0,444)
2,081∗∗
(0,422)
3,210∗∗
(0,291)
0,860∗∗
(0,165)
0,896∗∗
(0,343)
-13,989∗∗
(1,878)
0,568
(0,506)
0,195
(0,558)
4,64 chisq(3)
p-value:0,20
1,49 chisq(2)
p-value:0,48
23
1,65 chisq(3)
p-value:0,65
1,36 chisq(3)
p-value:0,72
Table 9: Markup statistics (# of obs: 273)
Stats
Minimum
Percentile 5%
Percentile 25%
Percentile 50%
Percentile 75%
Percentile 95%
Maximum
Mean
Std, Dev,
SL
2,700
2,701
2,774
2,825
2,826
2,831
2,843
2,798
0,041
NL
RC-1
RC-2
Equal
markups
1,966 1,339
1,983 1,524
2,044 1,740
2,071 2,164
2,131 2,711
2,364 4,908
2,946 52,660
2,108 2,799
0,124 3,736
1,529
1,633
1,790
2,168
2,841
4,744
10,819
3,462
0,380
-2,032
0,787
1,557
2,346
3,308
5,011
8,497
2,585
1,547
Table 10: Markups testing
k=1
φk
λk
ρ∗k
2,356∗∗
(0,721)
-2,412∗∗
(0,722)
-0,338
(0,756)
k=2
k=3
-0,291 -0,589†
(0,955) (0,341)
0,742
0,555
(0,959) (0,343)
-1,286
1,39∗∗
(0,979) (0,380)
24
k=4
k=5
-0,204 -1,194∗∗
(0,204) (0,429)
0,214
1,172∗∗
(0,205) (0,429)
0,328
2,586∗∗
(0,226) (0,543)
7
Conclusion
Discrete choice models (DCMs) have been widely used in recent years to estimate demand
for differentiated products. These models yield estimates of marginal cost and markups
that have been used to address several questions that interest industrial organization
economists, such as the impact of mergers and the measurement of market power. Due
to lack of data, researchers are rarely able to evaluate the accuracy of their estimates
by comparing them to actual data or alternative estimates. The aim of this paper is to
compare markups implied by DCMs to markups obtained by a different approach. The
fact that we observe prices and sales of the same car model that is sold with taxes and
without taxes, allow us to use a simple accounting model that relates the import prices
with the final prices for taxed and tax-free cars. This allows us to calculate markups under
some simple non-equilibrium assumptions and compare them with markups obtained by
discrete-choice models.
Our results are derived from our entire dataset and from a reduced-period subset.
The subset has the disadvantage of reducing our markups sample by about 43%, but it
enriches our demand analysis since additional demand attributes can be included. By
using the entire dataset, we found that the markups obtained by DCMs appeared to
be higher compared to the model-free markups, whereas, by using the subset, the two
sets of markups look quite similar. Additionally, the impact of product characteristics
on markups obtained by DCMs and accounting model are examined through a proper
statistical test. For the whole period dataset, we fail to reject the null hypothesis that
the coefficients are the same, whereas, for the subset only the coefficient of cylinders is
found to be different. We conclude that the comparison of the two sets of markups shows
them to be reasonably similar, which bodes well for DCMs.
References
Berry, S. T., 1994, “Estimating Discrete Choice Models of Product Differentiation,”
RAND Journal of Economics, 25, 242–262.
Berry, S. T., J. Levinsohn, and A. Pakes, 1995, “Automobile Prices in Market Equilib25
rium,” Econometrica, 63, 841–890.
Besley, T., and H. Rosen, 1999, “Sales Taxes and Prices: An Empirical Analysis,” National
Tax Journal, 52, 157–178.
Clerides, S., 2008, “Gains from Trade in Used Goods: Evidence from Automobiles,”
Journal of International Economics, 76, 322–336.
Mielke, P., 1976, “Simple Iterative Procedures for Two-Parameter Gamma Distribution
Maximum Likelihood Estimates,” Journal of Applied Meteorology, 15, 181–183.
Nevo, A., 2000, “A Practitioner’s Guide to Estimation of Random-Coefficients Logit
Models of Demand,” Journal of Economics and Management Strategy, 9, 513–548.
Poterba, J., 1996, “Retail Price Reactions to Changes in State and Local Sales Taxes,”
National Tax Journal, 49, 165–176.
Thom, H., 1958, “A Note on the Gamma Distribution,” Monthly Weather Review, 86,
117–122.
A
A.1
Description of DCMs
The simple logit
The utility consumer i obtains from buying brand j in year/market t is given by the
following equation:
uijt = xjt β − αPjt + ξjt + εijt
where Pjt is the observed price of product j in year t, xjt is a k-dimensional vector
of observed characteristics of product j in year t, ξjt is an unobserved characteristic of
product j in year t and εijt is an idiosyncratic shock with mean zero. The SL model leads
to the following demand equation:
ln(sjt ) − ln(s0t ) = xjt β − αPjt + ξjt
26
where sjt is the market share of product j in period t and s0t is the share of the outside
good.
A.2
The nested logit
The utility consumer i obtains from buying brand j in year/market t is given by the
following equation:
uijt = xjt β − αPjt + ξjt + ζigt (σ) + (1 − σ)εijt
where ζigt (σ) is a group-specific random coefficient that allows goods that belong to the
same group g in year t to contribute a common component of utility to the individual
i. The parameter σ measures the extent to which products within the same group are
substitutes to each other. The NL model leads to the following demand equation:
ln(sjt ) − ln(s0t ) = xjt β − αPjt + σ ln(sj/g ) + ξjt
where sj/g is the market share of product j within its group g in period t.
A.3
The Random Coefficients
The utility consumer i obtains from buying brand j in year/market t is given by the
following equation:
uijt = xjt βi − αi Pjt + ξjt + εijt
where αi is modeled as α + Πα Di + Σα Viα and the term βi is formed as β + Πβ Di + Σβ Viβ .
Where Di is a dX1 vector of demographic variables and Vi is the additional unobserved
characteristics that might be important in the decision of which car to buy. Π is a
(K+1)Xd matrix of coefficients and Σ is a (K+1)X(K+1) matrix of parameters. V’s are
obtained from a multivariate normal distribution, and D’s are nonparametric distributions
known from data sources. The RC model does not lead to an analytic solution for the
demand equation. Define δjt = xjt β − αPjt + ξjt . The following contraction mapping is
27
used for a numerical solution:
F (δjt ) = δjt + ln(sjt ) − ln(sjt (δjt ))
where sjt (δjt ) =
B
R
sijt dPD∗ (D)dPV∗ (V ) and sijt is given by
(20)
exp(δjt +xjt (Πβ Di +Σβ Viβ )−(Πα Di +Σα Viα )Pjt )
P
.
1+ J
k=1 exp(δkt +xkt (Πβ Di +Σβ Viβ )−(Πα Di +Σα Viα )Pkt )
Income Distribution
Data for household income is difficult to be found in Cyprus for all the years of our sample.
However, we manage to get incomes from surveys conducted by the Cyprus Statistical
Service for years 1991, 1996 and 2003. There are 2,703 households for year 1991, 2,636
households for year 1996 and 2,967 households for year 2003. All the households are
different among years of the survey. In order to get incomes for all the years, we test
incomes for each of the three years if they follow a specific two-parameter distribution. We
found that gamma distribution fits the data and we estimate its alpha (shape parameter)
and beta (scale parameter) for the 3 years of the survey using maximum likelihood. Income
is used as a consumer demographic for the random coefficients model.
The probability density function of the gamma distribution can be expressed in terms
of the gamma function parameterized in terms of a shape parameter α and scale parameter
β. Both α and β will be positive values. The equation defining the probability density
function of a gamma-distributed random variable x is:
f (x; α, β) =
xα−1 e−x/β
β α Γ(α)
for x > 0 and α, β > 0
The log-likelihood function for N i.i.d observations (x1 , ..., xN ) is
l(α, β) = (α − 1)
PN
i=1
ln(xi ) −
PN
xi
i=1 β
− N α ln(β) − N ln(Γ(α))
By taking the derivative with respect to β and setting it equal to zero yields the
maximum likelihood estimator of the scale parameter β:
PN
1
x̄
β̂ = αN
i=1 xi so β̂ can be estimated by α̂ where x̄ is the mean of income and α̂ is
the maximum likelihood estimator of the shape parameter α that we derive below.
28
Substituting the maximum likelihood estimator of the scale parameter into the loglikelihood function and then taking the derivative with respect to α and setting it equal
to zero yields:
P
PN
1
ln(α) − Γ̃(α)
= ln( N1 N
i=1 xi ) − N
i=1 ln(xi ) , where the right hand side of this
Γ(α)
relationship (let’s call it r) is the natural logarithm of mean of income minus the mean of
natural logarithm of income. As there is no close-form solution for α, a numerical solution
must be found using for example Newton’s method. An initial value of α can be found
using for example method of moments.3
In this paper for the estimation of α and β, a stata module created by Cox and Jenkins
is used. √
As an initial value they use the initial value proposed by Thom (1958) which is
1+
1+4r/3
α0 =
. For the numerical solution for α instead of Newton’s method they use
4r
an approximation of Γ̃(α)
proposed by Mielke (1976).
Γ(α)
Figure 4 reports the empirical distributions and the gamma distributions using α̂ and
β̂ for each of the three years of the survey. Using those parameters, we calculate alphas
and betas for years 1989 to 2002 by using GDP per capita as a weight. For example,
if we assume that the parameters are increased by the same amount each year then for
years 1991 to 1996, alpha increases by 0.024 and beta decreases by 55.04 every year. By
using GDP per capita as a weight, we relax the assumption that the parameters increase
by the same amount each year. For example GDP per capita for 1991 was 12157 CP and
for 1996 was 15992 CP. Using the assumption of the same-amount-increase each year, for
1992 it should increase by 767 CP. As the actual GDP per capital for 1992 was 13474
(increase of 1317 CP), there was a higher increase that the one predicted by the sameamount-assumption, so using GDP per capital as a weight we calculate an alpha of 1.836
and a beta of 5467,3 instead of 1.834 and 5469,7 for 1992. We did that for all years and
then we draw incomes from a gamma distribution using the specific parameters calculated
for each year. Income is used as a demographic for the random coefficients model.
3
The first moment of a random variable with this probability density function is E(x) = αβ and
x̄2
x̄2
E(x2 ) = β 2 α(α + 1). Solving the two equations for α and β we get α = V ar(x)
and β = V ar(x)
so V ar(x)
x̄
can be used as an initial value of α.
29
8.0e−05
0
50000
100000
year 1991: alpha 1.81, beta 5524.7
150000
0
50000
100000
year 1996: alpha 1.93, beta 5249.5
150000
0
50000
100000
year 2003: alpha 2.01, beta 5884.8
150000
0
8.0e−05
0
8.0e−05
0
Figure 4: Income Distribution
30