Interval Estimation of the Herfindahl-Hirschman Index under

2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation
Interval Estimation of the Herfindahl-Hirschman Index Under Incomplete Market
Information
Marta Flamini
Faculty of Engineering
Università Telematica Internazionale UNINETTUNO
Roma, Italy
m.fl[email protected]
Maurizio Naldi
Dpt. of Computer Science and Civil Engineering
Università di Roma Tor Vergata
Roma, Italy
Email: [email protected]
which may be tiny by themselves but add up to a large
fraction of the market. In those cases, we may wish to obtain
nevertheless an estimate of the HHI.
A point estimate for the HHI in the presence of incomplete
market data has been provided in [6]. Being based on
a combinatorics analysis to assess the probability that a
customer chooses one of the manufacturers, that estimate
assumed however that the market share was represented by
an integer number. A further attempt has been again made
by Nauenberg et alii [7], adopting the Bradford distribution
to fit the incomplete data and estimate the unknown market
shares.
An upper and lower bound to estimate the HHI have
been proposed in the context of market power acquisition by
Kanagala et alii [8]. However, those bounds relied on quite
specific assumptions, which may not be met in most cases.
The lower bound was computed under the hypothesis that the
dominant market participant had 20% of market share and
the rest of the total market share was uniformly distributed
among N − 1 participants. The upper bound adopted instead
the hypothesis that the market was made of 5 companies,
each having a 20% share.
Here we propose a new approach to estimate the HHI
when we do not know all the market shares. Instead of
providing a point estimate, we aim an interval estimate.
On the basis of the only assumption that we know the
largest market shares, without any further assumption on
the probability model that generates the market distribution,
we derive both a lower and an upper bound for the HHI,
which serve as the two ends of our interval estimate. Unlike
[8], these bounds do not rely on any specific or arbitrary
assumption, and are therefore tight bounds, valid under any
market condition.
We apply these bounds to three cases, considering respectively a set of real data and two sets of synthetic data, where
we assume that the sales follow respectively a Zipf law and
a Pareto probability distribution. We show that the interval
estimate is quite tight even when the unknown market shares
add up to a signification fraction of the overall market.
The paper is organized as follows. In Section II, we define
Abstract—An interval estimate is provided for the
Herfindahl-Hirschman Index (HHI) when the knowledge about
the market is incomplete, and we know just the largest market
shares. Two rigorous bounds are provided for the HHI, without
any further assumptions. Though the interval gets wider as
the sum of the known market shares gets smaller, the estimate
proves to be quite tight even when the fraction of the market
that we do not know in detail is as high as 30%. This robustness
is shown through three examples, considering respectively a set
of real data and two sets of synthetic data, with the company
sizes (a proxy for market shares) following respectively a Zipf
law and a Pareto distribution.
Keywords-Herfindahl-Hirschman Index; HHI; Zipf law;
Pareto distribution; Market structure;
I. I NTRODUCTION
The analysis of the structure of a market is a relevant
issue. That means understanding how it is divided among
the companies, whether it is characterized by a significant
competition or instead is dominated by a few companies or
even a single company. The analysis of the degree of concentration (or competition, its counterpart) is a key driving
force for the industrial policy of many administrations. This
analysis is often conducted by considering the HerfindahlHirschman Index (HHI) as a parameter measuring the degree
of concentration [1].
The use of the HHI encompasses many industrial sectors
and serves many application purposes. Just to name a
few examples, it has been considered in [2] to measure
the degree of competition in the DSL (Digital Subscriber
Line) telecommunication access market, as an indicator of
the harmony of accounting measurement practices (i.e., if
companies use the same accounting practice) in [3], to
investigate bank pricing in [4], and to measure concentration
in banking systems in [5].
The computation of the HHI requires however the precise
knowledge of the market shares of all the companies, so
that the 100% of the market is accounted for. It is often
the case that we do not know all of them. For example, the
market may include a large number of small manufacturers,
for which we do not know precisely the market shares,
978-1-4799-4923-6/14 $31.00 © 2014 IEEE
DOI 10.1109/UKSim.2014.66
317
HHI
Concentration degree
< 0.01
0.01 ÷ 0.15
0.15 ÷ 0.25
> 0.25
negligible competition
absence of concentration phenomena
moderate concentration
strong concentration
We now provide some examples of real world evaluations
of the HHI. We consider first the market structure of mobile
telephony services. The data for 2011 in Italy, reported
by the Italian Telecommunications Regulatory Authority
AGCOM in its Annual report, are shown in Table II [9].
On the basis of these data, we can easily compute the HHI,
which is 0.30785.
Table I
HHI
VALUES AND DEGREE OF CONCENTRATION
Operator
the HHI and provide examples of its computation. We derive
the bounds on the HHI in Section III and apply them in
Section IV.
II. T HE H ERFINDAHL -H IRSCHMAN I NDEX
Total
100
According to the typical classification reported above, this
market exhibits therefore a strong degree of concentration.
III. B OUNDS ON HHI
In Section II, we have provided the definition of the HHI
and described its application in several contexts. As stated in
the Introduction, it is often the case that one does not have
access to detailed information concerning all the companies’
market shares. Though the precise HHI value cannot be
computed in those cases, it is desirable to have at least an
estimate of the true value. In this section, we derive upper
and lower bounds for the HHI, which can serve as an interval
estimate of the actual HHI value.
In many cases, we know just the market shares of the most
important market players. This may happen, for example if
the market includes many small manufacturers (or service
providers, if we are considering a market for services), or
the complete data are simply not available.
Suppose that we know just the market shares of the M
largest companies in the market. Without loss of generality,
we label the market shares in non-increasing order, so that
s1 ≤ s2 ≤ · · · ≤ sM ≤ sM +1 ≤ · · · ≤ sN . Then we know
s1 , s2 , . . . , sM , while we don’t know sM +1 , sM +2 , . . . , sN .
Since all the market shares are positive numbers, the true
HHI is larger than the partial sum of the squares of known
market shares:
i=1
where the following obvious constraint holds
si = 1.
37.1
35.4
19.7
7.8
Table II
M ARKET SHARES OF MOBILE NETWORK OPERATORS IN I TALY (2011)
The Herfindahl-Hirschman Index is a well established
tool to assess the degree of competition in a market (or,
conversely, the closeness to a monopolistic market structure).
In this section, we provide its definition, the relation between
its values and the market structure, and some examples of
actual values in several contexts.
If we consider a market, where N companies operate, and
indicate the market share of the i-th company by si (i =
1, 2, . . . , N and 0 < si ≤ 1), the Hirschman-Herfindahl
Index is
N
HHI =
s2i ,
(1)
N
Market share [%]
Vodafone
Telecom Italia
Wind
H3G
(2)
i=1
THe HHI is a well established indicator of the degree
of competition in the market, with low values indicating a
high degree of competition and, conversely, higher values
betraying closeness to a monopoly.
Though in the definition (1) employing market shares it is
a normalized index, its range is not the whole [0,1] interval.
In fact, we can see that the maximum value is 1 (when the
market is made of a single company, which has a market
share s1 = 1), but the minimum value, which holds when
the market is equally shared by all the companies (perfect
competition), is
2
2
N 1
1
1
min HHI =
(3)
=N
= ,
N
N
N
i=1
HHI =
so that the HHI varies within the range
N
s2i >
i=1
1
≤ HHI ≤ 1.
(4)
N
Though an exact mapping of HHI values onto concentration degree is not possible, given the qualitative nature of the
concentration concept, the classification of Table I is rather
established.
M
s2i .
(5)
i=1
That partial sum is therefore itself a (rough) lower bound
for the HHI. A tighter lower bound as well as an upper bound
can be found by introducing the residual sum of squares
Δ=
N
i=M +1
318
s2i > 0,
(6)
so that
HHI =
M
s2i
+ Δ.
(7)
i=1
min Δ = (N − M )
M
1 − i=1 si
N −M
2
=
2
M
1 − i=1 si
N −M
,
i=M +1
= s2M Q +
i=1
1−
M
i=M +1
2
si − sM Q
(14)
,
i=1
and the resulting upper bound for the HHI
2
M
M
2
2
HHI <
si + sM Q + 1 −
si − sM Q . (15)
i=1
i=1
Depending on the direction of the inequality R ≶ sM , we
end up therefore with two different ranges for the HHI, with
the first one being valid when R ≤ sM and the second one
when R > sM . The set of bounds is reported in Table III.
(10)
we have two cases, depending on the direction of the
inequality R ≶ sM .
If R ≤ sM , we can assign all the residual market share to
the M +1-st company, since the constraint sM +1 = R < sM
holds true. Though this means that the remaining companies
(i.e., those labelled M + 2, M + 3, . . . , N ) have a zero
market share, so that the market is actually made of M + 1
companies rather than N . We can always approximate this
situation with a full market of N companies, assuming that
the smaller N − M − 1 have a market share → 0. The
resulting maximum value for Δ is
2
N
M
2
2
si = (sM +1 ) = 1 −
si , (11)
max Δ =
IV. T IGHTNESS OF BOUNDS
In Section III, we have derived both the lower and the
upper bound on the HHI, which allows us to obtain an
interval estimate for the HHI. In order to be of use, the
interval associated to such an estimate should be as narrow
as possible. In this section, we evaluate the tightness of
those bounds in three situations. We first consider real data,
and then synthetic data obtained from two models typically
found for the distribution of company sizes: Zipf and Pareto
distributions.
A. Real data
We consider the market of mobile phones. According to
the data reported in Table IV-A, provided by Gartner [10],
the market is made of 10 major players, but a large slice,
making up roughly 1/3 of the whole sales, is fragmented
among a host of minor manufacturers. We cannot compute
the exact HHI value in this case. We resort therefore to the
bounds obtained in the previous section.
i=1
and the resulting upper bound for the HHI is provided by
the following inequality
2
M
M
2
si + 1 −
si .
(12)
HHI <
i=1
+
With this Δ-maximizing assignment, we have
2
M
+Q
M
+Q
M
2
max Δ =
si + 1 −
si −
si
i=1
i=M +1
2
(1− M
i=1 si )
−M
2
N
+ 1− M
i=1 si
2
M 2
M
2
i=1 si + sM Q + 1 −
i=1 si − sM Q
2
i=1 si
M 2
i=1 si
On the other hand, if R > sM , we can assign the
following residual market shares so that Q = R/sM have
a market share equal to sM
⎧
i = M + 1, . . . , M + Q
⎨ sM
R − sM Q i = M + Q + 1
(13)
si =
⎩
0
i = M + Q + 2, . . . , N
By a similar reasoning, we can obtain an upper bound.
In fact, we know that the highest values of the HHI are
obtained when the market is concentrated in as few hands
as possible. Likewise, the higher values for Δ are obtained
when the residual market shares are concentrated in as few
companies as possible. However,
we cannot always consider
M
all the residual market share 1 − i=1 si to be concentrated
in a single company (namely the M + 1-st), since by the
ordering imposed we have sM +1 ≤ sM . By introducing the
residual market share
si ,
M
Table III
B OUNDS ON HHI
which provides the following lower bound inequality for the
HHI
2
M
M
1 − i=1 si
HHI >
(9)
s2i +
N −M
i=1
R=1−
Lower
Upper (R > sM )
(8)
M
Bound
Upper (R ≤ sM )
We now have to identify lower and upper bounds for Δ.
Since we know that the lowest value for the HHI takes
place when all the companies have the same market share,
as in Equation (3), this is true for the residual sum of squares
as well, so that the minimum value of Δ takes place when
1− M
i=1 si
, i = M + 1, M + 2, . . . , N :
si = N −M
Type
i=1
319
Brand
0.07
Share [%]
22.0
19.1
7.5
3.9
3.3
2.7
2.1
2.0
1.9
1.8
33.7
0.068
0.066
HHI
Samsung
Nokia
Apple
ZTE
LG
Huawei
TCL
RIM
Motorola
HTC
Others
HHI max
HHI min
0.064
0.062
Table IV
S ALES OF MOBILE PHONES IN 2012
0.06
6
7
8
9
10
11
12
Number of known market shares
Figure 1.
Since we know M = 10, but we don’t know the total
number N of manufacturers, we cannot use the lower bound
(9), but rather the less tight (5). We obtain
min HHI =
10
s2i
= 0.0663.
been expressed as a function of the truncated Riemann Zeta
N
−α
, which can be immediately
function ζN (α) =
i=1 i
seen as tightly connected with the generalized Zipf law (19):
(16)
i=1
As to the upper bound, the lowest known market share is
s10 = 0.018, and the residual market share is R = 0.337, so
that R > s10 , and we fall in the second upper bound case
of Table III. Since Q = R/s10 = 18, the upper bound is
2
10
10
2
2
max HHI =
si + 18s10 + 1 −
si − 18s10
i=1
HHI =
ζN (2α)
2 (α) .
ζN
(20)
Here we consider that the market shares follow the
generalized form of the Zipf law, assuming that we however
know just the larger M market shares, and see what the
resulting bounds are for the HHI.
In [15] the value of the Zipf parameter α has been found
to be about 1 for the U.S.market. The wider range 0.7 ÷ 1.2
has been observed in [17] for Japanese companies, while
an even wider range 0.44 ÷ 1.4 has been estimated for the
slope of the power law for 20 countries in America, Asia
and Europe [16].
Here, we report in Figure1, Figure 2, and Figure 3 respectively the results for three values of the Zipf parameter that
encompass these ranges, i.e., α = 0.5, 1, 1.5, for a market
with N = 20 players, when the number of known market
shares ranges from 6 to 12. As expected, the incertitude cone
shrinks as the number of known market shares increases.
The incertitude range is however quite small even at its
maximum, when we know just 6 market shares out of 20
(less than 1/3), since the interval width with respect to the
interval center is just ±6.1% when α = 0.5, reduces to
±3.2% when α = 1, and becomes a negligible ±0.58%
when α = 1.5. The bound can then be considered quite
tight. It is to be noted that such tight bounds are achieved
even in the presence of high residual values for the unknown
market shares. In Table V we report the residual R for the
cases considered here. In one case (M = 6 and α = 0.5),
the fraction of unknown market shares is even larger than
that of known ones.
i=1
= 0.00663 + 0.00583 + 0.000169 = 0.0126.
(17)
The interval estimate for the HHI is then
0.00663 < HHI < 0.0126,
HHI bounds for the Zipf law (α = 0.5)
(18)
which leads us to classify that market as one with negligible
concentration.
B. The Zipf model
The generalized Zipf law states that, if we rank a collection of N subjects in non-decreasing order according to
their size, the product of a power of the rank and of the size
of each subject is constant throughout the collection, so that
the size decreases with the rank according to the equation
1
.
(19)
iα
That law, initially observed for α = 1 by Zipf in a linguistic
context [11], has been found to represent well a number
of other phenomena (see [12] for an explanation of its
properties and [13], [14] as two examples of its application
in other contexts). In particular, it has been found to describe
well the distribution of firms’ sizes [15], [16], [17], [18] and
is therefore a good candidate to describe the distribution of
market shares. In [19], the relationship between the HHI
and the Zipf law has been investigated, and the HHI has
si ∝
320
0.13
HHI max
HHI min
100
0.128
Empirical pdf
80
HHI
0.126
0.124
60
40
20
0.122
0
0.12
6
7
8
9
10
11
12
0.04
Number of known market shares
0.08
0.1
0.12
0.14
HHI
HHI bounds for the Zipf law (α = 1)
Figure 2.
0.06
Figure 4.
Empirical distribution of HHI under the Pareto distribution
0.26
HHI max
HHI min
cumulative distribution function [21].
In the Pareto distribution, the probability that a company
has a size lower than x is (with α > 2 to have at least the
first two moments finite [22])
α
k
F (x) = 1 −
.
(21)
x
0.258
HHI
0.256
0.254
The empirical pdf of the resulting HHI is shown in
Figure 4, where its skewness and the tail we expect in a
power law model are visible.
0.252
0.25
6
7
8
9
10
11
12
Number of known market shares
Figure 3.
When we come to the estimates of HHI, in this case we
do not find a deterministic relationship as in the Zipf law,
but rather a scatterplot. For a sample of 1000 simulation
instances (with k = 1 and α = 3) and a market made of 20
players, we find the belts reported in Figure 5 and Figure 6
respectively. The scatterplot is obtained by computing the
upper and lower bound for HHI for each simulated instance
(which exhibits a random value of R). The belts obtained
in such a way are plotted versus the overall market share
owned by the M largest companies (i.e., 1 − R).
HHI bounds for the Zipf law (α = 1.5)
C. The Pareto distribution
Another model, closely related to the Zipf law, to describe
the distribution of firms size, is the Pareto distribution [20].
While the Zipf law provides deterministic values for ranked
quantities, the Pareto distribution offers a probability model
where no ranking is involved. However, the two models,
which are both power-laws, bear quite a direct relationship,
which is self-evident by plotting the Pareto complementary
M
6
9
12
0.5
α
1
1.5
52.1
38.1
26.1
31.9
21.4
13.7
15.8
9.5
5.7
V. C ONCLUSION
When we do not know the whole distribution of the
market, the Herfindahl-Hirschman Index cannot be computed exactly. We have derived both a lower and an upper
bounds on the HHI. These bounds do not require further
assumptions and can be used to obtain an interval estimate
of the HHI. We have demonstrated their application in three
cases, considering respectively a set of real data and two sets
of synthetic data, obtained by assuming that the company
sales (a proxy for the market shares) follow respectively a
Zipf law and a Pareto distribution. In all cases, we have
shown that the estimates provide quite a tight interval even
Table V
R ESIDUAL MARKET SHARE R [%]
321
[6] E. Nauenberg, K. Basu, and H. Chand, “HirschmanHerfindahl index determination under incomplete information,” Applied Economics Letters, vol. 4, no. 10, pp. 639–642,
1997.
HHI
0.2
[7] E. Nauenberg, M. Alkhamisi, and Y. Andrijuk, “Simulation of
a Hirschman-Herfindahl index without complete market share
information,” Health Economics, vol. 13, no. 1, pp. 87–94,
2004.
0.15
0.1
[8] A. Kanagala, M. Sahni, S. Sharma, B. Gou, and J. Yu, “A
probabilistic approach of hirschman-herfindahl index (hhi) to
determine possibility of market power acquisition,” in Power
Systems Conference and Exposition, 2004. IEEE PES, Oct
2004, pp. 1277–1282 vol.3.
0.05
0.35
0.4
0.45
0.5
0.55
0.6
0.65
Known market share
Figure 5.
[9] AGCOM,
“Annual
report,”
2012,
available
at
www.agcom.it/Default.aspx?message=contenuto&DCId=5.
HHI bounds for the Pareto distribution (M = 6)
[10] Gartner, “Gartner says worldwide mobile phone sales declined 1.7 percent in 2012,” Press release, February 2013,
available at www.gartner.com/newsroom/id/2335616.
0.25
[11] G. K. Zipf, Human behavior and the principle of least effort.
Addison-Wesley, 1949.
HHI
0.2
[12] W. J. Reed, “The Pareto, Zipf and other power laws,” Economics Letters, vol. 74, no. 1, pp. 15–19, 2001.
0.15
[13] X. Gabaix, “Zipf’s law for cities: an explanation,” The Quarterly journal of economics, vol. 114, no. 3, pp. 739–767, 1999.
0.1
[14] M. Naldi and C. Salaris, “Rank-size distribution of teletraffic
and customers over a wide area network,” European Transactions on Telecommunications, vol. 17, no. 4, pp. 415–421,
2006.
0.05
0.65
0.7
0.75
0.8
Known market share
Figure 6.
[15] R. L. Axtell, “Zipf distribution of us firm sizes,” Science, vol.
293, no. 5536, pp. 1818–1820, 2001.
HHI bounds for the Pareto distribution (M = 12)
[16] J. Ramsden and G. Kiss-Haypal, “Company size distribution
in different countries,” Physica A: Statistical Mechanics and
its Applications, vol. 277, no. 1-2, pp. 220 – 227, 2000.
when the unknown market shares account for a significant
fraction of the overall market.
[17] K. Okuyama, M. Takayasu, and H. Takayasu, “Zipf’s law
in income distribution of companies,” Physica A: Statistical
Mechanics and its Applications, vol. 269, no. 1, pp. 125–131,
1999.
R EFERENCES
[1] S. A. Rhoades, “The Herfindahl-Hirschman index,” Federal
Reserve Bulletin, no. Mar, pp. 188–189, 1993.
[18] P. L. Conti, L. D. Giovanni, and M. Naldi, “Blind maximum
likelihood estimation of traffic matrices under long-range
dependent traffic,” Computer Networks, vol. 54, no. 15, pp.
2626–2639, 2010.
[2] J. Bouckaert, T. Van Dijk, and F. Verboven, “Access regulation, competition, and broadband penetration: An international study,” Telecommunications Policy, vol. 34, no. 11, pp.
661–671, 2010.
[19] M. Naldi, “Concentration indices and Zipf’s law,” Economics
Letters, vol. 78, no. 3, pp. 329 – 334, 2003.
[3] R. H. Taplin, “Harmony, statistical inference with the herfindahl h index and c index,” Abacus, vol. 39, no. 1, pp. 82–94,
2003.
[20] J. Growiec, F. Pammolli, M. Riccaboni, and H. E. Stanley,
“On the size distribution of business firms,” Economics Letters, vol. 98, no. 2, pp. 207–212, 2008.
[4] T. H. Hannan, “Market share inequality, the number of
competitors, and the hhi: An examination of bank pricing,”
Review of Industrial Organization, vol. 12, no. 1, pp. 23–35,
1997.
[21] L. A. Adamic, “Zipf, power-laws, and Pareto - a ranking
tutorial,” Available at www.parc.xerox.com/istl/groups/iea/papers/ranking/ranking.html, 2000.
[5] C. Alegria and K. Schaeck, “On measuring concentration in
banking systems,” Finance Research Letters, vol. 5, no. 1, pp.
59–67, 2008.
[22] C. Walck, “Handbook on statistical distributions for experimentalists,” 2007.
322