A Bayesian Latent Variable Mixture Model for Filtering Firm Profit Rate

SCHWARTZ CENTER FOR
ECONOMIC POLICY ANALYSIS
THE NEW SCHOOL
WORKING PAPER 2014-1
A Bayesian Latent Variable Mixture Model
for Filtering Firm Profit Rates
Gregor Semieniuk and
Ellis Scharfenaker
Schwartz Center for Economic Policy Analysis
Department of Economics
The New School for Social Research
6 East 16th Street, New York, NY 10003
www.economicpolicyresearch.org
FEBRUARY
2014
Suggested Citation: Semieniuk and Scharfenaker, Ellis. (2014) “A Bayesian
Latent Variable Mixture Model for Filtering Firm Profit Rates.”
Schwartz Center for Economic Policy Analysis and Department of Economics,
The New School for Social Research, Working Paper Series.
* This research benefited from discussions with Duncan Foley, Sid Dalal as well as from suggestions by
Mishael Milakovíc, Amos Golan, Anwar Shaikh, Sid Chib, Fernando Rugitsky, participants in Sid Dalal’s
Columbia course “Bayesian Statistics”, at the ASSRU seminar in Trento, at the New School Economics
Dissertation Workshop and at the meeting of the International Society for Bayesian Analysis at Duke
University. We are grateful to Columbia University Library for access to the COMPUSTAT dataset. Gregor
gratefully acknowledges financial support from INET.
†
Both Department of Economics, New School for Social Research, New York City, USA. Email addresses
Email address: [email protected] (Ellis) and [email protected] (Gregor).
Abstract: By using Bayesian Markov chain Monte Carlo methods we select the proper
subset of competitive firms and find striking evidence for Laplace shaped firm profit rate
distributions. Our approach enables us to extract more information from data than
previous research. We filter US firm-level data into signal and noise distributions by
Gibbs-sampling from a latent variable mixture distribution, extracting a sharply peaked,
negatively skewed Laplace-type profit rate distribution. A Bayesian change point
analysis yields the subset of large firms with symmetric and stationary Laplace
distributed profit rates, adding to the evidence for statistical equilibrium at the economywide and sectoral levels.
Keywords: Firm competition, Laplace distribution, Gibbs sampler, Profit rate,
Statistical equilibrium
JEL codes: C15, D20, E10, L11
1
Firm Competition and Bayesian Methods
We propose Bayesian computational methods, in particular Markov chain Monte
Carlo simulations for selecting the subset of firms for analysis of firm profitability
under competition. This enables us to extract more information from firm level
data than the exogenously selected subset yielded in previous research by Alfarano
et al. (2012) on profit rates and e.g. Stanley et al. (1996); Bottazzi et al. (2001);
Malevergne et al. (2013) on firm growth rates. We obtain striking results of Laplaceshaped profit rate distribution by applying our selection method to US firms in the
COMPUSTAT dataset from 1950-2012 . This confirms the findings by Alfarano and
Milaković (2008) and Alfarano et al. (2012) for a much larger economy-wide dataset.
Moreover, we extend the sectoral results of Bottazzi and Secchi (2003a) for growth
rates to profit rates.
The formation of a general rate of profit, around which profit rates gravitate, takes
place through competition and capital mobility. This point was stressed by classical political economists beginning with Adam Smith, who theorized that through
competition capital would instinctively move from one area to another in its endless search for higher rates of profit. From this instinctive movement, a tendency of
the equalization of profit rates across all industries would emerge. Excluded from
2
this process of equalization would be industries that are shielded from competition
by legislative measures and monopolies or public enterprises that strive for other
objectives than profitability.
Recently, Alfarano et al. (2012) in this journal and also Alfarano and Milaković
(2008), examine firm-level data and find that private firms in all but the financial
services participate in the competitive processes as described by Adam Smith, which
they call “classical competition.” Interested in determining the shape of the distribution of profit rates, Alfarano et al. (2012) fit a generalized error distribution also
known as the Subbotin distribution to a sample of firm-level data. They find that
throughout their sample the Subbotin’s shape parameter is close to one for a subset
of long-lived firms, implying profit rates are distributed as a Laplace distribution.
The Laplace distribution is already a common result in research on firm growth
rates. Stanley et al. (1996); Amaral et al. (1997); Bottazzi et al. (2001); Bottazzi and
Secchi (2003a,b) find Laplace distributed, or nearly Laplace distributed (Buldyrev
et al., 2007) firm growth rates in particular industries. These studies are part of a
larger literature that examines distributions of various firm characteristics (Farmer
and Lux, 2008; Lux, 2009). A preview of our results in Figure 1 shows that for our
data we find strong evidence for Laplace distributed profit rates by sector as well as
for the total economy.
One problem in many such analyses is selecting the proper subset of the data.
Some papers examine specific sectors, e.g. the pharmaceutical sector in Bottazzi
et al. (2001) or use preselected datasets such as the Fortune 2000 (Alfarano and
Milaković, 2008). For more general datasets, subsets are commonly selected using
an exogenously determined lower size bound or minimum age. For example, in their
3
Figure 1: Profit rate histograms on semi-log plot with maximum likelihood fit in the
class of Laplace distributions. Data is pooled over the years 1961-2012, with filtering by
Bayesian mixture and change point models having endogenously removed noise and entry
and exit dynamics, as described in sections 3 and 4 of this paper. Results look similar for
plots of single years.
àà
ì
ì
ì
ì
ì
àà
ì
ò
à
æ
ì
ì
òò
ì
ò
ì
æ
ì
ò
àò
ì
ì
ò
ì àæ
àì
ì
ì
ò
æì àò
ì
ò
ì
àò
ì àæ
ò
ì
ò
ì
ò
àòì àì
ì
æ
ò
æì
ì
ì
ò
ì
ò
à
ì
ì
ò
à
ì
ì
ì
à
æ
ì
òì
æ
òæ
àà
ì
àò
ì
ì
ì
ò
ààì
ì
ì
ì
ì
ò
ì
æ
ò
àà ì
ì
ì
ì
ì
ì
ì
àààò
ò
òòò
ò
ì
æ
æ
ì
à
ààà ì
ì
ì
ì
ò
ìòòææ
ì
ì
àì
à
ìò
ìò
ò
ì
ìò
ì
ì
ò
ì
àì
ì
ì
ò
ì
ì
ì
òò
ì
àà ò
à
àà
ì
ì
ì
ì
ò
ì
ì
ì ò
ì
ì
ì
ì
æ
ææ
æææ
æ
ì
ìò
ò
òòìòò
ì
ì
ì
ì
æ
à
àà
à
à
àà
à
à
à
à
à
òìì ò
ò ò
ò
ì
ììì
æ
òò
ææ
ò
òò ò òò
ò
ò
ò
ò
ò
ò
ò
ìì
ìì
ì
ìì
ì
ì
ì
ììò
ì òò òò
æ
ò
òò òòòò
æ ìò
òòì
æ òòì æ
ò
ì
ì
ì
ì
ì
ìì ì
ì
ì
ì
ìì
ìì ìì
ì ìì
ì
ì ìì
ìì ììì
p@rD
1
0.1
0.01
ì
æì
æì
ìì
ììì
æì
æì
ìì
ì ì
-2
-1
10
p@rD
1
0.1
0.01
ì
æ
æ æ
ò
æ
òæ
ò
-2
Agriculture
à
Construction
ì Manufacturing
ò
Mining
ììì
ì
ì
ì
ì
ì
ì
ìì
ì
ììì
ææ
ì
æì
ìì
ì
ìì
ì
ìì
ææì
æì
æ
0
1
2
ì
ì
ì
ì
ì
ìà
ò
ì
à
ò
à
ò
à
à
ò
à
ò
ò
æ
à
à
æ
ò
ò
à
æ
ì
ì
ò
æ
ò
à
ò
à
ò
æ
à
æ
ì
ò
ò
ò
ò
à
à
ì
æ
ò
ò ì
ò
æ
ì
æ
à
ìà
à
ò
ò
à
ò
ò ì
æ
ò
ò
à
æ
à
ìà
ò
à
à
ò
ì ìò
æ
ò
ò
ò
ò
à
æ
ò
ì ì
æ
ò
ò
à
ì
ò
ìà
ò
æ
ò
à
æ
ò
àì
ò
ì
ò
à
æ
ò
òì
ò
ìà
ì
ò
òò
æ
à
à
à
ò
æ
ì
ì
à
ì
ò
ìò
ò
ì
æ
ò
à
ò
ææ
ì
ì
à
ò
à
à
ò
à
ò
ì
ò
ææ
ò
òæ
ì
ò
à
ì
ì
ò
ìà
ò
ò
ò
ì
æ
à
ì
ì
òì
à
òæ
à
æ
ò
ì
ì
ò
ò
ò
ì
à
ò
à
ò
à
à
ì
ò
ì
òææ
ò
ò
æ
à
ì
ò
àà
ò
ææ
æ
ò
ò
à
ò
à
ì
ò
ì
ò
ò
à
àà
à
ìà
òì
ò
ò
æææ
ò
òòæ
òò
ò
ì
òò
òæ
ì
ì
ìà
ò
ì
ò
à
òòæ
ì
ò
ò
ò
ò
ìà
ò
ò
òòà
æ
à
à
à
à
ò
ò
ò
æ
ò
ì
òò
ææ
ò
ì
ì
ò
æ
à
àò
à
òò
æ
òò
æò
ì
ì
ò
òààà
à
à
òæ
ì
ì
ò
à
à
àòò
ò
ò
ææ
ò
òò
òì
òà
òæææ
ì
ì
ì
ì
ò
ò
ò
àì
à
àà
àòòòàæàà
òò
ò
ò
ì
ì
ì
ì
æ à
ò
ò
à
à
æ
àæææ
òì
òòòæ
ì
ìòòò
ò
òì ì
ò
ò
ò
æ æàò
æ
æì
æà
òà
òò
ò
ì
ì
ìàà
à
àìà
à
à
à
ò
ò
ò
ò
òò
ò
ò æì
ò
òò ì ì
òæ
ò æ
æ
ì òòòì
æ
ò
òì ì
òòì
ò
æ
æò
æ
æææ
æææææ
ò
ò
ò
ò
òò
ò
òàààà
ò
ò
ò ææ
ò
òæ
æ
àà
à
à ææ
àæ
à
æ
à
à
àà
à
à
àò
à
àò
à
à
à æ
à
òì
ò
ò
òì
òòòòììììì
òò
òòòì
ò
ò òì
òìì
ì
æ ò òò
ì
æ
ì
ì
ì
æ
ì
ì
ì
æ
ì
ì
ìì
ì
æ
æ
ì
æ
æ
ææì
æì
æò
ì
æ
ææ
ææ
æ
æ
ò ææ
ò
æ
ò
ò
òæ
òò
òòòò
òæ
òæò
ò
òòò
òææ
òæ
ò
òò
ò
òò
òæ
òæ
ò
æ
òò òææ
æ
òæ
ò æ
òòæ
òææ
òòòæ
òæ
òæ
òæ
òòò
æ
òò
ò
æ
òò
òòòò
òò
-1
æ
æ
Services
à
Trade
ì Transport
ò
All Industries
òòòæ
ò
òòòò
æ
òòò
òæ
òòæ
òæ
òò òæ
òò æ
òòòæ
òò
ææ
òæ
ò
0
1
2
r
seminal paper on the statistical theory of firm growth rates, Simon and Bonini (1958)
disregard firms with a capital stock below a lower bound. Subsequently, Hymer and
Pashigian (1962) investigate only the thousand largest U.S. firms, and Amaral et al.
(1997) and Malevergne et al. (2013) subset their data using various lower bounds on
firm revenue. These exogenous cut-offs appear to be necessary to rid the dataset of
noise as well as of new or dying firms, which distort the distribution of firms engaged
in the competitive process. In their analysis of profit rates, Alfarano et al. (2012)
4
consider only firms that are alive at least 27 years or for the entire period spanned
by the dataset. While their approach is a good first approximation, they necessarily
discard valuable information.
In this paper we argue that instead of a more or less arbitrary cut-off, a Bayesian
approach enables us to extract more information from the data. As a natural starting
point our prior is based off of the previous research which takes the distribution
of profit rates as Laplace. This we take as one of two components of a mixture
distribution into which all firms are mapped, regardless of whether they partake
in competition or are disruptive noise. In order to decide for each firm whether
it is in the Laplace distribution, which we designate as the signal distribution, or
confounding noise, we will give a latent variable to each firm observation that is
one if the firm belongs to the signal distribution and zero if it does not. We then
compute each latent variable’s posterior probability of taking on the values one or
zero, and thus the firm’s probability of being part of the competitive process. If the
probability of its being in the signal distribution is not sufficiently high, we discard it.
For our dataset, after discarding the noise, we retain nearly 97% of all observations
regardless of age or size. Unlike previous studies, this allows us to analyze the profit
rate distribution throughout the entire competitive process preserving significantly
more information. As a second step, we apply a Bayesian change point analysis to
filter out small firms not subject to entry-exit dynamics, again using the Laplace
distribution as our prior, to demonstrate an alternative way of obtaining a subset
of stationary Laplace distributions.1 Both steps are made possible by using Markov
chain Monte Carlo (MCMC) methods.
1
After this more restrictive selection we discard 45% of our original data compared toAlfarano
et al. (2012) who discard 79% of their data.
5
The Gibbs sampler, a widely-used MCMC method, is an extremely useful tool in
this type of economic analysis. It has been introduced in this journal by Waggoner
and Zha (2003) in the context of structural vector auto regressions, but to the best
of our knowledge has never been applied to the problem of firm distributions. In
general, MCMC methods are computational procedures for estimating complicated,
multi-parameter densities in statistical problems making them particularly useful
for Bayesian analysis. They were first used by Metropolis et al. (1953), but the
increased computing capacity has led to a proliferation of these techniques since the
1990s. MCMC methods reproduce densities by sampling a large number of values
from the full conditional marginal densities of a joint, high-dimensional, stationary
distribution whose analytical form may be difficult to find. Since the parameter
value sampled at t depends on the one at t − 1, the resulting sequence of samples is
a Markov chain.
In section two, we will introduce our Bayesian approach to the profit rate filtering
problem and the Gibbs sampler for a mixture distribution using latent variables.
Section three will show the results that accrue from applying it to the COMPUSTAT
database. Section four will use additional Gibbs sampling to analyze the existence
of a Laplace distribution in a subset of firms and relate the result to statistical
equilibrium arguments in the literature. Section five will summarize the results and
point to further research opportunities.
6
2
A mixture model for profit rates
2.1
A Bayesian approach
Using prior knowledge and a model for the likelihood, we would like to infer each
firm’s posterior probability of being in the signal distribution. From previous research, our prior for the shape of the likelihood of firm profit rates is a two component mixture distribution of a Laplace signal and an arbitrary noise distribution.
We specify the noise as a Gaussian distribution with a large variance.2 The density
function for the Laplace/Gaussian mixture is
f (y|q, µ1 , σ1 , µ2 , σ2 ) = q ∗ L(y|µ1 , σ1 ) + (1 − q) ∗ N (y|µ2 , σ2 )
(1)
L denotes a Laplace distribution, N denotes a Gaussian distribution with µ1 and
µ2 being location parameters and σ1 and σ2 being scale parameters; the parameter q
weighs the two distributions. The mixture distribution can be rewritten with latent
variables. For any observation i of y, and i = 1, ..., n, write (1) as
f (yi |z, µ1 , σ1 , µ2 , σ2 ) = [L(y|µ1 , σ1 )]zi [N (y|µ2 , σ2 )](1−zi )
(2)
where z is Bernoulli distributed, zi ∼ Bern(q).
We assume that the priors of the parameters are distributed independently of each
other, and that all parameters have non-informative prior distributions with those of
σ1 , σ2 restricted to the interval (0, ∞) and that of q to (0, 1). We use uniform priors
2
Sensitivity tests using uniform and Laplace noise distributions made no qualitative difference
for the sorting process.
7
for µ1 and σ1 because the Laplace distribution does not belong to the exponential
family of distribution and thus has no computationally convenient conjugate priors.
Having little preconception about the prior distribution of the noise, we use Jeffrey’s
prior for µ2 and σ2 . Finally, we use a uniform prior for q.
Since (2) has a parameter, zi , for each firm observation, the n latent variables,
z1 , ...zn , allow us to distinguish in which part of the distribution individual firms
lie. This requires that we know the posterior distribution of each zi . Since the joint
density for (2) has n + 5 dimensions, it is difficult to derive analytically. We will use
the Gibbs sampler to sample from the full conditional posteriors of each parameter
density. If 95% of all sample values from the Gibbs sampler of some zi are equal to
one, the ‘null hypothesis’ that a firm is noise is refuted and the observation is kept
in the signal distribution. This way, we avoid arbitrary cut-offs but let the mixture
distribution determine endogenously whether a firm belongs to the competitive signal
distribution or not. It remains to derive the full conditional posterior distributions
for the Gibbs sampler.
2.2
The Gibbs sampler
The Gibbs sampler is one of a set of Markov chain Monte Carlo (MCMC) methods
that, for collections of random variables with an analytically intractable joint density,
finds the marginal densities without using the joint density. Instead, only the full
conditional posterior distributions of all parameters are required.3 Using (2) and our
3
See Gelfand and Smith (1990) and Casella and George (1992) for the concept. MCMC methods
were originally use by Metropolis et al. (1953) and developed by Hastings (1970). The Gibbs
sampler in particular was named and used by Geman and Geman (1984). Its first application to an
econometric problem was to the best of our knowledge in Koop (1992). Bayesian econometrics goes
back much further, at least to the now classical textbook of Zellner (1971). Bayesian Monte Carlo
8
priors we obtain all full conditional posterior densities. They are:
q ∼ Beta(
X
zi + 1, n −
i
zi ∼ Bern
µ1 ∼ c
n
Y
X
zi + 1)
(3)
i
qL(yi |µ1 , σ1 )
qL(yi |µ1 , σ1 ) + (1 − q)N (yi |µ2 , σ2 )
L(µ1 |yi , σ1 )zi
, for i in 1, ..., n
(4)
(5)
i=1
σ1 ∼ IG(1 +
X
zi ,
X
zi |yi − µ1 |)
(6)
i
i
(1
−
z
)x
σ2
i i
i
P
P
,
n − i zi
n − i zi
!
P
n − i zi 1 X
2
,
(1 − zi )(xi − µ2 )
2
2 i
P
µ2 ∼ N
σ2 ∼ IG
(7)
(8)
The detailed derivations of these density and the Gibbs sampler pseudo code are
explained in Appendix B.
3
3.1
Empirical Results
The data
Our data is a set of firm profit rates from COMPUSTAT, Standard and Poor’s (2013),
spanning the years 1950-2012. We calculate the profit rate by dividing the difference
of net sales and operating costs, which equals operating income before depreciation,
methods (but not set up as Markov chains) with “importance sampling” have been introduced to
econometrics by Kloek and van Dijk (1978) and are reviewed in Geweke (1989) - just before the
takeover by MCMC methods.
9
by total assets.4
Net Sales − Operating Cost
Operating Income before Depreciation
=
=r
Total Assets
Total Assets
Our selection of variables is the same as that of Alfarano et al. (2012). Government
and financial services, real estate and insurance have been excluded because the former does not partake in competition and the latter adheres to different accounting
conventions for revenue calculation that makes this part of the industries incomparable. The summary statistics of the remaining 290,931 observations or on average
4618 observations per year are in Table 1.
Figure 2 reveals enormous outliers in the box plot of the distribution for each year.
While obscured in the plot, the summary statistics relate that the interquartile range
is “well-behaved” and that the median is at the historically credible level of 10%.
A mixture distribution with latent variables removes these outliers by identifying
them with appropriate indicators. As the profit rate has the unit 1/time, it is
insensitive to changes in prices except if the price index in the numerator develops
substantially differently from that in the denominator. Therefore we sometimes
display data pooled across years, even while we do all of our analysis and sampling
year by year to avoid muddling the results through cyclical influences such as business
cycles.
4
Details for our data are in Appendix A.
10
Table 1: Summary statistics, raw profit rates
Min
r -9860
1st Qu.
0.02
Median
0.10
Mean 3rd Qu. Max
6.67
0.17
153,600.00
Figure 2: The uncensored year by year box plots of the raw profit rates.
150000
r
100000
50000
3.2
2010
2005
2000
1995
1990
1985
1980
1975
1970
1965
1960
1955
1950
0
Results from the Gibbs Sampler
We run the Gibbs sampler separately for each year to avoid artifacts from pooling
heterogeneous years.5 For each year, the Gibbs sampler yields 500 samples of the
latent variable Bernoulli posterior distribution of each firm, zi,t , where t is the time
subscript. Discarding any observation with zi,t = 1 less than 95% of the iterations,
we use the remaining sample for our subsequent analysis.
Figure 3 shows box plots of the posterior distributions of q, µ1 , and σ1 for every
year after discarding the first fifty “burn-in” samples to allow the Markov chain
5
We compute all sampling algorithms with the software R (R Core Team, 2013).
11
Figure 3: Box plots for the Gibbs sampled q, µ1 ,σ1 1950-2012
1.0
0.25
0.4
0.20
σ1
0.3
q
µ1
0.9
0.15
0.2
0.8
0.10
2010
2005
2000
1995
1990
1985
1980
1975
1970
1965
1960
1955
1950
2010
2005
2000
1995
1990
1985
1980
1975
1970
1965
1960
1955
1950
2010
2005
2000
1995
1990
1985
1980
1975
1970
1965
1960
1955
1950
0.1
to converge to the stationary distribution. All distributions are sharply peaked,
confirming convergence of the distribution.6
The first thing that stands out in both Figure 4 and Figure 3 is a structural break
in the profit rates and the Gibbs-sampled parameter around 1960. If the results are
divided into sectors - an analysis we cannot pursue further here - the most obvious
departure from later years’ distributions is in the transport sector. This might be
due to competition-stifling legislation. Moreover, the number of annual data points
before 1960 is less than a thousand, but rises rapidly to more than 5000 in 1974,
reaching peaks of over 10,000 in the 1990s. Therefore, section 4 of this paper will
focus on the period after 1960.
Secondly, after 1960 the Gibbs parameters display trends. q, the parameter that
indicates the share of firms in the signal distribution, hovers at almost one and
gradually drops to around 95%. This implies that less than 5% of the data are
noise according to our algorithm. µ, the location parameter of the Laplace signal
distribution, fluctuates around a falling trend, its median falling from 17% in 1965
to a low of 7 % in 2001, preliminary evidence for a falling rate of profit. The σ series,
showing the scale parameter of the Laplace signal, follows a rising trend, that peaks
6
See Appendix C for trace plots of the sampled signal and mixture parameters.
12
Table 2: Summary statistics, raw (r) and signal (r*) profit rates
Min
r -9860
r* -1.65300
1st Qu. Median Mean
0.02
0.10
6.67
0.04265 0.11630 0.07506
3rd Qu.
0.17
0.17810
Max
153,600.00
2.42200
in 2001.
3.3
Analyzing the Signal Distribution
As Table 2 shows, the very dispersed distribution has been reduced to a well-behaved
one without having made any arbitrary cut-offs as regards firm size or length of
survival. We believe this captures the entire gamut of competition. The comparative
summary statistics, moreover, show that the maximum and minimum values have
been substantially shrunk towards the median, while the interquartile range has
remained virtually unaffected. This reflects the setup of the mixture distribution,
that keeps the signal, and discards highly dispersed noise, which is illustrated in
Figure 4.
The semi-log tent-plot of the pooled histogram of the signal distribution in Figure 5 shows an extremely peaked mode and the tent shape characteristic of the
Laplace distribution.7 The strong clustering around the mean profit rate, which we
refer to as general rate of profit, shows minimal fluctuation over time (see below
Figure 7 for single year plots). The second striking feature in Figure 5 is a left skew
7
It is important to realize that this result is independent of the choice of the Laplace distribution
as a signal. It is the data that has this tent shape, it is not the sampling distribution that “forms”
this shape, since the error part of the mixture takes out none of the data around the mode. This
shape confirms the results of the previous literature and our prior conjecture of Laplace distributed
data was well-chosen.
13
Figure 4: The uncensored year by year box plots of the filtered profit rates.
2
r
1
0
−1
2010
2005
2000
1995
1990
1985
1980
1975
1970
1965
1960
1955
1950
−2
in the distribution. This skew is likely due to entry and exit dynamics, where uncompetitive firms realize very low profit rates and either die or are absorbed by more
competitive firms, or new firms realize initially little income.
Figure 5: The uncensored histogram of the filtered profit rates. The plots look similar
for observations of single years.
10 4
p@rD
1000
100
10
1
-1
0
r
1
2
Thus, unlike previous authors, the latent variable Gibbs sampler procedure yields
14
a distribution without having to make additional assumptions about minimum size
or age. In particular we keep more than 96% of our data8 despite our restrictive
selection condition of more than 95% of z samples being one. In the next section,
we use Bayesian methods to explore removing the skew in the distribution in order
to locate statistical regularities as in Alfarano et al. (2012).
4
Change Point Analysis
4.1
Filtering for Larger Firms
Figure 5 shows that a tent shape is clearly perceptible in a semi-log plot of the signal
distribution; however, some asymmetry remains. Alfarano et al. (2012) conjecture
that this skew is due to entry and exit dynamics of young firms, and that the competitive process of long lived, more established firms is characterized by a symmetric
Laplace distribution. We extend this conjecture to firm size and continuing with our
Bayesian approach we investigate this hypothesis for the present, noiseless data. Using a simple Bayesian change point analysis on the first moment of the distribution,
conditional on capital stock instead of age, we determine which subset of our data
is distributed symmetrically about the mode. The change point in the mean thus
indicates the change from asymmetric to symmetric distribution.
The use of the first moment in our change point analysis is motivated by the idea
that smaller firms are subject to entry and exit dynamics resulting in their average
profit rates being significantly different from the general rate of profit, while larger
8
The exact percentage varies slightly depending on which pseudorandom number seed is set for
the simulation.
15
firms’ profit rates will tend to be clustered around it. We partition firms by their
size and use the conditional mean profit rate to determine from which size onwards
firm profit rates converge to the general rate of profit.
This is achieved by computing the posterior probability of a change point between
each two adjacent partition blocks. We partition the firm capital stock space into
vigintiles (twenty evenly sized quantiles) and after estimating the posterior probability of a change point in the mean between each quantile for each year separately,
we discard all firms in those vigintiles below high posterior probabilities of a change
point. As an example, Figure 6 illustrates that for 1995 firms below the eleventh
vigintile are discarded. By using vigintiles of the capital stock for each year instead of
a dollar value, we avoid the problem of adjusting for inflation. We set up the change
point analysis with MCMC methods as in Barry and Hartigan (1993). Using the
Erdman and Emerson (2007) package that implement Barry and Hartigan’s change
point algorithm in R we run a Gibbs sampler for 500 iterations excluding the first
50 burn-in samples.
This analysis provides a lower bound for firm size and selects a subset of large,
well-established firms. The distribution of this part of the competitive process is
strikingly symmetric and Laplace-shaped in every year, as Figure 7 shows. Moreover,
the symmetry is present in every industry, and all distributions are peaked at almost
the same value, as was already shown in Figure 1 in the introduction.
This further process of filtration is not very restrictive, as on average, the change
point analysis only discards 45% of the initial, raw dataset. That is, the subset of
the competitive process characterized by the Laplace distribution, which filters out
the entry and exit dynamics of small firms, still comprises 55% of the original, entire
16
Figure 6: Example of the output of the change point analysis for the mean profit rate
conditional on capital stock for 1995. The upper panel shows how the conditional
mean profit rate converges to a steady value and the lower panel shows significant
posterior probabilities of a change point until the 10th vigintile.
Mean Change Point p[r|K]
0.15
Posterior Means
0.1
0.05
0
−0.05
−0.1
Posterior Probability
−0.15
−0.2
1
0.8
0.6
0.4
0.2
0
0
5
10
15
20
Location
dataset. Only 38% of these remaining firms are alive 27 years or longer.9
4.2
Statistical Equilibrium of Large Firms
While the change point analysis is only a rough one and could be refined to tease
out more precisely which firms belong to the symmetric distribution, it highlights
the striking symmetry and stationary of this subset’s distribution. Comparing its
regular features to the strongly varying statistics of the entire competitive process in
Figure 8, emphasizes this important regularity.
9
By way of comparison, where we loose only 45% of the data, Alfarano et al. (2012) discard
79% of their observations with their method of selecting firms that are alive the entire span of their
dataset, 27 years, a number which Mishael Milaković kindly shared with us. Had we instead filtered
out firms that are alive less than 27 years, we would have kept only 35.5% of the original sample.
Moreover, had we focused on firms surviving the entire period, 63 years, we would have kept only
2.2% of the dataset.
17
Figure 7: Histogram of profit rates of large firms remaining after the change point
analysis for a representative subset of years afterr 1960.
p@rÈyear=1969D
2.0
1.0
0.5
2.0
1.0
0.5
0.2
0.1
-1.0 -0.5
0.0
0.5
5.0
-1.0
p@rÈyear=1977D
0.2
1.0
-1.0
0.0
0.5
1.0
-1.0 -0.5
0.0
0.5
1.0
-1.0
p@rÈyear=1993D
p@rD
1.00
0.50
p@rD
5.00
1.0
-1.0
5.00
p@rD
1.00
0.50
0.10
0.05
-1.0 -0.5
0.0
0.5
-0.5
0.0
0.5
1.0
-1.0
p@rÈyear=2005D
1.0
0.5
1.0
-0.5
0.0
0.5
1.0
p@rÈyear=2009D
5.0
2.0
1.0
0.5
0.2
0.1
5.0
p@rD
0.5
0.0
0.10
0.05
0.10
0.05
0.0
-0.5
p@rÈyear=1997D
1.00
0.50
p@rÈyear=2001D
1.0
0.2
0.1
5.00
-0.5
0.5
2.0
1.0
0.5
2.0
1.0
0.5
-1.0
0.0
5.0
5.0
0.2
0.1
-0.5
p@rÈyear=1985D
p@rD
p@rD
p@rD
0.5
2.0
1.0
0.5
p@rÈyear=1989D
p@rD
0.0
0.2
0.1
0.2
0.1
p@rD
-0.5
5.0
2.0
1.0
0.5
-0.5
1.0
p@rÈyear=1981D
5.0
-1.0
2.0
0.5
0.2
0.1
1.0
p@rÈyear=1973D
10.0
p@rD
5.0
p@rD
p@rD
p@rÈyear=1965D
5.0
-1.0 -0.5
0.0
0.5
1.0
2.0
1.0
0.5
0.2
0.1
-1.0
-0.5
0.0
0.5
1.0
The stationarity of the distribution can be interpreted as a statistical equilibrium, and its non-Gaussian shape entails that there are other constraints than the
mean and variance acting on the shape of the distribution. Other authors have explained such phenomena in economic data based on maximum entropy reasoning,
e.g. Farjoun and Machover (1983); Stutzer (1983); Foley (1994); Dragulescu and
Yakovenko (2000); Milaković (2003); Alfarano and Milaković (2008); Alfarano et al.
(2012); Chakrabarti et al. (2013). It is attractive to explain the empirical result
by analogy to statistical physics: The distribution of firm profit rates achieves its
maximum entropy distribution, that is we see the distribution of profit rates in or
18
Figure 8: The year by year box plots of the profit rates of large firms remaining after
the change point analysis (left panel) and for comparison the box plots of the entire
competitive process, which is reproduced from Figure 4 (right panel). The regularity
of the distribution of large firms emerges after 1960.
1
1
2010
2005
2000
1995
1990
1985
1980
1975
1970
1965
1960
1950
2010
2005
2000
1995
1990
1985
1980
1975
−2
1970
−2
1965
−1
1960
−1
1955
0
1950
0
1955
r
2
r
2
very near to the macro state (that determines the shape of the distribution) that
can be achieved by the largest number of combinations of micro states subject to
constraints.10 The shape of the macro state is determined by constraints acting on
the ensemble of firms. The Laplace distribution is the maximum entropy distribution
given a constraint of a constant mean of the absolute values of profit rates.11
10
An alternative, information-theoretic foundation for this reasoning on which some of the cited
literature in fact draws is in Jaynes (2003).
11
Mathematically the Laplace distribution is the density that maximizes the entropy H of the
probability distribution p(y) subject to the normalizing and first moment of absolute values constraints, that is
max H[p(y)]
p
Z
∞
s.t.
p(y) dy = 1
−∞
Z ∞
|y|p(y) dy = c
−∞
where the Lagrange multiplier of the moment of absolute values is equal to the entropy maximizing
Laplace scale parameter.
19
A stationary Laplace distribution over more than five decades invites an entropic
viewpoint and conveys important information about the strong competitive forces
active in the economy on both an intra- and inter-industry level. The challenge is
to motivate the constraint more thoroughly with economic theory. Numerically, the
Laplace constraint implies that for any profit rate that moves away from the mean,
another has to move towards it. Classical competition focusing on how opposing
forces of competition and individual firms’ innovative efforts result in a distribution,
appears a good starting point for more theoretical work.
5
Conclusions
This paper has contributed striking evidence for Laplace distributed profit rates. We
have shown that by using a Bayesian mixture model with latent variables and MCMC
methods it is possible to filter out confounding noise and extract a signal distribution
of firm profit rates. This endogenous selection process results in an effective filter
with little information lost, allowing us to investigate the distributional form of profit
rates in the U.S. economy that captures the competitive process. We find that the
Laplace distribution with a negative skew characterizes the general economy-wide
rate of profit. Using a Bayesian change point analysis we find that large firms are
characterized by a stationary symmetric Laplace distribution. We also find this
to be the case at the sectoral level, suggesting similar inter- and intra- industry
dynamics. Our analysis both confirms Alfarano et al. (2012), who similarly find that
for a sample of long-lived firms the Laplace distribution is consistent with their data,
and extends the finding of sectoral Laplace distributed growth rates in Bottazzi and
20
Secchi (2003a) to profit rates.
The Bayesian mixture and change point models have the advantage of being very
versatile and are by no means restricted to firm profit rate analysis. In particular,
mixtures can be applied to any distribution of objects to be partitioned, where it is
necessary to know for each observation into which subset of the partition it belongs,
while Bayesian change point analysis can be applied to any regime change problem where the threshold value is not immediately obvious. Further methodological
research can be directed towards a mixture distribution with more than two components, where a third one might capture the small firms affected by entry and exit
dynamics.
On the theoretical level, interpreting the large firm distribution as a statistical
equilibrium in terms of maximum entropy appears attractive, but needs further theoretical investigation. In addition, the empirical transition dynamics that give rise
to the distribution need consideration, which would also offer insight into the interindustry dynamics, and whether the profit rate gravitation around a general rate of
profit is observed.
References
Alfarano, S., Milaković, M., 2008. Does classical competition explain the statistical
features of firm growth? Economics Letters 101, 272–274.
Alfarano, S., Milaković, M., Irle, A., Kauschke, J., 2012. A statistical equilibrium
model of competitive firms. Journal of Economic Dynamics and Control 36, 136–
149.
21
Amaral, L.A.N., Buldyrev, S.V., Havlin, S., Salinger, M.A., Stanley, H.E., Stanley,
R., 1997. Scaling behavior in economics: The problem of quantifying company
growth. Physica A , 124.
Barry, D., Hartigan, J., 1993. A bayesian analysis for change point problems. JASA
88, 309–19.
Bottazzi, G., Dosi, G., Lippi, M., Pammolli, F., Riccaboni, M., 2001. Innovation
and corporate growth in the evolution of the drug industry. International Journal
of Industrial Organization 19, 1161–1187.
Bottazzi, G., Secchi, A., 2003a. Common properties and sectoral specificities in the
dynamics of u.s. manufacturing companies. Review of Industrial Organization 23,
217–232.
Bottazzi, G., Secchi, A., 2003b. Why are distributions of firm growth rates tentshaped? Economics Letters 80, 415–420.
Buldyrev, S.V., Growiec, J., Pammolli, F., Riccaboni, M., Stanley, H.E., 2007. The
growth of business firms: Facts and theory. Journal of the European Economic
Association 5, 574–584.
Casella, G., George, E.I., 1992. Explaining the gibbs sampler. The American Statistician 46, 167–174.
Chakrabarti, B., Chakraborti, A., Chakravarty, S., Chatterjee, A., 2013. Econophysics of Income and Wealth Distributions. Econophysics of Income and Wealth
Distributions, Cambridge University Press.
22
Dragulescu, A., Yakovenko, V., 2000. Statistical mechanics of money. The European
Physical Journal B - Condensed Matter and Complex Systems 17, 723–729.
Erdman, C., Emerson, J., 2007. bcp: An r package for performing a bayesian analysis
of change point problems. Review of Radical Political Economics 3, 1–13.
Farjoun, F., Machover, M., 1983. Laws of Chaos: A Probabilistic Approach to
Political Economy. Verso.
Farmer, J.D., Lux, T., 2008. Introduction to special issue on applications of statistical
physics in economics and finance. Journal of Economic Dynamics and Control 32,
1 – 6. doi:http://dx.doi.org/10.1016/j.jedc.2007.01.019.
Foley, D.K., 1994. A statistical equilibrium theory of markets. Journal of Economic
Theory 62, 321–345.
Gelfand, A.E., Smith, A.F.M., 1990. Sampling-based approaches to calculating
marginal densities. Journal of the American Statistical Association 85, 398–409.
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2003. Bayesian Data Analysis.
Chapman Hall/CRC.
Geman,
butions,
actions
S.,
Geman,
and
on
the
Pattern
D.,
1984.
bayesian
Stochastic
restoration
Analysis
and
of
Machine
relaxation,
images.
Intelligence
gibbs
distri-
IEEE
Trans-
6,
721–741.
doi:http://doi.ieeecomputersociety.org/10.1109/TPAMI.1984.4767596.
Geweke, J., 1989. Bayesian inference in econometric models using monte carlo integration. Econometrica 57, 1317–39.
23
Hastings, W.K., 1970. Monte carlo sampling methods using markov chains and their
applications. Biometrika 57, 97–109.
Hymer, S., Pashigian, P., 1962. Firm size and rate of growth. Journal of Political
Economy 70, 556–569.
Jaynes, E.T., 2003. Probability Theory: The Logic of Science. Cambridge University
Press.
Kloek, T., van Dijk, H.K., 1978. Bayesian estimates of equation system parameters:
an application of integration by monte carlo. Econometrica 46, 1317–39.
Koop, G., 1992. Aggregate shocks and macroeconomic fluctuations: a bayesian
approach. Journal of Applied Econometrics 57, 395–411.
Kotz, S., Kozubowski, T., Podgórski, K., 2001. The Laplace Distribution and Generalizations: A Revisit with New Applications to Communications, Economics,
Engineering, and Finance. Birkhauser Verlag GmbH.
Kozumi, H., Kobayashi, G., 2011. Gibbs sampling methods for bayesian quantile
regression. Journal of Statistical Computation and Simulation 81, 1565–1578.
doi:10.1080/00949655.2010.496117.
Lux, T., 2009. Applications of statistical physics in finance and economics, in: Jr.,
J.B.R. (Ed.), Handbook of Research on Complexity. Edward Elgar Publishing,
Inc., pp. 213–258.
Malevergne, Y., Saichev, A., Sornette, D., 2013. Zipf’s law and maximum sustainable
growth. Journal of Economic Dynamics and Control 37, 1195–1212.
24
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953.
Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21, 1087–1092.
Milaković, M., 2003. Maximum Entropy Power Laws: An Application to the Tail of
Wealth Distributions. LEM Papers Series 2003/01. Laboratory of Economics and
Management (LEM), Sant’Anna School of Advanced Studies, Pisa, Italy.
R Core Team, 2013.
puting.
R: A Language and Environment for Statistical Com-
R Foundation for Statistical Computing. Vienna, Austria.
URL:
http://www.R-project.org/.
Simon, H.A., Bonini, C.P., 1958. The size distribution of business firms. American
Economic Review 48, 607–617.
Standard, Poor’s, 2013. COMPUSTAT.
Stanley, M., Amaral, L., Buldyrev, S., Havlin, S., Leschhorn, H., Maass, P., Salinger,
M., Stanley, H., 1996. Scaling behavior in the growth of companies. Nature 379,
804–806.
Stutzer, M.J., 1983. Toward a statistical macrodynamics : an alternative means of
incorporating micro foundations. Technical Report 242. Federal Reserve Bank of
Minneapolis. Research Dept.
Waggoner, D.F., Zha, T., 2003. A gibbs sampler for structural vector autoregressions.
Journal of Economic Dynamics and Control 28, 349–366.
Zellner, A., 1971. An introduction to Bayesian inference in econometrics. J. Wiley.
25
6
Appendix A: Data Sources
Data is gathered from the COMPUSTAT Annual Northern American Fundamentals
database. We extract yearly observations of the variables AT = Total Assets, REVT
= total revenue, XOPR = operating cost, SIC = Standard Industry Code, FYR =
year, CONM = company name, from 1950 through 2012. At the time of download,
data for 2012 was incomplete. Subtracting XOPR from REVT and dividing by AT
gives a rate of income per capital stock, and thus a profit rate.
OIBDP
REVT − XOPR
=
=r
AT
AT
Our raw data set consists of 430,209 observations. Subtracting completely missing
values, government and finance, insurance and real estate - government because it
is not engaged in a search for profit maximization but pursues other objectives and
finance et al. because the accounting methods are different for them and part of
income is not recorded in REVT, leading to profit rates almost zero - we impute at
random the remaining missing values. Our completed case dataset contains 290,931
observations or roughly 4,600 observations per year. The actual yearly number is
much smaller in earlier years and above 10,000 in the 1990s.
7
Appendix B: Mixture Gibbs Sampler Derivation
This appendix derives the Gibbs sampler for a latent variable augmented LaplaceGaussian distribution. Consider
26
f (y|q, µ1 , σ1 , µ2 , σ2 ) = q ∗ L(y|µ1 , σ1 ) + (1 − q) ∗ N (y|µ2 , σ2 )
(B.1)
L denotes a Laplace distribution and N is a Normal or Gaussian distribution
and assume that the parameters are all independent of each other a priori, and we
also assume that all parameters have uniform prior distributions. Introducing latent
variables, for every observation i of y, and i = 1, ..., n, this can be written as
f (yi |z, µ1 , σ1 , µ2 , σ2 ) = [L(y|µ1 , σ1 )]zi [N (y|µ2 , σ2 )](1−zi )
(B.2)
where zi ∼ Bern(q).
We derive full conditional posteriors for each variable.
7.1
Full conditional posterior densities
We will derive the conditional posterior densities using a few rules of conditional and
joint probabilities. They are
p(A|B)p(B) = p(A, B) ⇔ p(A|B) =
Z
p(A, B)dA = p(B) ⇒ p(A|B) = R
p(A, B)
p(B)
(B.3)
p(A, B)
p(A, B)dA
(B.4)
27
We want to find the density of a parameter conditional on priors and likelihood.
Hence, our generic p(A, B) is the joint hyperprior-prior-likelihood density. The integral integrates out the parameter whose conditional density we are interested in.
Everywhere except for the Laplace, nice properties of the distributions will enable
us to avoid the brute force integral. However, the Laplace distribution is not of the
exponential family (Kotz et al., 2001) and will require use of explicit integration.
7.2
The joint density, Laplace/Normal
For uniform priors for Laplace distribution and the mixture parameter q and Jeffrey’s
prior for the Gaussian distribution, our joint density can be written as follows. Let
Z = {z1 , ...zn } and Y = {y1 , ..., yn }.
n
n
Y
Y
p(zi |q)
p(yi |zi )
p(q, µ1 , σ1 , µ2 , σ2 , Z, Y ) = p(q)p(µ1 )p(σ1 )p(µ2 ) p(σ2 )
{z
}
|
i=1
i=1
∝
1
σ
n
Y
i=1
uniform priors
n
Y
1−zi
[L(y|µ1 , σ1 )]zi [N (y|µ2 , σ2 )](1−zi ) )
q zi (1 − q)
i=1
In what follows we will omit priors, if they are uniform.
28
(B.5)
(B.6)
7.3
7.3.1
The full conditional posterior densities, Laplace/Normal
z posterior
The posterior for each zi is be Bernoulli because it can only take on two values. Let
Z−i = {z1 , ..., zi−1 , zi+1 , ...zn }.
p(zi = 1| q, µ1 , σ1 , µ2 , σ2 , Z−i , Y )
|
{z
}
(B.7)
abbreviate as Ω
= p(zi = 1|Ω, yi )
(B.8)
p(zi = 1|Ω, yi )
p(zi = 1|Ω, yi ) + p(zi = 0|Ω, yi )
[qL(yi |µ1 , σ1 )]zi
=
[qL(yi |µ1 , σ1 )]zi + [(1 − q)N (yi |µ2 , σ2 )]1−zi
=
Thus zi ∼ Bern
7.3.2
qL(yi |µ1 ,σ1 )
qL(yi |µ1 ,σ1 )+(1−q)N (yi |µ2 ,σ2 )
(B.9)
(B.10)
.
q posterior
p(q|µ1 , σ1 , µ2 , σ2 , Z, Y ) ∝
n
Y
q zi (1 − q)1−zi
i=1
This is the kernel of a beta distribution, hence
P
P
q ∼ Beta( i zi + 1, n − i zi + 1).
29
(B.11)
7.3.3
µ1 posterior
Since the Laplace is a non-exponential family distribution, one cannot directly derive
a closed-form conditional posterior distribution.12
The conditional posterior density for µ1 is
p(µ1 |q, σ1 , µ2 , σ2 , Z, Y )
(B.12)
= p(µ1 |σ1 , Z, Y )
Qn
L(y|µ1 , σ1 )zi
= R ∞ Qi=1
n
zi
i=1 L(y|µ1 , σ1 ) dµ1
−∞
hP i
−|yi −µ1 |
exp
i zi
σ1
"
=Z
#
∞
X
−|yi − µ1 |
exp
dµ1
zi
σ1
−∞
i
|
{z
}
=c
"
#
X −|yi − µ1 | = c exp
zi
σ1
i
(B.13)
(B.14)
(B.15)
(B.16)
Since |yi − µ1 | = |µ1 − yi |, we have a product of Laplace distributions and the
posterior density is:
p(µ1 |σ1 , Z, Y ) ∼ c
n
Y
L(µ1 |yi , σ1 )zi
(B.17)
i=1
12
Kozumi and Kobayashi (2011) use a mixture representation of the Laplace distribution from
Kotz et al. (2001) to find a closed form solution for the posterior L(0, σ). However, our location
parameter is unequal zero and therefore we have to resort to numerical integration at every iteration
of the Gibbs sampler.
30
7.3.4
σ1 posterior
p(σ1 |µ1 , Z, Y ) ∝
n
Y
i=1
n
Y
L(y|µ1 , σ1 )zi
(B.18)
!#
√
− 2|yi − µ1 |
∝
σ
exp zi
σ1
i=1
"
!#
√
X
P
− 2|yi − µ1 |
−( i zi )
=σ
exp
zi
σ1
i
"
−1zi
(B.19)
(B.20)
This is an inverse gamma kernel. Therefore
p(σ1 |µ1 , Z, Y ) ∼ IG(1 +
X
zi ,
X
i
7.3.5
zi |yi − µ1 |)
(B.21)
i
µ2 posterior
The derivation of Jefrey’s prior is given implicitly in many textbooks e.g. (Gelman
et al., 2003, p. 46), when the Gaussian conjugate prior is derived. The result is a
normal distribution for µ2 and an inverse gamma distribution for σ2 .
p(µ2 |q, µ1 , σ1 , σ2 , Z, Y ) = [N (yi |µ2 , σ2 )](1−zi )




P
z )x
1
1
i (1 −

P i i − µ2 )2 
∝ exp − 
2
σ
2
n − i zi
P2
2
(n−
(B.22)
(B.23)
i zi )
Hence p(µ2 |q, µ1 , σ1 , σ2 , Z, Y ) ∼ N
P
)x
i (1−z
P i i
n−
i zi
31
,
σ
2
P
n− i zi
7.3.6
σ2 posterior
n
1 Y
p(σ2 |q, µ1 , σ1 , µ2 , Z, Y ) = 2
[N (yi |µ2 , σ2 )](1−zi )
σ2 i=1
"
#
P
n− i zi
X
1
−2(
+1)
2
2
∝ σ2
exp
(1 − zi )(xi − µ2 )
2 i
Hence p(σ2 |q, µ1 , σ1 , µ2 , Z, Y ) ∼ IG
n−
P
i zi
2
,
1
2
(B.24)
(B.25)
2
.
(1
−
z
)(x
−
µ
)
i
i
2
i
P
This completes the derivation of the Gibbs sampler. It generates a value for each
zi from its full conditional posterior. After a Monte Carlo number of runs, the ratio
of zi can be taken as the probability that the underlying firm observation belongs to
the Laplace (the signal) distribution. The pseudocode is presented below.13
7.4
The Gibbs sampler pseudocode, Laplace/Normal
Select initial values for qj , µ1,j , σ1,j , µ2,j , σ2,j , Zj , j = 0
for (j in 0 through J) :
13
This can be easily transformed into e.g. a Laplace/Laplace Gibbs sampler, by repeating steps
3 and 4 for the noise distribution instead of carrying out 5 and 6, and replacing N in step 2 by L.
32
1. qj+1
X
X
∼ Beta(
zi,j + 1, n −
zi,j + 1)
i
2. z1,j+1
∼ Bern
3. µ1,j+1 ∼ c1
n
Y
i
qL(yi |µ1 , σ1 )
qL(yi |µ1 , σ1 ) + (1 − q)N (yi |µ2 , σ2 )
L(µ1,j+1 |yi , σ1,j )zi,j+1
i=1
4. σ1,j+1 ∼ IG(1 +
X
i
zi,j+1 ,
X
zi,j+1 |yi − µ1,j+1 |)
i
P
5. µ2,j+1 ∼ N
6. σ2,j+1 ∼ IG
8
− zi,j+1 )xi
σ2,j
P
P
,
n − i zi,j+1
n − i zi,j+1
!
P
n − i zi,j+1 1 X
,
(1 − zi,j+1 )(xi − µ2,j+1 )2 ;
2
2 i
i (1
Appendix C: Convergence of Gibbs Sampler
The convergence of the marginal distributions for each parameter in the signal distribution from the mixture model is shown in the trace plot in Figure C.1 for selected
years. Trace plots show sampled values against the simulation index and give a
good visual diagnostic for the convergence of the Markov chain to its stationary
distribution. It is clear that after 10 iterations the chain converges to a stationary distribution, fluctuating narrowly around the mode of the posterior distribution.
While the mode for each parameter is non-stationary over time, the distribution
clearly is once the initial burn-in of 50 iterations is discarded.
33
Figure C.1: Trace plots for the Gibbs sampled q, µ1 , σ1 for years 1950, 1960,... 2010
1.0
0.20
0.20
1960
1960
1970
1970
1980
0.6
year
1950
µ1
q
0.8
year
1950
1980
0.15
1950
1960
1970
σ1
year
0.15
1980
1990
1990
1990
2000
2000
2000
2010
2010
2010
0.10
0.10
0
100
200
300
Iteration
400
500
0
100
200
300
Iteration
34
400
500
0
100
200
300
Iteration
400
500