A Model of Firm Experimentation Under Demand
Uncertainty with an Application to Multi-Destination
Exporters
Zhanar Akhmetova, Cristina Mitaritonna
June 2012
Abstract
We propose a model of experimentation by firms in the presence of demand uncertainty in a new market. The firm can postpone paying the sunk cost of full-scale entry
into the market, and learn more about the demand for its product by accessing a few
consumers at some cost. This is the first formal model of test marketing where the
firm chooses optimally the experimentation intensity (number of consumers accessed),
as well as exit/entry policy. When applied to the trade literature, the model can help
explain the observed patterns of the new exporter dynamics: increasing average export
volumes over time, and falling exit rates. We structurally estimate the model using
French firm-level export data, by extending it to a setting where a given geographical
region is viewed as a target market for the firm, and individual destinations within the
region as consumers. Based on the estimation results, we carry out simulation exercises, where we study the e↵ect of lower import tari↵s and higher information precision
in a target market on the behavior of French exporters.
1
Introduction
Recently available firm-level export data has provided crucial insights into the dynamic
behaviour of individual exporters. In particular, we observe that new exporters in a given
market exhibit patterns di↵erent from those of old exporters, which cannot be explained by
standard models. Instead, dynamic models that are tailored to the problems that exporters
face in new markets are required. The main feature of these models is the presence of
demand uncertainty, a significant sunk cost of entry, and the possibility of learning from
exporting activity, so that new exporters start out with little information and acquire it
as they export. In this section, we showcase the empirical patterns that have attracted
attention in recent papers, using French firm-level data over 1995-2005. We briefly describe
the model that we propose to explain these findings. This model o↵ers unique features, such
as a testing technology that allows firms to learn more about demand before investing in the
sunk cost of entry, and active learning by firms that recognize the information value of their
own exports, and optimally choose the duration and intensity of experimentation. We also
present anecdotal evidence of such test marketing in domestic and foreign markets.
The first pattern found in the literature is the small initial export volumes (in quantities)
of new exporters, and their later expansion over time. This has been documented by Eaton
et al. (2007) for Colombia, and Eaton et al. (2008) for France, who find that many new
exporters export very small quantities and are likely to drop out of the foreign market in
the following few years. This is demonstrated in Figure 1 for French exporters, where on the
horizontal axis we measure the date since first exports of an individual 8-digit-code product
to an individual destination (where we treat as new exporters all firms who did not export
in 1995, the first year in our dataset, but did so later), and on the vertical axis we measure
the (averaged over all firms exporting at that date) quantity of the 8-digit-code product,
normalized by the average of the firm’s quantity exported over the period. The general
upward trend in these normalized quantities reveals an expansion over time in firm exports,
as the firm exporting age increases. We focus on the ratio of quantity exported to the average
over all years of the quantity exported by the firm of the given 8-digit product in the given
country, to make sure that this expansion is not a result of the survival of larger exporters
over time. Instead, on average each firm expands relative to its own initial size. This can
be explained by the uncertainty that these firms face - they are not sure what the demand
for their product is in the foreign market, and start out small. As they learn more about
their demand, they either expand (if they find that their demand is higher than expected),
or exit the market (if they find that their demand is too low to justify the exporting cost).
This also implies that the exit rates will tend to fall over time, as the worst exporters are
weeded out early on, and only the best ones, those that are less likely to exit, remain. We
document this pattern in Figure 2.
Another dimension along which we can examine these dynamics, is the number of destinations that firms export to. Exporting to one destination can provide information about
demand in other, similar, destinations. This interaction between exporting to di↵erent destinations has been studied in several papers (Albornoz et al. (2009), Nguyen (2008)). We can
divide the world into several groups of countries which are ‘close’ - geographically, culturally,
economically - and see if we can detect patterns of expansion of exports within regions. For
example, consider Figure 3, where we follow the average number of destinations belonging
to the same group (the first fifteen members of the European Union, excluding France, that
is, Austria, Belgium, Germany, Denmark, Spain, Finland, the UK, Greece, Ireland, Italy,
Luxembourg, Netherlands, Portugal, and Sweden) where French firms exported beauty or
make-up preparations (4-digit code 3304) between 1995 and 2005. As in previous graphs,
on the horizontal axis we plot the date since first exports to this region (where we now
treat as new exporters firms that did not export the particular 4-digit-code product to any
destination within the region in 1995, but did so later), and on the vertical axis we plot the
number of export destinations within this region, averaged over all firms exporting at that
date. The dots show the average over all firms exporting at that date, while the solid line
tracks the average over all exporters that survived for at least 7 years in the market. This
2
2
Average normalized export quantity
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1
2
3
4
5
6
7
8
9
10
Date since first exports
Figure 1: Ratio of quantity (8-digit product) to average quantity of the firm over the period,
averaged over all new exporters, consumer non-durable goods
0.5
Exit rate
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
8
9
10
Date since first exports
Figure 2: Exit Rate (8-digit product), of all new exporters, consumer non-durable goods
3
way, we make sure that the expansion in the average is not purely due to the selection e↵ect
- less productive and smaller firms exiting in the first few years.
4.5
Average number of destinations
4
3.5
3
2.5
2
1.5
Conditional
1
Unconditional
0.5
0
1
2
3
4
5
6
7
8
9
10
Date since first exports
Figure 3: Example of average size evolution, unconditional and conditional on survival for at
least 7 years. Exports of beauty or make-up preparations by French firms to the first fifteen
members of the EU, excluding France.
In Figure 4, we study the exports of footwear with outer soles of rubber, plastics, leather
or composition leather and uppers of leather (4-digit code 6403). This time, we consider the
average number of export destinations within the region of Eastern Europe (broadly defined,
and including countries such as Albania, Bosnia and Herzegovina, Bulgaria, Czech Republic,
Estonia, Croatia, Hungary, Lithuania, Latvia, Montenegro, Macedonia, Poland, Romania,
Serbia, Slovenia, and Slovakia). In both graphs, the average grows over time, and reaches a
stable level by date 8 or 9. This pattern can serve as evidence of learning by new exporters in
these regions, who on average experiment for 8-9 years, before expanding to a larger scale - 4
countries and 3 countries in the EU (first fifteen members) and Eastern Europe, respectively.
In Figure 5, we contrast the average exit rates of old and new exporters, calculated
separately for each 4-digit-code and region pair. An old exporter here is a firm that exported
the 4-digit-code product to at least one destination within the region in 1995. An exit is
a switch in the number of exporting destinations (of a given 4-digit-code product) from a
positive number to zero, followed by zeros until 2005. The scatter diagram shows that for
most of the 726 product and region pairs under consideration, the exit rate of old exporters
is lower than that of new exporters - the scatter points lie above the dotted 45-degree line.
Taking learning into account can reconcile the di↵erent exit rates, since new exporters exit
not only due to exogenous factors, as old exporters do, but also due to negative demand
signals, that tell them staying in the market is not worthwhile. In Figure 6, we compare the
4
Average number of destinations
3.5
Conditional
3
Unconditional
2.5
2
1.5
1
0.5
0
1
2
3
4
5
6
7
8
9
10
Date since first exports
Figure 4: Example of average size evolution, unconditional and conditional on survival for
at least 7 years. Exports of footwear by French firms to Eastern Europe.
1
0.9
0.8
New exporters
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Old exporters
Figure 5: Exit rates of new and old exporters over 726 pairs of regions and 4-digit-code
products
5
7
6
New exporters
5
4
3
2
1
0
0
1
2
3
4
5
6
7
Old exporters
Figure 6: Average export sizes (number of destinations) of new and old exporters over 726
pairs of regions and 4-digit-code products
average number of export destinations of new and old exporters within each product and
region pair, where we first calculate the average number of destinations over the period 19952005 for each firm, and then take the mean of these averages, separately, over old and new
exporters in each pair. We see that on average the old exporters export to more destinations
than new exporters in most product-region pairs, as the scatter points lie below the dotted
45-degree line. This suggests that new exporters start small, and expand to larger sizes as
they age, i.e. become old exporters.
The empirical evidence is consistent with the hypothesis that firms may start small in
a foreign market when they are unsure of their profitability there, in order to collect more
information about the market and to either expand or quit the market later, depending on
their observations. Moreover, Figures 3 and 4 suggest that there may be an optimal scale
that firms switch to, once they finish learning and decide to stay in the market. We propose
a theory of firm behavior that assumes demand uncertainty in a foreign market, where the
firm is uncertain about the mean of a demand shift parameter that a↵ects its profitability.
There is a large sunk entry cost one needs to incur to access the entire market. While a firm
may be unwilling to export to this market under such conditions, we introduce an additional
assumption. We allow the firm to postpone its full-scale entry, and instead to access a few
consumers in the foreign market at a certain variable cost, called testing cost. The sales to
the individual consumers serve as noisy signals about the mean demand parameter, and the
firm uses this information to update its beliefs about demand. Based on these beliefs, the
firm decides whether it should incur the large sunk entry cost to access the entire market
(full-scale entry, or simply entry in what follows), quit the market, or keep experimenting,
and if so, how many consumers to access. When deciding on the number of consumers, the
6
firm recognises the information value of an additional consumer, and weighs the marginal
benefit of the consumer (current profits plus information value) against the marginal testing
cost.The firm can keep experimenting as long and as intensively, as it finds optimal.
Essentially, one is assuming two-tiered costs of accessing the foreign market: the variable
testing costs to access a few random consumers, and the sunk costs of entry to access the
entire market. One can think of the sunk cost as the cost of establishing a distribution
and marketing network in the foreign market, and signing long term shipping contracts,
and of the testing costs as the costs of accessing a few consumers by temporarily hiring
marketing agencies in the foreign country and locating temporary shipping services. Clearly,
we need to assume that the discounted cost of accessing the entire market using the testing
technology forever is higher than the sunk cost of entry, so that the cheapest way of accessing
the entire market over a long horizon is investing in the sunk entry cost, rather than using
the testing technology indefinitely. This assumption is plausible, since temporary, shortterm arrangements with distributors, shipping and marketing agencies are likely to be more
expensive on average than long-term arrangements or in-house services dedicated to the
exporting activity. This is supported by Klompmaker et al. (1976), who mention various
direct and indirect costs of test marketing, and in particular, ‘higher media expenses because
of low volumes, ..., and higher trade allowances to obtain distribution’.
That this kind of experimentation takes place when a firm creates a new product and
aims to sell it in the domestic market, has been documented in the marketing literature,
as test marketing. As defined in the Handbook of Marketing Research (Ferber, 1974), ‘test
marketing is a controlled experiment, done in a limited but carefully selected part of the
marketplace, whose aim is to predict the sales or profit consequences, either in absolute
or in relative terms, of one or more proposed marketing actions. It is essentially the use
of the marketplace as a laboratory and of a direct sales measurement which di↵erentiates
this test from other types of market research’. While many papers in the marketing field
point out the importance of test marketing, there are few papers that model this activity.
Hitsch (2005) builds a model of test marketing, where the firm launching a new product
and uncertain about its demand, has to choose an optimal exit and advertising policy. In
that model, advertising is positively related to expected demand (through expected higher
profits), but the level of advertising does not a↵ect the precision or speed of learning. So
the firm does not have to choose an optimal experimentation intensity, while in our model
the number of consumers accessed in the experimentation phase is a decision variable that
a↵ects the precision of obtained information.
Numerous cases of test marketing have been documented. For example, Heineken test
marketed Heineken Premium Light in Phoenix, Dallas, Providence, and Tampa in 2005,
before launching it nationally in 2006. Procter and Gamble test marketed Align, a probiotic
dietary supplement, in Cincinnati, Dallas and St.Louis for two years (2007-2009) before
rolling out the product. In 2009, Coca Cola was reported to test market a sweetened fizzy
milk beverage called Vio. About 200 retailers in New York city were reported to be carrying
the drink. Columbus, Ohio, is called the test market capital of the US, as its inhabitants
are believed to form a representative cross-section of the US population. There is no doubt
7
that the same kind of demand uncertainty exists when a firm starts exporting to a new
foreign market. Even if a product has been produced and sold in the domestic and other
foreign markets previously, the firm may be uncertain about the demand for this product
in a new destination. The difficulty of expanding into new export markets is acknowledged
by government agencies, such as the US Department of Commerce, who o↵er extensive
assistance to businesses wishing to export. According to a source in the US Department of
Commerce, US companies often start exporting in the maquiladora area in Mexico, and later
expand to the rest of Mexico, if successful. As another example, Austria apparently serves
as a test market for cell phone companies looking to export to German-speaking countries.
In 2008, Apple introduced iPhone 3G in Austria as one of the few test markets in the world.
India is another country where phone companies test new phones first before exporting to
other destinations. These stories tell us that the notion of a representative consumer can be
interpreted in many di↵erent ways. A consumer can be viewed as an individual, a household,
a retail store, a city, a state/province, or a country. We will use this observation to apply
our model to a setting where the firm views a geographical region as a target market, and
countries within the region as individual consumers.
The model provides a new way of looking at the dynamics of new exporters. It points
out that there may be a state of the firm in-between full-market access and non-exporting,
where the firm can learn about demand before making a final decision. The duration and
intensity of this learning stage is moreover determined endogenously - by the firm and market
characteristics, and is a random variable, a↵ected by the draws of demand signals that the
firm obtains. Similarly, the total entry cost - which here would be the sum of total testing
costs and the (one-time) sunk cost of entry, is endogenous and random.
There is a burgeoning literature on the issues of new exporters, market penetration
and learning under demand uncertainty. A model where the decision to export is not a
binary decision is Arkolakis (2006). The firm may access a few consumers rather than the
entire market, through the marketing technology, so that the fixed cost of entry is no longer
‘fixed’, but endogenously determined. However, learning and, in particular, dynamics in
sales produced through learning, are not the main focus of the paper.
Recently, several papers have focused on the role of uncertainty and learning in export
markets. In one of the earliest contributions, Rauch and Watson (2003) introduce a model
where a firm has to decide whether to contract a supplier in a new destination, under
uncertainty about the quality of its services. The firm places a small order initially and in
the next period either expands (signs a contract with the supplier) or quits the destination,
depending on the quality it observes. Though this is not a model of export behaviour per
se, the idea that a firm might want to start small in a new market is easily extendable to
the export setting. This model does not allow the firm to optimally choose the duration and
intensity of the testing phase. In Eaton et al. (2008), there is uncertainty about the foreign
demand, and firms search for buyers and collect information on their sales. If they collect
encouraging information, they expand (search for more buyers), or shrink, until finally they
exit. This is a model of passive learning, in the sense that the firm does not take into
account the e↵ect its sales have on its learning when making exporting decisions. Moreover,
8
the sunk cost of entry does not play an important part here. Another paper that deals with
the learning behavior of exporters is Albornoz et al. (2009). The firm can export to two
destinations, in each of which it faces a sunk cost of entry and uncertainty in demand. Once
the firm exports to a market, it learns its demand there immediately and with complete
precision. Moreover, due to correlation in demand across destinations, and di↵erences in
the sunk entry costs, optimal sequential entry strategy is generated. This paper provides
support to our model, and, in particular, to the empirical application of the model that we
present here. Nguyen (2008) also introduces demand uncertainty, both in the domestic and
foreign market, and correlation between the demand parameters in the two markets. The
firm can learn about demand in a destination with complete precision once it sells there
for at least one period. There are no sunk costs of entry into either market, but there are
per period fixed costs of selling, so the firm will exit a market, if the demand there is too
low to cover that cost. Exporting to any market provides information about demand in
both markets, and therefore also produces sequential export patterns. Finally, a paper that
provides interesting evidence of learning by exporters is Freund and Pierola (2008). In their
model, the firm is uncertain about the per period cost of exporting, which it can learn once it
exports. There is a fixed cost of entry that the firm has to pay once it decides to stay in the
market, but it can first export a small quantity (a fixed fraction of the potential total sales)
for a smaller fixed cost (cost of trial), to learn the per period cost and make the ultimate
decision to stay or not. This is a one-period learning model with a predetermined trial sales
size, which does not depend on the features of uncertainty, such as the distribution of the
uncertain parameter, or on the characteristics of the firm or the market.
To summarize, we introduce a few crucial features that jointly distinguish the model from
the ones mentioned above: a sunk cost of entry that can be postponed due to the availability
of a testing technology, imperfect and costly information obtained from export sales, active
learning by firms under demand uncertainty - the recognition of the information value of
exporting, and an endogenous choice by the firm of optimal duration and intensity of the
experimentation phase.
The rest of the paper proceeds as follows. We present the model and its solution in Section
2. In Section 3, we discuss the comparative statics predictions, and the export participation
condition. In Section 4, we introduce the empirical application of the model - we divide
the world into several regions, and assume that the firm learns about the demand for its
product in any given region by accessing individual destinations within the region. That is,
each destination is treated as a consumer, and the region as the target market. We carry
out a structural estimation of the model for several pairs of products and regions, as well as
various simulation exercises, such as the response of exporters to trade liberalization in the
entire region, or a single country within the region, and an improvement in the precision of
the demand shocks. Section 5 concludes.
9
2
The Model
There are two countries, one Home, another Foreign. There are M consumers in the Foreign
k
country, where M 2 Z + . For any foreign consumer k, utility from consuming quantities qjt
is given by
Utk
Z
1
k
= [ ((eµj ) ✏ (qjt
)
j
✏ 1
✏
✏
dj] ✏ 1 ,
where j denotes varieties, and variety j is produced by a firm j. µj = µ̄ with probability
p0 , and µ with probability 1 p0 . µj is the same across all consumers for given j.Once µj is
drawn for a given variety j, it is fixed over time. Hence,
hjt ✏
]
Pt
= eµj yt (hjt ) ✏ (Pt )✏ 1 ,
k
qjt
= Qkt eµj [
where hjt is the price of variety j, Pt is the ideal price index for the di↵erentiated good,
Z
1
Pt = [ eµj (hjt )1 ✏ dj] 1 ✏ ,
j
yt is the total income of consumer k, assumed equal across consumers, and Qkt is the total
consumption of the di↵erentiated good (so that Qkt Pt = yt , i.e. Qkt = Qt = Pytt , 8k).
Labor is the only factor of production, with the constant marginal cost of producing any
variety of the di↵erentiated good given by the ratio of wages, wt , and productivity, t . Each
firm produces one variety.
Hence, the price of any variety is given by
✏ wt
hjt =
,
✏ 1 jt
where jt is the productivity of firm j.
We study the steady-state partial equilibrium. Operational profits per period of the firm,
earned from sales to an individual consumer, can be calculated as
k
⇡jt
=
k
hjt qjt
= (
✏
✏
wt
k
qjt
jt
k
1)qjt
wt
1
jt
1 w t µj
=
e yt (hjt ) ✏ Pt✏ 1
✏ 1 jt
✏
1
jt
[
] ✏ [ ]✏ 1 Pt✏ 1
= e µj y t
✏ 1 ✏ 1
wt
✏
1
j
[
] ✏ [ ]✏ 1 P ✏ 1 ,
= e µj y
✏ 1 ✏ 1
w
10
where we substitute all variables with their steady state values.
The expected discounted lifetime profits from selling to consumer k, with respect to the
information at time t, are given by:
E[⇡jk |pt ]
= E[
Z
= AG
e
1
rt µj
✏ 1
j
e y
Z
Z
✏
[
✏
1 ✏
e
rt
µj
E[e |pt ]dt
e
rt
[pt eµ̄ + (1
= AG
✏ 1
j
= AG
✏ 11
[pt eµ̄
j
r
1
] ✏[
j ✏ 1
w
]
P ✏ 1 dt|pt ]
pt )eµ ]dt
pt )eµ ],
+ (1
where r is the discount rate, AG ⌘ y ✏ 1 1 [ ✏ ✏ 1 ] ✏ [ Pw ]✏ 1 , a composite of the aggregate
demand variables, and pt is the belief on the part of the firm that µj = µ̄, given all information
up to time t. In particular, at time 0,
✏ 11
[p0 eµ̄
j
E[⇡jk |p0 ] = AG
r
p0 )eµ ].
+ (1
The firm has to choose the optimal number of consumers to target subject to the following
cost structure: there are two distribution and marketing technologies available. The first
technology has zero or negligible sunk costs, but the cost of selling to n consumers is convex
in n: c(n) is continuous, strictly increasing and convex. The second technology has a linear
in n cost function, c̃(n) = f n, but to use this technology, the firm has to pay the fixed
cost F , which could reflect expenditures on building own distribution and retail centres, and
upfront costs of long-term contracts with shipping and marketing agencies. Thus, the firm
may find it optimal to eventually pay the sunk cost F and utilise the linear technology, if it
plans to stay in the market long-term. To derive the optimal behaviour of the firm, consider
first the learning of the firm.
The firm does not observe the precise values of quantities sold, i.e. it does not observe
the precise value of µ. Instead, it observes the following for an individual consumer indexed
by k:
k
Xjt
⌘
Z t
0
k
qjs
ln(
y(hj ) ✏ P ✏
1
)ds +
k
x Wjt
=
Z t
0
µj ds +
k
x Wjt ,
so that
k
dXjt
= µj dt +
k
x dWjt ,
where Wjtk is a Wiener process, and the firm can update its beliefs about µj from these
observations. More precisely, denote by pjt the (subjective) probability at time t that firm
j’s variety has high demand, µj = µ̄. As is shown in the Appendix, upon sampling njt
11
k
consumers and observing njt values of dXjt
, and the sample average dXjt ⌘
firm updates its beliefs according to:
dpjt = pjt (1
pjt )
µ̄
µp
Pnjt
k
dXjt
,
njt
k=1
the
njt dW̃jt ,
x
where
µ̄ µ
x
⌘ ⇣ is the signal-to-noise ratio, and
dW̃jt ⌘
⌘
p
p
njt
x
njt
[
Pnjt
k=1
k
dXjt
njt
[dXjt
(pjt µ̄ + (1
(pjt µ̄ + (1
pjt )µ)dt]
pjt )µ)dt],
x
which is an observation-adapted Wiener innovation process, that is, it follows a standard
Wiener process relative to the information at time t. How these assumptions can be intuitively translated into what kind of discrete-time data the firm (and the researcher) observe,
is explained in the Appendix.
As can be seen from the equation above, when the firm observes a sample average dXjt
higher than the expected value at time t, pjt µ̄ + (1 pjt )µ, it updates its beliefs upwards
(dpjt > 0), and downwards otherwise. The firm weighs this signal (the di↵erence between
observed average and expected average) by the signal-to-noise ratio, ⇣, and by the sample
size njt . The higher the signal-to-noise ratio, and the higher the sample size, the more weight
the firm puts on the new signal. So one can see that the firm would like to sample as many
observations as possible, so as to learn faster, subject to the cost constraints.
2.1
Solution of the problem of Stage 2
Consider first the optimal behaviour of the firm once it pays the sunk cost F and accesses
the linear technology with the cost function c̃(n) = f n. Denote this stage as stage 2, and the
stage before paying the sunk cost F as stage 1. In this stage, the firm may still be learning
about demand and updating its beliefs pjt . Hence, the value function of the firm j is given
by:
V (pjt ) = sup E[
<njs >
Z 1
0
( f njs + njs AG
✏ 1
µ̄
j [pjs e
+ (1
pjs )eµ ])e
rs
ds|pjt ],
µ̄ µ p
subject to dpjt = pjt (1 pjt ) x njt dW̃jt , where dW̃jt is as above. Note that we do
not take into account the possibility that the firm may choose to re-enter the market in the
future, since in the model the aggregate variables and firm productivity are constant over
time, and once the firm quits is does not get any new signals, and hence pj also does not
change. So once the firm quits, it stays out of the market.
The Hamilton-Jacobi-Bellman equation is as follows (we omit the subscripts below):
12
rv(p) = max0nM [n( f + AG
✏ 1
µ̄
j [pe
1
p)eµ ]) + n (p(1
2
+ (1
p)
µ̄
µ
)2 v 00 (p)].
x
It is clear that the stage-2-value function is linear in n, so that the optimal size in stage
2 is n⇤ = M , as long as the value function is positive. To find the value function v(p) for
this case, plug n = M into the HJB equation:
rv(p) = M ( f + AG
✏ 1
µ̄
j [pe
1
p)eµ ]) + M (p(1
2
+ (1
p)
µ̄
µ
)2 v 00 (p).
x
We need to solve this second-order non-linear ODE, subject to the value matching and
smooth pasting conditions:
v(p) = 0,
v 0 (p) = 0,
and derive the threshold value p, below which the firm will quit the market. For any
beliefs pjt above this value the firm will sell to the maximum number of consumers, M .
Notice that the value function with v 00 (p) = 0 satisfies the ODE above, which gives us:
v(p) =
M
( f + AG
r
✏ 1
µ̄
j [pe
+ (1
p)eµ ]).
Intuitively, the only value of learning to the firm comes from the e↵ect of pjt on the
decision to quit the market, where quitting yields a payo↵ of 0. If we introduced exogenous
shocks to the aggregate demand variables or productivity, the firm would possibly want to
re-enter the market after quitting, so there would be value to learning resulting from this
potential re-entry (and having to pay F again) in the future. However, for simplicity here
we do not allow exogenous shocks, and therefore re-entry. Hence, the value function in
the second stage is simply the sum of all discounted profits net of fixed costs of exporting.
Denote this value function by Ṽ (p). We will use this value function in the next section. The
threshold value of pjt , below which the firm would quit the market (sell to 0 consumers) is
obtained by setting the v(p) above to 0, and is given by:
f
pj = max{
2.2
AG
eµ
✏ 1
j
eµ̄
eµ
, 0}.
Solution of the problem of Stage 1
Now consider stage 1, when the firm employs the convex costs technology. The firm gains
profits from selling to n consumers, and has to pay the cost c(n), but also gains information
value from updating its beliefs and possibly investing in the linear technology in the future
as a result. The function c(n) is twice di↵erentiable on (0, M ), and increasing and strictly
13
convex on [0, M ). We will assume throughout that c(0) > 0, which is a necessary condition
for all the results of Moscarini and Smith (2001), which we apply extensively here. That is,
there is a fixed cost of maintaining the ability to experiment (think of this for example as
the fixed periodic cost of the contracts with marketing agencies and warehouses).
One way in which our model is di↵erent from that of Moscarini and Smith is that the
testing phase in our case actually represents productive activity by the firm, which produces
the product and ships it to the sample of consumers, even if not to the entire market. Hence,
the firm earns profits in the experimentation phase, and these a↵ect the optimal sample size
n.
The value function of firm j takes the form
V (pjt ) =
sup E[e
rT
T,<njs >
+
Z T
0
K(pT )
( c(njs ) + njs AG
✏ 1
µ̄
j [pjs e
pjs )eµ ])e
+ (1
rs
ds|pjt ],
where K(p) = max[Ṽ (p), 0] is the payo↵ to making the terminal decision, that is, deciding
between quitting and investing in the linear technology, Ṽ (p) is the value function derived
for stage 2 above, T is the stopping time, and pT is the value of the belief variable at time
T . Then the Hamilton-Jacobi-Bellman equation (HJB) becomes
rv(p) = max0nM [ c(n) + nAG ✏j 1 [peµ̄ + (1
µ̄ µ 2 00
1
+ (p(1 p)
) nv (p)].
2
x
p)eµ ]]
The FOC for the HJB equation give us n(p) = z(rv(p)), where z ⌘ g 1 , g(n) = nc0 (n)
c(n), and z is strictly increasing. The problem can be transformed into a two-point free
boundary value problem
00
v (p) =
c0 (z(rv(p)))
AG
1
(p(1
2
✏ 1
µ̄
j [pe +
µ̄ µ 2
p)
x
p)eµ ]
(1
)
plus the value matching conditions:
v(p̄) = Ṽ (p̄)
F =
M
( f + AG
r
✏ 1
µ̄
j [p̄e
+ (1
p̄)eµ ])
v(p) = 0,
and smooth pasting conditions:
v 0 (p̄) = Ṽ 0 (p̄) =
M
AG( j )✏ 1 (eµ̄
r
14
eµ ),
F,
v 0 (p) = 0.
The solution is graphically depicted in Figure 1.6.
Since the value of experimentation v(p) should be convex, we require
c0 (z(rv(p))) > AG
✏ 1
µ̄
j [pe
p)eµ ],
+ (1
for all p.
It is clear that if
c0 (z(0)) > AG
then, since v(p)
✏ 1 µ̄
j e
AG
✏ 1
µ̄
j [pe
+ (1
p)eµ ],
0 and c00 > 0,
c0 (z(rv(p))) > AG
✏ 1
µ̄
j [pe
+ (1
p)eµ ],
for all p. Moreover, this assumption (along with the assumption we stated earlier, c(0) >
0) is sufficient for us to be able to directly apply all the proofs of existence and uniqueness
of a solution in Moscarini and Smith (2001). We discuss this point in greater detail in the
Appendix.
Intuitively, marginal cost of experimentation is the marginal testing cost net of the
marginal profits earned from the sample sales:
M C = c0 (n)
AG
✏ 1
µ̄
j [pe
+ (1
p)eµ ]
and the marginal benefit of experimentation is the contribution of the sample observations
to the udpating of beliefs and to the value function as a result:
1
M B = (p(1
2
p)
µ̄
µ
)2 v 00 > 0,
x
so for an optimal n, where MC and MB are equated, MC should be positive. This is
already assumed.
3
3.1
Comparative Statics Predictions and Export Participation Condition
Comparative Statics Predictions
Here we only state the propositions and provide the proofs in the Appendix. These are
analogous to the comparative statics predictions of Moscarini and Smith (2001).
15
40
35
Value function
Terminal payoff
Value function
30
Upper Bar P
25
20
15
Lower Bar P
10
5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Beliefs P
Figure 7: Solution to the problem of Stage 1
The solid line shows the ultimate payo↵ function, ⇡ ⌘ max[0, Ṽ (p̄) F ], and the dashed
curve shows the value function. The value Lower Bar P⌘ p, where the value function is
tangent to the zero-line (payo↵ from quitting), determines the cuto↵ for quitting, and the
value Upper Bar P⌘ p̄, where the value function is tangent to the profit line after entry,
determines the cuto↵ for entering the market (paying the sunk cost F )
16
Proposition 1. As any of y,P , j increases or w decreases, the firm will experiment more
intensively, for any given value of beliefs, and the thresholds for quitting and for entering the
market at full scale decrease, if the optimal experimentation intensity is sufficiently bounded
away from the full scale M .
Proposition 2. As F rises, so that the final payo↵ to entry falls, the firm will experiment
less intensively, for any given value of beliefs, and the thresholds for quitting and for entering
the market at full scale will increase.
Proposition 3. As x falls, so that the signal-to-noise ratio increases, the firm will experiment more intensively, for any given value of beliefs, and the thresholds for quitting and for
entering the market at full scale shift out.
Proposition 4. As the testing cost function c(n) grows more convex and initially (for n close
to 0) weakly higher and steeper (i.e. c(n) is replaced by ĉ(n), where ĉ(0) = c(0), ĉ0 (0) c0 (0),
and ĉ00 (n) c00 (n) for all n), the value function v(p) rises for all p, so that optimal n rises
for all p, and the thresholds shift out.
3.2
Export participation condition
We consider only a partial equilibrium in this model, taking the domestic side of the economy, as well as the aggregate expenditures and the aggregate price index in the foreign
economy, as given. The firms that start exporting are assumed to have been producing
and selling domestically for some time, and therefore they know their productivity. What
matters when they make the decision to export, is the prevailing common belief about the
distribution of µ in the foreign market, which is assumed to be the objective probability
p0 = P rob(µ = µ̄). To find the cuto↵ for exporting, we need to know how productivity
a↵ects the exporting/experimentation behavior. As was stated in Proposition 1, as rises,
both the threshold for switching to stage 2, p̄, and the threshold for quitting the market,
p, fall. The firm will be willing to switch to the linear technology (that requires paying the
sunk cost F ) for lower values of p, and will be willing to keep experimenting for lower values
of p than before. The optimal experimentation intensity n(p) will shift up, so that the firm
will target more consumers for any given p. This is shown in Figure 8.
Thus, there is a monotonic ranking of cuto↵ thresholds p̄ and p over the productivity
range. Given the common belief p0 , the lowest productivity exporter will have p = p0 . Denote
this productivity level as . All firms with productivity below this cuto↵ will have p higher
than p0 , that is it will not be worthwhile for them to start exporting, even as experimenters.
Thus, we have
p( ) = p0 .
Notice that this export participation condition is very di↵erent from the one we usually
work with, where the expected lifetime discounted export profits of the lowest productivity
17
100
90
Low Phi
Low Phi
High Phi
High Phi
80
Value function
70
60
50
40
30
20
10
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Beliefs P
Figure 8: Comparing the solutions to the problem for two levels of productivity
The thin solid line shows the ultimate payo↵ function, ⇡ ⌘ max[0, Ṽ (p̄) F ], and the thin
dashed line shows the value function. The thick solid line shows the new ultimate payo↵
function, for higher productivity, and the thick dashed line shows the new value function.
The value function shifts up, and the thresholds fall - the thresholds for entering is lower,
and the threshold for quitting is lower for the higher productivity firm.
18
exporter are just high enough to cover the sunk entry cost F (e.g. Melitz (2003)). Here we
cannot apply this rule, since the value of exporting is no longer given by expected lifetime
profits for the lowest productivity exporter, but by the total value of exporting - which
also includes the information value of export sales. Therefore, we must find the firm whose
expected value is exactly zero at the initial belief p0 , which would be the firm whose p is p0 .
Similarly, it is not possible to tell whether a firm will forgo experimenting or not by simply
comparing its expected lifetime discounted profits with the sunk entry cost F . In fact, the
firm that satisfies the equality between its expected lifetime discounted export profits with
the sunk entry cost F ( i.e. , such that Mr ( f + AG ✏ 1 [p0 eµ̄ + (1 p0 )eµ ]) = F ) will
choose to experiment first, since its p̄ will be above p0 (and its p will be below p0 ). This
can be seen in Figure 9. Even though its expected profits from entry are greater than or
equal to F , it chooses to experiment first because of the information value it gains. As
productivity increases, one of the two things can happen. Either the profit line for second
stage (discounted expected lifetime profits from full-scale exporting) will cross the origin,
and all firms with productivity above this level of productivity, denote it as 1 (where
M
( f + AG ✏1 1 eµ ) = F ), will choose to enter right away, or for some < 1 , the threshold
r
for entering p̄ will be just equal to p0 , and all firms with productivity at or above this level,
call it 2 , will enter right away. This is depicted in Figure 9. So, we get
M
( f + AG
r
✏ 1 µ
1 e )
= F,
p̄( 2 ) = p0 ,
and the productivity of the highest productivity experimenter is given by
¯ = min{ 1 ,
2 }.
We show the two firms - the lowest productivity exporter, and the highest productivity
experimenter in Figure 10.
So, we can conclude that in the presence of a testing technology, the cuto↵s for experimenting and entering right away compare as follows with the cuto↵ for exporting in a
standard Melitz-type model:
< ˜ < ¯, where is the cuto↵ for exporting (by experimenting first), and ¯ is the cuto↵ for entering right away in the experimentation model, and
˜ is the cuto↵ for exporting in the standard framework (obtained by equating the stage-2
discounted lifetime profits with the sunk cost F ).
4
Structural Estimation
The model laid out in previous sections is concerned with the optimal behavior of a
firm facing demand uncertainty in a new market, and choosing its exit and full-scale entry
policy, as well as experimentation intensity - the number of consumers that it accesses to
19
800
P0
700
Value function
600
500
400
300
200
100
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Beliefs P
Figure 9: The illustration of the search for the thresholds for exporting (experimenting) and
entering right away
The thin solid line (lowest on the graph) shows the terminal payo↵ function for the exporter
with , such that Mr ( f + AG ✏ 1 [p0 eµ̄ + (1 p0 )eµ ]) = F , and the thin dashed curve shows
the value function for this exporter. It is clear that the value function at p0 of this exporter
is greater than 0, p̄ > p0 and p < p0 . The thick solid line shows the terminal payo↵ function
for the exporter with productivity 2 , s.t. p̄( 2 ) = p0 , and the thick dashed curve shows the
value function for this exporter. The dashed-dotted line (the highest on the graph) shows the
terminal payo↵ function for the exporter with productivity 1 , s.t. Mr ( f + AG ✏1 1 eµ ) = F .
The productivity of the highest productivity experimenter is then given by ¯ = min{ 1 , 2 }.
20
200
180
160
Value function
140
120
100
80
P0
60
40
20
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Beliefs P
Figure 10: The cuto↵s for exporting (experimenting) and entering right away
The thin dashed curve shows the value function for the lowest productivity exporter, and the
thin solid line shows the terminal payo↵ function for this exporter. Since the threshold for
quitting p for this firm is exactly at p0 , this firm is just indi↵erent between exporting (through
experimenting first) and not exporting. The thick dashed curve shows the value function for
the highest productivity experimenter, and the thick solid line - the terminal payo↵ function
for this firm. Since its threshold for entering the market is exactly p0 , it is indi↵erent between
experimenting and entering at the initial belief p0 . All firms with productivity in-between
these two levels of will experiment, and all firms with productivity above the higher level
of will enter right away.
21
learn about demand. The consumer in the model can be interpreted in many ways. We
can think of each retail store as a consumer, so that sales in each store provide information
about demand in the entire chain of retail stores. We can consider each city within a country
as a consumer, and the entire country as the ultimate market. One way to interpret our
model is viewing each country within a certain region as a consumer, and the entire region
as the market to be tested. Suppose we divide the world into regions in such a way that the
structure of demand uncertainty is the same in all countries within the same region. That is,
the unknown demand parameter is common across these countries, and the observation error
has the same distribution in all the countries. In that case, the behavior of a firm that faces
a high sunk cost of entering the region, and can learn about the unknown demand parameter
through its sales in individual countries, can be predicted by our model in a straightforward
manner.
In what follows, we continue with this interpretation of the model - we treat individual
destinations within a given region as ‘consumers’, and the entire region as the target market.
We focus on 4-digit-code products, rather than 8-digit-code products. This allows us to avoid
the issue of the interaction in terms of learning across individual 8-digit-code products within
the same 4-digit-code category: sales of one 8-digit-code product could provide information
about demand for another 8-digit-code product in the same market. Instead, we aggregate
sales of all 8-digit-code products within the same 4-digit-code category, and focus on the
interaction in terms of learning across destinations.
We will proceed as follows. We will describe the data and present the 17 regions we
formed. We will then lay out the estimation procedure. Next, we will provide some general
regression results that support the model, and showcase the structural estimation results
for several pairs of 4-digit-code products and regions. We will also carry out various policy
experiments, such as a uniform tari↵ reduction in a region, a temporary tari↵ reduction in a
single destination, and an increase in the signal-to-noise ratio in the region, and study how
these a↵ect exports to the region.
5
Description of the Data
The data on export flows comes from the database collected by the French Customs. It
provides records of French firms’ exports (both quantities in tons and export values in euros)
aggregated by firm, year, product (identified by an 8-digit code, NC8, which is equivalent to
the 6-digit classification of ComTrade in the first 6 digits, with the two last digits appended
by the French Customs) and destination country, over the 1995-2005 period. To obtain
information on the characteristics of firms, we use a second data source, the French Annual
Business Surveys (Enqutes Annuelles d’Entreprises) for the manufacturing sector over the
same period, provided by the French ministry of Industry. These contain information on firms
that have more than 20 employees, and the variables listed are the address, the identification
number of the firm (siren), sales, production, number of employees, and wages. Therefore,
we are able to obtain measures of productivity for the firms with employment above 20, even
though these measures are not product-specific (and so a single measure is assumed to apply
22
to all the products of a multi-product firm).
We consider only consumer non-durable products, and aggregate the quantities exported
and total value of exports (in euros) of all 8-digit products within a given 4-digit-code
category to obtain 4-digit-level quantities and values exported. Unit values of 4-digit-code
products are then obtained as total values exported divided by quantities exported.
We form 17 geographical regions out of all world destinations. We do so because it is
much more plausible that demand structure is the same across destinations within a certain
region in the world, than across all countries in the world. The categorization we make in
this paper is very crude and was based for the most part on geographical proximity. In
Tables 6-7 in the Appendix, we show the 17 regions and their constituent countries.
We define a pair as a combination of a region and a 4-digit-code product. The estimation
is then carried out for each pair individually. This allows us to estimate separately the
features of export behaviour and demand in each 4-digit-code category and region. We focus
on new exporters, which are defined as firms that start exporting to the given pair in or after
1996, since the sample we have runs over 1995-2005. That is, all firms that export in 1995
are considered ‘old exporters’. We calculate the number of destinations within the given
region that a firm exported to in any given year to measure the experimentation intensity
(or consumer sample size), which is a central variable of the model. To make sure that we
do not consider insignificant export destinations, which French firms either never export to
or export negligible quantities, we only count those destinations which appear at least once
in the dataset, and out of these, we eliminate those for whom the total export volume by
French firms over 1995-2005 is less than 5 per cent of the entire export volume by French
firms to the region over 1995-2005 (in the given 4-digit-code category).
6
Estimation Procedure
The estimation of the model relies on fitting the time series of export size (number of destinations), denoted by n, over all new exporters between 1996-2005, and the time series of
firm beliefs, denoted by p, which are estimated within the model. Given the profitability
of a firm, the cost parameters, the parameters of demand - µ̄, µ, x , and the firm’s beliefs,
the model predicts the optimal decision rule (experimentation versus entry or exit), and the
optimal export size (n(p) in the experimentation phase, and full-scale export size M in the
post-entry stage) of a new exporter. We can therefore calculate the likelihood of the data for
any values of parameters relying on the di↵erence between the predicted and the observed
time series of n for each firm and year.
Since there are a lot of unknowns in the model, we employ Bayesian techniques, which are
very suitable for such situations. The estimation proceeds in two steps, producing estimates
of two sets of parameters - ⇥1 and ⇥2 . First, we estimate the demand structure - parameters
µ̄, µ, x , p0 , which we jointly denote as ⇥1 - and beliefs p, based on the time series of quantities
of exports to individual destinations. Second, given the estimates of ⇥1 ⌘ {µ̄, µ, x , p0 }, and
beliefs p, we look for cost parameters (coefficients of c(n) and F ) and full scale M , jointly
forming a set ⇥2 , that provide the best fit between the optimal n⇤ and the observed n. For
23
that, we need a measure of firm profitability. We employ a measure of profitability that relies
directly on the estimated productivity of firms (TFP), which we obtain from the domestic
balance sheets of French firms.
Step 1: Estimating the demand parameters ⇥1 ⌘ {µ̄, µ,
beliefs p.
x , p0 }
and
k
Fix a pair of a 4-digit-code product and region. First, we estimate the demand signals ↵jt
for each destination k, firm j and time t within this pair. This can be done as follows: from
the model, the observed value of log-quantity sold in a destination k at time t by firm j is
k
ln qjt
= ln ytk
k
✏ ln prjt
+ (✏
k
1) ln Ptk + ↵jt
.
where j denotes firms, t denotes years, and k denotes destinations within the region.
k
in the above equation do not have a zero mean, and, moreover,
By assumption, the ↵jt
have a mean that varies from one firm to another (each firm has a µj that is either µ̄ or µ).
Therefore, what we can do first is run the above regression with firm-fixed e↵ects (to account
for the firm-specific means in residuals), and obtain estimates of ✏. As a proxy for prices
we use the unit values (export values in euros divided by the export quantities). Since the
unit values may be correlated with the demand shocks, we instrument for these using firmproductivities. We calculated TFP a la Ackerberg, Caves and Frazer, to address the issue
of endogeneity of production inputs in the production function and to avoid the collinearity
issue arising in Olley-Pakes or Levinsohn-Petrin. For more details on the estimation of TFP,
please, see the Appendix.
Thus, the regression used to estimate the elasticities is:
k
ln qjt
=
k
✏ ln uvjt
+
KT
X
gtk Dtk +
1
J
X
j
Dj + ekjt ,
1
k
are the unit values and are instrumented for with firm’s TFP, and Dtk are
where uvjt
destination-time fixed e↵ects. The destination-time fixed e↵ects are meant to capture the
variation in the aggregate variables y and P across k and time t. The Dj are the firm-fixed
e↵ects.
We summarize the estimates of ✏ over all pairs that we obtained as a result of these
regressions in the Appendix. We must note that for some of these pairs, we obtained values
of ✏ that are smaller than 1. We then do not consider these particular pairs in further
analysis, since the assumption that ✏ > 1 is crucial in the model. We have 726 pairs that we
can work with, as a result.
Given the estimates of ✏, we calculate the following for each firm j, destination k and
time t:
k
k
k
vjt
⌘ ln qjt
+ ✏ ln uvjt
,
which by assumption contains the aggregate factors, common to all firms, and the firmk
. Now, we would like to filter out the
destination specific, time-varying demand shocks ↵jt
24
k
aggregate factors, common to all firms and destinations, and obtain demand signals ↵jt
.
k
Since ↵jt have firm-specific means, we cannot treat these as a usual error term, and just
k
run a regression of vjt
on time-fixed e↵ects to control for aggregate demand. Moreover,
since there are a lot of unknowns involved - we do not know the values of µ̄, µ, x , p0
(the objective distribution of µ in the population), and what µ each firm has drawn, we
apply Gibbs sampling. We assume that there are two (unobserved) aggregate factors, each
of which follows an AR(1) process, and employ Gibbs sampling (with Kalman filtering) to
deduce these aggregate factors, the demand shocks, µ̄, µ and x , and µj for each j.
We have
k
↵jt
= µj +
k
k
x ⌘jt , ⌘jt
⇠ N (0, 1)iid,
µj = µ̄ with probability p0 , and µ with probability 1 p0 ,
the transition equations for the aggregate factors, denoted by A1 and A2 ,
At1 = ⇢1 A(t
1)1
+ CA1 + ⌫t1 ,
At2 = ⇢2 A(t
1)2
+ CA2 + ⌫t2 ,
⇢1 2 ( 1, 1), ⇢2 2 ( 1, 1), CA1 2 R, CA2 2 R are constants, and ⌫t1 and ⌫t2 are normal
error terms with means zero and covariance matrix ⌦. The observation equation is
k
k
vjt
= D1 At1 + D2 At2 + ↵jt
.
k
It can be seen from the equations above that the unobserved ↵jt
and At1 , At2 can be
identified up to a constant, and therefore we normalize µ = 0. Denote ⇥1 ⌘ {µ̄, x , p0 }, the
main parameters characterizing demand uncertainty. The details of the estimation of ⇥1 are
reported in the Appendix. We also summarize the posterior distributions of µ̄, x and p0 ,
µ̄ µ
as well as the signal-to-noise ratio ⌘ x , in the graphs in the Appendix.
Once we have generated draws of the values of µ̄, x , p0 , as well as the aggregate factors
k
At1 , At2 , and demand shocks ↵jt
, from their respective posteriors, we can calculate the beliefs
of any given firm j at any time t, using the discrete time Bayes updating rule:
P robjt (µj = µ̄) ⌘ pjt =
( ↵¯pjtx µ̄ )pj(t
njt
( ↵¯pjtx µ̄ )pj(t
njt
1)
+ (
1)
↵
¯ jt µ
p x
njt
)(1
pj(t
1) )
,
k
over destinations k, and
where is the standard normal density, ↵
¯ jt is the average of ↵jt
njt is the sample size at time t, which in this application is the number of destinations within
a given pair that the firm exports to.
25
Intermediate step: Measure of firm heterogeneity (and profitability)
We treat aggregate factors, such as aggregate demand and wages, and firm productivity as
stationary, with constant variance, and assume that the firm uses their expected values to
make the optimal export/experimentation decision. Hence, we need to measure the expected
value of the profits of a firm with productivity j to predict its behaviour.
The profits of a firm j from exporting to a destination k are given by
1
k
⇡jt
=
✏
wt
1
= eµj ytk
jt
k
eµj ytk (prjt
) ✏ [Ptk ]✏
1
✏
✏
[
1 ✏
1
] ✏[
jt ✏ 1
wt
]
1
[Ptk ]✏ 1 ,
and in logs,
k
= ln
ln ⇡jt
1
✏
+ ln ytk (✏
✏ 1
✏ 1
⌘ C + (✏ 1) ln jt + Akt + µj ,
✏ ln
1) ln wt + (✏
1) ln
jt
+ (✏
1) ln Ptk + µj
where C collects all the constant terms, and Akt collects all the aggregate terms (ytk , wt , Ptk ).
As mentioned before, we assume that the aggregate variables are stationary. Moreover, we
set E(Akt ) = Ā, V ar(Akt ) = A2 , for all k, t.
We assume that ln jt follows an AR(1) process for every firm j:
ln
jt
= C j + ⇢ ln
j(t 1)
+ ejt ,
where ⇢ 2 ( 1, 1), ejt ⇠ N (0, ), and C j is a firm fixed e↵ect. This is consistent with
the assumption that log-TFP follows a first-order Markov process made when estimating
TFP (in the Appendix). Then ln ⇡j is normally distributed, and given
+ Akt + µj ]
k
jt ) + E(At ) + µj
E(ln ⇡jt ) = E[C + (✏ 1) ln
= C + (✏ 1)E(ln
= C + (✏
1)
jt
Cj
+ Ā + µj ,
(1 ⇢ )
and
V ar(ln ⇡jt ) = (✏
1)2 V ar(ln
jt )
+ V ar(Akt )
2
= (✏
2
1)
(1
26
(⇢
)2 )
+
2
A,
we obtain
E(⇡j ) = exp[E(ln ⇡j ) +
V ar(ln ⇡j )
]
2
(✏
Cj
+ Ā + µj +
1)
(1 ⇢ )
= exp[C + (✏
Cj
]exp[(✏
1)
(1 ⇢ )
µj
= e exp[(✏
2
1)2 (1
(⇢ )2 )
2
2
1)2
2(1
+
(⇢ )2 )
]exp[
2
A
2
2
A
]
]C̃,
where C̃ ⌘ exp[C + Ā]. We take the time series of for all firms, evaluate ⇢ , , C j ,
and calculate the corresponding E( jt ) for each firm j. To measure heterogeneity between
Cj
firms, it suffices to focus on exp[ (1 ⇢ ) ], which we will call as smoothed average productivity
of firm j and denote by ˜j .
We show in the Appendix that normalizing the profits by any constant results in the
estimates of cost variables that are scaled by the same constant. Therefore, we can scale the
profits ⇡j :
⇡
ˆj ⌘
E(⇡j )
exp[(✏
1)2 2(1
2
(⇢ )2 )
]exp[
where md is the median of exp[(✏
⇡
˜j ⌘
exp[(✏ 1) (1
md
C
2
A
2
= e µj
]mdC̃
1) (1
Cj
]
⇢ )
exp[(✏
1) (1
Cj
]
⇢ )
md
⌘ e µj ⇡
˜j ,
over all exporters in the given pair, and
j
⇢ )
]
is the measure of heterogeneity we will focus on in the estimation.
Step 2: Estimating the cost parameters and full scale M , given
estimates of µ̄, µ, x , p0 and beliefs p.
The cost parameters that we are interested in are the sunk entry cost F and the coefficients
of the testing cost function c(n):
c(n) = g0 + g1 n + g2 n2 + g3 n3 + g4 n4 + g5 n5 .
From now on, denote the set of cost parameters {g0 , g1 , g2 , g3 , g4 , g5 } by G.
Given the parameters of the model, one can solve the second-order free boundary value
problem of the firm outlined in the theoretical part, where we now treat the equation for the
value function as a discrete time equation. That is, we input annual values for all variables
in the equation, and solve for the (annual) values of the value function, and corresponding
annual export size. When we put the expected value of per-period profitability of a firm, ⇡
˜j ,
into the BVP, we get
27
v 00 (p) =
c0 (z(rv(p)))
⇡
˜j [peµ̄ + (1
1
(p(1
2
p)
where n(p) = z(rv(p)), z ⌘ g 1 , g(n) = nc0 (n)
plus the value matching condition:
v(p̄) = Ṽ (p̄)
p)eµ ]
µ̄ µ 2
)
,
x
c(n),
F, v(p) = 0.
and smooth pasting condition:
v 0 (p̄) = Ṽ 0 (p̄), v 0 (p) = 0.
Ṽ (p) is the value function in the second stage (post-entry),
Ṽ (p) =
M
( f +⇡
˜j [peµ̄ + (1
r
p)eµ ]),
and it is simply lifetime discounted profits from selling to the entire market (of size M )
using the linear exporting technology.
There are three states that a new exporter can be in: experimentation (pre-entry), fullscale export (post-entry), and non-exporting (post-exit). In the model, once the firm exits the
market, it does not re-enter, since we assume a stationary equilibrium with static aggregate
variables and firm productivity. In the data, we assume that a firm that did not export
a given 4-digit-code product to any destination within the region in some year and in all
years after that, up to 2005, is in the non-exporting state. However, when we observe a
firm not exporting to any destination in some year, followed by exports to at least one
destination within the given region in any year afterwards, the non-exporting year is treated
as just a temporary zero in the time series of the number of destinations of this firm. That
is, we simply set njt = 0 for that year for that firm. We denote the states as follows:
experimentation phase as St = 0, full-scale export phase as St = 1 and non-exporting phase
as St = 2.
Relying on the BVP, and given the profitability of a firm, ⇡
˜j , the cost parameters, the
full scale M and the beliefs of the firm, one can calculate the thresholds p̄ and p, the state
and optimal export size for this firm in every year. In particular, we can predict which firms
are experimenters - these are firms that have profitability below the cuto↵ value for entering
the market right away, or equivalently, as outlined in the model,
M
(˜
⇡j e µ
r
f ) < F,
and
p̄(˜
⇡j ) > p0 .
28
We also require that the export participation condition hold:
p(˜
⇡j ) <= p0 .
Given that a firm chooses to experiment first, we can predict its state and export size at
every t = 1, ..., T , where T = 11, the total number of years in the sample: with probability
, the firm exits in any given period, but otherwise,
t = 1 =) Sjt = 0,
t > 1, Sj(t
1)
n⇤jt = z(rv(pjt )|˜
⇡j );
n⇤jt = z(rv(pjt )|˜
⇡j );
= 0, pjt 2 (p̄j , pj ) =) Sjt = 0,
t > 1, Sj(t
1)
= 0, pjt
t > 1, Sj(t
t > 1, Sj(t
1)
p̄j , =) Sjt = 1,
= 1 =) Sjt = 1,
n⇤jt = M ;
= 0, pjt pj =) Sjt = 2,
1)
t > 1, Sj(t
1)
= 2 =) Sjt = 2,
n⇤jt = M ;
n⇤jt = 0.
n⇤jt = 0.
Given that a firm chooses to enter at full scale right away, we can also predict its behaviour: with probability it exits in any given period, and if it does not, it exports to M
destinations in the region. Thus, the model gives us predictions with respect to the optimal
export size of firms, as measured by the number of export destinations within the region,
and optimal exit behaviour.
We assume that the total discount rate r is equal to the sum of the exogenous death rate,
, and a pure discount rate, say, dr. We fix the value of dr to 0.05. We choose the value
of 0.05, since that is the average long-term interest rate in France over 1995-2005, and it is
also the value of discount rate suggested by the European Commission in the 2008 issue of
Guide to Cost Benefit Analysis of Investment Projects (2008). Thus, we need to estimate
only, rather than both and r.
Recall that in the theoretical part, we assumed a linear exporting technology for firms
in the second stage (post-entry), more precisely, c̃(n) = f n. The equation that allows us to
estimate this parameter in the model is the boundary condition
v(p̄) = Ṽ (p̄)
Replace F with F 0 ⌘ F +
F =
M
f,
r
M
( f +⇡
˜j [peµ̄ + (1
r
and f with f 0 = 0:
29
p)eµ ])
F.
F0 =
v(p̄) = Ṽ (p̄)
v(p̄) = Ṽ (p̄)
F
M
( f0 + ⇡
˜j [peµ̄ + (1
r
M
M
f=
⇡
˜j [peµ̄ + (1
r
r
p)eµ ])
p)eµ ]
F
F 0,
M
f.
r
Notice that the last equation is identical to the original boundary condition, and that
F and f appear only in this boundary condition, and nowhere else in the model. Thus, F
and f are not separately identifiable: for any F and f , the same kind of behaviour will be
produced by F 0 ⌘ F + Mr f and f 0 = 0. Paying f in every period in each of M destinations,
with r as the discount rate, is equivalent to paying Mr f upfront upon entry into second stage
(in addition to the cost F ). Therefore, we normalize f = 0 and focus on estimating F .
There are some additional parameters, that we introduce to accommodate the data better.
The model predicts the optimal export size (number of destinations) for each firm, given its
profitability in the market, and the testing costs, entry cost F and full scale M . We still need
to allow for some error in the observed number of export destinations, since it is impossible
to match perfectly the predicted n⇤jt and the observed njt for all firms and years:
njt = n⇤jt + ejt ,
2
where ejt is a mean zero normal error term, with variance N
, which we will also estimate.
We view the error term as the discrepancy caused by managerial error, delays and disruptions
in international delivery, and considerations a↵ecting the choice of the number of destinations
that are outside the model. Of course, the goal of estimation is to minimize this discrepancy.
Additionally, the model predicts that in the experimentation stage, the firm quits as soon
as its beliefs pjt fall below some level pj . We cannot ensure that this holds true for all j, t,
and therefore introduce p˜j :
p˜j = pj
epd ,
where pj is the threshold for quitting as predicted by the model, and p˜j is the actual
threshold for quitting, which di↵ers from that predicted by some error epd . The variable
epd has an exponential distribution with mean mp > 0. So we allow for the actual quitting
threshold for any firm to be lower than the one predicted by the model.
To summarise, we need to estimate the following parameters: ⇥2 ⌘ {G, F, M, , N , mp }.
We do this as follows:
• Fix values of M , , r ⌘ + dr, N , and mp . Given these values, carry out the
Metropolis-Hastings step (which essentially lets you pick the values with the highest
likelihood), to produce new estimates for the cost parameters G and F .
• For given values of G and F , update the values of M , , r, N and mp , that is, generate
draws from their new posteriors, conditional on the values of G and F .
30
• Iterate on the previous two steps until convergence.
In the Appendix, we explain in greater detail how we set the initial values of M , , N ,
and mp , how we calculate the likelihood of the data, given the parameter values, and how
we carry out the two steps above.1
7
Results
General overview
In the Appendix, we show the histograms for estimated elasticities of substitution ✏, and
demand uncertainty parameters µ̄, x , ⌘ µ̄x (recall that we normalized µ = 0). Below, we
show a scatter graph, where it is clear that elasticity of substitution ✏ is positively correlated
with µ̄. That is, the more di↵erentiated product-region pairs (as measured by ✏) appear to
have a correspondingly larger gap between lowest and highest demand (µ̄ µ).
As a first look at the relationships predicted by the model, we run two regressions for
each pair: first, a regression of the number of export destinations on estimated beliefs (controlling for aggregate demand, firm productivity and firm fixed e↵ects), and second, a logistic
regression of the exit of firms on beliefs (controlling for aggregate demand, firm productivity
and firm fixed e↵ects). In the first regression, we introduce firm fixed e↵ects to take into
account the di↵erent average export sizes of firms, and unobserved firm-level characteristics,
and track only the change in their export size in response to their beliefs. In the second
regression, we again introduce firm fixed e↵ects, to take into account unobserved firm-level
characteristics. Since the beliefs are not observed, but instead have been estimated, we do
the following: we generate thirty draws of the vector of beliefs from the posterior obtained in
the first step of the estimation (estimating demand parameters), for each firm in every year.
We then generate a new dataset, where the observed independent variables are matched with
the vectors of beliefs, and stacked together, so that we have 30 times the original number
of observations. That is, initially we have a regression equation for the export size ES as a
function of beliefs P in a given pair of region and 4-digit product
1
In Step 1, we have shown how to generate draws of ⇥1 ⌘ {µ̄, µ, x , p0 } conditional on
observed export quantities. However, the observed behavior of firms in terms of experimentation intensity (number of export destinations) also provides information for the posterior of
⇥1 . Therefore, one could incorporate this into updating the posterior of ⇥1 . We did consider
this e↵ect, and found that the posterior of ⇥1 , conditional on just quantities, is tight enough
that the information about the number of export destinations does not change it much. In
other words, picking di↵erent draws from the posterior of ⇥1 , conditional on just quantities,
does not result in very di↵erent time series for the beliefs of the firms. Therefore carrying
out, say, Metropolis-Hastings sampling to update the posterior of ⇥1 further, conditional
on the observed behavior of the firms in terms of the number of export destinations, will
only a↵ect the first-stage posteriors marginally. Hence, we locate the median values of the
posterior distributions of ⇥1 , fix these as the values of µ̄, x , p0 , calculate the beliefs of all
firms, based on these parameters, and proceed to estimate the cost parameters.
31
40
Scatter points
Values of upper bar mu
35
30
Least squares line
25
20
15
10
5
0
0
2
4
6
8
10
12
14
16
Values of elasticity of substitution
Figure 11: Correlation between ✏ and µ̄
ESjt =
0
+
1 Pjt
+
2
ln ADt +
3
ln
jt
+ f ej + ejt ,
where j denotes firms exporting the given 4-digit product to the given region, and t years, and where ADt denotes the aggregate demand indicator for the given pair (measured as
a weighted average over all countries within the region of imports from the entire world of the
given 4-digit product, explained in detail in the Appendix), jt denotes firm j’s productivity,
f ej are firm fixed e↵ects and ejt is the error term.
After we generate 30 draws of beliefs P for each firm within a given pair, we will have
ESjt =
0
+
s
1 Pjt
+
2
ln ADt +
3
ln
jt
+ f ej + esjt ,
where s denotes the draw of beliefs P . Note that the error term will have the form
esjt = ējt + ẽsjt ,
since the beliefs Pjts come from the same distribution for every j, t. Hence, we will
calculate clustered standard errors in these regressions, where each combination of firm and
year forms a cluster.
Similarly the equation for the logistic regression of exit on beliefs will become
log
odds(exit)jt =
0
+
s
1 Pjt
+
2
ln ADt +
3
ln
jt
+ f ej + esjt ,
where log odds(exit) is the log of the odds of exiting the market, j denotes firms, t years, s - the draw if beliefs P , f ej are firm fixed e↵ects, and esjt is the error term. In this
32
regression, we will also use clustered standard errors, where each combination of firm and
year forms a cluster.
To obtain more readily generalizable results, we carry out the regression separately for
each 4-digit-code product, thus pooling together all pairs of a given 4-digit-code product and
various regions. We introduce the firm-fixed e↵ects and calculate clustered standard errors,
with firm-year combinations as clusters, as explained above, where now every exporter to a
specific region of the given 4-digit-code product will be treated as a separate firm. That is,
we do not allow a producer exporting to two regions to have a common fixed e↵ect in both
regions, to better control for unobservables within regions.
In Table 1, we report representative regression results, where in the first column we show
the regression of the number of export destinations on beliefs for 4-digit code 3304 (Beauty or
make-up preparations and preparations for the care of the skin (other than medicaments),
including sunscreen or suntan preparations; manicure or pedicure preparations), and in
the second column - the logistic regression for the odds of quitting the market for 4-digit
code 6112 (Tracksuits, ski suits and swimwear, knitted or crocheted). In the regression of
the number of export destinations on beliefs, we observe a statistically significant positive
e↵ect of beliefs and aggregate demand on export size. In the regression of exit behavior on
beliefs, we observe a statistically significant negative e↵ect of beliefs, aggregate demand and
productivity on the probability of exiting the market.
Table 1: Representative regressions of the number of export destinations and exit behavior
on beliefs
Number of destinations Log-odds of exit
Beliefs
.399
-1.79
(5.65)***
(3.92)***
Log-aggregate-demand
.06
-.8
( 4.3)***
(3.15)***
Log-tfp
-0.03
-0.65
(1.07)
(2.3)**
Constant
-0.02
(2.75)***
Observations
309750
34832
Note: absolute values of t-statistics in parentheses;
*significant at 10%; **significant at 5%; *** significant at 1%.
In Tables 2 and 3, we summarize the obtained coefficients for beliefs (variable P ) for the
two types of regression over all 4-digit codes. From studying the z-values of the coefficient for
beliefs P in the export size regression, we see that in 45 percent of all 4-digit-code products
33
Table 2: Summary of the regression of number of export destinations on beliefs, over 146
4-digit-code-products.
25th percentile 55th percentile Mean
Beliefs coefficient
0.02
0.16
0.18
Z-value
0.16
1.28
1.22
Maximum
1.81
7.15
Note: the z-value is obtained as a ratio of the beliefs coefficient and its standard error
Table 3: Summary of the logistic regression of exit on beliefs, over 133 4-digit-code-products.
90th percentile 65th percentile Mean
Beliefs coefficient
0.04
-0.98
-1.78
Z-value
0.07
-1.28
-2.04
Minimum
-8.59
-13.65
Note: the z-value is obtained as a ratio of the beliefs coefficient and its standard error;
13 products omitted, as they contained too many firms to introduce firm fixed-e↵ects
Table 4: Regression of the number of export destinations on beliefs and aggregate demand,
introducing interaction terms with a dummy for new exporters
Number of destinations
Beliefs
.25
(12.61)***
Beliefs*New
.04
(1.65)*
Log-aggregate-demand
.018
( 3.68)***
Log-aggregate-demand*New
-.049
( 9.06)***
Log-tfp
0.003
(0.46)
Constant
0.012
(8.09)***
Observations
10561071
Note: absolute values of t-statistics in parentheses;
*significant at 10%; **significant at 5%; *** significant at 1%.
34
it has z-values at or above 1.28, which produces a p-value of around 10 percent for the onesided test of significance. Thus, at a significance level of 10 percent (or less) we can reject
the hypothesis that beliefs have a non-positive e↵ect on export size for around half of all
4-digit-code products. For at least 75 percent of all 4-digit-code products we obtain positive
coefficients for beliefs, if not necessarily statistically significant ones.
From studying the z-values of the coefficient for beliefs P in the logistic regression of
exit behavior, we see that in 65 percent of all 4-digit-code products its has z-values at
or below 1.28, which produces a p-value of around 10 percent for the one-sided test of
significance. Thus, at a significance level of 10 percent (or less) we can reject the hypothesis
that beliefs have a non-negative e↵ect on the probability of exiting for around two thirds
of all 4-digit-code products. For about 90 percent of all 4-digit-code products we obtain
negative coefficients on beliefs, if not necessarily statistically significant ones.
While the positive response of the export size to beliefs, and the negative e↵ect of beliefs
on the exit behavior are consistent with the model of experimentation, these results are
expected in any model of learning, such as Passive Learning, where the firm expands in size
in response to better beliefs simply because of higher expected profits, or quits the market
in response to a deterioration in beliefs. However, with Active Learning, a new exporter
will respond less strongly to aggregate demand, and more strongly to its beliefs, than an
old exporter, which has completed its experimentation phase. For example, when aggregate
demand falls, firms that are not learning, and only consider the profit value of sales (and
not their information value) will decrease their export size. Firms that are learning actively,
however, may not react in the same manner - lower profits per destination may not induce
them to decrease the number of export destinations, if the information value of exporting
to these destinations is high enough. That is, the marginal value of exporting to the last
destination in the current set of destinations may still be as high as the marginal cost, even
after the fall in current profits in this destination (this is possible for export size expressed in
integers, but not if it is viewed as a continuous variable). Thus, for the actively learning new
exporters the correlation between aggregate demand and export size will be weaker than for
the old exporters. Instead, new exporters will place more emphasis on their current beliefs
about demand.
To test these implications, we carry out the following regression over the pooled dataset
of all pairs of regions and 4-digit-code products:
k
ESjt
=
0
+
ks
1 Pjt
+
ks
2 Pjt
⇤ N ewjk +
3
ln ADtk +
4
ln ADtk ⇤ N ewjk +
5
ln
jt
+ f ekj + eks
jt ,
k
is the number of export destinations in market k (pair of region and 4-digitwhere ESjt
code product) accessed by firm j at time t, Pjtks is thes-th sample beliefs of firm j about
demand in market k at time t, ADtk is aggregate demand in market k at time t, N ewjk is a
dummy variable equal to 1 if firm j in market k is a new exporter, and 0 otherwise, jt is
the productivity of firm j at time t, f ekj is the firm-pair fixed e↵ect, and
k
ks
eks
jt = ējt + ẽjt ,
as explained above, so that we use clustered standard errors to calculate t-ratios, where
clusters are combinations of pair k, firm j and year t.
35
The results are presented in Table 4. We see that, indeed, the coefficient for the interaction term of beliefs and the new exporter dummy is statistically significant and positive,
which indicates a stronger e↵ect of beliefs on export size of new exporters, and the coefficient
for the interaction term of aggregate demand and the new exporter dummy is statistically
significant and negative, which indicates a weaker e↵ect of aggregate demand on export size
of new exporters. It is interesting to note that the coefficients for the two interaction terms
are almost equal in absolute terms. Of course, the marginal e↵ects of the interaction terms
will not be equal in absolute terms, since the variables aggregate demand and beliefs have
very di↵erent magnitudes.
Estimates for two pairs
We present the results of the structural estimation for two pairs: combinations of two regions (East Asia and Middle East) and the 4-digit code 4202 - ‘trunks, suitcases, vanity
cases, executive-cases, briefcases, school satchels, spectacle cases, binocular cases, camera
cases, musical instrument cases, gun cases, holsters and similar containers; travelling-bags,
insulated food or beverages bags, toilet bags, rucksacks, handbags, shopping-bags, wallets,
purses, map-cases, cigarette-cases, tobacco-pouches, tool bags, sports bags, bottle-cases, jewellery boxes, powder boxes, cutlery cases and similar containers, of leather or of composition
leather, of sheeting of plastics, of textile materials, of vulcanised fibre or of paperboard, or
wholly or mainly covered with such materials or with paper’.
In Figures 12 and 13, we show how the average number of destinations accessed within
each region evolves with the date since first exports in the region. On the same graphs, we
plot the evolution of the average number of export destinations (within the region) of old
exporters, where the horizontal axis now measures chronological time (year since 1995, 1995
being 1). We plot the averages for new and old exporters on the same graph, to make sure
that the upward trends in the average export size of new exporters are not a result of a
general upward trend in the region during that period. One can see that the behavior of old
exporters in East Asia and Middle East is much more stable than that of new exporters. As
an additional check, we plot the aggregate demand in each region in Figure 14. Aggregate
demand is measured again as the weighted average of quantity imported from the entire
world by the region (averaged over all destinations) of the 4-digit-code product 4202 - trunks,
suitcases, etc. We see that in both East Asia and Middle East the aggregate demand has
multiple ups and downs, and clearly does not exhibit steady upward movement over time.
In the table below we summarize the estimates of the main parameters (means of/draws
from their posterior distributions).
One can see that the estimated values of the signal-to-noise ratio are similar in East
Asia and in the Middle East. The signal-to-noise ratio a↵ects the speed of learning and
convergence of the average export size to the full scale, and we can see this in the simulations of the evolution over time of new exporters in Figures 15 and 16. To produce these
simulations, we do the following. We start with a fixed number of new exporters (20000),
and assign each one randomly into a profitability range, as well as demand range - high or
low (with probability of high demand given by the estimated p0 ). We then draw demand
36
Table 5: Estimates of the parameters for three regions, exports of Trunks, suitcases, etc.
Parameter
✏
md
p0
µ̄
x
M
F
F
p0 eµ̄ +(1 p0 )eµ
r
N
¯
pd
East Asia Middle East
1.14
1.64
1.8
14.71
0.0638
0.1183
2.82
2.64
1.61
1.44
1.75
1.84
2
3
46.9
59.77
23.42
23.52
0.19
0.191
0.24
0.241
0.94
2.06
0.0005
0.0057
Average number of destinations
2.5
2
1.5
1
Old exporters
New exporters
0.5
0
1
Estimated full scale
2
3
4
5
6
7
8
9
Date since first exports for New and year since 1995 for Old exporters
Figure 12: Average size evolution for East Asia, Trunks, suitcases, etc.
37
10
11
Average number of destinations
3.5
3
2.5
2
1.5
Old exporters
1
New exporters
0.5
Estimated full scale
0
1
2
3
4
5
6
7
8
9
10
Date since first exports for New and year since 1995 for Old exporters
Figure 13: Average size evolution for Middle East, Trunks, suitcases, etc.
5
4.5
x 10
East Asia
4
3.5
3
2.5
2
1
2
3
4
5
2
3
4
5
6
7
8
9
10
11
6
7
8
9
10
11
4
3.5
x 10
Middle East
3
2.5
2
1.5
1
0.5
0
1
Year since 1995
Figure 14: Aggregate demand evolution between 1995-2005 in three regions, Trunks, suitcases, etc.
38
Average number of destinations
2.5
2
1.5
1
Simulated
0.5
0
1
Observed
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Date since first exports
Figure 15: Simulated average size, over 5000 exporters, for East Asia, Trunks, suitcases, etc.
Average number of destinations
3
2.5
2
1.5
Observed
Simulated
1
0.5
2
4
6
8
10
12
14
Date since first exports
Figure 16: Simulated average size, over 5000 exporters, for Middle East, Trunks, suitcases,
etc.
39
12000
400
350
10000
Frequency
Frequency
300
8000
6000
4000
250
200
150
100
2000
0
50
0
2
4
6
8
10
0
12
Total discounted, and scaled by demand parameter, testing costs,
over all new exporters
0
5
10
15
20
25
Sum of total testing costs and sunk cost F, discounted
and scaled by demand parameter, over successful experimenters
15000
800
700
Frequency
Frequency
600
10000
5000
500
400
300
200
100
0
0
5
10
15
Duration of experimentation phase,
over all new exporters
0
0
2
4
6
8
10
12
14
Duration of experimentation phase,
over successful experimenters
Figure 17: Histograms of new exporters’ total discounted and scaled by E(eµ ) cost of experimentation and duration of experimentation phase, as well as total costs of entering the new
market (sum of discounted and scaled by E(eµ ) testing costs and sunk cost F ) and duration
of experimentation phase of successful experimenters. Exports of trunks, suicases, etc., by
French firms to the Middle East.
signals for each firm according to the estimated distribution of demand signals. We update
the beliefs of each exporter, and calculate their optimal response - whether they experiment
or enter or quit, and if they experiment, their optimal sample size n(p). In this way we
produce a smoothed average size evolution, without the sharp turns observed in our (much
smaller) dataset. We observe that the simulated average number of destinations in both
Middle East and East Asia approach the estimated full scale by date 15.
In Table 3, we show the estimated sunk cost estimates, as well as the scaled down
estimates (F/[p0 eµ̄ + (1 p0 )eµ ]). We scale the costs down by the demand portion that does
not appear in our measure of profitability, but still should be taken into account - since total
expected profits of a firm will be given by the product of our measure ⇡˜ and E(eµ ). Even
though the raw estimates F are quite di↵erent for the two pairs, when scaled by the demand
shift parameter, they are closer - 23.52 for Middle East and 23.42 for East Asia. Thus, we
estimate the sunk cost of entry F to be 24 times the expected (at initial beliefs) median
profit in Middle East and 23 times the expected (at initial beliefs) median profit in East
Asia.
We can also study the total cost of experimentation and the duration of experimentation
phase of new exporters. We sum the (discounted) experimentation costs over all periods for
each new exporter in Middle East, and plot the histogram of these in Figure 17. In the same
figure, we also plot the histogram of the duration of the experimentation phase, unconditional, as well as conditional on being a successful experimenter (switching to the full scale
40
at some point) and being an unsuccessful experimenter (quitting at some point). The mean
total discounted experimentation cost in Middle East is 5.22 (times the median profitability),
and 4.5 in East Asia. The mean duration of experimentation phase for successful experimenters is 2.52 in Middle East and 2.7 in East Asia, and for unsuccessful experimenters - 2
in Middle East and 2.2 in East Asia. We see that in our model both the optimal experimentation duration and the total cost of entering a new market (total experimentation cost plus
the sunk cost of entry for successful experimenters) are firm-specific and random.
Trade liberalization exercises
Uniform trade liberalization within an export region
Here we carry out the following exercise. Suppose the destination region cuts down tari↵s
for French exporters, by 20 percent. This tari↵ cut applies only to France, and assume that
France is a small exporter in this market. In that case, the aggregate variables (in particular,
the ideal price index) will not be a↵ected by the behavior of French exporters, and we can
focus only on the direct e↵ect of lower tari↵s on the profits of French exporters.
To simulate the response of French firms, we need to know how the thresholds for exporting will be a↵ected, and in particular, how many and which (in terms of productivity)
exporters will start exporting. First of all, we have the following expression for the profits earned by the firm j in the foreign market k (disregarding the time dimension and the
firm-specific demand portion eµj )
⇤
⇤
) q(prjk
)mcj
⇡jk = prj q(prjk
⇤
✏k
= (prjk ) Ak (prj mcj )
= (prj ⌧k ) ✏k Ak (prj mcj ),
where Ak incorporates all the aggregate demand variables in market k, mcj is the marginal
cost of production, prj is the price set by the firm (as a markup times the marginal cost),
⇤
and prjk
⌘ prj ⌧k , the price faced by the consumers in the foreign market, which is the price
prj multiplied by the tari↵ rate in that market, and hence consumption is a function of this
⇤
price - q(prjk
). Thus, as tari↵s fall by 20 percent, from say, 1.25⌧k to ⌧k , the profits of a firm
increase from (prj 1.25⌧k ) ✏k Ak (prj mcj ) to (prj ⌧k ) ✏k Ak (prj mcj ), that is they increase
by a factor of (1.25)✏k .
Recall that we used the normalized version of profits, ⇡
˜j ⌘
Cj
1) (1 ⇢ ) ]
exp[(✏ 1) (1
md
C
j
⇢ )
]
, where md
is the median of exp[(✏
over all exporters in the given pair, to characterize the
heterogeneity of exporters in a given sector. We locate the first percentile of this ⇡
˜j over
all exporters in a given pair (rather than the minimum, which helps us disregard outliers),
0
and call this the exporting threshold in this pair, ⇡
˜ex
. This value also implies an exporting
0
0
˜
threshold for the smoothed average productivity, ex = exp[ ln(˜⇡✏ex1md) ].
41
With trade liberalization in region k, all profits ⇡
˜j will go up by (1.25)✏k . Denote the
new profit variable by ⇡
˜j0 :
˜j (1.25)✏k .
⇡
˜j0 = ⇡
0
˜ex
will export now. That is, in
All firms with ⇡
˜j0 at or above the exporting threshold ⇡
1
terms of the old profit variable, all firms with ⇡
˜j at or above ⇡
˜ex , where
1
⌘
⇡
˜ex
0
⇡
˜ex
,
(1.25)✏k
1
0
will now export. Thus, firms with profitability ⇡˜j between ⇡
˜ex
and ⇡
˜ex
will be new
1
exporters as a result of the trade liberalization. ⇡
˜ex can also be used to deduce the new
exporting threshold for the smoothed average productivity,
0
˜1
ex
1
˜0
˜0
exp[ ln(˜⇡✏ex1md) ]
md)
ln(˜
⇡ex
ex
ex
]=
=
=
= exp[
✏ .
✏ 1
exp[ ✏ ✏ 1 ln(1.25)]
exp[ ✏ ✏ 1 ln(1.25)]
(1.25) ✏ 1
To know how many of these new exporters will be there, we need to turn to the observed
distribution of productivities in the sector concerned. In the dataset, each firm is assigned
an industrial sector (NAF 2), according to their main activity. The exporters in a particular
4-digit category (in our case, code 4202 - Trunks, suitcases, etc.) may come from di↵erent
sectors. We find that the exporters of this product category belong to 19 various sectors. Out
of these, three are dominant and comprise 20, 12 and 10 percent of all exporters, respectively,
- Clothing and furs, Chemical, pharmaceutical and synthetic products, and Leather products
and shoes. We form a single group out of all firms belonging to these three sectors.
First, we fit a Pareto distribution to the smoothed average productivities within this
combined group:
ln(1
CF ( ˜j )) = ↵(ln ˜m
ln ˜j ),
where ˜m is the minimum smoothed average productivity in the group, ↵ is the shape
parameter, and CF is the cumulative distribution. We obtain a shape parameter of 0.605,
which is less than 1, and implies an infinite mean of the distribution.
Hence, instead of relying on the Pareto estimates, we take a simpler approach. We count
how many firms within the combined group have smoothed average productivity between
the old exporting threshold, ˜0ex , and the new exporting threshold, ˜1ex . We find that the
numbers are 58 and 54, for East Asia and Middle East, respectively.
We simulate the behavior of these new exporters, relying on their post-liberalization
profits ⇡
˜ 0 , and the optimal experimentation behavior as predicted by the model. We find
that the values of ⇡
˜ 0 of all of these new exporters in all three pairs belong to the first decile
of profitability. We simulate their behavior 1000 times, and take an average of the resulting
time series (averaging over all 1000 samples, for each date since first exports), to obtain a
smoothed out pattern. It takes 18 years for new exporters in East Asia to reach the full scale
42
and 19 years for new exporters in the Middle East. These numbers are larger than what we
observed in the previous simulations for these regions, due to the fact that the new exporters
responding to a fall in import tari↵s are low productivity/low profitability producers, and
thus have lower experimentation intensities (number of export destinations), and higher full
entry thresholds on beliefs.
2.5
East Asia
2
1.5
1
0.5
0
1
2
3
4
5
6
7
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
8
9
10
11
12
13
14
15
16
17
3.5
Middle East
3
2.5
2
1.5
1
0.5
0
1
Date since trade liberalization
Figure 18: Response to a 20 % tari↵ cut in the entire region. Exports of Trunks, suitcases,
etc., by French firms to East Asia and Middle East.
Temporary trade liberalization in a single country within an export region
Suppose now that only one country within a region reduces its import tari↵ on French firms
in the given product by 80 percent, from ⌧ to 15 ⌧ . Moreover, assume that this a one-o↵
reduction, and the lower tari↵ rates apply only for one year. This requires that we assume
a hybrid model with discrete time in the stage 0 (the year of lower import tari↵s), and
continuous time afterwards (the stages 1 and 2 of the model we studied in the theoretical
part). We want to see whether there will be new exporters who will start exporting to the
(temporarily) liberalizing country, and whether this will a↵ect exports to other destinations
within the region. We ignore here the e↵ect this temporary tari↵ cut will have on the
exporters already present in the region.
We can view the problem of a new exporter to the region as follows. Since the tari↵ cut is
temporary and valid only for one period, the information value of exporting to any destination
in the region does not change. Moreover, the profit structure in all other destinations within
the region does not change, and that in the liberalizing destination reverts to the original
after one year. Hence, any firm that did not choose to export to the region before the
liberalization, will either continue not exporting there, or will export only to the liberalizing
43
12
Value function
10
8
6
Lower Bar P
4
P1
P0
2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Beliefs P
Average number of destinations
Figure 19: Example of a firm with low productivity, whose initial beliefs p0 are too low to
induce it to export to the region, but for whom the subsequent change in beliefs to p1 > p0 ,
after exporting to a single destination with temporary trade liberalization, is sufficient to
start experimenting in the region.
3
2
1
0
1
2
3
4
5
2
3
4
5
6
7
8
9
10
11
12
13
14
15
6
7
8
9
10
11
12
13
14
15
Exit rate
1
0.5
0
1
Date since the temporary fall in tariffs
Figure 20: Response to an 80 % temporary tari↵ cut in a single destination within Middle
East, exports of Trunks, suitcases, etc.
44
country in the year when its tari↵s fall, to benefit from the temporary higher profits. This
would require that the profits from exporting to the liberalizing destination, net of exporting
costs, are positive:
E0 (˜
⇡k0 )
c(1) = E0 (˜
⇡k )5✏k
c(1) > 0,
where ⇡
˜k0 are the profits from exporting to the liberalizing destination k in the year when
its import tari↵s fall, E0 is the expected value given beliefs p0 , and c(1) is the testing cost of
accessing one destination, which is the exporting cost, since the firm does not invest in fullscale entry to export to destination k for one year. To locate all new exporters, we therefore
find all firms that do not export under the original setting, but for whom E0 (˜
⇡k0 ) c(1) > 0.
We count 62 and 59 such new exporters in East Asia and Middle East, respectively.
Of course, once a new exporter observes its sales in country k, and updates its beliefs
about demand for its product in the region, it may choose to continue exporting in the region.
We illustrate this situation in Figure 19, where we show the solution to the BVP we discussed
in the theoretical part for a firm whose productivity is lower than that of exporters already
present in the market. Thus, given the prevalent beliefs p0 , this firm chose not to export in
the original setting. However, having exported to one destination, and updated its beliefs
to some p1 > p0 , high enough to induce it to experiment, it decides to continue exporting in
the region even after the tari↵s in that destination went to their original level. Note that, ex
ante, given beliefs p0 , its intention is only to export to the liberalizing country during the year
when that country decreases import tari↵s, and not to export to the region afterwards. We
plot the response of the new exporters in Figure 20, where they all start with 1 destination
in year 1 since the fall in tari↵s, and later either quit or decide to export/experiment in the
region further.
We observe that, indeed, some of the exporters stay in the region after the single year of
exporting in the liberalizing country, which leads to their subsequent expansion. It takes 14
years for the average number of destinations of these exporters (conditional on survival) to
reach the full scale in Middle East. Of course, there are only few firms that indeed stay in
the region after the year of lower tari↵s, and few firms that also survive the subsequent years.
This can be seen in the graph of exit rate - only 20 percent of those firms that exported in
the lower tari↵s period stay afterwards, and out of these 17 percent exit in the second year
(here we take into account only the exit of firms due to learning, and disregard exogenous
death).
Response to the improvement in the signal-to-noise ratio
Another exercise we can carry out using these estimates, is the response of French producers
µ̄ µ
to an improvement in the precision of information - a rise in the signal-to-noise ratio ⌘ x ,
as the standard error x falls. This will shift up the value function for any given producer, and
shift out the thresholds for entering and quitting. As a result, the threshold on productivity
for exporting will decrease, and the threshold on productivity for entering the market at
full scale right away will increase. Thus, there will be more exporters, and out of these,
45
a larger share will choose to experiment first. The graph we will plot will pertain to the
average dynamics of new exporters - the French firms that decide to start exporting (through
experimenting first) in response to the higher value of exporting. Notice that, while no other
variables change - productivities of all firms and aggregate factors remain the same, there
will be an influx of new exporters simply due to the higher value of information generated
through exporting. We calculate the number of new exporters in the same manner as we did
for trade liberalization - find the new threshold on productivity for exporting, and evaluate
the number of firms in the combined group of producers of Clothing and furs, Chemical,
pharmaceutical and synthetic products, and Leather products and shoes, whose productivity
lies between the new threshold and the old threshold. We have around 70 such exporters for
the Middle East. The average size evolution of these is shown in Figure 21. It takes 7 years
(less than the 19 years we had in the trade liberalization exercise) for the new exporters to
expand to the full scale. This is shorter than the 19 years witnessed earlier due to the faster
learning allowed by the higher signal-to-noise ratio.
Average number of destinations
3.5
3
2.5
2
1.5
1
0.5
0
1
2
3
4
5
6
7
8
9
10
Date since increase in signal−to−noise ratio
Figure 21: Response to a 20 % increase in the signal-to-noise ratio. Exports of Trunks,
suitcases, etc., by French firms to the Middle East.
8
Conclusion
We present a model with demand uncertainty and sunk costs of entry. The firm is able to
learn more about the demand for its product before incurring the sunk cost of entry, by
using a costly testing technology that allows to sample individual sales observations with
some noise, and update its beliefs about the parameter of interest. Thus, the decision for
new exporters is no longer a binary choice between entry and non-entry, but an optimal
46
control and optimal stopping problem where the scale and duration of the initial testing
phase is optimally chosen, depending on the characteristics of the firm and the market.
This is a very rich framework that produces multiple interesting comparative statics
predictions, and is applicable to various economic settings. One such application is the
study of exporters accessing new markets. Investigation of numerous international datasets
has revealed robust patterns of expansion over time of export volumes and falling exit rates
with export age of new exporters. Moreover, in some cases we observe the new exporters
switch to some stable scale after a certain period of learning. Our model can help explain such
behavior. While several other models, in particular models of learning, have been proposed
in the trade literature to accommodate these patterns, our model introduces several features
that jointly di↵erentiate it from previous formulations. These are the presence of a sunk cost
of entry that can be postponed due to the availability of a testing technology; imperfect and
costly information obtained from export sales to individual consumers, such that a higher
number of consumers accessed provides more precise signals; active learning by firms under
demand uncertainty - the recognition of the information value of exporting and of the tradeo↵ between higher testing costs and more precise information obtained from sampling more
consumers; and an endogenous choice by the firm of optimal duration and intensity of the
experimentation phase.
We present several real-life examples of test marketing carried out in domestic and export
settings. These indicate that we can apply this model to the problem of a firm that wishes
to export to a certain geographical region, with uncertain demand and a high sunk entry
cost, and can learn more about demand there by accessing a few individual destinations
in the region first. This allows us to structurally estimate the model, using French firmdestination-product-level export data between 1995-2005. We estimate the beliefs of several
hundred pairs of regions and 4-digit-code products, and test the relationship between the
beliefs of new exporters, and their export size (number of export destinations within a given
region) and exit decisions. We show estimates of all parameters of the model for exports of
trunks, suitcases, etc., by French firms to the Middle East and East Asia. We show that
simulations based on these estimates can help us track the actual evolution of exports of
trunks, suitcases, etc., in these regions.
Several further simulation exercises are carried out. We consider the e↵ects of a uniform
permanent cut in tari↵s in a given region, of a temporary cut in a single country within a
region, and of a rise in the signal-to-noise ratio in a given region, on French exporters. We
show that the presence of active learning and interaction of learning across destinations leads
to an influx of exports to all countries within the region in all cases. In particular, in the
case of a temporary cut in a single destination the new information obtained in one country
allows the firm to update beliefs about the entire region and possibly start exporting to other
countries within the region, and in the case of the improvement in the signal precision, the
higher information value of exporting, with the actual profits held constant, induces some
firms to start experimenting (exporting) in the region.
We believe this model can be applied to multiple other settings, such as the introduction
by firms of new products in domestic markets, and we plan to carry out this application
47
using marketing data in the future.
48
9
References
Ackerberg D., K. Caves and G. Frazer (2006), Structural Identification of Production Functions, mimeo, R&R Econometrica
Albornoz F., H.F.C.Pardo, G.Corcos and E.Ornelas (2009), Sequential Exporting, CEP
Discussion Papers.
Arkolakis C. (2006), Market Penetration Costs and the New Consumers Margin in International Trade, NBER Working Papers.
Bernard A.B., J.B.Jensen, S.J.Redding and P.K.Schott (2009), The Margins of US Trade,
American Economic Review, Vol. 99, No. 2, 487-93.
Bolton P. and C. Harris (1999), Strategic Experimentation, Econometrica, Vol. 67, No.
2, 349-374
De Loecker J. (2010), Product Di↵erentiation, Multi-Product Firms and Estimating the
Impact of Trade Liberalization on Productivity, forthcoming, Econometrica
Eaton J., S. Kortum, and F. Kramarz (2008), An Anatomy of International Trade: Evidence from French Firms, NBER Working Papers.
Eaton J., M.Eslava, C.J.Krizan, M.Kugler and J.Tybout (2007), Export Dynamics in
Colombia: Firm-Level Evidence, Working Paper, BANCO DE LA REPBLICA.
Eaton J., M.Eslava, C.J.Krizan, M.Kugler and J.Tybout (2008), A Search and Learning
Model of Export Dynamics, Working Paper.
European Commission (2008), Guide to Cost Benefit Analysis of Investment Projects,
July.
Ferber R., (Ed., 1974), Handbook of Marketing Research, New York, McGraw-Hill book
company.
Freund C. and M.D.Pierola (2008), Export Entrepreneurs: Evidence from Peru, Policy
Research Working Paper Series, World Bank.
Hitsch G.J. (2005), An Empirical Model of Optimal Dynamic Product Launch and Exit
Under Demand Uncertainty, Marketing Science, Vol. 25, No.1, January-February 2006,
pp.25-50.
Klompmaker J.E., G. D.Hughes, and R. I. Haley (1976), Test Marketing in New Product
Development, Harvard Business Review, May.
Levinsohn J. and A.Petrin (2003), Estimating Production Functions Using Inputs to Control for Unobservables, Review of Economic Studies, 317-342
Melitz M. (2003), The Impact of Trade on Intra-Industry Reallocations and Aggregate
Industry Productivity, Econometrica, Vol. 71, No. 6., 1695-1725
Moscarini G. and L.Smith (2001), The Optimal Level of Experimentation, Econometrica,
Vol. 69 (Nov., 2001), No. 6, pp. 1629-1644.
Moscarini G. and L.Smith (1998), Wald Revisited: The Optimal Level of Experimentation,
Working Paper.
Nguyen D.X. (2008), Demand Uncertainty: Exporting Delays and Exporting Failures,
Discussion Paper, Dept. of Economics, University of Copenhagen.
Olley S. and A.Pakes (1996) The Dynamics of Productivity in the Telecommunications
Equipment Industry, Econometrica 64, 1263-1295
49
Rauch J.E. and J.Watson (2003), Starting Small in an Unfamiliar Environment, International Journal of Industrial Organization, Vol. 21(7), pp. 1021-1042.
Ruhl K.J. and J.L.Willis (2008), New Exporter Dynamics, Working Paper, NYU Stern
School of Business.
Segura-Cayuela R. and J.M.Vilarrubia (2008), Uncertainty and Entry into Export Markets, Banco de Espana Working Papers.
50
10
10.1
Appendix
Interpreting the assumptions about the demand signals of
the firms
The firm j does not observe the precise values of quantities sold, i.e. it does not observe
the precise value of µ. Instead, it receives an imperfect signal from an individual consumer
indexed by k:
k
Xjt
⌘
Z t
0
k
qjs
ln(
y(hj ) ✏ P ✏
1
)ds +
k
x Wjt
=
Z t
0
µj ds +
k
x Wjt ,
so that
k
dXjt
= µj dt +
k
x dWjt ,
where Wjtk is a Wiener process. We now assume that the firm (and the researcher) observe
the signals only at discrete time intervals, t 2 0, 1, 2, 3, 4, .... We can approximate the signal
process as
k
Xjt
⌘ ln(
and with
1
) t+
k
k
⌘ ↵jt
, and
Xjt
t = 1, and denoting
k
k
↵jt
= ln qjt
k
qjt
y(hj ) ✏ P ✏
ln y + ✏hj
x
Wjtk ,
k
Wjtk ⌘ ⌘jt
⇠ N (0, 1), we get
(✏
1) ln P +
k
x ⌘jt .
Now, our assumption will be that the way this happens is that the firm observes not the
k
true quantities (ln qjt
), but noise-contaminated ones:
k
k
ln˜qjt
⌘ ln qjt
+
k
x ⌘jt ,
so that
k
ln˜qjt
ln y + ✏hj
(✏
1) ln P = µj +
k
x ⌘jt
k
⌘ ↵jt
,
k
where ↵jt
⇠ N (µj , x2 ). Therefore, we can interpret the assumptions made about the
demand signals as saying that in discrete time the firm observes log quantities sold with
some normally distributed error.
10.2
Deriving the evolution of beliefs
Here we derive the evolution of beliefs of the firm pjt , given the stochastic process for the
firm’s observations. This proof follows closely the proof in Bolton and Harris (1999), Lemma
51
1. The firm j samples n observations, indexed by k, each of which follows the independent
process
k
dXjt
= µj dt +
k
x dWjt ,
where Wjtk is a Wiener process. Denoting by p(t) the probability that µj = µ̄, given all
information up to time t, as before, and applying Bayes’ rule,
p(t + dt) =
=
=
k n
P rob([dXjt
]k=1 , µj = µ̄)
k n
P rob([dXjt
]k=1 )
k n
P rob([dXjt
]k=1 |µj = µ̄)P rob(µj = µ̄)
k n
k n
P rob([dXjt ]k=1 |µj = µ̄)P rob(µj = µ̄) + P rob([dXjt
]k=1 |µj = µ)P rob(µj = µ)
p(t)F̃ (µ̄)
,
p(t)F̃ (µ̄) + (1 p(t))F̃ (µ)
where
k n
]k=1 |µj = µ)
F̃ (µ) ⌘ P rob([dXjt
= P rob(dWjtk =
n
Y
=
k=1
q
k
dXjt
µj dt
x
|µj = µ)
k
1 dXjt
(
exp[
2dt
(2⇡dt)
µdt
1
N
2
= (2⇡dt)
k
n
dXjt
1 X
exp[
(
2dt k=1
)2 ]
x
µdt
)2 ],
x
since dWjtk ⇠ N (0, dt).
Drop the j and t indices in what follows (with the exception of µj , which still denotes
the µ that firm j holds), to simplify notation. The change in beliefs is
dp = p(t + dt)
p(t) =
p(1
p)(F̃ (µ̄)
pF̃ (µ̄) + (1
F̃ (µ))
p)F̃ (µ)
,
and since
F̃ (µ) = (2⇡dt)
= (2⇡dt)
N
2
N
2
1
exp[
2dt
exp[
n
X
k=1
(
n
X
2
x k=1
((dXk )2
(dXk )2 dXk µ
+
2
2dt x2
x
52
2dXk µdt + µ2 (dt)2 )]
µ2 dt
)],
2 x2
we have
p(1
dp =
N
2
(2⇡dt)
p(1
=
N
2
p)(2⇡dt)
exp[
Pn
Pn
k=1 (
(dXk )2
)](F̂ (µ̄)
2dt x2
(dXk )2
)](pF̂ (µ̄)
2dt x2
k=1 (
p)(F̂ (µ̄)
pF̂ (µ̄) + (1
exp[
F̂ (µ))
p)F̂ (µ)
+ (1
F̂ (µ))
p)F̂ (µ))
,
where
F̂ (µ) ⌘ exp[
n
X
(
dXk µ
k=1
2
x
µ2 dt
)].
2 x2
Under the Maclaurin expansion of the exponent,
F̂ (µ) = 1 +
= 1+
n
X
k=1
n
X
k=1
n
X
(
n
µ2 dt
1 X
dXk µ
[
)
+
(
2
2 x2
2 k=1
x
dXk µ
2
x
µ2 dt 2
)] + ...
2 x2
n
dXk µ X
µ2 dt
2
2
x k=1 2 x
k=1
n
X
n
µ2 dt 1 X
dXk µ 2
[
+
]
2
2 k=1 x2
k=1 2 x
dXk µ
2
x
n
X
1
µ2 dt 2
+ [
] + ...
2 k=1 2 x2
= 1+
n
X
dXk µ
n
X
dXk µ
k=1
n
X
n
X
n
n X
dXk µ 2 X
µ2 dt 1 X
dXk µ dXi µ
+
[
]
+
2
2
2
2
2 k=1
x
x
x
k=1 2 x
k=1 i<k
2
x
dXk µ µ2 dt 1 2 µ2 dt 2
n 2 + n [ 2 ] + ...
2
2 x
2
2 x
x
k=1
= 1+
= 1+
k=1
n
X
k=1
n
X
n
µ2 dt X
µ2 dt
+
2
2
k=1 2 x
k=1 2 x
2
x
dXk µ
2
x
,
since
(dXk )2 = µ2 (dt)2
2µdtdWk
x
+
2
2
x (dWk )
=
2
x dt,
3
we ignore everywhere terms of order dt 2 and higher, and
dXk dXi = µ2 (dt)2 + µdt x dWk + µdt x dWi +
since dWk and dWi are independent.
We get
53
2
x dWk dWi
= 0,
dp =
where m̃(p) ⌘
Since
(1 + m̃(p)
n
X
pµ̄+(1 p)µ
x
dXk
)(1
p(1
1+
m̃(p)
n
X
dXk
Pn
dXk µ̄ µ
k=1 x
x
P
m̃(p) nk=1 dXxk
p)
.
x
k=1
p)
1+
⌘
Pn
µ̄ µ
k=1 dXk x2
pµ̄+(1 p)µ Pn
2
k=1 dXk
x
p(1
) = 1
(m̃(p))2
= 1
2
n
n X
X
1 X
2
(
(dX
)
+
dXk dXi )
k
2
x
k=1
,
x k=1
k=1 i<k
(m̃(p)) ndt,
and
n
X
dXk
n
X
(1 + m̃(p)
x
k=1
dXk
)(1
x
k=1
=
n
X
=
dXk
(m̃(p))2 n
1
x
k=1
n
X
dXk
dXk
)
x
k=1
(m̃(p))2 ndt)
(1
x
k=1
n
X
m̃(p)
n
X
(µ(dt)2 +
x dtdWk )
x k=1
=
n
X
dXk
,
x
k=1
we can write
Pn
k=1
1 + m̃(p)
dXk
x
Pn
k=1
dXk
=
x
=
Pn
k=1
n
X
dXk
x
(1 + m̃(p)
Pn
dXk
k=1
x
Pn
1 + m̃(p)
dXk
(1
m̃(p)
x
k=1
n
X
k=1
)(1
k=1
dXk
m̃(p)
dXk
x
),
x
and
dp = p(1
p)
µ̄
x
= p(1
p)
µ̄
n
µX
dXk
µ
x
k=1
n
X
(
k=1
(1
m̃(p)
x
dXk
x
54
n
X
dXk
k=1
n
X
k=1
m̃(p)dt)
x
)
Pn
k=1
dXk
x
)
= p(1
= p(1
p)
p)
µ̄
µ̄
µ n
x
x
µp
(
Pn
k=1
dXk
(pµ̄ + (1
n
p)µ)dt)
ndW̃ ,
x
where
p Pn
n k=1 dXk
(
dW̃ ⌘
n
x
(pµ̄ + (1
p)µ)dt) =
p
n
(dX
(pµ̄ + (1
p)µ)dt).
x
The sum of several independent Wiener processes is also a Wiener process. Since
dXk = µj dt +
x dWk ,
we can express
dW̃ ⌘ mW̃ dt +
W̃ dWt ,
and find the drift and variance in the following way.
First, the drift term is given simply by the sum of all the drift terms in dW̃ :
p n
p
n X µj dt
n
mW̃ =
(pµ̄ + (1 p)µ)dt) =
(
(µj (pµ̄ + (1 p)µ)dt),
x k=1 n
x
and the variance as
W̃
n nV ar(dWt )
n n x2 dt
=
= dt.
2
2
2
n2
x
x n
=
Thus,
dW̃t =
p
n
(µj
(pt µ̄ + (1
pt )µ))dt + dWt ,
x
and
E(dW̃t |It ) = 0,
W̃ has zero mean (conditional on information up to time t), i.e. W̃ is an observationadapted Wiener innovation process.
10.3
Comparing our problem with that of Moscarini and Smith
(2001)
The problem formulated in Moscarini and Smith (2001) concerns the choice between two
terminal actions, A and B, that have di↵erent payo↵s in two possible states of the world
(✓ = H, L): if p denotes the probability of high state ✓ = H,
55
⇡A (p) = p⇡AH + (1
p)⇡AL ,
⇡B (p) = p⇡BH + (1
p)⇡BL ,
where ⇡BH > ⇡AH , ⇡BH > ⇡BL , and ⇡BL < ⇡AL .
The cost of experimentation is c(n), twice di↵erentiable on (0, 1), increasing and strictly
convex on [0, 1). The main results in the paper require that either c(0) > 0 or ⇡A (p) > 0.
Since in our case ⇡A (p) ⌘ 0, assume that c(0) > 0. They also assume that limn!1 g(n) >
r max✓,a ⇡a✓ .
There is an observation process [x̄t ], which is a Brownian motion with constant uncertain
drift µ✓ in state ✓, µH = µL = µ > 0. Thus, x̄t follows
dx̄✓t = µ✓ dt + p dWt ,
nt
in state ✓, where Wt is a Wiener process, and nt is the level or intensity of experimentation.
Denoting by pt the probability that ✓ = H, given all information up to time t,
p t = p0 +
p
where dW̄s ⌘ ns (dx̄s
vation process, and ⇣ ⌘ µ
V (p0 ) = sup E[
T,(nt )
Z t
0
p
ps )⇣ ns dW̄s ,
ps (1
[ps µ
( µ)
+ (1 ps )( µ)]ds), the observation-adapted Wiener inno. Then the problem of the agent can be expressed as
Z T
0
c(n)e
rt
dt + e
rT
⇡(p0 +
Z T
0
pt (1
p
pt )⇣ nt dW̄t )|p0 ],
i.e. the agent has to set the stopping time T , and the experimentation schedule (nt ), t =
0, ..., T .
The Hamilton-Jacobi-Bellman equation for the control problem is
rv(p) = sup( c(n) + n
p2 (1
p)2 ⇣ 2
2
n 0
v 00 (p)),
plus the value matching condition:
v(p) = p⇡AH + (1
p)⇡AL ,
v(p̄) = p̄⇡BH + (1
p̄)⇡BL .
The ‘generalized Stefan’ ODE problem for the optimal stopping problem given the control
policy n(p) is
rv(p) =
c(n(p)) + n(p)
56
p2 (1
p)2 ⇣ 2
2
v 00 (p),
or substituting c0 (n) =
p2 (1 p)2 ⇣ 2 00
v (p)
2
(from the FOC),
v 00 (p) =
c0 (z(rv(p)))
p2 (1 p)2 ⇣ 2
2
,
where z ⌘ g 1 , and g(n) ⌘ nc0 (n) c(n),
plus the value-matching conditions
v(p) = p⇡AH + (1
p)⇡AL ,
v(p̄) = p̄⇡BH + (1
p̄)⇡BL ,
and the smooth pasting conditions
v 0 (p) = ⇡AH
v 0 (p̄) = ⇡BH
⇡AL ,
⇡BL .
Compare this problem with the one we have in our case:
00
v (p) =
c0 (z(rv(p)))
(D1 p + D2 (1
p2 (1
p))
p)2 ⇣ 2
2
where z ⌘ g
1
as before, ⇣ ⌘
µ̄ µ
x
,
D1 ⌘ AG
✏ 1 µ̄
j e ,
D2 ⌘ AG
✏ 1 µ
j e ,
p)˜
⇡AL ,
v(p̄) = p̄˜
⇡BH + (1
plus the value-matching conditions
v(p) = p˜
⇡AH + (1
and the smooth pasting conditions
v 0 (p) = ⇡
˜AH
v 0 (p̄) = ⇡
˜BH
⇡
˜AL ,
⇡
˜BL ,
where
⇡
˜BH ⌘
M
( f + AG( j )✏ 1 eµ̄ )
r
F,
⇡
˜BL ⌘
M
( f + AG( j )✏ 1 eµ )
r
F,
⇡
˜AH ⌘ 0,
⇡
˜AL ⌘ 0.
57
p̄)˜
⇡BL ,
Notice that the two problems are extremely similar, with one di↵erence being the additional linear term (D1 p + D2 (1 p)) appearing in the nominator of the di↵erential equation.
The presence of this term does not a↵ect the derivation of the main results in Moscarini
and Smith (2001), in particular those of the existence and uniqueness of a solution to the
problem of the firm. The only way in which it does a↵ect the problem, is that now we have
to make sure that the nominator c0 (z(rv(p))) (D1 p + D2 (1 p)) is positive for all p, since
we need v 00 (p) > 0 for all p 2 [p, p̄]. In the main text, we made the assumption
c0 (z(0)) > AG
✏ 1 µ̄
j e
AG
✏ 1
µ̄
j [pe
+ (1
p)eµ ],
+ (1
p)eµ ],
8p.
which implies that
c0 (z(rv(p))) > AG
✏ 1
µ̄
j [pe
This is a sufficient condition for v 00 (p) > 0. However, even some c(n) that do not satisfy
this condition may still allow for the existence and uniqueness of a solution, if c(n) is such
that
c0 (z(rv(p))) > AG
✏ 1
µ̄
j [pe
+ (1
p)eµ ],
8p 2 [p, p̄].
˜AL ⌘
Another condition that we have to impose is c(0) > 0, since in our problem ⇡
˜AH ⌘ 0, ⇡
0. Otherwise, as pointed out in Moscarini and Smith (2001), Appendix.B.4, boundary problems may arise: as p approaches p from above, n(p) goes to 0, and p may not be reached in
finite time with positive probability.
Given these two assumptions we impose (c(0) > 0, and c0 (z(rv(p))) > AG ✏j 1 [peµ̄ + (1
p)eµ ], 8p 2 [p, p̄]), all the main proofs of Moscarini and Smith (2001) apply directly to our
problem. The only results that need modification are the comparative statics results. Below
we first show how introducing the profits in the experimentation phase (the linear term in
the ODE nominator), a↵ects the value function and the optimal experimentation behavior,
and then re-derive the comparative statics results for our case.
10.3.1
The e↵ect of including profits in the experimentation phase
To study the e↵ect of including profits in the testing phase in the value function on the
behavior of the solution, modify the value function by introducing an indicator
in front
of the testing phase profits, which is assumed to be continuous, and range from 0 to 1. If
is 0, testing profits do not enter the value function, if it is 1, they do so fully.
V (pt ) = sup Et [e
T,<ns >
rT
⇡(pT ) +
Z T
0
( c(ns ) +
ns A(ps eµ̄ + (1
where A ⌘ y ✏ 1 1 [ ✏ ✏ 1 ] ✏ [ wj ]✏ 1 P ✏ 1 .
Take the derivative of the value function with respect to
V (pt ) = Et [
Z T⇤
0
n⇤s A(ps eµ̄ + (1
58
ps )eµ ))e
ps )eµ ))e
rs
ds],
, using the envelope theorem:
rs
ds] > 0,
where T ⇤ and n⇤s , s = t, ..., T ⇤ are the optimal values of the stopping time and sample
size, respectively. Hence, the value function increases for all pt (shifts up) when goes from
0 to 1. This implies, by Proposition 1 (Monotonicity in values) in Moscarini and Smith
(2001), that in the case with testing profits in the value function, compared with the case
without, n(pt ) = z(rv(pt )) goes up for all pt . Also, since the terminal payo↵ does not change
in any way, and v(p) is increasing and convex, and given the value matching and smooth
pasting conditions, the threshold p̄ goes up and the threshold p goes down. Hence, the firm
experiments more intensively and the thresholds for it to take a terminal decision expand,
when we allow the firm to gather the profits in the experimentation phase.
10.3.2
Comparative Statics Results
Proposition 1. As any of y,P , j increases or w decreases, the firm will experiment more
intensively, for any given value of beliefs, and the thresholds for quitting and for entering the
market at full scale decrease, if the optimal experimentation intensity is sufficiently bounded
away from the full scale M . Proof. Consider the comparative statics with respect to
A ⌘ y ✏ 1 1 [ ✏ ✏ 1 ] ✏ [ wj ]✏ 1 P ✏ 1 , which will e↵ectively give us the comparative statics with respect
to , P , w and y. The derivative of the value function (with set to 1) with respect to A,
using the envelope theorem, is
VA (pt |
= 1) = Et [
Z T⇤
0
n⇤s (ps eµ̄ + (1
ps )eµ )e
+P rob(pT ⇤ = p̄|pt , n⇤ , T ⇤ )Et [e
M
⇤ (p̄eµ̄ + (1 p̄)eµ )) > 0,
r
rs
ds]
rT ⇤
|pT = p̄, pt ] ⇤
since ⇡(pT ) ⌘ max[0, Mr (pT (eµ̄ A f ) + (1 pT )(eµ A f )) F ].
Hence, we already know that the value function increases for all pt as A rises, and since
n(pt ) is increasing in v(pt ), so does the experimentation intensity n for each p.
Let A1 > A0 , and denote the corresponding thresholds for quitting as p1 and p0 , and for
entering as p̄1 and p̄0 , respectively. The value function of the problem for A = A1 , evaluated
at p0 is, by linear approximation,
V (p0 |A1 ) ' V (p0 |A0 ) + VA (p0 |A0 )(A1
A0 ) > 0,
by value matching (V (p0 |A0 ) = 0), and VA (p) > 0. Hence, we get p1 < p0 , since we
require V (p1 |A1 ) = 0, and V (p) is increasing in p . That is, the threshold for quitting goes
down as A increases.
Next, we want to see what happens to the threshold for entering, p̄. Consider
Vp (p̄0 |A1 ) ' Vp (p̄0 |A0 ) + VpA (p̄0 |A0 )(A1 A0 )
A(eµ̄ eµ )
= M
+ VpA (p̄0 |A0 )(A1 A0 ).
r
59
µ
µ̄
If VpA (p̄0 |A0 ) > 0, then Vp (p̄0 |A1 ) > M A(e r e ) , and since the smooth pasting condition
µ
µ̄
has to be satisfied by the new solution (Vp (p̄1 |A1 ) = M A(e r e ) ), and by convexity of the
value function V (p), we will get p̄1 < p̄0 . To check if VpA (p̄0 |A0 ) > 0:
VpA (p) = VAp (p)
Z T⇤
dn⇤s
(ps eµ̄ + (1 ps )eµ ))e rs ds]
dp
0
t
⇤
⇤ dEt (T )
+Et [n⇤T ⇤ (pT ⇤ eµ̄ + (1 pT ⇤ )eµ )e rT ]
dpt
dP rob(pT = p̄|pt )
M
⇤
+
Et [e rT |pT = p̄] ⇤ (p̄eµ̄ + (1 p̄)eµ )
dpt
r
rT ⇤
|pT = p̄] M
dEt [e
⇤ (p̄eµ̄ + (1 p̄)eµ ).
+P rob(pT = p̄|pt )
dpt
r
= Et [
(n⇤s (eµ̄
eµ ) +
⇤
t (T )
For pt small enough and close to 0, dEdp
> 0. This follows from the behavior of the
t
expected remaining time until stopping, E[T |pt ], which according to Moscarini and Smith
(1998, working paper), obeys the boundary conditions ⌧ (p) = ⌧ (p̄) = 0, and the ODE
1=0+
p2 (1
p)2 ⇣ 2
2
n(p)⌧ 00 (p),
i.e. ⌧ 00 (p) < 0, and ⌧ (p) is hill-shaped. If pt is close to 1, so that
dEt (T ⇤ )
dpt
< 0, we still have
dP rob(pT = p̄|pt )
> 0,
dpt
and
dEt [e
rT ⇤
|pT = p̄]
> 0.
dpt
So if the last two terms in the expression for the cross-derivative dominate the second
⇤
⇤
t (T )
term (Et [n⇤T ⇤ (pT ⇤ eµ̄ + (1 pT ⇤ )eµ )e rT ] dEdp
) in absolute value, then the cross-derivative
t
is still positive. In particular, if the ultimate payo↵ from entering the market full-scale
⇤
M
(p̄eµ̄ + (1 p̄)eµ ) is large enough compared to Et [n⇤T ⇤ (pT ⇤ eµ̄ + (1 pT ⇤ )eµ )e rT ], then we
r
can obtain a positive cross-derivative. That is, if the optimal experimentation intensity is
sufficiently bounded above, so that n⇤T ⇤ is relatively small compared to the entire market
size M , then the threshold for entering the market p̄ falls as A increases. This is intuitive:
suppose the opposite holds, and the optimal testing intensity is large. Since A is a measure
of profits from an average consumer in the market, if the optimal testing sample size is large
and approaches the entire market size, then the firm will have little incentive to rush towards
a terminal decision to enter the market, and even less so as A rises, so that the profits in the
testing phase will dominate the pure testing costs (c(n)). Hence, as A rises, the threshold
60
for entering will in fact rise, rather than fall.
Proposition 2. As F rises, so that the final payo↵ to entry falls, the firm will experiment
less intensively, for any given value of beliefs, and the thresholds for quitting and for entering
the market at full scale will increase.
Proof.
The derivative of the value function with respect to F , using the envelope theorem, is
VF (pt ) =
P rob(pT ⇤ = p̄|pt , n⇤ , T ⇤ )Et [e
rT ⇤
|pT = p̄, pt ] < 0,
since ⇡(pT ) ⌘ max[0, Mr (pT (eµ̄ A f ) + (1 pT )(eµ A f )) F ].
Hence, the value function clearly shifts down as F increases. Since n(p) ⌘ z(rv(p)), and
z is increasing, we know that n(p) also shifts down. Consider now two values of F , F1 > F0 ,
and denote the corresponding thresholds for quitting as p1 and p0 , and for entering as p̄1 and
p̄0 , respectively. The value function of the problem for F = F1 , evaluated at p0 is, by linear
approximation,
V (p0 |F1 ) ' V (p0 |F0 ) + VF (p0 |F0 )(F1
F0 ) < 0,
by value matching (V (p0 |F0 ) = 0), and VF (p) < 0. Hence, we get p1 > p0 , since we
require V (p1 |F1 ) = 0, and V (p) is increasing in p . That is, the threshold for quitting goes
down as F increases.
Next, we want to see what happens to the threshold for entering, p̄. Consider
Vp (p̄0 |F1 ) ' Vp (p̄0 |F0 ) + VpF (p̄0 |F0 )(F1 F0 )
A(eµ̄ eµ )
= M
+ VF p (p̄0 |F0 )(F1 F0 )
r
A(eµ̄ eµ ) dP rob(pT ⇤ = p̄|pt , n⇤ , T ⇤ )
⇤
Et [e rT |pT = p̄, pt ]
= M
r
dpt
rT ⇤
|pT = p̄, pt ]
⇤
⇤ dEt [e
.
P rob(pT ⇤ = p̄|pt , n , T )
dpt
We have
dP rob(pT ⇤ = p̄|pt , n⇤ , T ⇤ )
> 0,
dpt
and
dEt [e
rT ⇤
|pT = p̄, pt ]
> 0.
dpt
Hence,
61
Vp (p̄0 |F1 ) < M
µ̄
eµ )
A(eµ̄
r
,
µ
and since we need Vp (p̄1 |F1 ) = M A(e r e ) , and V (p) is convex, we know that p̄1 > p̄0 , i.e.
the threshold for entering the market at full scale p̄ increases as F increases.
Proposition 3. As x falls, so that the signal-to-noise ratio increases, the firm will experiment more intensively, for any given value of beliefs, and the thresholds for quitting and for
entering the market at full scale shift out.
µ̄ µ
Proof. As x falls, ⇣ ⌘ x increases. Now apply Claim 13 in Moscarini and Smith
(1998, working paper), which (adjusted to our case, which can be done, since the proof does
not change as a result), states the following:
Parametrize models by 1 , 2 , where ⌘ [ci (.), ⇣i (.), ri ]. Let [vi , pi , p̄i ] denote the corre0
(D1 p+D2 (1 p))
, where D1 , D2 are as defined
sponding solutions, with i (p, vi (p)) ⌘ c (z(rv(p)))
p2 (1 p)2 ⇣ 2
2
above. If
v2
v1 =)
2 (p, v2 )
>
1 (p, v1 )
independently of p, then p1 < p2 , p̄1 > p̄2 , and v1 (p) > v2 (p) for all p 2 [p2 , p̄2 ].
This condition holds: if ⇣1 > ⇣2 , then v2
v1 implies 2 (p, v2 ) > 1 (p, v1 ), since the
denominator of 1 (p, v1 ) is larger and the nominator is smaller relative to those of 2 (p, v2 ).
Hence, the value function shifts up, and the thresholds for quitting and entering shift out,
as x increases. Since the function c(n), and thus z as well, is unchanged, n(p) ⌘ z(rv(p))
shifts up.
Proposition 4. As the testing cost function c(n) grows more convex and initially (for n close
to 0) weakly higher and steeper (i.e. c(n) is replaced by ĉ(n), where ĉ(0) = c(0), ĉ0 (0) c0 (0),
and ĉ00 (n) c00 (n) for all n), the value function v(p) rises for all p, so that optimal n rises
for all p, and the thresholds shift out.
Proof. Here we replicate the proof from Moscarini and Smith (2001, Proposition 5
(c)). Define ⇠(w) ⌘ c0 (z(w)), c(n) ⌘ ĉ(n) c(n), and similarly c0 , c00 , g, f , ⇠.
Convexity of c (since ĉ00 c00
0) and c(0) = 0 imply c0 (n) > c(n)
for all n > 0.
n
Equivalently, g(n) > 0, and thus z(w) < 0 for all w > 0, and f (0) 0 by continuity.
1
Since ⇠ 0 (w) = z(w)
, we have ⇠ 0 (w) > 0 for all w > 0 and ⇠ 0 (0)
0. If we prove that
⇠(0) 0, then ⇠(w) > 0 for all w > 0 and then (7) holds, yielding: v̂ v =) ˆ (p, v̂) =
ˆ
⇠(rv̂)
(D1 p+D2 (1 p))
p2 (1 p)2 ⇣ 2
2
⇠(rv) (D1 p+D2 (1 p))
p2 (1 p)2 ⇣ 2
2
=
(p, v) independently of p, and thus v > v̂.
Focus on ⇠(0) 0. A formal proof is long, but in (n, c)-space, draw the cost function
c(n) and the ray through the origin tangent to it. The unique tangency point v is where
vc0 (v) = c(v), i.e. g(v) = 0 or v = z(0); the ray has the same slope as the function at the
tangency point, namely c0 (z(0)) = ⇠(0). Then draw the cost function ĉ(n) and the tangent
62
ˆ
ray through the origin, with slope ⇠(0):
The ray tangent to ĉ(n) is steeper than that to c(n),
i.e. ⇠(0) > 0. Finally, n > n̂, as v > v̂ and f > fˆ from above.
10.4
Estimation Details
10.4.1
Showing that a shift in profits results in a proportional shift in estimated
costs
One important question is how a shift in the profits of the firms in the dataset a↵ects the
estimates of costs, the c(n) schedule, the sunk entry cost F and the fixed cost of exporting f .
That is, suppose we changed the units of account, and instead of units, expressed all profits
in thousands of euros, or expressed all profits in dollars instead of euros. It is easy to show
that this would shift all the costs proportionally - by the same proportionality factor.
To show this, we show that multiplying all profits and all the costs by the same factor
results in the multiplication of the value function by and no change in the thresholds
p̄, p, and the optimal testing schedule n(p). Given original costs c(n), F, f , and profits
⇡
˜j , j = 1, ..., J, multiply all these by :
ĉ(n) ⌘ c(n) , F̂ ⌘ F
˜j , j = 1, ..., J.
fˆ ⌘ f , ⇡
˜ˆj ⌘ ⇡
Recall that the solution to the original problem for a fixed firm j is given by n(p) =
z(rv(p)), where z ⌘ g 1 , g(n) = nc0 (n) c(n), and z is strictly increasing, and v(p) is the
solution of the two-point free boundary value problem
v 00 (p) =
c0 (z(rv(p)))
1
(p(1
2
⇡
˜j [peµ̄ + (1
p)
p)eµ ]
µ̄ µ 2
)
x
plus the value matching condition:
v(p̄) = Ṽ (p̄)
F, v(p) = 0.
and smooth pasting condition:
v 0 (p̄) = Ṽ 0 (p̄), v 0 (p) = 0.
Now, since ĉ(n) ⌘ c(n) , ĉ0 (n) = c0 (n) , and ĝ(n) = g(n) . Hence, ẑ(x) ⌘ z( x ),
n̂(p) = ẑ(rv̂(p)) = z( rv̂(p) ). Also, the value function in the second stage, Ṽˆ (p) is now given
by
M
Ṽˆ (p) =
( fˆ + ⇡
˜j [peµ̄ + (1
r
p)eµ ])
=
63
Ṽ (p).
Now by simple substitution we can show that
v̂(p) ⌘ v(p)
satisfies the new BVP:
v̂ 00 (p) =
ĉ0 (ẑ(rv̂(p)))
⇡
˜j [peµ̄ + (1
1
(p(1
2
p)
p)eµ ]
µ̄ µ 2
)
x
plus the value matching condition:
v̂(p̄) = Ṽˆ (p̄)
F̂ , v̂(p) = 0.
and smooth pasting condition:
0
v̂ 0 (p̄) = Ṽˆ (p̄), v̂ 0 (p) = 0.
Thus, proportionally shifting both the profits and the cost parameters results in the same
experimentation behavior, and therefore, will produce the same values of likelihood, given
data, as the original profits and cost parameters. That is, we only need to multiply by
all the values we generate originally as draws from the posterior of these cost parameters to
obtain the draws from the new posterior. This observation allows us to carry out estimation
for normalized profits, and later rescale to match the monetary values in the dataset.
10.4.2
Estimating demand parameters
We would like to evaluate the posteriors of the parameters ⇥1 ⌘ {µ̄, µ, x , p0 }, as well as
k
{At1 , At2 , t = 1, ..., T } and ⇢1 , ⇢2 , D1 , D2 , CA1 , CA2 , ⌦, conditional on {vjt
}. It can be seen
k
from the equations above that the unobserved ↵jt and At1 , At2 can be identified up to a
constant, and therefore we normalize µ = 0. Denote Hj = I(µj = µ̄), that is, Hj is 1 if firm
j holds a high µ, and 0 otherwise, and H̃ ⌘ {Hj , j = 1, ..., J}. H̃ will be estimated along
with the main unknown parameters.
We set the following priors:
• µ̄ ⇠ N (mh ,
•
1
x
h ),
⇠ Gamma(↵x ,
• p0 ⇠ Beta(↵p ,
x ),
p ).
• Impose flat priors on CA1 , CA2 .
• Impose flat priors, bounded by 0 and 1, on ⇢1 , ⇢2 .
• Impose flat priors on D1 , D2 .
64
• Impose a flat prior on the coefficients of the matrix ⌦, the covariance of the error terms
⌫t1 , ⌫t2 .
k
}, is proportional to the product of
The joint posterior of these parameters, given {vjt
their priors and their joint likelihood:
k
x , p0 , ⇢1 , ⇢2 , CA1 , CA2 , D1 , D2 , ⌦|vjt )1
f (µ̄,
1f (µ̄)f (
k
⇤f (vjt
|µ̄,
1
)f (p0 )f (⇢1 , ⇢2 )f (CA1 , CA2 )f (D1 , D2 )f (⌦)⇤
x
x , At1 , At2 , D1 , D2 , H̃)f (H̃|p0 )f (At1 , At2 |⇢1 , ⇢2 , CA1 , CA2 ).
˜1
Since the joint posterior is quite complicated, we apply Gibbs sampling. Denote by ⇥
par
˜1
˜ 1 without the parameter
the set {⇥1 , ⇢1 , ⇢2 , CA1 , CA2 , D1 , D2 , ⌦}, and by ⇥
the set ⇥
par.
The conditional distributions of the main parameters of interest are as follows:
˜ 1 µ̄ , {v k }, {At1 , At2 }, H̃) is a normal with mean
• f (µ̄|⇥
jt
mh
¯h h
x +Nh ↵
x +Nh h
and variance
P
j:H =1
x h
x +Nh
P P
k
h
,
↵
k jt
t
j
k
values of all high-µ firms and ↵
¯h ⌘
is
where Nh is the number of ↵jt
Nh
k
k
k
. Given vjt
, D1 , D2 and At1 , At2 , t = 1, .., T , ↵jt
is calculated as
the average of these ↵jt
k
vjt
D1 At1
D2 At2 .
P
k
˜ 1 x , v k , {At1 , At2 }, H̃) is Gamma with hyperparameters ↵x + Nx and x + j,t,k (ejt
• f ( x |⇥
jt
2
2
k
k
where ekjt = ↵jt
µ̄ if Hj = 1 and ekjt = ↵jt
µ if Hj = 0, ē is the average and Nx is
the number of these residuals.
1
˜ 1 p0 , v k , {At1 , At2 }, H̃) is Beta with hyperparameters ↵p + N and p + N
• f (p0 |⇥
jt
h̃
where Nh̃ is the number of high-µ firms and N is the total number of firms.
Nh̃ ,
˜ 1 , v k , {At1 , At2 }). To
• The conditional distribution of Hj can be written as P (Hj = 1|⇥
jt
calculate this, we apply the discrete analog of the Bayesian updating equation used in
the theoretical part:
P rob(µj = µ̄) ⌘ Pj =
( ↵¯pj x µ̄ )P0
nj
(
↵
¯ j µ̄
px
nj
)P0 + (
↵
¯j µ
px
nj
)(1
P0 )
,
k
where ↵
¯ j is the average over all t and k of ↵jt
for firm j, and nj is the number of these
draws. is the standard normal density.
65
ē)2
,
• To evaluate the posteriors of At1 , At2 , conditional on the rest of the parameters and
k
vjt
, we apply the Kalman smoother to the system
k
k
= D1 At1 + D2 At2 + ↵jt
.
vjt
k
= µ̄ +
↵jt
k
x ⌘jt ,
if
Hj = 1,
k
=µ+
↵jt
k
x ⌘jt ,
if
Hj = 0,
At1 = ⇢1 A(t
1)1
+ CA1 + ⌫t1 ,
At2 = ⇢2 A(t
1)2
+ CA2 + ⌫t2 ,
[⌫t1 , ⌫t2 ] ⇠ N ([0, 0], ⌦),
where D1 , D2 , µ̄, µ, {Hj , j = 1, ..., J}, ⇢1 , ⇢2 , CA1 , CA2 , ⌦ are fixed and known.
• Given At1 , At2 , regress At1 on A(t 1)1 to update values of CA1 , ⇢1 , and regress At2 on
A(t 1)2 to update values of CA2 , ⇢2 .
• Given At1 , At2 , and CA1 , CA2 , ⇢1 , ⇢2 , calculate ⌫t1 = At1
At2 ⇢2 A(t 1)2 CA2 , and update ⌦.
⇢1 A(t
1)1
CA1 , ⌫t2 =
k
k
• Given At1 , At2 , vjt
, µ̄, µ, H̃, regress vjt
on At1 , At2 and firm-specific constant µ, to
update the values of D1 , D2 .
We iterate on the above steps, until convergence.
10.4.3
Estimating TFP
To carry out TFP estimation we use only data on domestic sales of the firm, i.e. Rjt =
R̂jt Xjt , where R̂jt is total revenues of the firm, and Xjt is its export revenues. Thus,
we evaluate the firm’s TFP from domestic production only, which allows us to abstract
away from the additional complications of a demand system for al the markets of the firm
(domestic and foreign). We assume that the inputs used for domestic production only are
the same fraction of total inputs used as the fraction of domestic sales in total sales.
66
An extensive literature is devoted to the estimation of TFP. We explain how we deal
with some issues that have been brought up so far below. We borrow many steps from De
Loecker (2010). Begin with the pair of production and demand equations:
↵k !jt +ujt
Qjt = L↵jtl Mjt↵m Kjt
e
,
Qjt = Qst [
Pjt ✏ ⌘jt
] e ,
Pst
where L, M, K are inputs - labor, intermediate inputs and capital, respectively, !jt is
unobserved productivity shock, ujt is an error term, Pjt is firm j-s price, Qjt is firm j-s
quantity produced and sold, ⌘jt is unobserved demand shock, Qst and Pst are industry-wide
output and price index, respectively. j indexes firms, and t indexes time (years in our case).
We observe only total domestic revenues (and not physical output):
Rjt ⌘ Qjt Pjt = Qjt Qjt
1
✏
1
1
Qst ✏ Pst e⌘jt ✏ ,
so that upon dividing both sides by Pst and taking logs:
r̃jt =
✏
1
1
1
qjt + qst + ⌘jt ,
✏
✏
✏
and plugging in the equation for the production function:
r̃jt =
l ljt
+
m mjt
+
k kjt
+
s qst
+ !jt + ⌘jt + ujt ,
the small letters denote the logarithms of the capitalized variables (e.g. qjt ⌘ ln Qjt ),
and all the coefficients are reduced form parameters combining the production function and
demand parameters.
We next denote !
˜ jt ⌘ !jt + ⌘jt and re-write:
r̃jt =
l ljt
+
m mjt
+
k kjt
+
s qst
+!
˜ jt + ujt .
Here we follow Levinsohn and Melitz (2006) in treating both the unobserved productivity
and demand shocks jointly, so that we do not identify these two sources of firm profitability
independently. Since we later use the estimates of !
˜ to estimate demand shocks ↵ in the
foreign markets, we need to consider how this assumption a↵ects that part of estimation.
For !
˜ to be a good instrument for unit values, we need !
˜ to be correlated with the unit
values which it is : as shown in the main part of the paper, we assume that
Q⇤jt
=
⇤
⇤ Pjt ✏ ↵⇤jt
Qst [ ⇤ ] e ,
Pst
where stars denote foreign market variables. Assume for now that there are constant
returns to scale in this industry, so that average cost is equal to marginal cost. Denote by T
67
↵k
the combined input L↵jtl Mjt↵m Kjt
, and by cT the average cost of this input: cT ⌘
wL+rK+mM
↵
↵m ↵k ,
Ljtl Mjt
Kjt
which is constant for all output levels with constant returns to scale. Then the foreign price
is a product of the firm’s average cost of production and mark-up and possibly iceberg trade
cost ⌧ :
Pjt⇤ = ⌧ Pjt = ⌧
cT
,
e!jt
and since !
˜ jt = !jt + ⌘jt , Pjt⇤ and !
˜ are clearly correlated. We also assume that ⌘jt , the
domestic demand shocks, are not correlated with foreign demand shocks, which is necessary
to have no correlation between !
˜ and the error term in the regression
⇤
⇤
qjt
= qst
+ ✏p⇤st
⇤
✏p⇤jt + ↵jt
,
which is the equation (in di↵erent notation) we relied on in the main part of the paper
to estimate ✏ and later generate ↵ s.
Now returning to the task at hand - estimating TFP (combined with domestic demand
shocks, i.e. !
˜ ). Since our goal is only estimation of TFP, it suffices for us to control for
industry-wide output with industry-time fixed e↵ects. Therefore, we have
r̃jt =
l ljt
+
m mjt
+
k kjt
+!
˜ jt +
T
X
Dt Dst
+ ✏jt .
t=1
Next issue to deal with is the identification of the variable inputs’ coefficients. To do that,
we take note of the ACF (Ackerberg, Caves and Frazer) critique of the Levinsohn-Petrin and
Olley-Pakes estimation approach, and estimate all the input coefficients in the second stage.
We use value added in the first stage regression, so that we only estimate the coefficients on
labor and capital in the production function. We do use the data on intermediate inputs,
however, as the control for (unobserved) productivity. We use the equation for intermediate
inputs
mjt = mt (kjt , !
˜ jt ),
so that assuming monotonicity in the function mt (.), we can invert:
!
˜ jt =
t (mjt , kjt ).
Similarly, we assume that the optimal quantity of labor is chosen once current productivity is observed by the firm, so that
ljt = lt (kjt , !
˜ jt ),
and once we utilize the expression for !
˜ jt above,
ljt = lt (kjt ,
t (mjt , kjt )).
68
Inserting these into the expression for value added:
ṽajt =
l ljt +
k kjt + (mjt , kjt ) +
T
X
Dt Dst
+ ✏jt
t=1
=
l lt (kjt , (mjt , kjt )) +
k kjt + (mjt , kjt ) +
T
X
Dt Dst
+ ✏jt
t=1
=
t (mjt , kjt ) +
T
X
Dt Dst
+ ✏jt ,
t=1
where t (mjt , kjt ) is a polynomial in capital and intermediate inputs, one for each time
period (year in our case):
(mjt , kjt ) ⌘
T
X
q0t Dt +
t=1
+
+
T
X
t=1
T
X
T
X
q1kt Dt kjt +
t=1
T
X
q1mt Dt mjt +
t=1
2
q2kkt Dt kjt
+
T
X
t=1
t=1
q2mkt Dt kjt mjt
t=1
q2mmt Dt m2jt +
q3kmmt Dt kjt m2jt +
T
X
T
X
2
q3kkmt Dt kjt
mjt
t=1
T
X
T
X
3
q3kkkt Dt kjt
+
t=1
q3mmmt Dt m3jt .
t=1
Assuming that productivity follows a first-order Markov process:
!
˜ jt = E[˜
!jt |˜
!j(t
1) ]
+ ⌫jt ,
where ⌫jt is uncorrelated with kjt , and given values of l and k , one can estimate the
residual ⌫jt (unobserved innovation to productivity) non-parametrically from
!
˜ jt = ṽajt
l ljt
k kjt
T
X
Dt Dt ,
t=1
˜ j(t
!
˜ jt = z0 + z1 !
1)
2
+ z2 !
˜ j(t
1)
3
+ z3 !
˜ j(t
1)
+ ⌫jt .
Next, use the moments
E[⌫jt ( k , l )kjt ] = 0,
E[⌫jt ( k , l )lj(t
1) ]
= 0,
to identify the coefficients on capital and labor.
Finally, !
˜ jt is given by
!
˜ jt = ṽajt
l ljt
k kjt
T
X
t=1
69
Dt Dt .
10.4.4
Estimating ⇥2 ⌘ {M, G, F, ,
liefs p
N , mp },
given estimates of µ̄, µ,
x , p0
and be-
In the second step of the estimation, we treat the values of ⇥1 ⌘ {µ̄, µ, x , p0 } and beliefs
pjt , j = 1, ..., J, t = 1, ..., T as given and fixed. Here we describe how we calculate the
likelihood of the data, given arbitrary values of ⇥2 ⌘ {G, F, M, , N , mp }, and these fixed
values of ⇥1 , and how we use this likelihood to estimate ⇥2 .
• Fix the values of , N , mp , M . We fix the initial value of N to 1, of mp to 0.01, of to
an estimate of the exogenous death rate based on the subsample of old exporters alone,
and of M to the average number of export destinations over all old exporters. We do
so because the old exporters are assumed to have completed their experimentation
phase (if any), and therefore their average export size should be a good indicator of
the full scale in the market, and their average exit rate should be a good indicator of
the exogenous death rate. Starting the iteration from di↵erent values of , N , mp , M
does not a↵ect the results.
For fixed , N , mp , M , and r ⌘ + dr, dr = 0.05, we can calculate the likelihood of
the data for any given values of the coefficients of the cost function c(n) and the sunk
entry cost F as
L(njt |G, F,
N , mp ,
, M ) ⌘ P rob(njt |G, F, N , mp , , M )
Y
njt n⇤ (pjt ) Y njt
=
(
)
(
N
0
Ejt
Y
1
Xjt
Y
Q1jt
Y
(1
1
Rjt
Y
0
Rjt
(1
)
Y
Q0jt
1
Ejt
M
N
)⇤
(1
),
(0|mp ) ⇤
)(pjt
pj |mp )
Y
2
Rjt
where denotes the standard normal distribution, denotes the exponential distribution (for the discrepancy between the predicted pj and the actual p˜j ),
0
= [j, t|Sjt = 0], that is, all j, t, where the firm exports and is in the experi– Ejt
mentation stage,
1
– Ejt
= [j, t|Sjt = 1], that is, all j, t, where the firm exports and is in the full export
stage,
1
– Xjt
= [j, t|Sjt = 2, Sj(t
from State 1,
1)
1
– Rjt
= [j, t : Sjt = 1, Sj(t
market in State 1,
= 1], that is, all j, t, where the firm exits the market
1)
= 1], that is, all j, t, where the firm remains in the
70
– Q0jt = [j, t : Sjt = 2, Sj(t 1) = 0, pjt pj ], that is, all j, t, where the firm exits the
market from State 0, given its beliefs are below pj , which allows us to set p˜j = pj ,
to maximize the likelihood, so that epd = pj p˜j = 0,
– Q1jt = [j, t : Sjt = 2, Sj(t 1) = 0, pjt > pj ], that is, all j, t, where the firm exits the
market from State 0, given its beliefs are above pj , which implies that the exit
was caused by an exogenous death shock,
0
– Rjt
= [j, t : Sjt = 0, Sj(t 1) = 0, pjt pj ], that is, all j, t, where the firm remains
in the market in State 0, given its beliefs are below pj , the probability of which
happening is (1 )(p˜j pj |mp ), and we set p˜j = pjt , to maximize the likelihood,
2
– and Rjt
= [j, t : Sjt < 2, Sj(t 1) = 0, pjt
pj ], that is, all j, t, where the firm
remains in the market in State 0, given its beliefs are above pj , which happens
with probability (1
).
We set flat priors for G and F , and pick new values for these parameters using the
Metropolis-Hastings step, which essentially amounts to maximising the above likelihood in our case.
• Given the latest values of G and F , we can reevaluate the posteriors of ,
– The prior distribution of
eters ↵N , N , so that
f(
1
N
N
is an inverse gamma distribution with hyperparam1 ↵N 1
|↵N ,
N)
=
N
e
(↵N )
1
N N
,
↵N
N
and 0 otherwise. The posterior distribution of
hyperparameters
0
↵N
N , mp , M .
nbN
,
= ↵N +
2
0
N
=
N
N
+
if
N
> 0,
is also an inverse gamma with
PnbN
i=1
(ei
2
ē)2
,
where nbN is the number of observations on export size (number of destinations)
we have, ei are the deviations of the observed export size n from the predicted
export size n⇤ , and ē is the mean of these ei .
– The prior distribution of
that
f ( |↵ ,
)=
is a beta distribution with hyperparameters ↵ ,
(↵ + )
(↵ ) ( )
and 0 otherwise. The posterior of
eters
↵
1
(1
)
1
,
if
0<
, so
< 1,
is also a beta distribution with hyperparam-
71
nb
X
0
↵ =↵ +
0
xi ,
=
nb
X
+ nb
i=1
xi ,
i=1
where xi is an indicator variable equal to 1 if a firm exits the market due to an
exogenous shock, and nb is the number of observations where we can infer this
indicator variable’s value.
– The prior distribution of mp is an inverse gamma with hyperparameters ↵d ,
so that
f(
1
|↵d ,
mp
d)
=
1
mp d
( m1p )↵d 1 e
(↵d )
d,
,
↵d
d
if
mp > 0,
and 0 otherwise. The posterior of mp is also an inverse gamma distribution with
the hyperparameters
↵d0 = ↵d + nbp ,
0
d
d
=
1+
d
Pnbp
i=1 (pj
p˜j )
,
where nbp is the number of all observations where we can infer the value of epd =
0
pj p˜j , that is sets Q0jt and Rjt
described above.
– The prior distribution of M is a normal distribution with hyperparameters m,
so that
f (M |m,
m)
=
q
(
1
2
m⇡
)e
1
(M
2 m
m)2
m,
,
and the posterior of M , given N , and the observed export sizes in state 1 (fullscale export), {njt |Sjt = 1}, is a normal distribution with hyperparameters
m0 =
m
N
N
0
m
+ Nm n̄ m
,
+ Nm m
m N
=
N
+ Nm
,
m
where n̄ is the average over all {njt |Sjt = 1}, and Nm is the number of all such
observations.
• Iterate on the previous two steps until convergence.
72
Table 6: The 17 regions
Region
Caribbean islands
Central Africa
Eastern Asia
Eastern Africa
Eastern Europe
European Union,
the first 15 members,
excluding France
Latin America
Middle East
Northern Africa
Oceania
European countries,
other than previous
South Asia
South East Asia
South Africa
Former Soviet Rep.
excl. Baltic states
US and Canada
Western Africa
Members
Antigua&Barbuda, Anguilla, Aruba, Barbados, Bermuda, Bonaire,
St. Eustatius&Saba, Bahamas, Cuba, Curacao, Dominica, Dominican Rep., Haiti,
Jamaica, St. Kitts&Nevis, Virgin Isd. (U.S.), Cayman Isd., Saint Lucia,
St. Martin (Dutch), Turks&Caicos Isd., Trinidad&Tobago, Grenada,
St. Vincent&the Grenadines, Virgin Isd. (Brit)
Angola, The Democratic Rep. of Congo, Central African Rep.,
Congo, Cameroon, Gabon, Equatorial Guinea, Sao Tome&Principe, Chad
China, Hong Kong, Japan, Rep. of Korea, D.P. Rep. of Korea, Mongolia, Macao, Taiwan
Burundi, Djibouti, Eritrea, Ethiopia, Kenya, Comoros, Madagascar,
Mauritius, Malawi, Mozambique, Rwanda, Seychelles, Somalia,
Tanzania, Uganda, Zambia
Albania, Bosnia and Herzegovina, Bulgaria, Czech Rep.,
Estonia, Croatia, Hungary, Lithuania, Latvia, Montenegro, Macedonia, Poland,
Romania, Serbia, Slovenia, Slovakia
Austria,Belgium, Germany, Denmark, Spain,
Finland, UK, Greece, Ireland, Italy, Luxembourg,
Netherlands, Portugal, Sweden
Argentina, Bolivia, Brazil, Belize, Chile, Venezuela,Colombia,
Costa Rica, Ecuador, Guatemala, Guyana, Honduras, Mexico, Nicaragua,
Panama, Peru, Paraguay, Suriname, El Salvador, Uruguay
United Arab Emirates, Bahrain, Israel, Iraq, Iran, Jordan,
Kuwait, Lebanon, Oman, Palestinian Terr., Qatar, Saudi Arabia,
Syria, Turkey, Yemen
Algeria, Egypt, West. Sahara, Libya, Morocco, Sudan, Tunisia
American Samoa, Australia, Cocos (Keeling) Isd., Cook Isd.,
Christmas Isd., Fiji, Micronesia, Guam, Northern Mariana Isd., Nauru, Niue,
New Zealand, Papua New Guinea, Pitcairn, Palau, Solomon Isd., Tonga,
Tuvalu, Vanuatu, Samoa, Marshall Isd., Kiribati, Tokelau
Andorra, Switzerland, Cyprus, Gibraltar, Iceland, Liechtenstein,
Monaco, Malta, Norway, San Marino, Vatican City State
Afghanistan, Bangladesh, Bhutan, India, Sri Lanka, Maldives, Nepal, Pakistan
Brunei, Indonesia, Cambodia, Laos, Myanmar, Malaysia, Philippines, Singapore
Thailand, Timor-Leste, Vietnam
Botswana, Lesotho, Namibia, Swaziland, South Africa, Zimbabwe
Armenia, Azerbaijan, Belarus, Georgia, Kyrgyzstan, Kazakhstan, Moldova,
Russia,Tajikistan, Turkmenistan, Ukraine, Uzbekistan
Canada, Puerto Rico, US, US minor outlying islands
Burkina Faso, Benin, Cote D’Ivoire, Cape Verde, Ghana,
Gambia, Guinea, Guinea-Bissau, Liberia, Mali, Mauritania, Niger,
Nigeria, Sierra Leone, Senegal, Togo
73
© Copyright 2026 Paperzz