Lecture 9. Two Period Empirical Models:
A General Introduction, The Full Information
No Error Specification, and Entry Models.
Ariel Pakes
October 30, 2007
(Need Tables 1 and 4 from Seim; forthcoming)
General Introduction to Two Period Models.1
Two period models were introduced into I.O. in the context of
entry models, but they have since been used in several other contexts including models involving other investments, and models
of contracting. The models have a first period which establishes
the state variables that determine the nature of product market
competition, and then a second period in which that competition takes place. In the investment/entry models we establish
the number of firms, or the size of their capital stocks in the
first period, and in the second period we compete in prices or
quantities. In the buyer-seller contracting problem we establish
which sellers sell to which buyers in the first stage and in the
second stage the buyers remarket the goods to consumers.
1
Parts of this lecture are taken from Pakes, Porter, Ho and Ishii (2007), and other parts
are from my Fisher Schultz Lecture (in process)
1
The solution concept used in all of these models is “subgame
perfection”, which in this context ( finite horizon), simply means
we solve the game backward. I.e. we
• solve the game in period 2 for the Nash quantities (or
prices) that would result from (every set or at least a subset
of the possible) period 1 choices)
• Assuming the period 2 quantity choices are (always) unique,
compute the resultant period 2 profits and net cash flow
(profits minus any fixed costs) conditional on different vectors of decsions for the actors, and then
• Find a Nash equilibria in the first period decision variable.
Since we are going to use the two period model repeatedly
I am going to start out with a general introduction to it. Let
π(·) be the profit earned in the second period, di and d−i be the
agent’s and its competitors’ choices, Di be the choice set, Ji be
the agent’s information set, and E be the expectation operator.
Then the Nash condition for the first period choice is written as
C1 : supd∈Di E[π(d, d−i , yi , θ0 )|Ji ] ≤ E[π(di = d(Ji ), d−i , yi , θ0 )|Ji ],
where yi is any variable (other than the decision variables) which
affects the agent’s profits, and the expectation is with respect
to the joint distribution of (d−i , yi ) that summarizes the agent’s
beliefs on the likely realizations of those variables. Throughout
variables that the decision maker views as random will be represented by boldface letters while realizations of those random
variables will be respresented by standard typeface.
Three points about C1 are worth noting.
2
• There are no restrictions on the choice set; for example
Di could be the space of all bilateral contracts (or of all
exclusive deals), and di may be at a boundary of that choice
set. Alternatively di could be a vector whose components
define the characteristics of a product (say the location and
size of a retail outlet).
• C1 is a necessary condition for a Nash equilibrium (indeed
it is necessary for the weaker notions of equilibrium). As a
result if we were willing to assume that the observed choices
satisfied one of these equilibrium conditions, C1 would be
satisfied regardless of the equilibrium selection mechanism.
• Finaly note that C1 is meant to be a rationality assumption
in the sense of Savage(1954); i.e. the agent’s choice is optimal with respect to the agent’s beliefs. Most often we will
assume that these beliefs are correct, but the framework is
more general than that and we will discuss some extensions
below.
To check the Nash conditions we must get an approximation of profits from counterfactuals to what profits would
have been had the agent made a choice which in fact it did not
make. This, in turn, requires a model of how the agent thinks
that d−i and yi are likely to change in response to a change in
the agent’s decision. E.g.: HMO hospital contacting problem.
One component of yi in the HMO/hospital contracting problem
is the premiums an HMO charges to customers, and it will typically depend on the network of hospitals the members of the
HMO can go to. So when an HMO considered whether to reject
a contract it in fact accepted, it knew that if it rejected the con-
Note.
3
tract equilbrium premiums would change. As a result we will
need a model for the agent’s preception of the likely impact of
that change on its expected profits. Similarly if there are decisions that are made at different points in time, we will require
a model for what the agent with the initial choice thought was
likely to happen to subsequent choices would it have altered the
choice it in fact made.
The model for how the agent thinks (yi , d−i ) are likely to
respond to changes in di is likely to depend on other variables,
say zi , which I will take as primitives in the sense that the agent
thinks they will not change if the agent changes its decision.
Condition 2 formalizes this assumption.
C2 :
d−i = d−i (di , zi ), and yi = y(zi , di , d−i ), and the
distribution of zi conditional on (Ji , di = d(Ji )) does not depend
on di .
If the game is a simultaneous move game then d−i (d0 , zi ) =
d−i and there is no need for an explicit model of reactions by
competitors.
Note.
The model’s theoretical restriction
is that if we let
∆π(di , d0 , d−i , zi ) ≡ π(di , d−i , yi ), −π(d0 , d−i (d0 , zi ), y(zi , d0 , d−i )),
where d0 is any alternative choice in Di , then C1 and C2 together
insure that
E[∆π(di , d0 , d−i , zi )|Ji ] ≥ 0, ∀ d0 ∈ Di .
(1)
Equation (1) is the moment inequality delivered by the theory. To move from it to an estimation algorithm we need to
specify
4
• the relationship between the expectation operator underlying the agents decisons (our E(·)) and the sample moments
that the data generating process provides, and
• the relationship between our constructed profit function
and our observable measures of the determinants of those
profits on the one hand, and the π(·, θ) and (zi , di , d−i ) that
appear in the theory on the other.
These two aspects of the problem differ with the different applications.
The Full Information, No Error, Models.
All the first generation two period models used the following
two assumptions. The relationship between the data generating
process and the agents’ expectations is that
FC3:
∀d ∈ Di ,
π(d, d−i , zi , θ0 ) = E[π(d, d−i , zi , θ0 )|Ji ].
I.e. it is assumed that all agents know both the decisions of their
competitors, and the realization of the exogenous variables that
will determine profits, when they make their own decision. FC3
rules out asymetric and/or incomplete information, and as a
consequence, all mixed strategies (i.e. this approach implicitly
restricts Di to consist only of pure strategies)2 .
To complete the specification we need an assumption on the
relationship between the variables we measure and the variables
that enter the theoretical model. This approach assumes
2
As stated it also rules out the analysis of sequential games in which an agent who
moves intially believes that the decisions of an agent who moves thereafter depends on its
initial decision. However at the cost of only notational complexity we could allow for a
deterministic relationship between a component of d−i and (d, zi ).
5
f
o
) are
zi = (ν2,i
, zio ) . (di , d−i , zio , z−i
observed, and
∼ F (·; θ), for a known function F (·, θ).
FC4. π(·, θ)) is known.
f
f
o
(ν2,i
, ν2,−i
)|zio ,z−i
FC4 assumes there are no errors in our profit measure; that is
were we to know (di , d−i , zi , z−i ) we could construct an exact
measure of profits for each θ. However a (possibly vector valued) component of the determinants of the profits (of the zi )
f
is not observed by the econometrician (our ν2,i
). Since FC3 asf
f
sumes full information, both ν2,i and ν2,−i are assumed to be
known to all agents when they make their decisions, just not
to the econometrician. FC4 also assumes that there is no error
in the observed determinants of profits (in the zio ) and that the
f
f
) conditional
, ν2,−i
econometrician knows the distribution of (ν2,i
o o
on (zi , z−i ) up to a parameter vector to be estimated.
Substituting FC3 and FC4 into equation (1) we obtain
f
∀d0 ∈ Di , ∆π(di , d0 , d−i , zio , ν2,i
; θ0 ) ≥ 0,
(2)
and
f
f
o ∼ F (·; θ0 ).
)|zio ,z−i
, ν2,−i
(ν2,i
To insure that the model assigns positive probability to the condition that
f
; θ) ≥ 0
∀d0 ∈ Di , ∆π(di , d0 , d−i , zio , ν2,i
for some θ and all i (as is assumed by the model), we need
further conditions on F (·) and/or π(·). The additional restrictions typically imposed are that the profit function is additively
separable in the unobserved determinants of profits, that is that
f
f
∀d ∈ Di , π(d, d−i , zio , ν2,i
) = π as (d, d−i , zio , θ0 ) + ν2,i,d
,
6
(3)
f
f
f
and that the distribution ν2,i
≡ {ν2,i,d
}d∈Di , conditional on ν2,−i
,
has full support.
Note that the additive separability in equation (12) can not
be obtained definitionally, by assuming ν2 is a residual from
a projection, at least if the right hand side contains a decision
variable (like the number of incumbents), for that would assume
the residual would be orthogonal to the decision itself.
Two Period Entry Models.
The two period entry models were the first two period models
used in the literature. They both show what two-period models can (and can not) do in helping understand data, and they
raised conceptual issues that the later generations of models
worked hard at getting around. Today we have avaialbe both:
(i) richer two period models (they use more data and provide
more detailed relationships), and (ii) truly dynamic entry models. With current data and methodology, each of these latter
two have their benefits and limitations; and we will get to both
of them in class.
The early entry models are the kind of entry models that
had been used extensively in the theoretical literature to develop intuition on just what can happen once we endogenize
entry. When they are used in the empirical literature they are
mostly used as a way of organizing data on the determinants of
cross-sectional differences in market structure, sometimes with
an emphasis on the impact of those differences on price-cost margins. The only real rational for this is in an environment that has
been stable for a very long time, and the only thing that changes
over time is idiosyncratic incumbent and potential entrant spe7
cific, entry costs and selloff values. In this setting we would
expect the observed cross-sectional distribution conditional on
covariates to converge to some constant “invariant” distribution
conditional on those covariates. The differences across markets
would be expected to depend on the size and other characteristics of the market. Moreover different theories of competition
suggest different effects of the number of competitors (think of
Cournot vs Bertrand), so in a best-case scenario we might learn
something about the nature of post-entry competition.
They are not structural models in the sense of the static models we used above; that is we generally do not think of using them
to estimate primitives and do counterfactuals (they do not let
past conditions determine current market structure or allow perceptions about future conditions to impact on current decisions).
Rather we think of them as a convenient way of summarizing
relationships in the data. On the other hand these models, when
applied and interpreted carefully, can be quite suggestive about
the likely strengths of various entry incentives.
There are at least three generations of two period entry models.
• models with identical firms
• models firm heterogeneity in fixed costs,
• models with heterogeneity in continuation values.
The latter group includes some models with asymetric information, so in using them we will weaken F C3 and F C4 above.
Each class of models brings us closer to what one might think
of as detailed structural models which use demand and supply
primitives as derived in the earlier part of the course.
8
Entry Models with Identical Firms.
These models are very much like the simple theory models of
entry (all entrants are the same, their are a lot of potential
entrants....). Thus
• a (possible infinite) set of identical firms make a period 1
choice of whether to enter a market.
• in the second period they compete according in some oligopoly
game such as Cournot. The firms are identical, and so after entry they split the market, with each of the N entering
firms receiving 1/N of total market demand3 .
For example, the simplest model might posit Cournot postentry competition. The value function would be V (N ), which
depends only on the number of firms (and not on the identity of
the firms) because of the assumed symmetry. The equilibrium
number of firms is then the largest N such that π(N ) > 0. If
the profit function is concave in N then N ∗ must satsify
π(N ∗ ) > 0, π(N ∗ + 1) < 0.
Mankiw and Whinston (RAND,86) provide a good introduction
to this class of models and consider their implications for “welfare”.
Exercises.
• Consider the Cournot second period competition with the
linear inverse demand function p = a − bQ and the cost
function C(q) = cq + F. Derive the free-entry equilibrium
3
When this class of models considers product differentiation, the demand function is
“symmetric” (e.g. logits with identical mean qualities) and so their market shares remain
equal.
9
number of firms, N e . Now consider the problem of a social
planner who can choose the number of firms, but cannot
set prices or quantities. Derive this social planner’s optimal
number of firms, N ∗ . Show that N e = (N ∗ +1)3/2 −1. Note
that N e > N ∗ , which is a general property of this class of
models.
A stylized description of data typically used would be that
observations are on cross-section of markets and that in each
market we see the number of competitors N and a vector of
market specific profit-shifters, x, but no data on the individual
firms. The vector x would include (at the least) the size of the
market and also possibly other measures of demand together
with some measures of costs. The model would make profts a
function of the demand indicators, cost indicators (both variable and sunk), the number of competitors, and a disturbance.
The profit shifters are generally taken as unrelated to the unobserved factors that determine value (the disturbances), but the
disturbance is known to the agents making the decisions (just
not to the econometrician), and so the model determines N as
a function of x and the realizations of a disturbance process.
In a series of papers Bresnahan and Reiss
(87,90,91)) look at retail and professional firms. They chose
markets which are small (so there is a small number of firms)
and isolated (so as to minimize the problem of defining what
is a “market”). This and many of the studies that followed
focused on retail trade and services where location is an important strategic variables. In these instances the the definition of
markets, and to how analyze demand in real geographic space
becomes important. The data are cross-sectional observations
Bresnahan and Reiss.
10
on town characteristics (including population) and on the number of establishments.
The value of enterring market i is
EV (Ni ) = V C(Ni , Mi , xi , θ) − F Ci ,
(4)
where Ni is the observed number of firms in market i, Mi is the
size of market i, V (·) is the continuation value (so it could have
indicators of future profitability, like expected growth rates, as
well as indicators of current profits), xi is a vector of profit
shifters and θ is a vector of parameters to be estimated.
Fixed costs, F Ci , are the same to each firm in the market,
so the Nash Equilibrium implies that the equilibrium (observed)
number of firms in a market satisfies
EV (Ni ) > 0 > EV (Ni + 1).
(5)
or, plugging in the formula for value,
V C(Ni , Mi , xi , θ1 ) > F Ci > V C(Ni + 1, Mi , xi , θ1 ).
(6)
This is an equilbrium condition. Nothing is said about how we
get to this equilibrium (just if there were more firms then Ni
one would leave, and if there were less a potential entrant would
enter).
Though the fixed costs are assumed to be identical for all
potential entrants in a given market, they may differ over markets, and they are not observed by the econometrician. The
probability of observing Ni firms therefore depends on the probability that fixed costs in this market are larger than V C(Ni +
1, Mi , xi , θ1 ) but less than V C(Ni , Mi , xi , θ1 ). If F C has distribution Φ(·; θ2 ) then the likelihood of N firms is
11
Φ(V C(Ni , Mi , xi , θ1 ); θ2 ) − Φ(V C(Ni + 1, Mi , xi , θ1 ); θ2 ). (7)
and one can estimate θ by MLE. If the fixed costs distribute
(log) normally, and we take the log of V C(·) instead of V C(·)
itself, then this is the likelihood of an “ordered probit.”
Bresnahan and Reiss use the parameters to make inferences
about the “nature of competition”. In their “benchmark” case
V (N ) will decreases proportionately with N . For example, V (2) =
1
2 V (1). This is meant to approximate a homogeneous goods
market in which entry doesn’t drive prices down (you should be
able to show that this would be true in a “one-period of profits”
model if the demand curve has a constant slope). If prices were
driven down by entry, V (N ) should decrease more rapidly (i.e.
V (2) < 21 V (1)); for then price minus marginal cost is lower, and
to make up a given fixed cost we would have to have a larger
market (remember the firms are symmetric, so ex poste they all
make equal profits).
B & R define the “entry thresholds”, MN∗ , as the population
size that would induce N firms to enter. Here the implicit assumption in a “one-period of profit” model needed to keep the
constancy of MN∗ is that population enters by increasing the constant in the demand function but not changing any other part
of the specificaiton.. Since the entry threshold would change
with the value of fixed cost in the market, they define MN∗ to
be the entry threshold were fixed costs held at its mean values.
∗
The “per-firm” threshold is mN = MNN . In the benchmark case
(“homogeneous goods and no price competition”) the per-firm
thresholds would be constant in N (that is, MN∗ increases linearly in N since V (N ) decreases linearly.) If prices decreased in
12
N ,the thresholds would increase in N , so that
mN +1
> 1.
mN
The table gives the estimated ratios of per-firm thresholds.
B & R interpret the ratios greater than one as evidence of prices
declining in N . Under this interpretation, the prices seem to
decline a lot when moving from one to two doctors, tire dealers
or dentists. However, further increases in N do not seem to
increase competition much. Consistent with old jokes, plumbers’
prices never fall.
Per Firm Entry Thresholds from
Bresnahan and Reiss, 1991 Table 5
Profession
m2 /m1 m3 /m2 m4 /m3 m5 /m4
Doctors
1.98
1.10
1.00
0.95
Dentists
1.78
0.79
0.97
0.94
Plumbers
1.06
1.00
1.02
0.96
Tire Dealers
1.81
1.28
1.04
1.03
There are a couple of reasonably obvious caveats to the analysis. First these professions and retail establishments seem likely
to offer differentiated products. For example, the personalities,
practice styles and even specialities of the doctors may differ,
while the tire dealers may offer different brands. In this case, a
threshold near one could result from off-setting effects of market
expansion and price competition. Also different firms may have
different entry costs, with, for tend to make mN increase over
N, so that the increase in going from the first to the second firm
may simply be a result of fixed costs, and not of competition.
There is some intuition, however, behind the answers, and
none of our caveats explains why so many of the estimated
13
threshold ratios are close to exactly one. We note that B & R do
look at the observed prices of tire dealers in a simple regression
context over a broader range of market sizes. These prices do
seem to fall with the first few entrants and then level out. This
is consistent with the estimated entry thresholds. However, they
also note the interesting fact that the prices in these small towns
appear to level out at a much higher level than is observed at
large, big-city tire dealers.
It is possible to extend the simple framework
of the last section in a number of ways, and still maintain the
the ordered probit estimation method. These include
Extensions to BR.
• Let the size of the market, Mi depend on some data and
parameters (on the distance to nearby population centers,
as well as own-town population, average income . . .).
• Introduce some heteregeneity in the mean of fixed costs distribution across the number of entrants (rising input costs
across entrants, or poorer locations for the same cost).
• Add price and or quantity data.
B & R had to rely entirely on
the observed entry data to make inferences about variable profits
V (N ). Berry-Waldfogel(RAND 2001) consider entry into the
radio industry, where both price and quantity data are available.
They integrate this data into the B&R framework. Perhaps
more interesting is this is one of the first attempt to analyze a
media market. This is a three equation model.
Adding Price and Quantity Data.
14
• Demand is modelled as a logit model with symmetric firms.
Each station produces some idiosyncratic benefit to listeners, so listening increases in N . Of course there is a “business stealing” effect of each entrant, as some of its listeners
are taken from others. Note that there is survey data available on how much each station is listened to.
• Stations sell listeners to advertisers at some price per listenerhour. Advertisers’ demand is downward sloping, with a
simple constant elasticity functional form. The demand
for advertising curve is the second equation of their model.
There is data on the price of an advertising hour. Given
the number of listener-minutes and the advertising curve
we get a revenue per radio station.
• The third equation uses this and a distribution of fixed
costs across markets to derive an entry equation similar to
B&R’s equation.
The data are a cross section of 135 metro areas, and they
use their results to look at the relationship between producer
surplus and entry. The theory here is largely as in Mankiw and
Whinston (1986). This paper tries to measure the “ business
stealing” effects of new entry on the profits of existing firms
(note producer surplus here is in terms of advertising revenue).
The empirical results show a large degree of business stealing by
new stations, which in turn implies a large producer surplus loss
from free entry (on the order of 40% of industry revenue). So
there should be a large potential gain from cartelization, at least
if one could enforce agreements with all potential entrants, and
large benfits from mergers (at least if one is willing to abstract
from further dynamics like subsequent entry and investment de15
cisions).
Note, however, that the results on both producer and consumer surplus are suspect; they depend crucially on a demand
functional form that is hard to believe.
• For producer surplus consider a market populated by people who speak different languages, and only listen to radio
stations that speak in their language. All the new radio
stations do is bring in another language group. There is no
“buisness stealing” effect at all. This is an extreme, but it
serves to make the point that to do this carefully we need
differentiation in continuation values.
• The consumer suplus here is the benefit to listeners, so you
might not want to discount it entirely, and if there is any
additional benefits from new stations, this model does not
capture it.
Note that the FCC has established new (and considerably
more liberal) regulations on ownership in the radio industry,
starting from 1996, and there is a current debate about liberalizing them further. It would be interesting to get a hold on
the likely impact of this; on likely mergers, on the nature of the
market for advertising, and on variety. Probably the hardest
part of the analysis, is to figure out the effect of mergers on
variety, and then the effect of variety on consumer surplus, as
it would require a choice of entry into a space of characteristics
(language, genre, ....) Of course fact that this is a dynamic market also throws some doubt on the ability of two period models
to supply a reasonable framework for the analysis.
16
Firm Heterogeneity
The early literature considered two “types” of differences among
entrants, and considered them separately. In both cases; (i)
computation is more difficult, and (ii)there are uniqueness issues. I.e. when firms are different our models are usually not
rich enough to determine exactly which firms enter.
• Fixed costs that varied across entrants.
• Continuation values which differ due to, say, product differentiation.
The variable continuation values complicates the story even
further as entry can “expand” the market. Indeed much of the
reason for empirical work in this area is to determine just what
impact entry into one location has on profitability on another.
This is just a variant of the question of how “substitutable” is
one product for another which appears repeatedly in the static
literature, but in some markets there is not much meaningful
price and quantity data available. The entry location (and possibly size) story, seems to be very important to understanding
the retail and service industries.
Heterogeneity in Fixed Costs.
Berry (1992) considers entry of airlines operating on city-pair
routes shortly after the deregulation of the U.S. airline industry.
He maintain the assumption that continuation values are the
same conditional on N , but lets the fixed component of costs
vary across firms. Assume that entry values are given by
EV (N, xm , zf m , mf , θ) = V C(N, xm , θ) − F C(zf , mf , θ), (8)
17
where N is again the equilibrium number of firms, xm is a vector of market-specific variables, zf m is a vector of firm specific
variables, is a firm/market specific fixed cost, and θ is a vector
of parameters to be estimated. To parametrize the entry values
further, assume that
EVf m (N ) = xm β + zf m α − δln(N ) + f m
(9)
with modeled as having a market specific component, which
allows for correlation in unobservables across firms within the
market,
q
f m = ρum0 + 1 − ρ2 uf m .
(10)
(note the constraint on the variances to set the variance of equal to one; similar to any discrete choice model.)
We have the same basic assumptions as above; full information Nash-like conditions with no errors (in either measurement
or decision making). This implies that the f m are known to all
of the agents just not to the econometrician. The probability of
an N firm equilibrium is no longer an ordered probit, as the unobservables are now a vector of dimension equal to the number
of potential firms.
In a general class of discrete entry models, it is not clear that
an equilibrium exists or is unique. However, if the continuation
values (in contrast to the fixed costs) are symmetric (depends on
N but not zmf ) and there is a finite number of potential entrants,
then there is a simple, constructive proof of an equilibria which
detemines a unique N (though not a unique set of identities to
the entrants).
Proof of unique N (sketch): order firms in decreasing entry values, let entry occur until last profitable entry. Call this last
18
firm N ∗ . This allocation of firm to “in” and “out” is an equilibrium. Can’t have fewer firms in equilibrium, because firm N ∗
would then enter. Can’t have more because N ∗ + 1 firm won’t
be profitable in an N ∗ + 1 equilibrium. Note that the proof
is constructive as it allows one to find N ∗ given observables,
unobservabels and θ. ♠
While the number of firms is unique, the identities of the entering firms are not. It can happen that in equilibrium either
firm A or firm B could enter, but not both. (This occurs when
both firms would make profits in a N ∗ − 1 equilibrium but neither would in a N ∗ + 1 equilibrium.) Thus, the model does not
identify the probability of entry for any of the entering firms.
This because there is not a unique map from observables, unobservables, and parameters to actions. This implies that we loose
what information their might be in who enters, and just keep
the information on the number of firms that enter.
The likelihood of θ given N is hard to calculate because the
region of the space that leads to an N -firm equilibrium is hard
to describe. However, just as in aggregation, it is easy to use
simulation methods to solve the problem. Begin by taking S
draws on the underlying random variables u in equations (9)(10). For each guess of θ, construct the equilibrium number of
firm, N̂ (us , θ), via the constructive method of the equilibrium
proof just given. At the true value of θ, the difference between
the observed value of N and the value of N simulated from the
model is mean independent of all the determinants of N . This
lets us build a method of moments subroutine that looks for a
value of θ that makes the covariance of the difference between
the observed and simulated N 0 s and any functions of its determinants equal to zero.
19
The empirical application is to the importance of airline hubs
in early deregulated airline industry. The data are cross-section
by city-pair (origin-destination) market. Interestingly in the reduced form analysis the most important determinant of whether
one airline served a city pair route was the number of other
routes it served from the same cities. This shows the importance of network effects, both in formulating demands and in
formulating costs for the airline market, and we are only now
getting some idea of how to analyze that.
Variable Continuation Values: Introductory Remarks.
The early ways of handling variable continuation values assumed
or derived unique equilibrium, and we start with them. We come
back to ways which do not require uniqueness below.
I start with a paper by Siem(2005) who introduces asymmetric information to get around the existence and (at least partially
get around) multiple equilibria problems. She only has a limited
form of variable continuation values; differences in profitability
among locations but not among firms in a given location. Firm
specific differences are limited to differences in fixed costs. I then
go back to two papers who keep the same assumptions on heterogeneity, but use the full infomation structure (they are more
in the nature of sequential models); papers by Mazzeo (2003),
and Toivanen and Waterson (1999). There is also some related
empirical work by Einav (2004) on a finite sequence of decisions
by different agents in a war of attrtion framework (release dates
of movies).
Two big sources of product differentiation in retail trade of all sorts are location and
A Note on Geographic Differentiation.
20
size/variety. There are many issues which are related to entry
and geographic differentiation which have not been adequately
analyzed;
• Policy issues; comparing optimal to actual both location
and size distributions of retail outlets; and figurring out
the implications of zoning laws on each.
• Modelling issues; geographic issues are nascent in many
(both demand and entry) studies since they are implicit
in the defintion of a “market” (including DOJ studies for
mergers.).
In this context we should note that there are several computerized maps which give store locations and characteristics of
populations now available, and these make the study of retail
markets much easier (see, for e.g. Davis,2005). There are of
course other dimensions of product differentation. Indeed the
questions of which dimensions of product differentiation should
be included in the entry models is much like the question of
which characteristics should be included in characteristic-based
demand systems.
Siem; Variable Continuation Values and Geographic Differenetiation in the Video Retail Market (Rand, 2007.)
Her model determines both how many firms enter, say N̂ and a
probability distribution for their locations. I will focus on the
equilibrium probabilities of firms enterring at different locations
conditional on the number of firms that decide to enter (our
N̂ ), as her determination of N̂ is not particularly helpful. This
is then a model of location or product choice.
21
• N ∗ firms decide whether to enter a given market (indexed
by m),
• Those that decide to enter choose their optimal location
from a finite set of l = (1, . . . , L) locations.
• Each firm makes its choices based on the expected post entry
value (so we will have to specify what it knows when it
makes those expectations).
• Post entry value in location l for any firm is
vl = ξ m + xm,l β + nm
l θ
X
+
j∈d1l,j
xjm β1 +
xjm β2 +
X
j∈d2l,j
X
j∈d1l,j
nm
1,j θ1 +
X
nm
2,j θ2
j∈d2l,j
where
– d1(j, l) is an indicator for regions adjacent to location
i (one to three mile from), and d2(l, j) is an indicator
for locations somewhat farther away from i (three to
ten miles from location)
– nm
l is the number of enterring firms in the immediate
“tract”, nm
1,j is the number of firms within the distance
defined from d1l,j etc.
A Note on Underlying Theory. This is the first
paper in this literature that introduces asymetric information.
She replace FC3 and FC4 above with the assumptions
Assumption. ∀l ∈ {1, . . . , L}
SC3 & SC4,
π(l, l−i , zio , νi ) = vla + νi,l ,
22
and if ν2,i ≡ {νi,l }l∈Di , then the distribution of (νi , ν−i ) conditional on zi is i.i.d. extreme value. Each agent knows their own
νi but does not know its competitor’s draws, it only knows the
distribution from which they are drawn.
Note that in this problem the νi are variables that agent i makes decisions based on (thus these are not
errors in measurement). Since only agent i knows ν1,i there is
assymetric information, and after the choices of everyone are
made there is ex poste regret; i.e. firms will have made choices
which ex poste are not profitable. Though there is nothing a
priori wrong with firms making wrong decisions, in this model
the assumptions imply that firms earn negative profits, perhaps
large negative profits, and never leave the industry (in a truly
dynamic model firms would move towards correcting past mistakes). This is a problem with using the combination of a two
period model with assymetric information.
Conceptual Problem.
Computation. Each firm
• forms its expected profits at every location and choses the
location that maximizes,
• to do so it has to form the expected number of firms at all
locations (these are expectations, rather than realizations,
because no firm knows ν−i ).
• Since all firms are symmetric in observables, the equilibrium
for the expectation of the fraction that chose location l, is
just the expectation of the max being l, which is given by
23
the usual logit probabilities; i.e.
pl =
exp[Eν(−i) πl ]
.
exp[Eν(−i) πk ]
P
k
where Eν(−i) is just notation for the fact that we have integrated out over the unobservable firm specific probabilities
and the implied number of entrants.
Now say that the probability of any one of the other firms that
enter choose location l is pl , and that there are N̂ firms that
enter. Then
Eν −i πl = ξ m +xm,l β +pl [(N̂ −1)+1]θ +
X
xjm β1 +
j∈d1i,j
+
X
j∈d1i,j
X
pj [N̂ − 1]θ1 +
X
xjm β2 +
j∈d2i,j
pj [N̂ − 1]θ2 + νi,l
j∈d2i,j
Now
• Substituting this into the definition of pl
• This gives us an equation for the pl in terms of themselves.
• We look for a set of pl which, when substituted into the
probabilities, gives us back those very same probabilities;
or the fixed point of the system of equations above.
• It can be computed by the method of successive approximations. Guess at some value of p,say p(1). Then calculate
the implied values at each location.. Subsitute that into
the equation for p to determine p(2) and iterate until we
stabilize at a given vector of probabilities.
24
• Note that in general the fixed point need not be unique;
though she claims to never have found more than one equilibrium. Also though standard fixed point theory insures
existence of an equilibrium, there is no guarantee that this
algorithm will always converge.
• Given the uniqueness assumption and N̂ we can solve for
equilibrium conjectures for each different value of the parameters of the model. I.e. for each vector of parameters
we find the set of probabilities that make the l.h.s equal to
the r.h.s.
• we can then fit these equilibrium conjectures to data on
the shares of the entrants in each tract. M.l.e. is easy here
since we can calculate nl,m /N̂m × lnpl,m analytically.
Look to Tables. She uses data on 151 mid sized markets
with on average 21 census tracts (her “locations”) per market.
Data on tract characteristics are obtained from “Advanced Geographic Solutions” (similar data is available decennially from
the census). Data on video store locations is derived from the
American Business Disc 1999 (has store location, chain affiliation, and line of business for most retail trade; she does not
use chain, but comparring competition within and across chains
would be interesting from both a private and a social point of
view) . Table 1 gives you some idea of the data. Table 4 provides
some estimates. The big issue here is location effects. Clearly
• distant location’s population and income effects demand;
• distant location’s number of competitors effects demand
25
• the more distant the less the effect, but even locations 3-10
miles from the tract have a significant impact.
• She did not get much on distance more than ten miles.
• The figures give you an impression for the effect of distance.
Clearly you have a very different pattern of demand and
impact of competitors if you are at a corner.
What is missing and why? The partition of the distance measures is arbitrary. Of course she could have refined
it more to get an approximation to something more continuous,
but tha refinement would have been at the cost of more parameters. Pushing this further one might think of deriving the
continuation values directly a demand and cost equation. If individual utilities were derived from distance and other x0 s, and
we knew the distribution of locations... this would connect the
β and θ to β(d) and θ(d) and the characteristics of the adjacent
locations. Why not go this route
• Though it would have saved on parameters, but would have
made computation much more difficult (though maybe not
impossible; see Davis, forthcoming, for a demand system
estimated in this way). We would have needed to estimated
demand again for every function of the parameter values,
and only then computed the fixed point.
• Were we to go to this “more structural” version, we might
want to add real error terms to reflect the fact that our
models are not perfect.
• There is no firm specific known differences in profitability
among locations. If there were there would be a more ex26
treme problem of multiple equilibria. All differences among
firms in a given location are fixed cost differences.
There is still a multiple equilibria problem here, but this last
comment makes one realize that to move to more realism we
are going to have to take the multiple equilibria problem more
seriously.
Entry Models with Full Information and Variable Continuation Values.
We now revert back to the full information assumptions (F C3
and F C4).
Mazzeo (2003, RAND) and Toivanen and Waterson(2004 RAND)
extends the simple B & R model to consider firm heterogeneity in continuation values, keeping fixed costs constant (at least
within a group with symmetric continuation values). Both these
papers assume a small number of entry locations which are fixed
exogeneously. They then consider entry into all locations, producing a vector of “endogneous” variables. Again we work with
a reduced form continuation value.
The value function differs with the location, and at each location depends on the number of entrants at each other location.
Mazzeo works with motels (high, low, and medium quality) at
fairly isolated exits on interstate highways. Waterson and Toivanen deal with fast food outlets in Britain, and distinguish between McDonalds and Burger King. Both these papers keep
the assumption that firms of a given type share a common fixed
cost. Continuation values for any firm choosing quality level
t = (1, . . . , T ) in market m are assumed to be
Vtm = xm βt + g(N1 , . . . , NT , θt ) + tm .
27
(11)
where N̄ = N1 , . . . , NT is the vector of the number of firms
of each type. The parameters (specific to each type) are βt
on the market level variables and θt , which parameterizes the
effect of own-type and other-type competition. Both papers
approximate g with a set of dummy variables (since N̄ takes on
discrete values, so does g(·, θt )). Note that the unobservables
are constant across firms within quality type, so that the model
is symmetric conditional on type.
Unfortunately, even this simple type of heterogeneity in variable profits leads to both possible non-existence, and possible
non-uniqueness when equilibrium exists – at least in a simultaneous move game. So we have to put more structure on the
problem in order to derive the model’s theoretical implications.
One way of putting more structure on the problem is by assuming that choices are sequential. Different types of sequential
behavior would seem appropriate for the two different papers,
because
• in one market we would think that there are independent
actors who generate actions that result in a zero profit constraint being satisfied, and
• in one market we might think that there are two actors
(Burger King and McDonalds) that each choose the number
of firms of their type that enter.
We start with the Burger-King/McDonald’s problem we might
assume that (say) Burger King moves first (since it was actually
operating in the country first). Say the ’s are draws from a
known distribution. We proceed as follows.
• Fix the two 0 s, as well as the parameters and let EV j (b, m)
be the total entry value of Burger King (if j = b) or Mcadon28
ald’s (if j = m) when there are b Burger King outlets and
m McDonald’s outlets.
• Now solve the problem backwards. That is assume BurgerKing already has b outlets and ask what would McDonald’d do if BK has b outlets Mcdonald’s would choose m(b)
to maxm EV m (b, m). Given that, the first stage choice by
Burger King is to chose b to maxb EV b (b, m(b)).
• This gives us (b, m) conditional on the vector and a value
for the parameter vector. If we do this a number of times,
and take the expectation (over ) over the average (m(θ), b(θ),that
expectation at θ = θ0 should be (m, b) and its variance
should be well approximated by the sampling variance in
the (m(θ), b(θ)) draws.
• Thus we can form a moment based on the covariance between the difference between the simulated and the observed (m, b) multipled by the determinants of entry (the
x0 s) and then search over parameter vectors, until that moment is approximately zero.
For the motel problem, assume that there are only two types,
L and H. Let EV l (l, h) be the entry value of next type l firm
when there are already l low type firms, and h high type firms
active.
Mazzeo’s first equilibrium assumes that
• firms play sequentially and
• make irrevocable decisions about entry and product type
before the next firm plays.
29
Firm’s anticipate subsequent entry in the usual way. The Nash
conditions are that
• the last firm of each product type;
– finds entry profitable and
– prefers the chosen type to the alternatives.
• Additional entry in either product type is not profitable.
Therefore a Nash equilibrium is an orderred pair (l, h) for
which the following inequalities hold
• Vl (l − 1, h) > 0
• Vl (l, h) < 0
• Vl (l − 1, h) > Vh (l − 1, h)
• Vh (l, h − 1) > 0,
• Vh (l, h) < 0, and
• Vh (l, h − 1) > Vl (l, h − 1).
The alternative equilibrium assumes that;
• the firms sink there sunk costs without committing to whether
they will be low or high quality.
• In the second stage product types are selected among the
entrants either simultaneously or sequentially; but there are
a predeterminned number of entrants to allocate.
The two equilibria can be different as the following example
(taken from Mazzeo) illustrates. The case in point is
30
1. Vh (l, h) > 0,
2. Vl (l − 1, h + 1) < 0,
3. Vl (l − 1, h) > Vh (l, h − 1) > 0.
First look at the first equilibrium. Then despite (3) the lth low
type firm does not enter. This is because if it did it knows that
the h + 1 firm will enter (1), and if the h + 1 firm enters, it is no
longer profitable for lth firm to be in from (2). So the outcome
is (l − 1, h + 1).
In the game with the sunk costs made before a committment
is made to type, (3) insures that there are l+h firms enterring in
the first stage. Given that there are l + h firms, (3) also insures
that in the second stage the last firm will chose location l, giving
us (l, h).
Mazzeo fits both equilibrium and finds not much difference
in restults. He finds strong returns to differentiation – all entry
by rivals drives down profits but an incumbents profits fall by
much more when the rival is offering the same quality type as
the incumbent. Waterson and Toivian find that McDonald’s is
much more likely to enter a location where there are only Burger
King outlets then where there are McDonald’s outlets. That is,
just as does Siem, and probably not surprisingly, the two authors
find that their types of differentation matters for entry decisions,
and hence likely for profits and consumer surplus calculations.
One lesson from this type of analysis is that we should be
careful about doing either producer or consumer calculations
without allowing for sufficient product differentiation.
31
Recent Full Information, No Error, Models;
Tamer (2003),Ciliberto and Tamer (2006),
and Andrews Berry and Jia (in process).
We now come back to the original assumptions, which recall give
us the model
f
f
∀d ∈ Di , π(d, d−i , zio , ν2,i
) = π as (d, d−i , zio , θ0 ) + ν2,i,d
,
(12)
f
f
f
}d∈Di , conditional on ν2,−i
,
and that the distribution ν2,i
≡ {ν2,i,d
has full support.
Two further points which one should keep in mind; (i) the ν2,i
are known to all the agents when they make their decisions, just
not to the econometrician (so there is no room for assymetric
information, incomplete information, or either measurement or
decision error), (ii) the additive separability in equation (12) can
not be obtained definitionally, by assuming ν2 is a residual from
a projection, for then the residual would be orthogonal to the
decision itself.
Estimation. Though this model does specify a parametric
f
f
distribution for the (ν2,i
, ν2,−i
) conditional on the observables,
it is not detailed enough to deliver a likelihood. This because
the conditions required by the model can be satisfied by multiple
tuples of (di , d−i ) for any value of θ (i.e., there can be multiple
equilibria). As a result there is not a one to one map between
observables, unobservables, and parameters on the one hand,
and outcomes for the decision variable on the other. However
we can check
• whether the conditions of the model are satisfied at the obf
f
, ν2,−i
) and θ, and this, together
served (di , d−i ) for any (ν2,i
32
with F (·, θ), enable us to calculate the probability of those
conditions being satisfied at any θ. Since these are necessary conditions for the choice to be made, when θ = θ0
the probability of satisfying them must be greater then the
probability of actually observing (di , d−i )
• Similarly we can check whether (di , d−i ) are the only values of the decision variables to satisfy the necessary conf
f
ditions for any (ν2,i
, ν2,−i
) and θ, and this can be used to
provide a lower bound to the probability of actually observing (di , d−i ) given θ.
These are inequalities that not all values of θ will satisfy, and,
as a result, can be used as a basis for inference.
More formally define the probability that the model is satisfied
at the observed (di , d−i ) for a given θ to be
f
f
o
) : (di , d−i ) satisfy equation (2)|zio , z−i
, ν2,−i
P {(di , d−i ) |θ} ≡ P r{(ν2,i
, θ},
the analogous lower bound to be
f
f
o
, ν2,−i
) : only (di , d−i ) satisfy equation (2)|zio , z−i
, θ},
P {(di , d−i ) |θ} ≡ P r{(ν2,i
and the actual likelihood of (di , d−i ) for a given θ to be
o
P {(di , d−i )|θ} ≡ P r{(di , d−i ) |zio , z−i
, θ}.
What we know is that whatever the equilibrium selection mechansim when θ = θ0
P {(di , d−i ) |θ0 } ≥ P {(di , d−i )|θ0 } ≥ P {(di , d−i ) |θ0 }.
33
Estimating Equations. Let {·} be the indicator function
which takes the value one if the condition inside the brackets
is satisfied and zero elsewhere, h(·) be a function which only
takes on positive values, and E(·) provide expectations conditional on the actual process generating the data. The model’s
assumptions then imply that
1
E
N
"
1
E
N
X
−i
P {(di , d−i ) |θ} − {d = di , d
= d−i }
#
o o
h(zi , z−i )
i
"
X
!
−i
P {(di , d−i ) |θ}−E[{d = di , d
=
!
#
o o
o o
d−i }|zi , z−i ] h(zi , z−i )
i
1 X
o
=
P {(di , d−i ) |θ} − P {(di , d−i )|θ0 } h(zio , z−i
) ≥ 0,
N i
and similarly
!
1
N
X
−i
E[{d = di , d
=
o
d−i }|zio , z−i
]−P {(di , d−i )
!
o
|θ0 } h(zio , z−i
) ≥ 0,
i
at θ = θ0 . Note that these are inequalities. So there may be a
set of values of the parameters that satisfy them. Thus even in
the model is correct and we have infinite data, the constraints of
the model will identify a set. We come back to the econometric
issues that we need to handle with set value estiamtors below.
Estimation Routine. The estimation routine constructs unbiased estimates of (P (·|θ), P (·|θ)), substitutes them for the true
values of the probability bounds into these moments, and then
accepts values of θ for which the moment inequalities are satisfied.
34
Since typically neither the upper nor the lower bound are
analytic function of θ, we employ simulation techniques to obtain an unbiased estimate them. The simulation procedure
is straightforward, though often computationally burdensome.
Take pseudo random draws from a standardized version of F (·)
as defined in FC4, and for each random draw check the necessary
conditions for an equilibrium, i.e. the conditions in equation (2),
at the observed (di , d−i ). Estimate P (di , d−i |θ) by the fraction
of random draws that satisfy that inequality at that θ. Next
check if there is another value of (d, d−i ) ∈ Di × D−i that satisfy
the equilibrium conditions at that θ and estimate P (di , d−i |θ)
by the fraction of the draws for which (di , d−i ) is the only such
value.
If there are ns simulation draws for each i, and n(i) is the
cardinality of the set of choices appearing in d−i in the profit
function (in equation 2), then to construct P (·|θ) we need to
check ns × Πni=1 n(i) equilibrium conditions for each θ evaluated
in the estimation routine, while to construct P (·|θ) we need to
multiply this quantity by the cardinality of Di × D−i . This can
be computationally expensive.
The computational burden is likely to be particularly expesnive when the functions determining d−i = d−i (di , zi ), and
yi = y(zi , di , d−i ) are difficult to calculate. This typically will
be the case when inequality estimators are used to characterize
the first stage decision in a two stage game, and the second stage
works with an explicit structural model of demand and supply.
In this case to obtain yi = y(zi , di , d−i ) (and/or d−i (di , zi )) from
a structural model we need to compute equilibrium conditions
for the second stage of that game, and this can be computationally prohibitive. Though I have not seen it done, one can
35
decrease the computational burden for P (·|θ) by just checking a
fraction of the inequalities in equation (2) for each agent. However we need to examine them all for the lower bound P (·; θ)
and without calculating both bounds we may well be left with
an unbounded identified set4 .
4
One could, and probably should, use variance reduction techniques to estimate
(P (·|θ), P (·|θ)), but even one evaluation of all possibilities for P (·|θ)) at every θ can be prohibitive. Also it should be possible to obtain a more precise estimate of P (·) then simply
using the indicator function by using non-parametric procedures which take weighted averages of the indicator functions with weights increasing with the closeness of the observed
covariates to the covariates for observation i.
36
General Issues: Analysis of Entry Games.
• All reduced value function models becomes harder and harder
as we increase the number of types, since the number of parameters we estimate increases at about the square of the
number of types. The obvious way around this is to estimate demand and cost functions directly on the underlying “characteristic” space, just as we did in static analysis,
and then use those profit functions directly in the empirical analysis. This is computationally more burdensome,
but becomes less so if we estiamte many of the parameters
separately in a first stage.
• Once we allow for a lot of heterogeneity we are going to
have to face multiple equilibria problems. We have shown
that it is possible to allow for multiple equilibria. However
the methods we have to do this so far require very strong
assumptions on our measurement model; one might be especially worried about their being no room for error in the
model We come back to this below.
• Lack of True Dyanmics. Note that not only does this get
rid of some of the conceptual problems and put in the lags
we observe in the world, but it would allow you to bring
in the information in the sequential nature of most data.
That is in many of these data sets we know exactly when
each firm entered, so we could condition that firm’s entry
decision on firms existing in the market at the time they
enter. The problems here are a mix of computational and
conceptual, and we come back to them after we introduce
dyanmic games.
37
© Copyright 2026 Paperzz