Semiparametric Estimation of Signaling Games with Equilibrium

Semiparametric Estimation of Signaling Games
with Equilibrium Re…nement
Kyoo il Kim
University of Minnesota
November 29, 2012
Abstract
This paper studies an econometric modeling of a signaling game with two players where
one player can have one of two types. In particular, we develop an estimation strategy that
identi…es the payo¤s structure and the distribution of types from data of observed actions.
We can achieve uniqueness of equilibrium using a re…nement, which enables us to identify the
parameters of interest. In this game, the type distribution (i.e., conditional probability of
being a particular type) is nonparametrically speci…ed and we propose to estimate the model
using a sieve conditional MLE. We achieve the consistency and the asymptotic normality of the
structural parameters estimates.
Keywords: Semiparametric Estimation, Signaling Game, In…nite Dimensional Parameters,
Sieve Simultaneous Conditional MLE
JEL Classi…cation: C13, C14, C35, C62, C73
1
Introduction
The econometric modeling of strategic interactions has been of signi…cant interest over the last
decade. It has been one of most active research topics in the …eld of empirical industrial organization including studies on industry entry decisions (Bresnahan and Reiss (1990, 1991), Berry (1992),
Toivanen and Waterson (2000), Seim (2006), Ciliberto and Tamer (2009)) and others (Mazzeo
(2002), Sweeting (2009), Davis (2006)). Most of econometric analysis of game theoretic models
has focused on simultaneous games with complete information (Bjorn and Vuong (1984, 1985),
Bresnahan and Reiss (1990, 1991), Tamer (2003), Bajari, Hong and Ryan (2010)) or with incomplete information (Brock and Durlauf (2001, 2003), Seim (2006), Sweeting (2009), Aradillas-Lopez
This paper is based on my doctoral dissertation. I am truly grateful to my advisor, Jinyong Hahn for his advice
and encouragement throughout my doctoral study in UCLA. I would like to thank Joe Hotz, Moshe Buchinsky, Dan
Ackerberg, John Riley, Kathleen McGarry, Ra¤aella Giacomini, Patrik Guggenberger, and William R. Zame for their
useful comments and suggestions. I am also indebted to seminar participants at Brown, Rochester, SMU, UCL,
University of Washington, WU in St. Louis, University of Wisconsin-Madison, and University of Chicago. I would
also like to thank a co-editor, associate editor, and referees for their correcting my errors and invaluable suggestions,
which much improved this paper. All remaining errors are mine. Correspondence: Kyoo il Kim, Department of
Economics, University of Minnesota, 4-129 Hanson Hall, Minneapolis, MN55455, Tel: +1 612 625 6793, E-mail:
[email protected].
1
(2010), Bajari, Hong, Krainer, Nekipelov (2010)). More recently, there has been also signi…cant development in dynamic games (Aguirregabiria and Mira (2007), Bajari, Benkard, and Levin (2007),
Berry, Ovstrovsky, and Pakes (2007), Pesendorfer and Schmidt-Dengler (2008)).
In these games, we typically assume that types of players are singleton or types are also in game
players’ information set. In other words, players are symmetric in terms of information quantity
they have on other players whether the game is with or without complete information. However,
in many cases, one player may have more information than others and take an advantage on this
information asymmetry. A used car market is one natural example of this sort where sellers have
more information on true qualities (types) of cars and a labor market is another good example of
information asymmetry where a job candidate knows her true productivity (type) while employers
do not.1
However, although the information asymmetry problem has been noted widely both theoretically
or empirically over more than thirty years, a formal econometric analysis of this asymmetry problem
appears only in a few studies.2 For example, Ackerberg (2003) provides a dynamic learning model
of consumer behavior allowing for the signaling e¤ect of advertising on experience goods. Brown
(2002) provides a structural estimation of prestige e¤ects in IPO underwriting within a signaling
game context where a …rm’s owner has an incentive to signal the …rm’s type by choosing prestigious
underwriters. However, their approaches do not provide modeling of signaling games per se.
Here we take our step on structural econometric modeling for a class of signaling games where
an informed player moves …rst by taking a costly action (signaling) in the sense that she cannot be
sure about the outcome of the game. The informed player can get the outcome she desires when
the second player (uninformed player ) makes a correct assessment on the informed player’s type.
Therefore, modeling beliefs of players (i.e., distribution of types) plays a key role in this game,
which makes it distinct from other econometric modeling of games.
One of di¢ culties that have hindered econometric implementation of signaling games is the
issue of multiple equilibria or even absence of any equilibrium if we restrict strategies of players to
a limited class, (e.g.) pure strategy, which results in the nonexistence of a well-de…ned likelihood
function over the entire set of observable outcomes. This paper shows that the non-existence
problem can be resolved by allowing for a sort of mixed strategy (semi-separating in Bayesian
extensive form game) and the multiplicity problem can be resolved by adopting an equilibrium
re…nement such as intuitive criterion by Cho and Kreps (1987), which enables us to achieve the
uniqueness of equilibrium.
Even when these problems are overcome, it is generally hard to set up a ‡exible econometric
model that is capable of dealing with a generic family of signaling games with ‡exible information
structure (type distribution) and is also estimable. This paper takes one step in econometric
modeling of a signaling game with two players where one player, informed player, has one of two
types. In particular, we develop an estimation strategy that identi…es the payo¤s structure and the
distribution of types from data of observed actions by players.
Providing more empirical relevance to the basic model, we introduce some non-strategic public
signals about the types, which the informed player can not manipulate, or at least has no incentive
to do so. These signals are observed by all players and econometricians. Under the separating
1
A vast of economic models have been developed to study these problems (e.g. Spence (1974)), Riley (2001)
provides an extensive survey on these topics.
2
There are still quite a few studies that empirically test implications of particular signaling games or models of
information asymmetry in general. However, there are very few studies that actually estimate the model primitives.
2
equilibrium, the action of the informed player reveals the true type and thus the uninformed party
has no incentive to use this additional information. However, under the pooling equilibrium, the
uninformed party will update her belief on types using these public signals following a Bayes’
rule. Put di¤erently the probability of being a particular type depends on the player’s observable
characteristics.
In our approach we specify the distribution of non-strategic signals on types nonparametrically
and thus estimate the game model (payo¤s structure and distribution of types) using a sieve conditional MLE (a conditional MLE version of Wong and Severini (1991) or Shen (1997)) where the
in…nite dimensional parameters are approximated by sieves.
The organization of this paper is as follows. Section 2 describes the game model. In Section
3, we characterize equilibria of the game and show how we can achieve uniqueness of equilibrium
using a re…nement. In Section 4, we consider public information about the distribution of types.
In Section 5, we discuss identi…cation and estimate the model using a sieve conditional MLE. The
consistency and the asymptotic normality of the structural parameters estimates are presented. We
then run a small-scale Monte Carlo experiment in Section 6. We conclude in Section 7. Technical
details and mathematical proofs are presented in the appendix.
2
Description of the Game
The signaling game in this section models a situation where each player observes actions of both
players (Player 1 and Player 2) including her own but one of players has uncertainty about payo¤s
that are a¤ected by types of the other player. Without loss of generality, we will assume Player
1 is informed of her type but Player 2 does not know Player 1’s type. What makes a signaling
game interesting is that Player 2 (uninformed party or receiver ) controls the action and Player 1
(informed party or sender ) controls the information. The receiver has an incentive to try to deduce
the sender’s type from the sender’s message (signal), and the sender may have an incentive to
mislead the receiver.
The solution concept we will use here is a perfect Bayesian equilibrium (PBE), instead of a
sequential equilibrium (SE). The former is simpler than the latter. We do so without loss of
generality since the game of interest here is a …nite Bayesian extensive game with observable actions.
Every sequential equilibrium of the extensive game associated with a …nite Bayesian extensive game
with observable actions is equivalent to a perfect Bayesian equilibrium of the Bayesian extensive
game, in the sense that they induce the same behavior and beliefs3 . The formal de…nition of a PBE
can be found in Osborne and Rubinstein (p.233, 1994).
Figure 1 illustrates the structure of the game we study, named Game G. We design this game
such that it is simple but still contains all the essence of a signaling game. This is an econometric
modeling of the beer-quiche game (or bar …ghting game) in Cho and Kreps (1987). In this game, we
have two players. Player 1 has either of two types fstrong; weakg with the probability of being the
strong type equal to p and knows her type. After observing her type, Player 1 moves …rst sending
one of two messages {B; Q} to Player 2 where B stands for beer and Q stands for quiche. Here it
is known to public that beer is associated with the strong type and quiche is associated with the
3
Note that the reverse statement is not true in general (see page 234-235, Osborne and Rubinstein (1994)).
However, Fudenberg and Tirole (1991) note that for a …nite Bayesian extensive game with two types or with two
periods, every PBE is also equivalent to a sequential equilibrium. Thus, for the game of Figure 1&2, these two
equilibrium concepts are equivalent.
3
weak type. Then, Player 2 chooses one action out of {F; N F } after observing the signal sent by
Player 1 where F stands for “…ght”and N F for “not …ght”. Here Player 2 is more likely to choose
to …ght (not to …ght) if he/she thinks Player 1 is a weak type (strong type). After the play, payo¤s
are realized according to actions chosen by two players. The payo¤s of this game have the following
properties that are generally shared in signaling games as summarized in Figure 1.
The payo¤s of Player 2 (uninformed party) given her action are determined by the type of
Player 1 (informed party), not by Player 1’s action (signal )
Given Player 2’s action, each type of Player 1 has bigger payo¤s by choosing the signal
corresponding to Player 1’s true type. “B” is the signal for strong and “Q” is the signal for
weak by construction (i.e., the mimicking costs are positive 1s > 0, 1w > 0)
The strong type of Player 1 may have an incentive to signal its true type
The weak type of Player 1 may have an incentive to mislead Player 2
Many interesting signaling games can …t into this framework. Think of a job-market signaling
in a labor market (e.g. Spence (1974)) and a strategic advertising spending in a di¤erentiated
products market as examples. In a job-market signaling case Player 1 is a job applicant of either
high (strong type) or low (weak type) productivity. Player 2 is a potential employer who wants to
hire an applicant with high productivity. Note that high productivity types may have an incentive
to undertake a costly education as signaling e.g., obtaining an advanced degree. Here high types
have this incentive because it is more costly for low types to obtain an advanced degree (and this
fact is also known to Player 2:employer). In the beer-quiche game framework Player 1’s undertaking
of higher education corresponds to choosing “Beer” and Player 2’s hiring decision corresponds to
the action of “NF”.
In a strategic advertising case Player 1 is a …rm whose product quality is either high (strong
type) or low (weak type). Engaging in costly advertising corresponds to the signaling of strong type,
so to having “Beer ”in the beer-quiche game framework. Player 2 are consumers who want to single
out the high quality product. Buying a product of the …rm corresponds to the action of “NF” in
the beer-quiche game. Here …rms with high quality goods have this incentive of costly advertising
because it is more costly for …rms with low quality goods to engage in excessive advertising (and
this fact is also known to Player 2:consumer).
{u1s − φ1s , u2 s } {−φ1s ,0}
NF
{u1w , u2 w }
F
{0,0}
NF
2
F
Q
Q
strong
weak
p
1− p
1
B
1
B
2
NF
{u1s , u2 s }
F
{0,0}
NF
F
{u1w − φ1w , u2 w } {−φ1w ,0}
FIGURE 2. Parameterization of Payo¤s
FIGURE 1. Structure of the Game G
4
As summarized in Figure 1 the beer-quiche game is fully characterized by seven structural
objects:
u1s : di¤erence of payo¤s between “N F ” and “F ” outcomes for the strong type of Player 1
u1w : di¤erence of payo¤s between “N F ” and “F ” outcomes for the weak type of Player 1
1s : mimicking cost of the strong type of Player 1 (cost of signalling falsely)
1w : mimicking cost of the weak type of Player 1 (cost of signalling falsely)
u2s : di¤erence of payo¤s choosing between “N F ” and “F ” for Player 2 when Player 1 is
strong
u2w : di¤erence of payo¤s choosing between “N F ” and “F ” for Player 2 when Player 1 is
weak
p: distribution of types
We further elaborate on the payo¤s structure. Note that because an equilibrium of a game is
characterized by payo¤s di¤erences of actions of Players, not by levels we take payo¤s di¤erences as
the structural objects above. In other words we normalize the payo¤s of “F ”equal to zero (before
we account for the mimicking costs of Player 1). These normalizations and incentive parameters
have consequences in the payo¤s displayed in Figure 1. Because of these normalizations and because
in this signaling game payo¤s of each player are determined by Player 2’s action and Player 1’s
type, we only need to specify 4 payo¤s di¤erences (2 players & 2 types) between “NF ” and “F ”:
u1s ; u1w ; u2s ; and u2w in Figure 1 (This means 8 payo¤s out of 16 payo¤s are set to zero when there
is no mimicking cost of each type of Player 1). In the game we also introduce a mimicking cost
of Player 1 for each type, denoted by two parameters: 1s and 1w . Then because the mimicking
costs appear in each type of Player 1’s payo¤s (when one type mimicks the other type) regardless
of Player 2’s action, these costs appear in four places (two for the strong type and the other two
for the weak type) of Player 1’s payo¤s including two places in Player 1’s payo¤s when Player 2
plays “F ” (otherwise zeros when there is no mimicking cost). Therefore in Figure 1 (and 2 below)
we have 6 zero payo¤s out of 16 possible payo¤s.
Then it is of our interest to identify these seven structural objects that characterize the game
from observed outcomes of actions. It is obvious that if they are fully nonparametrically speci…ed,
we cannot identify all of them unless there are exclusion or parametric restrictions since we have
only three independent conditional probabilities out of four possible outcomes fB&N F , B&F ,
Q&N F , Q&F g that can be used for identi…cation.
To ensure the identi…cation, here we will adopt the following parametric speci…cations of the
payo¤s as in Figure 2. Our detailed strategy of identi…cation is discussed in Section 5.2.
u1s = X10
u2s = X20
1
2
"1 , u1w = X10 1 "1
"2 + 2s , u2w = X20 2
"2
2w
Here X1 and X2 are observable factors that a¤ect Player 1&2’s payo¤s and "1 and "2 are
unobservables to econometricians while Player 1&2 observe all of them. The constants 2s and 2w
in the Player 2’s payo¤s measure degrees of Player 2’s incentive to single out a particular type of
Player 1 (i.e., payo¤s from judging the Player 1’s type correctly). For example 2s is the extra
payo¤s to Player 2 by taking the “N F ”action (believing Player 1 is the strong type) when indeed
Player 1 is the strong type.
5
In the game the parameter 1s > 0 measures the potential cost of the strong type for mimicking
the weak type given a …xed response of Player 2. Similarly 1w > 0 measures the potential cost of
the weak type for mimicking the strong type given a …xed response of Player 2.
Remark 1 Here we assume that 1s , 1w , 2s , and 2w are constant but we may extend the model
such that these parameters also depend on characteristics of Player 1 and Player 2, respectively.
This, however, makes the identi…cation of the model rely on more restrictive conditions such as
functional form or extensive exclusion restrictions.
Throughout the paper, we assume that 2s and 2w are not negative. Note that this assumption
is innocuous in the sense that “strong” and “weak” types are just labels unless we give some
structure to them. By imposing 2s 0, we mean that Player 2 is more likely to be better o¤ by
singling out the strong type for the action “N F ”and to be better o¤ by singling out the weak type
for the action “F ”.
The game tree tells that if Player 1 is the strong type and if Player 1 chooses “Q”and Player 2
0
chooses “N F ”, Player 1 obtains X10 1 "1
"2 + 2s , respectively.
1s and Player 2 obtains X2 2
If Player 1 is the strong type and if Player 1 chooses “Q”and Player 2 chooses “F ”, they earn
1s
and 0, respectively. Other payo¤s can be read in the similar way. It is noted that each player’s
payo¤s are not only determined by her own action but also by the other player’s action or type,
which depicts interactions between two players.
e1 ), X2 = (1; X
e2 ), X
e = (X
e1 ; X
e2 ) 2 Rk1 Rk2 , X = (X1 ; X2 ), and let " =
We let X1 = (1; X
2
("1 ; "2 ) 2 R . We denote the vectors of parameters as 1 2 Rk1 +1 , 2 2 Rk2 +1 , ( 1s ; 1w ; 2s ; 2w ) 2
(0; 1)2 [0; 1)2 , and p 2 (0; 1) and collect the parameters as ( ; p) = (( 1 ; 2 ; 1s ; 1w ; 2s ; 2w ); p).
Note that this stochastic payo¤s game is di¤erent from the deterministic payo¤s game in that by
adding the unobserved heterogeneity ("1 ; "2 ), we allow the cases that players with the same observed
characteristics (X1 ; X2 ) can produce di¤erent outcomes of the game even when they adopt the same
equilibrium strategy. We also note that the symmetric payo¤s structure of the game G is somewhat
restrictive. For example, under this structure, the following two payo¤s (to be precise, the payo¤s
di¤erences between the outcomes of “N F ” and “F ” due to the normalization) are the same. One
is the payo¤ of the strong type Player 1 when she chooses “B” and Player 2 plays “N F ”. The
other is the payo¤ of the weak type Player 1 when she chooses “Q” and Player 2 chooses “N F ”.
This structure may be justi…ed in some cases but can be restrictive in general. In a supplementary
appendix (also Kim (2006)),4 we relax this restriction and show that the games with asymmetric
payo¤s are very similar to symmetric ones in terms of equilibrium characterization.
Before characterizing the equilibria of the game, …rst we formalize the information structure of
Players and the distributional assumptions.
2.1
Information Structure
Note that in the signaling game we study the true type of Player 1 is only known to Player 1 herself
and Player 1 signals her type to Player 2, so Player 2 has uncertainty on her payo¤s, which depends
on Player 1’s type.
1 Player 1 knows her true type but Player 2 knows only the distribution of Player 1’s types
(i.e., p is known to Player 2).
4
The supplementary appendix can be found at www.econ.umn.edu/~kyookim.
6
2 Realizations of (X1 ; "1 ) and (X2 ; "2 ) are perfectly observed by both Player 1 and Player 2.
3 "1 and "2 are pure shocks commonly observed by Player 1 and Player 2. They are jointly
independent of any other variables in the game. They are also independent of the type of
Player 1.
4 Players’ actions and beliefs constitute a Perfect Bayesian Equilibrium (Sequential Equilibrium). Whenever there exist multiple equilibria, only one equilibrium is chosen out of these
according to some equilibrium re…nement.5 Players are assumed to play actions and hold
beliefs about this unique equilibrium.
2.2
Stochastic Assumptions
We impose the following distributional assumptions on the random variables of the game. We …rst
consider the simplest structure and generalize it later.
Assumption 2.1 (SA)
1 "1 and "2 are continuously distributed, statistically independent of X.
2 The cdf’s of "1 and "2 are continuous and denoted by G1 ("1 ) and G2 ("2 ) with corresponding
density functions g1 ("1 ) and g2 ("2 ), respectively. The density functions do not depend on the
structural parameter ( ; p) nor on the type of Player 1.
e1 and X
e2 can be continuous or discrete random variables. Both X
e1 and X
e2 are
3 Both X
e1 and X
e2 as f e ( ) and
independent of the type of Player 1. We denote the density of X
X1
fXe2 ( ), respectively. Neither fXe1 ( ) or fXe2 ( ) depends on the structural parameter ( ; p).
By imposing Assumption SA.1, we ensure that Player 2’s equilibrium beliefs are constructed
being conditioned on variables observed by the econometrician, although this can be relaxed further.
We will strengthen these stochastic assumptions to ensure the validity of the econometric modeling
in later sections.
3
Re…nement and Uniqueness of Equilibrium
In this section we characterize equilibria of the game under PBE and then show we can achieve the
uniqueness of equilibrium using a re…nement of Cho and Kreps (1987). For the game we study, it
turns out that we have multiple equilibria for some realizations of payo¤s, which can be removed by
adopting a stronger equilibrium concept as an equilibrium selection. Possible disadvantage of this
approach noted in the literature is that there is neither generally accepted or empirically testable
procedure to determine which equilibrium will be played among multiple equilibria, especially for
simultaneous move games. However, we also note that an equilibrium selection of a signaling game
is more acceptable than that of a simultaneous move game, at least theoretically, which may justify
our approach.
We introduce additional notation here. Let ui (t1 ; A1 ; A2 ) denote the payo¤s of player i 2 f1; 2g
when the true type of Player 1 is t1 2 fts strong; tw weakg, Player 1 chooses action (signal)
A1 2 fB; Qg and Player 2 chooses action A2 2 fN F; F g. We also let A1t1 denote the action
taken by a particular type t1 of Player 1. With (A1 ; A01 ), we mean (A1ts ; A1tw ) = (A1 ; A01 ) where
5
Note that the generic uniqueness of the PBE is ensured by some re…nement of the equilibrium concept.
7
A1 ; A01 2 fB; Qg. Note p = Pr(t1 = ts ) is the prior belief of Player 2 on the type of Player 1. This is
also the population distribution of types. We let E1 [u1 (t1 ; A1 ; A2 )] be the expected payo¤ of Player
1 with type t1 based on Player 1’s information. 2 (t1 jA1 ) denotes the posterior belief of Player
2 on the type of Player 1 after Player 2 observes the action (signal) of A1 . We also let Y2jA1 be
an indicator function that has the value 1 when Player 2 chooses the action “N F ” after observing
the signal A1 . Similarly, A2jA1 denotes the action of Player 2 after observing A1 . Finally we let
(A1 ; A2 : q) denote observed actions of Player 1&2 with probability q in separating equilibrium and
we let (A1 ; A2 : 1) = (A1 ; A2 ) in the pooling equilibrium case. Throughout this paper, we will use
this notation. In the supplementary appendix, we determine regions of ("1 ; "2 ) 2 R2 , where each
PBE (Pooling with Beer or Quiche, Separating, Semi-separating when only one of two types plays
mixing strategy) exists given a realization of X. Figure 3 summarizes the results.
FIGURE 3. Equilibria of the Game
Figure 3 shows distinctive regions of ("1 ; "2 ) where a particular equilibrium is supported6 in
terms of observed actions, assuming 1s > 0, 1w > 0, 2s 0, and 2w 0 (but not 2s = 2w =
0). Note that Figure 3 is drawn for the case 1w > 1s as an illustration but this is not necessary
at all. As we can expect from the structure of the game, note that under "2 > X20 2 + 2s (when
a relatively large value "2 is realized), Player 2 is better o¤ by choosing F regardless of Player 1’s
type or action and will choose F . Thus, Player 1 will choose a signal corresponding to her type
since given Player 2’s action, Player 1 is better o¤ by choosing the signal that corresponds to her
type. Similarly, under "2 < X20 2
2w (when a relatively small value of "2 is realized), Player 2 is
better o¤ by choosing N F no matter what and thus under this region, Player 1 is willing to reveal
her true type.
We can …nd similar patterns for the Player 1 when realized values of "2 are moderate. Due to
the game structure, when "1 is relatively large, Player 1 tends to prefer F while she prefers N F
6
To de…ne a PBE, we also need to specify out-of-equilibrium belief that supports a particular equilibrium. A
speci…c out-of-equlibrium belief for each PBE of the game can be found in the supplementary appendix. Here we do
not present such out-of-equilibrium belief to make discussion simple.
8
when "1 is small. However, Player 1 cannot choose the action of F or N F but she can only induce
Player 2 to choose F or N F by sending proper signals. As a consequence, in Figure 3 and 4 below
(see S 4 and S 5 ), we observe that when "1 is relatively large, Player 1 tends to choose Quiche to
induce Fighting while she tends to choose Beer when "1 is relatively small, to induce non-Fighting.
This well demonstrates the key feature of a signaling game that Player 2 (uninformed party or
receiver ) controls the action and Player 1 (informed party or sender ) controls the information.
0
0
In Figure 3, under the region S 3
f("1 ; "2 )jX10 1
1w < "1 < X1 1 + 1s and X2 2
0
2w < "2 < X2 2 + 2s g, we have separating equilibrium (i.e., the strong type chooses Beer
and the weak type choose Quiche). This region gets larger as 1s , 1w , 2s , and 2w become
larger. This implies that when the cost of mimicking and the Player 2’s incentive to single out
a particular type of Player 1 are higher, the region that supports this separating equilibrium
becomes larger. We note that in many empirical studies, researchers tend to focus on the separating
equilibrium where Player 1 reveals her type and Player 2 chooses di¤erent actions according to
di¤erent types of Player 1 by imposing relevant conditions or by simply asserting a separating
equilibrium is more reasonable. However, Figure 3 illustrates that other kinds of equilibria can
arise depending on the realizations of ("1 ; "2 ). For example, in region S 2
f"1 < X10 1
1w
0
0
<
"
<
X
+
g,
we
have
a
semi-separating
equilibrium
and X2 2 + p( 2s + 2w )
2
2w
2s
2 2
where the strong type plays B and the weak type mixes between B and Q. Similarly, in region
0
S6
f"1 > X10 1 + 1s and X20 2
2w < "2 < X2 2 + p( 2s + 2w )
2w g, we have another
semi-separating equilibrium where the weak type plays Q and the strong type mixes between B
and Q.
The following two theorems state the existence of PBE for all realizations of ("1 ; "2 ) given X
and provide conditions under which uniqueness of equilibrium is achieved (Theorem 3.2).
Theorem 3.1 (Existence of Equilibrium)
Suppose Assumptions IS and SA hold. Suppose also that 1s > 0, 1w > 0, 2s 0, and
(but not 2s = 2w = 0). Then, there exist PBE for all realizations of ("1 ; "2 ) given X.
2w
0
(Proof: See the supplementary appendix in which we determine regions of ("1 ; "2 ) 2 R2 where
each possible PBE exists given X. These regions cover the whole R2 , so PBE exists for all possible
realizations of ("1 ; "2 ).)
We note that there are regions where multiple equilibria arise. However, using the re…nement
of Cho and Kreps (1987), we can achieve uniqueness of equilibrium for each region of ("1 ; "2 ) given
X. Figure 3 shows that there exist three regions where we have multiple equilibria. In region
0
Aa
f("1 ; "2 ) j"1 > X10 1 + 1w and X20 2 + p( 2s + 2w )
2w < "2 < X2 2 + 2s g, we have
two possible equilibria. One is pooling (B; B) with A2jB = F and the other is pooling (Q; Q) with
0
0
A2jQ = F .7 In region Ab f"1 < X10 1
1s and X2 2
2w < "2 < X2 2 + p( 2s + 2w )
2w g,
we also have two possible equilibria. One is pooling (Q; Q) with A2jQ = N F and the other is
pooling (B; B) with A2jB = N F .8
In Appendix A, using the intuitive criterion of Cho and Kreps (1987), we show that the pooling
(B; B) does not survive the re…nement in region Aa , the pooling (Q; Q) does not survive the
re…nement in Ab . Figure 4 illustrates the uniqueness of equilibrium based on this re…nement
result. The intuitive criterion is based on the concept of “equilibrium dominance” which tells
7
8
In this region we also have a separating equilibrium when 1w < 1s .
Similarly in this region we also have a separating equilibrium when 1w <
9
1s .
that a certain type should not be expected to play a certain strategy. For example, in region
0
S4
f("1 ; "2 )j"1 > X10 1 + 1s and X20 2 + p( 2s + 2w )
2w < "2 < X2 2 + 2s g, Player 1
plays a pooling equilibrium strategy with (B; B) or (Q; Q). However, for the pooling equilibrium
with (B; B) under S 4 , it is not reasonable to believe that a deviation is played by the strong type
because she is better o¤ by choosing B no matter what Player 2 chooses (see Appendix A). Thus,
Player 2 will think a deviation is played by the weak type for sure when she observes Q. Under
this re…ned belief, now we can eliminate the pooling with (B; B) in region S 4 .
Remark 2 For the game of Figure I in Cho and Kreps (1987), we have two pooling PBE. In one
equilibrium, both types of Player 1 choose B and Player 2 does not …ght if she observes B while
she …ghts if she observes Q with out-of-equilibrium belief 2 (ts jQ) 0:5. In the other equilibrium,
both types of Player 1 choose Q and Player 2 chooses not to …ght if she observes Q while she …ghts
if she observes B with out-of-equilibrium belief 2 (ts jB) 0:5. We note that the example of Cho
0
and Kreps (1987) corresponds to region S 5
f("1 ; "2 )j"1 < X10 1
1w and X2 2
2w < "2 <
0
X2 2 + p( 2s + 2w )
g
in
Figure
3
and
4.
2w
FIGURE 4. Uniqueness of Equilibrium
We need to distinguish an equilibrium from its realized outcomes. For example, in region
S1
f("1 ; "2 )j"1 2 R and "2 > X20 2 + 2s g, we have one equilibrium where Player 1 plays the
separating equilibrium with (B; Q) and Player 2 chooses F . However, in terms of realized outcomes,
we have two possible outcomes in this region. With probability p (when Player 1 is the strong
type), we will observe the B&F combination but we can also observe the the Q&F combination
with probability 1 p (when Player 1 is the weak type). Likewise even though we have only one
equilibrium (semi-separating where the strong type plays B and the weak type mixes between B
0
0
and Q) in region S 2 f("1 ; "2 )j"1 < X10 1
1w and X2 2 +p( 2s + 2w )
2w < "2 < X2 2 + 2s g,
we can observe all four possible outcomes with some probabilities, respectively.
Theorem 3.2 (Uniqueness of Equilibrium)
Suppose Assumptions IS and SA hold and that
10
1s
> 0,
1w
> 0,
2s
0, and
2w
0 (but not
= 2w = 0). Suppose each Player plays equilibrium that survives the re…nement of Cho and
Kreps (1987) when there exist multiple equilibria. Then, there exists unique equilibrium for each
realization of ("1 ; "2 ) given X.
2s
This uniqueness result helps us to recover the game parameters from observed actions of players
because in this case we can generate model choice probabilities of players’actions given distributional assumption on the unobservables ("1 ; "2 ), which should be one-to-one with observed choice
probabilities. Note that one of key di¢ culties in the econometric implementation of empirical games
has been the presence of multiple equilibria because with multiplicity there are a set of model choice
probabilities that can be all consistent with observed ones, so the data does not tell which ones are
true.
3.1
Estimation: an Overview
Here we brie‡y discuss our estimation strategy based on this uniqueness result and fully develop
it in Section 5. The uniqueness result enables us to obtain a well-de…ned likelihood function for
estimation after some necessary normalizations. Note that in the payo¤s of Player 1 the constant
term 1;0 in 1 is not separately identi…ed from 1s and 1w because in the payo¤s of Payer 1 we
only identify 1;0 + 1s and 1;0
1w as constant terms, which is obvious from (e.g.) Figure 4.
We need to normalize either 1;0 = 0 or consider the symmetric case 1
1s = 1w . We take
the latter approach. By the same reason the constant term 2;0 in 2 is not separately identi…ed
from 2s and 2w , so we let 2
2s = 2w for the Player 2’s payo¤s. Note, however, that these
normalizations are not required to obtain the uniqueness of equilibrium result as in Theorem 3.2.
From the result of Theorem 3.2 and necessary normalizations, we can de…ne the conditional
probabilities for four possible observed outcomes
Pjl (X; ; p) = Pr(Y1 = j; Y2 = ljX) for j; l 2 f0; 1g
where we let = ( 1 ; 2 ; 1 ; 2 ). The speci…c forms of these conditional probabilities are provided
in Appendix B.
Now using these conditional probabilities, one can estimate the game parameter and p, using
the conditional ML method and n observations of game outcomes such that
where
(bCM L ; pbCM L ) = argmax
2 ;p2(0;1)
1 Pn
log L (yi jxi ; ; p)
n i=1
log L (yi jxi ; ; p) = y1i y2i log P11 (xi ; ; p) + y1i (1
+ (1
(1)
y2i ) log P10 (xi ; ; p)
y1i ) y2i log P01 (xi ; ; p) + (1
y1i ) (1
y2i ) log P00 (xi ; ; p).
The estimator bCM L ; pbCM L will be consistent and asymptotically normal under standard regularity conditions (e.g. Newey and McFadden (1994)). Note that this conditional ML estimator is
identical to the ML estimator since the density function of X does not depend on the parameter
( ; p) by Assumption SA.3.
Remark 3 Kim (2006) also considers an alternative information structure where Player 2 has
some private information on her payo¤ , not observed by Player 1. Under the game with IS,
11
both players can observe (X1 ; "1 ) and (X2 ; "2 ) but under this alternative information structure,
Player 1 cannot observe "2 while Player 2 observes both (X1 ; "1 ) and (X2 ; "2 ). It makes Player 1
take expectation over "2 when she derives her expected payo¤ s depending on her choice of signals
under this alternative game. Under this structure, Kim (2006) shows that equilibria exit for any
realization of ("1 ; "2 ) and obtains the uniqueness of equilibrium using Banks and Sobel (1987)’s
re…nement (universal divinity) because Cho and Kreps (1987)’s intuitive criterion is not applicable
to the game with additional incomplete information.
4
Public Information about the Type
Here we consider the case that public signals about the type of Player 1 are available. We denote
such signals by Z, which are observable to all the players of the game and econometricians. Until
now, we have assumed that X1 is independent of the type of Player 1. However, it is likely that
at least some of observed characteristics of Player 1 will reveal information regarding the type of
Player 1. In the beer or quiche game story, characteristics of Player 1 such as muscle intensity,
height, or age will tell how likely Player 1 is the strong type. Thus, Z can include all or subset of
variables X1 .
For the public signals, we require that Player 1 cannot strategically choose the signals Z when
a game is played (For example, in the beer-quiche game, Player 1 cannot work out her muscle when
the game is played.) or at least Player 1 does not have an incentive to do so. This means that in
the game G, only the action A1 plays the role of the strategic signal. The public signals Z has a
mixing distribution with a mixing variable p. We denote the density of Z by
fZ (z) = pf(s) (z) + (1
p)f(w) (z)
(2)
where f(s) and f(w) denote the densities of Z for the strong type and the weak type, respectively
and (f(s) ; f(w); p) are known to the players of the game. Player 2 has an incentive to use these
signals while playing the game. When players play a separating equilibrium, these additional
signals have no additional information for Player 1’s type since Player 1’s type is perfectly revealed
from her action. When the players play a pooling equilibrium, Player 2 will use these additional
signals to update her belief on Player 1’s type using the Bayes’rule. Therefore, under a separating
equilibrium, we have
2 (t1 = ts jA1 ; Z) = 2 (t1 = ts jA1 )
and under a pooling equilibrium, we have
2 (t1
= ts jA1 ; Z = z) =
pf(s) (z)
pf(s) (z) + (1 p) f(w) (z)
(3)
which is also the conditional probability of being strong type given Z = z. We let p(z) denote
pf(s) (z)
this conditional probability, Pr(t1 = ts jZ = z) = pf (z)+(1
p)f(w) (z) . Thus, we have p(Z) =
(s)
2 (t1 = ts jA1 ; Z) under a pooling equilibrium. Note that equation (3) and p(z) become the prior
p, when f(s) (z) = f(w) (z) (no mixture). We augment the information structure and the stochastic
assumptions when the public signals are available as follows:
Assumption 4.1 (IS-A)
12
1 Assumption IS holds.
2 The public signals Z about the types of Player 1 are known to both Player 1 and Player 2.
Assumption 4.2 (SA-A)
1 "1 and "2 are continuously distributed, statistically independent of X and Z.
2 The cdf’s of "1 and "2 are continuous and denoted by G1 ("1 ) and G2 ("2 ) with corresponding
density functions g1 ("1 ) and g2 ("2 ), respectively on their supports R. The density functions
do not depend on the model parameters ( ; p) nor on the type of Player 1.
e1 and X
e2 can be continuous or discrete random variables. Both X
e1 Z and X
e2 are
3 Both X
9
e1 Z and X
e2 as f e
independent of the type of Player 1. We denote the density of X
( ) and
X1 Z
fXe2 ( ), respectively. Neither fXe1 Z ( ) or fXe2 ( ) depends on the structural parameter ( ; p).
4 Z is random variables with the mixing density fZ (z) = pf(s) (z) + (1 p)f(w) (z). Neither
f(s) (z) or f(w) (z) depends on the structural parameter ( ; p).
In the game G under Assumptions IS-A and SA-A, it is not di¢ cult to see that we will obtain
exactly the same equilibria as under Assumptions IS and SA except for replacing 2 (t1 = ts jA1 )
with 2 (t1 = ts jA1 ; Z = z). This means that whenever Player 2’s belief is used in the characterization of equilibrium, we replace p with p(z). Note that we should distinguish the population
distribution of types (Pr(t1 = ts ) = p) from the posterior belief of Player 2 after observing the signal
Z = z under a pooling equilibrium, which should also be the conditional probability of being the
strong type given Z = z in equilibrium. Regardless of the public signals Z, the overall distribution
of types (p) is …xed by nature. We apply the re…nement of Cho and Kreps (1987) to these games
with public signals Z and obtain the same uniqueness of equilibrium as Theorem 3.2.
For our estimation below we will use a logistic speci…cation for the posterior belief of Player 2
under a pooling equilibrium noting that equation (3) can be rewritten as
p(z) =
pf(s) (z)
(p=(1 p)) f(s) (z) =f(w) (z)
=
pf(s) (z) + (1 p) f(w) (z)
1 + (p=(1 p)) f(s) (z) =f(w) (z)
(4)
and thus, the belief only depends on p and the ratio f(s) (z) =f(w) (z). The relationship in (4) means
that for updating, we do not need to know f(s) and f(w) individually but only the ratio between
the two is required.
f (z)
Therefore, by letting ho (z) = log f (s) (z) , we can rewrite (3) as a logistic speci…cation with
(w)
L( ) =
exp( )
1+exp( ) ,
p(z) =
exp (log(p=(1 p)) + ho (z))
= L(log(p=(1
1 + exp (log(p=(1 p)) + ho (z))
p)) + ho (z)) = L(h(z))
(5)
recalling that the posterior belief of Player 2 under a pooling equilibrium equals to the conditional
probability of being strong type given Z = z. We will let h(z) = log(p=(1 p)) + ho (z) and this
becomes one of our parameters of interest as p(z) = L(h(z)).
e1 Z denotes variables that are contained in X
e1 but are excluded from Z. We
With some abuse of notation, X
e1 Z to be empty and in this case assumptions on X
e1 Z become irrelevant.
allow X
9
13
Remark 4 Alternatively we can directly interpret p(z) as the probability of being a strong type for
Player 1 with her characteristics Z = z and assume that this probability is common knowledge to
Player 1 and 2 as p is. In other words, we can simply replace p with p(z) since p(z) is the relevant
type distribution when Player 1 with Z = z plays the game.
Remark 5 Suppose the probability of being a strong type p(z; ) also depends on other characteristics of Player 1, denoted by which are unobservable to Player 2. Then, the posterior belief of
Player 2 will be still p(z) such that p(z) = E[p(z; )jz] while the probability of being strong type will
be p(z; ). This makes our problem more di¢ cult in characterizing the choice probabilities implied
by the model and may require integration over .
Remark 6 We assume Z are continuous variables. However, this is just an ease of notation and
discussion. A subset of Z or all Z can be discrete. In that case part of h(Z) or h(Z) can be
parametric.
5
Identi…cation and Estimation
In this section, we develop our identi…cation and estimation strategies for the game G with public
signals. The information structure of IS-A is maintained but the stochastic assumptions are
strengthened to facilitate estimation. To preserve uniqueness of equilibrium, we also maintain the
assumption that each player plays only one equilibrium that survives the re…nement of Cho and
Kreps (1987) when there exist multiple equilibria.
We introduce additional notation. We let dl denote the dimension of a vector l. We let Y1
denote the indicator function that has the value one when Player 1 chooses “B” such that Y1
1(A1 = B). Similarly, we let Y2
1(A2 = N F ). We also let Y = (Y1 ; Y2 ). We let C( ), C1 ( ),
C2 ( ), and so on denote generic positive constants or functions. For a positive number k, we let
k denote the largest integer smaller than k. Finally, we let the upper case stand for a random
variable and the lower case stand for a realization of it. We use the subscript “0 ” to denote
the true value of parameters. Throughout the paper, we assume that econometricians observe
realizations of X1 [ Z, X2 , Y1 , and Y2 but do not observe "1 or "2 . We let = ( 1 ; 2 )0 denote
parameters that determine the distributions of "1 and "2 such that G1 ("1 ; 1 ) and G2 ("2 ; 2 ). As
in discrete choice models, normalizations of 1 and 2 are necessary. Whenever it is necessary,
we will adopt such normalization. To make notation simple, we assume that all are normalized
or known in what follows and it is not di¢ cult to add the parameter
(after normalization)
into (collection of parameters) if they are separately identi…able. As we have discussed we also
normalize 1
X [ Z 2 W and let
1s =
1w and 2
2s =
2w . Finally, we let W
0
0
0
k
= ( 1; 2; 1; 2) 2 R
(0; 1) [0; 1) with k = k1 + k2 + 2, = ( ; h) 2 A
H. The
is the parameter of interest that econometricians want to estimate.
5.1
Regularity Conditions
Now we impose stronger conditions than SA-A by adding a few “smoothness”conditions for G1 ( )
and G2 ( ). We let E1 , E2 , X1 , X2 , and Z denote the supports of "1 , "2 , X1 , X2 , and Z, respectively.
Conditions below are standard regularity conditions in the literature.
Assumption 5.1 (SA2)
14
1 "1 and "2 are continuously distributed, statistically independent of each other and of W .
2 The cdf’s of "1 and "2 are continuous and denoted by G1 ("1 ) and G2 ("2 ) with corresponding
density functions g1 ("1 ) and g2 ("2 ), respectively on their supports R. The density functions
do not depend on the model parameter or on the type of Player 1.
3 G1 ("1 ) and G2 ("2 ) are three times continuously di¤erentiable with bounded derivatives everywhere in E1 = E2 = R.
e1 and X
e2 can be continuous or discrete random variables. Both X
e1 Z and X
e2
4 Both X
are independent of the type of Player 1. Neither fXe1 Z ( ) or fXe2 ( ) depends on the model
parameter .
5 The support of Z is compact. The density function fZ (z) = pf(s) (z) + (1 p)f(w) (z) is
bounded and bounded away from zero. Neither f(s) (z) or f(w) (z) depends on the structural
parameter ( ; p).
Remark 7 Here we assume that "1 and "2 are independent as a matter of convenience. One can
allow for correlation between the two. Note that the equilibria of the game and the uniqueness results
with re…nements do not change whether "1 and "2 are correlated or not. However, Assumption
SA2.1 simpli…es the form of model probabilities we use for our estimation. If the correlation is
allowed, the model probabilities will contain a joint density and CDF instead of marginal ones.
5.2
Identi…cation
Now we let LY1 Y2 (W; ) denote the conditional probabilities of observed outcomes such that
Ly1 y2 (W; )
Pr(Y1 = y1 ; Y2 = y2 jW; ) for (y1 ; y2 ) 2 f(1; 1); (1; 0); (0; 1); (0; 0)g
where Pr(Y1 = y1 ; Y2 = y2 jW; ) takes rather complicated forms and is de…ned in Appendix B.
Now we let 0 ( 0 ; h0 ) denote the true value of . Later we show that the following condition is
su¢ cient for identi…cation of 0 (see Lemma 5.1).
Assumption 5.2 (ID) Conditional on W , if
6=
0
for ;
Pr (Ly1 y2 (W; ) 6= Ly1 y2 (W;
0
0 ))
2 A, then
>0
for (y1 ; y2 ) 2 f(1; 1); (1; 0); (0; 1); (0; 0)g.
The condition says for identi…cation to hold the model choice probabilities, evaluated at di¤erent
values of parameters , must be di¤erent for non-negligible values of W . Even though Assumption
ID delivers a clear idea about the identi…cation condition but it is a high level condition and does not
provide a concrete idea which parameter is identi…ed under which conditions. Below we consider
more primitive su¢ cient conditions that can satisfy Assumption ID relying on identi…cation at
in…nity argument. However, we stress that identi…cation conditions below are only su¢ cient but
not necessary. Using an identi…cation at in…nity argument is not new for models with strategic
interactions in the literature (e.g., Tamer (2003), Berry and Tamer (2006), and Bajari, Hong, and
Ryan (2010)).
e1;1
Now suppose the matrices of n observations of X1 and X2 have full rank, respectively. Let X
e1 and 1;1 be the coe¢ cient on X
e1;1 . Without loss of generality, further
denote the …rst element of X
15
e1;1 conditional on (X
e1; 1 [ Z; X
e2 ) has an everywhere
assume 1;1 > 0 where the distribution of X
0
e1; 1 = (X
e1;2 ; : : : ; X
e1;k ) . Further assume X
e1;1 is not
positive Lebesgue density where we de…ne X
1
included in Z.
e1;1 tends to in…nity.
Now we consider the model probabilities when X
e
[I1] As X1;1 ! 1, g1 ( ) ! 0 and we obtain
lim
Pr (Y1 = 1; Y2 = 0jW ) = p(Z) 1
lim
Pr (Y1 = 0; Y2 = 1jW ) = (1
e1;1 !1
X
e1;1 !1
X
G2 X20
2
p(Z)) G2 X20
+
(6)
2
2
2
:
(7)
Note that in the above the conditional choice probabilities (in the LHS) are known from the
data (i.e. unique with probability one). Then identi…cation is about whether there is unique
solution in (p(Z); 2 ; and 2 ) to the above equations. Therefore, we can identify p(Z) (and thus
h(Z) = L 1 (p(Z))), e 2 = ( 2;1 ; : : : ; 2;k2 ), and 2;0 + 2 from (6) since Z \ X2 = ? (inherent
exclusion between Player 1 and Player 2) and similarly we can identify p(Z), e 2 , and 2;0
2
from (7). Then from 2;0 + 2 and 2;0
2 we also separately identify 2;0 and 2 . To provide
more details, p(Z) is identi…ed by …xing X2 and varying Z in limXe1;1 !1 Pr (Y1 = 1; Y2 = 0jW ) or
limXe1;1 !1 Pr (Y1 = 0; Y2 = 1jW ). Once we identify p(Z), now we can identify other parameters by
…xing Z and varying X2 in limXe1;1 !1 Pr (Y1 = 1; Y2 = 0jW ) and limXe1;1 !1 Pr (Y1 = 0; Y2 = 1jW ).
For example, given p(Z) identi…cation of e and
+
is achieved because
2
Pr[ lim
e1;1 !1
X
Pr Y1 = 1; Y2 = 0jW; e 2 ;
2;0
+
2;0
2
6=
lim
e1;1 !1
X
2
Pr Y1 = 1; Y2 = 0jW; e 2 ; (
2;0
+
2)
]>0
for any e 2 6= e 2 or ( 2;0 + 2 ) 6= 2;0 + 2 since the matrix of n observations of X2 has full rank.
Once we identify 2 ; 2 ; and p( ), we can treat them as known for identi…cation of other parameters. Next we show that we can also identify parameters for Player 1 by combining outcomes so
that the problem becomes a single agent decision problem.
[I2] Combine B&N F and B&F outcomes and obtain
Pr(Y1 = 1jW )
= Pr(Y1 = 1; Y2 = 1jW; ) + Pr(Y1 = 1; Y2 = 0jW; )
0
= G1 (X10 1
G2 (X20 2
1 )(G2 (X2 2 + 2p(Z) 2
2)
2 ))
0
0
0
0
+p(Z)(G1 (X1 1 + 1 ) G1 (X1 1
))(G
(X
+
)
G
2
2 (X2 2
1
2
2 2
+p(Z)G2 (X20 2
G2 (X20 2 + 2 ))
2 ) + p(Z)(1
R1
2p( )(1 p( )) 2
+ 0 g2 (X20 2 + p( )+2p((1) 2p( ))
G1 (X10 1
2 ) p( )+ (1 p( )) d
1)
R1
2(1 )p( ) 2
2p( )(1 p( )) 2
0
+ 0 p( ) g2 (X2 2 + (1 )p( )+(1 p( ))
(1 G1 (X10
2 ) (1 p( ))2 d
where the model choice probabilities are from Appendix B. Since we can treat
known, we can rewrite (8) in the form
Pr(Y1 = 1jW ) = A(Z; X2 )G1 (X10
1
+
1)
+ B(Z; X2 )G1 (X10
1
1)
2 ))
1
2;
+
2;
(8)
1 ))
and p( ) are
+ C(Z; X2 )
(9)
where A(Z; X2 ); B(Z; X2 ); and C(Z; X2 ) are some known functions of p(Z) and G2 ( )’s. Now we see
that the identi…cation problem of 1 and 1 becomes a nonlinear estimation problem of (9), whose
16
identi…cation conditions are standard in the literature (see e.g. Newey and McFadden (1994)). Note
that e 1 = ( 1;1 ; : : : ; 1;k1 ), 1;0 + 1 , and 1;0
1 are identi…ed if the vectors of n observations of
@ Pr(Y1 = 1jW )
@e
1
@ Pr(Y1 = 1jW )
@( 1;0 + 1 )
@ Pr(Y1 = 1jW )
@( 1;0
1)
= fA(Z; X2 )g1 (X10
1
= A(Z; X2 )g1 (X10
1
= B(Z; X2 )g1 (X10
1
+
+
1)
1)
1)
+ B(Z; X2 )g1 (X10
1
e
1 )gX1
(10)
(11)
(12)
are linearly independent and the matrix of n observations of X1 has the full column rank. Then
from 1;0 + 1 and 1;0
1 we also separately identify 1;0 and 1 . Combining [I1] and [I2], we
conclude that all the parameters ( 1 ; 2 ; 1 ; 2 ; h( )) are identi…ed. We summarize this result in
the following theorem:
e1;1 ) that is excluded from
Assumption 5.3 (ID1) (i) X1 contains at least one variable (say X
both Z and X2 , whose distribution has the full support on R; (ii) The matrices of n observations
e1;1
of X1 and X2 has the full column ranks, respectively; (iii) 1;1 > 0 where the distribution of X
e1; 1 [ Z; X
e2 ) has an everywhere positive Lebesgue density; (iv) The vectors of n
conditional on (X
observations of (10)-(12) are linearly independent.
Theorem 5.1 Suppose the model choice probabilities are given by the form Pr(Y1 ; Y2 jW; ) as in
the Appendix B and suppose Assumption ID1 holds. Then, 1 ; 2 ; 1 ; 2 , and h( ) (i.e., p( )) are
identi…ed.
Remark 8 Here we note that 1 ; 2 ; 1 ; and 2 are identi…ed up to the normalization of the distribution parameter as usual in discrete choice models. For example, if G1 ( ) and G2 ( ) are the
CDF’s normal distributions, we normalize them to be the CDF of standard normal distribution.
Remark 9 If it is of interest, one may also model G1 ( ) and G2 ( ) nonparametrically. Then, such
G1 ( ) and G2 ( ) can be identi…ed from I1-I2 by restricting structures on the payo¤ s similarly with
Berry and Tamer (2006) due to Matzkin (1992).
Remark 10 One may let 1 and 2 depend on characteristics of Player 1 and Player 2, respectively
(e.g., some or all of X1 and X2 ). If we let 1 = exp(X10 1 ) and 2 = exp(X20 2 ), an investigation
of I1-I2 reveals that we can also identify 1 and 2 due to the functional form restrictions. If one
let 1 = X10 1 and 2 = X20 2 , an investigation of I1-I2 also reveals that we cannot identify 1
and 2 without suitable normalizations.
Remark 11 One can also consider asymmetric payo¤ s such that e 1 equals to e s for the strong
type and to e w for the weak type. A careful investigation of I2-I2 reveals that e s and e w can also
be identi…ed under Assumption ID1.
One may concern that the identi…cation of 1 and 1 in I2 heavily relies on the nonlinearity of
G1 ( ) and G2 ( ). Here we illustrate that this is not the case. For this exercise only, assume that
G1 ( ) and G2 ( ) are CDF’s of uniform distributions over [0; 1]. In this case one can show that the
equation (9) becomes
Pr(Y1 = 1jW ) = A(Z) + B(Z)X10 1 + C(Z) 1
(13)
17
where A(Z); B(Z); and C(Z) are known functions of p(Z) and 2 . Then, identi…cation of 1 and
1 becomes a regression problem of (13). Therefore unless A(Z), B(Z)X1 , and C(Z) have a linear
relationship (i.e., the vectors of n observations of them are linearly independent), identi…cation
does hold in this case too.
5.3
Estimation
We estimate the parameters of interest using a conditional sieve ML approach. In this estimation,
we approximate the unknown function h using sieves. For this purpose, we need to restrict the
1 (Z)
space of functions H where the true function h0 belong. We consider the Hölder space
with order 1 > 0 (see e.g., Ai and Chen (2003) and Chen (2007)).10 The Hölder ball (with radius
C1 < 1g where jj jj 1
C1 ) C11 (Z) is de…ned accordingly as C11 (Z) fg 2 1 (Z) : jjgjj 1
denotes a Hölder norm.
In the literature, it is known that functions in C11 (Z) can be approximated well by various
sieves such as power series, Fourier series, splines, and wavelets. We let H = C11 (Z). In particular,
we approximate h( ) by power series. According to Theorem 8, p.90 in Lorentz (1986) (Also see
Timan (1963)), if a function f is s-times continuously di¤erentiable, then there exists a K-vector
K
K
K
0
K and a triangular array of polynomials R (z) (particularly E[R (Z)R (Z) ] = IK ) on the
compact set Z such that
(14)
supz2Z jf (z) RK (z)0 K j C1 K s=dz :
As as approximation of H, we then consider a sieve space, Hn , based on the tensor-product power
series11 :
Hn = fh(z)jh(z) = RKn (z)0 for all satisfying khk 1 C1 g
(15)
such that Hn Hn+1 : : : H.
Then, from (14), we can …nd an approximate function hn ( )
supz2Z jh0 (z)
RKn (z)0
Kn j
= O Kn
RKn ( )0
1 =dz
Kn
2 Hn such that
:
(16)
This leads us to obtain the estimator en and e
hn by maximizing the following sample criterion
function
en ; e
b ; h) 1 Pn l(yi jwi ; ; h)
en
hn =
argmax
Q(
(17)
n i=1
( ;h)2An
Hn
where l(yi jwi ; ; h) denotes the single observation conditional log likelihood function as
l(yi jwi ; ; h)
=
+ (1
y1i y2i log L11 (wi ; ; h) + y1i (1 y2i ) log L10 (wi ; ; h)
y1i ) y2i log L01 (wi ; ; h) + (1 y1i ) (1 y2i ) log L00 (wi ; ; h)
:
10
The Hölder space is a space of functions g : Z ! R such that the …rst 1 derivatives are bounded and the 1 -th
derivatives are Hölder continuous with the exponent 1
1 2 (0; 1], where 1 is the largest integer smaller than 1 .
The Hölder space becomes a Banach space when endowed with the Hölder norm:
jjgjj
11
1
= sup jg(z)j +
z
max
sup
a1 +a2 +:::adz = 1 z6=z 0
jra g(z) ra g(z 0 )j
< 1, where ra g(z)
(jjz z 0 jjE ) 1 1
@ a1 +a2 +:::adz
g(z):
a
@z1a1 : : : @zdzdz
For comprehensive discussion of tensor-product linear sieves including power series, see Chen (2007).
18
We call e n the exact sieve conditional ML estimator. However, because the complexity (in the sense
de…ned in the Appendix C.3) of the sieve space An increases with n and because the maximizer
b ; h) over An to be
of (17) is often obtained numerically, we do not require the maximization of Q(
exact. An approximated estimator b n is enough for the asymptotic results we desire, given by
b n)
Q(b
sup
2An
b )
Q(
Op ( n ) , with
n
= o(1).
(18)
We call b n in (18) the approximate sieve ML estimator. Note that we choose the order of n
such that it can justify desirable asymptotics results (e.g., consistency, convergence rates, and the
asymptotic normality). Now also de…ne the population criterion function
Q( )
Q( ; h) = E [l(Y jW; ; h)] :
(19)
The following lemma shows that if Assumption ID holds, then l(Y jW; ) satis…es an information
inequality result which is useful to prove the consistency of our proposed estimator b n in (18)
b ) to Q( ).
together with the uniform convergence of Q(
Lemma 5.1 (Identi…cation) Suppose Assumptions IS-A and SA2 hold. Further suppose Assumption ID holds. Then, Q( ) < Q( 0 ) for all 6= 0 2 A.
The proof can be found in the Appendix.
5.3.1
Consistency and Convergence Rates of the Sieve Conditional ML
In this section we show the consistency of our sieve conditional ML estimator and derive its convergence rates. The consistency of sieve MLE was derived in Wong and Severini (1991) and Geman
and Hwang (1982) for i.i.d data. Some consistency results of sieve M-estimators can be found in
Gallant (1987) and Gallant and Nychka (1987). Chen (2007) presented a consistency result of
sieve extremum estimators allowing for non-compact in…nite-dimensional A, which is an extension
of Theorem 2.1 in Newey and McFadden (1994) and of Lemma A1 in Newey and Powell (2003).
Below using Theorem 3.1 in Chen (2007), we establish the consistency under a pseudo metric k ks
de…ned by
k 1
h 2 k1
2 ks = k 1
2 kE + kh1
where k kE is the Euclidean norm and khk1 = sup jh(z)j. We make additional assumptions for this
purpose.
z2Z
Assumption 5.4 (SA3) (i) fY1i ; Y2i ; Wi gni=1 are iid; (ii) 0 = ( 0 ; h0 ) 2 A
compact with nonempty interior and H = C11 (Z); (iv) Kn ! 1 and Kn =n ! 0.
H; (iii)
is
Lemma 5.2 (Consistency) Suppose Assumptions IS-A, SA2, ID, and SA3 hold and suppose
Condition 1 (Lipschitz conditions) in Appendix C holds. Then, kb n
0 ks = op (1).
The proof can be found in the Appendix. This consistency result is obtained combining the
identi…cation condition and the uniform convergence of the sample criterion function as in the case
of the parametric extremum estimation.
19
Next we consider the convergence rate of the sieve conditional ML estimator under a weaker
metric12 . We present a convergence rate using Theorem 3.2 of Chen (2007) which is a version
of Chen and Shen (1996) for i.i.d data. Now suppose that A =
H is convex in 0 so that
+
(
)
2
A
for
all
small
2
[0;
1]
and
all
…xed
2
A.
Suppose
that
the pathwise derivative
0
0
of l( ) at the direction [
y in the support of W Y. We
0 ] is well-de…ned for almost all w
denote the pathwise …rst derivative at the direction [
]
evaluated
at 0 by
0
dl(yi jwi ;
d
0)
[
0]
lim
!0
=
Now we de…ne the L2 (P0 )-norm, k
0 , i.e.
2
0 k2
k
dl(yi jwi ; (1
@l(yi jwi ;
@ 0
0 k2 ,
=E
"
0)
)
0
0)
+
+
)
d
(
dl(yi jwi ;
dh
0)
[h
h0 ]:
based on the pathwise derivative of l( ) evaluated at
dl(Yi jWi ;
d
0)
2
[
0]
#
:
(20)
This is the ML version of Ai and Chen (2003)’s L2 (P0 )-metric, which is quite natural since it is the
Fisher norm (see Wong and Severini (1991)) and the conditional ML version of the metric used by
Wong and Severini (1991).
Proposition 5.1 Let b n be the approximate sieve ML de…ned in (18). Suppose Assumptions ISA, SA2, ID, and SA3 hold. Suppose Conditions 1 and 2-3 (Lipschitz conditions) in Appendix C
hold. Then, we have
n
o
p
kb n
:
Kn =n ; O Kn 1 =dz
0 k2 = Op max O
Moreover, with Kn = C ndz =(2
1 +dz )
and
1 =dz
> 1=2, we have kb n
0 k2
= Op n
1 =(2 1 +dz )
.
The proof of this proposition can be found in the Appendix.
Remark 12 In the estimation we need to choose an appropriate Kn but the asymptotic results
give us little guideline how to choose Kn . In principle, one can utilize data-driven methods such as
cross-validation, AIC, or BIC to select such Kn . See Ichimura and Todd (2005) for related issues,
which are beyond the scope of this paper.
5.3.2
Asymptotic Normality
p
In this section, we derive the n-asymptotic normality of the structural parameters bn . Here we
show that even though we estimate the …nite dimensional parameter and the in…nite dimensional
parameter h simultaneously, the parametric rate inference for the …nite dimensional parameter is
still feasible. The following discussion and notation are based on Theorem 4.3 of Chen (2007),
which is a simpli…ed version of Shen (1997) and Chen and Shen (1998).
Suppose the functional of interest, f : A ! R, is smooth in the sense that
df (
d
0)
[
0]
lim
f(
0
+ (
0 ))
f(
0)
!0
12
Shen and Wong (1994), and Birgé and Massart (1998) derived the rates for general sieve M-estimation. Van de
Geer (1993) and Wong and Shen (1995) presented the rates for sieve MLE for i.i.d data.
20
is well-de…ned and
df (
d
0)
df (
d
sup
[
0]
k
0 k2 >0
2A;k
0)
Now let V denote the closure of the linear span of A
V ; k k2 is a Hilbert space with inner product
h
1;
2i
dl(Y jW;
d
=E
0)
[
0
0)
[
0]
=h
0;
under the metric k
dl(Y jW;
d
1]
Then, by the Riesz representation theorem, there exists
df (
d
< 1.
0 k2
0)
[
2]
df (
d
0)
Then,
.
2 V such that, for any
i i¤
0 k2 .
< 1.
2 A,
(21)
In particular, we are interested in the linear functional f ( ) = 0 for any …xed and nonzero 2 Rd
0
0
in our inference problem. Then, f ( )
is a linear functional on V . To estimate f ( )
at
p
a n rate, f ( ) has to be bounded (i.e. sup06=
jf
(
)
f
(
)j
=
k
k
<
1)
according
0
0 2
0 2V
to Van der Vaart (1991) and Shen (1997).
Now let H h0 denote the closure of the linear span of H h0 . De…ne bj 2 B
H h0 for
each component j of such that
"
#
2
dl(Y jW; 0 ) dl(Y jW; 0 )
bj = argmin E
[bj ]
:
(22)
d j
dh
bj 2B
Now de…ne b = (b1 ; : : : ; bd ),
dl(Y jW;
dh
0)
dl(Y jW;
dh
[b ] =
0)
[b1 ]; : : : ;
dl(Y jW;
dh
0)
[bd ] ;
jW; 0 )
and Db (Y; W ) dl(Y djW;0 0 ) dl(Y dh
[b ]. Here Db (Y; W ) becomes the 1 d vector of e¢ cient
scores that are given by mean projection residuals of the scores of the …nite dimensional parameters
dl(Y jW; 0 )
jW; 0 )
on the scores associated with nuisance parameters dl(Y dh
[ ]. For the inference we
d
need additional regularity conditions that ensure the boundedness of f ( ), the existence of wellbehaving e¢ cient scores for parametric components, and smoothness of each bj (z) that needs to
be well-approximated in …nite samples. We impose
Assumption 5.5 (SA4) (i) 0 2int( ); (ii) E [Db (Y; W )0 Db (Y; W )] is positive de…nite; (iii)
m
each element bj (Z) belongs to the Hölder space Cjj (Z) with mj > dz =2.
Now note that we have
df (
d
0)
[
0]
=(
f( )
f(
0)
In addition, we can show that for f ( )
sup
06=
0 2V
jf ( )
k
2
0 )j
2
0 k2
f(
0
0
0)
df (
d
with
=
0
and so
0)
[
0]
= 0.
2 Rd , 6= 0,
E Db (Y; W )0 Db (Y; W )
21
(23)
1
which implies f ( ) = 0 is bounded (in the sense of sup06=
jf ( ) f ( 0 )j = k
0 k2 < 1)
0 2V
0
if and only if E [Db (Y; W ) Db (Y; W )] is positive-de…nite, which is our Assumption SA4 (ii). For
this case, there exists
2 V such that
f( )
f(
0)
0
(
0)
=h
;
0i
for all
2A
(24)
by (23) and by the Riesz representation theorem (see (21)). We …nd that
( ; h ) 2 V satis…es
1
0
(24) with
= (E [Db (Y; W ) Db (Y; W )])
and h = b
. Moreover, Assumption SA4
(iii) ensures that b can be well-approximated by sieves, so
is also well-approximated by sieves.
This plays an important role in the asymptotic normality result. The following theorem states that
p
we can achieve the n-asymptotic normality for the structural parameters.
Theorem 5.2 Suppose Assumptions IS-A, SA2, ID, SA3, and SA4 hold and suppose Conditions 1 and 2-3 (Lipschitz conditions) in the Appendix hold. Then, we have
p
n(bn
)
0 ) !d N (0;
where
= fE [Db (Y; W )0 Db (Y; W )]g
1
.
The proof of this theorem can be found in the Appendix.
5.3.3
A Consistent Covariance Estimator
To do a statistical inference of the structural parameters based on Theorem 5.2, we need a consistent
estimator of
. For this we …rst need a consistent estimator of b . Similarly with Ai and Chen
(2003), we can estimate bj by bbj , j = 1; : : : ; d such that
bb = argmin
bj 2Hn
j
n
X
i=1
dl(yi jwi ; b n )
d j
dl(yi jwi ; b n )
[bj ]
dh
2
:
(25)
Note that for linear sieves, bbj is easily obtained by regressing the derivatives of l( j ; b n ) with respect
to j on the derivatives of l( j ; b n ) with respect to h. Finally, we estimate
by
b
1 Pn
Db (yi ; wi ; b n )0 Dbb (yi ; wi ; b n )
n i=1 b
where bb = (bb1 ; : : : ; bbd ) and Db (y; w; ) =
under suitable regularity conditions.
@l(yjw; )
@ 0
dl(yjw; )
[b
dh
1
]. We show that b is consistent
Proposition 5.2 Suppose Assumptions IS-A, SA2, ID, SA3, and SA4 hold and suppose Conditions 1 and 2-3 (Lipschitz conditions) in the Appendix hold. Then, b =
+ op (1).
Note that for the consistency of our sieve ML estimator we need to promise Kn tends to in…nity
as the sample size grows to in…nity, i.e. we need to use more ‡exible speci…cations for hn ( ) as n
grows. However, in practice, one always does estimation and inference with …xed Kn = K. Even
though the asymptotics will be di¤erent under two di¤erent scenarios: parametric model (…xed K)
and semiparametric model (increasing Kn ) as we have derived here, the computed standard errors
under two di¤erent scenarios can be numerically identical or equivalent (see Ackerberg, Chen, and
Hahn (2012)). Therefore, in practice we can ignore the semiparametric nature of our model and
proceed both estimation and inference (e.g. calculating standard errors) as if the parametric model
is the true model as in (e.g.) Newey and McFadden (1994).
22
5.4
Estimation of the Mixing Ratio p
If it is of interest, we can distinguish between the population probability p of the strong type (i.e.,
the mixing ratio) and the posterior belief p(Z) of Player 2, which is also the conditional probability
of being strong type conditional on Z. Identi…cation of p becomes interesting if one wants to know
the overall distribution of strong type in the whole population.13
We can identify p using the relationship obtained from fZ (z) = pf(s) (z) + (1 p)f(w) (z) and
(4) as
Z
R
R
E[p(Z)] = p(z)fZ (z)dz =
pf(s) (z) =fZ (z) fZ (z)dz = p f(s) (z) dz = p.
(26)
From this result, we can estimate p as
pbn =
1 Pn
1 Pn
b
exp(b
hn (zi ))=(1 + exp(b
hn (zi )))
i=1 L(hn (zi )) =
n
n i=1
(27)
where b
hn ( ) is obtained from (18). Applying the mean value theorem with e
hn that lies between b
hn
and h0 , we have
pbn
=
=
1
n
1
n
p0
Pn
P
L(h0 (Zi )) + n1 ni=1 (L(h0 (Zi )) E [L(h0 (Zi ))])
P
L(e
hn (Zi )))(b
hn (Zi ) h0 (Zi )) + n1 ni=1 (L(h0 (Zi ))
h0 jj1 + op (1)
b
i=1 L(hn (Zi ))
Pn
e
i=1 L(hn (Zi ))(1
(1=4)jjb
hn
E [L(h0 (Zi ))])
(28)
where the second equality is obtained by
) = L( )(1 L ( )) and the last result holds since
L( )(1 L ( )) 1=4 and the second term in the RHS of (28) is op (1) by LLN noting jL(h0 ( ))j < 1
and fZi gni=1 are iid.
Therefore, pbn is consistent as long as b
hn is consistent. We can also derive the asymptotic
distribution of pbn following Chen, Linton, and van Keilegom (2003) because (26) can be written as
a moment condition
m(p; h) = p L(h), E[m(p0 ; h0 )] = 0
R
and because we have an initial estimator b
hn from the sieve conditional ML. Let M (h) = Z L(h)dFZ
P
(where FZ denote the CDF of Z) and Mn (h) = n1 ni=1 L(h(Zi )). We state the asymptotic normality
of pbn as follows.
L0 (
Proposition 5.3 Suppose (i) jjb
hn
p
Then,
p
n (b
pn
n
R
Z L(h0 )(1
h0 jj1 = op (n
L(h0 ))(b
hn
1=4 )
and (ii)
h0 )dFZ + Mn (h0 )
M (h0 ) !d N (0; Vp ).
(29)
p0 ) !d N (0; Vp ).
Note that the …rst term in the LHS of (29) appears due to the fact that we use an initial
estimator b
hn and will disappear if b
hn = h0 . The proof of this Proposition can be found in the
Appendix. Note that we satisfy the condition (i) above: jjb
hn h0 jj1 = op (n 1=4 ) by combining
2 =(2 1 +dz )
the result in Proposition 5.1 and the fact kh h0 k1 kh h0 k2 1
(Lemma 2 in Chen and
13
However, as discussed in Remark 4, one can interpret p(z) as the probability of being a strong type for Player 1
with her characterisitics equal to Z = z, then the distinction between p(Z) and p is less interesting.
23
Shen (1998)). In the Appendix, we also discuss how the condition (ii) (also the condition (iv) in
Proposition 5.4 below) can be satis…ed for the sieve conditional ML estimator under Assumptions
in Theorem 5.2.
Finally we note that even if an explicit form of Vp can be derived, a feasible estimator of Vp
may be di¢ cult to calculate. Alternatively, we can use the ordinary nonparametric bootstrap. The
p
following proposition shows that we can approximate the distribution of n (b
pn p0 ) using the
bootstrap. We use “ ” to denote the bootstrap counterpart of the original sample fZi gni=1 . We let
P
Mn (h) = n1 ni=1 L(h(Zi )).
Proposition 5.4 Suppose (i) with P -probability tending to one, b
hn 2 H, and jjb
hn b
hn jj1 =
1=4
1=4
b
oP (n
);(ii) jjhn h0 jj1 = o(n
) a:s:; for any positives sequence n tending to zero,
(iii) supkh h0 k1 n jMn (h) M (h) Mn (h0 ) + M (h0 )j = o(n 1=2 ) a:s:;
p R
hn )(1 L(b
hn )) b
hn b
hn dFZ + Mn (b
hn ) Mn (b
hn ) = N (0; Vp ) + oP (1).
(iv) n Z L(b
Then,
6
p
n (b
pn
pbn ) converges in distribution to N (0; Vp ) in P -probability.
Monte Carlo Simulation
In this section we investigate …nite sample performance of our proposed estimator. To check the
validity of our estimation method we conduct a small-scale Monte Carlo experiment and show that
the conditional ML estimator can recover the true parameters reasonably well. We …rst generate
payo¤s and type relevant variables and then generate equilibrium actions of players using the game
structure (Figure 2) as below.
For each given realization of payo¤s relevant variables (X1 ; X2 ) and type relevant variables Z
we generate 20 total equilibrium actions of Player 1 and 2 (Y1 ; Y2 ) such that conditional means
of E [Y1 Y2 jX1 ; X2 ; Z], E [Y1 (1 Y2 )jX1 ; X2 ; Z], E [(1 Y1 )Y2 jX1 ; X2 ; Z], and E [(1 Y1 )(1
Y2 )jX1 ; X2 ; Z] are matched with the model choice probabilities (at the true parameter values) we
derive in Appendix B. Here E [ jX1 ; X2 ; Z] denotes the average over simulation draws on unobservables in the payo¤s functions ("1 ; "2 ) that follow independent standard normal distributions.
In this Monte Carlo design we let X1;1 ; X1;2 ; X2;1 , X2;2 ; Z1 ; and Z2 N (0; 1), and respectively
and let X1 = [X1;1 ; X1;2 ], X2 = [X2;1 ; X2;2 ], and Z = [Z1 ; Z2 ]. In all designs below we let X1
and X2 be jointly independent each other. For the true payo¤s parameters we take 1 = (1; 1)
and 2 = (1; 1) and for the strategic interaction parameter values we take ( 1s ; 1w ; 2s ; 2w ) =
(1; 1; 1; 1). Here we can estimate these interaction parameters separately for each type because
there is no intercept in Player’s payo¤s. Finally we directly model the probability of being a strong
type for Player 1 with her characteristics Z as p(Z) = exp( 0 + Z 1 )=(1 + exp( 0 + Z 1 )) and treat
this p(Z) as the model primitive. We do this because in our MC experiments we focus whether the
conditional ML procedure can recover this function p(Z) well together with our game parameters.
We pick ( 0 ; 1 ) = (0:2; 1; 1) for the true values. We generate T = 200 and 500 observations of
(X1 ; X2 ; Z), respectively.
In the …rst design [1] we let X1 , X2 , and Z 6= X1 be all mutually independent and in the second
design [2] we let X1 = Z, i.e. the same variables X1 both a¤ect the payo¤s and inform about the
type of Player 1. In the third design [3] we assume there is no public signal, so p(Z) = p and we
24
take p = 0:2. In the last design [4] we consider a semiparametric model where we pretend “not to
know ” the functional form of p(Z). We consider the following functional form:
P (Z) = exp(h(Z))=(1 + exp(h(Z))) with h(Z) = 0:2Z + log(Z 2 =2 + 1)=5
where Z
N (0; 1). In our sieve estimation we approximate “unknown” h(Z) using the Hermite
polynomials approximation (e.g. Newey and Powell 2003) as
h(Z)
Z
0
+
K
X
k
exp( Z 2 )Z k
1
k=1
and we take K = 3 or 4 in our estimations. Table 1 and 2 below summarize estimation results
based on these MC designs and we calculate the bias and the RMSE (root mean squared errors),
averaged across 50 repetitions of each design. For the estimates of p(Z) we also average across the
realized values of Z to obtain the bias and the RMSE.
Estimation results in our Monte Carlo experiments show that game parameters are reasonably
well recovered from the conditional ML estimation. The performance of estimators in terms of
RMSE generally improves as the sample size grows. This is true whether or not there exist public
signals: Design [1]&[2] vs Design [3]. By comparing Design [1] and Design [2] we also …nd the
estimators for Player 1’s game parameters in Design [1] overall perform better than those in Design
[2], which suggests that exclusion restriction does help identi…cation because variables that vary
payo¤s and those vary signals are di¤erent. However, this di¤erence is not overwhelming and in
Design [2] where Z = X1 game parameters are still well estimated.
In the semiparametric estimation of Design [4] where we do not know the true functional form
of p(Z) the estimators perform reasonably well, suggesting our sieve approach can suitably approximate the true function. Comparing semiparametric estimation results with K = 3 and K = 4
we …nd the estimates are similar in both bias and RMSE suggesting the third order term in (6)
with K = 4 is less important. Although it is a limited evidence in any nature, overall these Monte
Carlo experiments suggest that the signaling game model in Section 2 can be identi…ed from the
proposed conditional ML estimation method.
Table 1. Monte Carlo Experiment with T=200: Parametric Models
Design [1]
Design [2]
Design [3]
Public Signal Z 6= X1 Public Signal Z = X1 No Public Signal
Bias
RMSE
Bias
RMSE
Bias
RMSE
-.0161
.0461
.0022
.0426
.0058
.0466
1;1
-.0069
.0614
.0720
.0940
.0065
.0453
1;2
.0258
.0293
.0255
.0302
-.0037
.0202
2;1
-.0272
.0308
-.0223
.0265
.0042
.0170
2;2
.0649
.0844
.0838
.0981
.0537
.1241
1s
.0408
.0727
.0242
.0594
.0599
.0724
1w
-.0098
.0215
-.0210
.0288
.0303
.0417
2s
-.0182
.0249
-.0278
.0348
-.0112
.0215
2w
.0023
.0140
-.0021
.0111
0
.0143
.0176
.0115
.0176
1
.0068
.0124
.0032
.0216
2
p
-.0029
.0032
25
Table 2. Monte Carlo Experiment with T=500: Parametric and Semiparametric Models
Design [1]
Design [2]
Design [3]
Design [4]
Public Signal
Public Signal
No Public Signal
Unknown p(Z)
Z 6= X1
1;1
1;2
2;1
2;2
1s
1w
2s
2w
0
1
2
Bias
-.0128
.0183
.0294
-.0310
.0582
.0328
-.0029
-.0127
.0021
.0169
.0170
RMSE
.0489
.0441
.0330
.0346
.0650
.0591
.0206
.0271
.0089
.0179
.0188
p
p(Z)
7
Z = X1
Bias
.0319
.0193
.0316
-.0316
.0794
.0643
-.0100
-.0051
-.0002
.0162
.0080
RMSE
.0467
.0551
.0344
.0337
.0884
.0810
.0164
.0248
.0095
.0192
.0119
K=3
Bias
-.0252
.0256
.0332
-.0327
-.0210
.0320
.0427
.0251
RMSE
.0490
.0449
.0348
.0338
.0846
.0551
.0481
.0289
-.0027
.0031
K=4
Bias
-.0469
.0507
.0402
-.0410
.0638
-.0084
.0082
.0008
RMSE
.0589
.0635
.0412
.0416
.0849
.0566
.0148
.0140
Bias
-.0403
.0469
.0409
-.0430
.0648
-.0053
.0104
.0022
RMSE
.0505
.0581
.0426
.0441
.0886
.0320
.0179
.0175
-.0118
.0232
-0.0098
0.0210
Conclusion
Though estimating game theoretic models with strategic interactions is a growing literature, there
are very few studies on estimable signaling games. This paper develops an econometric modeling
of a signaling game with two players where one player, informed party, has a private information
on its type. In particular, we provide an estimation strategy that identi…es the payo¤s structure
and the conditional probabilities of being a particular type from data of observed actions. Though
multiplicity of equilibria arises in this game, we show that uniqueness of equilibrium given any
realization of payo¤s can be achieved if players play PBE and choose only one equilibrium out of
multiple equilibria, which survives the re…nement of Cho and Kreps (1987). This uniqueness result
enables us to derive well-de…ned conditional choice probabilities and our proposed ML estimator is
built on them.
We also consider public signals about the type of a player. The uninformed player will use this
information to update her belief on types after observing an action of the informed player. Since
the distribution of these public signals is nonparametrically speci…ed, we estimate the model using
a sieve conditional MLE where the in…nite dimensional parameters are approximated by sieves. We
obtain the consistency and the root n-asymptotic normality of the structural parameters estimates.
We note that in signaling games, multiple equilibria naturally arise due to the self-ful…lling
property. We have resolved this problem by re…ning the equilibrium as an equilibrium selection.
Even though an equilibrium selection of a signaling game is more acceptable than that of a simultaneous move game, at least theoretically, it is still an open question whether such re…nement is
relevant in reality. Following the seminal work by Manski (1995) on bound analysis (see also Ciliberto and Tamer (2009), Andrews, Berry, Jia (2004), Beresteanu, Molchanov, and Molinari (2011),
Holmes (2011), and Pakes, Porter, Ho, and Ishii (2006) for related approaches), as an attractive
alternative, we may develop a set estimation on our game model relaxing equilibrium re…nement
in future research.
26
Appendix
A
Equilibrium Re…nement and Uniqueness of Equilibrium (Proof
of Theorem 3.2)
We show that in all regions where multiple PBE exist (see Figure 3) only one unique PBE survives
the Intuitive Criterion of Cho and Kreps (1987).
[1] On region Aa
f("1 ; "2 ) j"1 > X10 1 + 1w and X20 2 + ( 2s + 2w )p( )
2w < "2 <
0
X2 2 + 2s g, we show that the pooling with (B; B) cannot survive the Intuitive Criterion of Cho
and Kreps (1987) while the pooling with (Q; Q) survives it.
Pooling with (A1ts ; A1tw ) = (B; B) and A2jB = F :
Note u1 (ts ; B; F ) > u1 (ts ; Q; N F ) since X10 1 "1
1s < 0 under Aa and note u1 (ts ; B; F ) >
u1 (ts ; Q; F ) since 1s > 0. This means that the strong type has no incentive to deviate in
any case. Thus, Player 2 will assign 2 (t1 = ts jQ) = 0 and hence Player 2 will choose F after
observing the deviation play Q under region Aa . Now we need to check whether the weak type
is better o¤ by deviation under this situation. Note u1 (tw ; B; F ) =
1w < u1 (tw ; Q; F ) = 0
and thus the weak type will deviate for sure. Therefore, this equilibrium fails the Intuitive
Criterion.
Pooling with (A1ts ; A1tw ) = (Q; Q) and A2jQ = F :
Note u1 (tw ; Q; F ) > u1 (tw ; B; N F ) since X10 1 "1 1w < 0 under Aa and note u1 (tw ; Q; F ) >
u1 (tw ; B; F ) since 1w > 0. This means that the weak type has no incentive to deviate in
any case. Thus, Player 2 will assign 2 (t1 = ts jB) = 1 and hence Player 2 will choose
N F after observing the deviation play B under region Aa . Now we need to check whether
the strong type is better o¤ by deviation under this situation. Note u1 (ts ; Q; F ) =
1s >
0
u1 (ts ; B; N F ) = X1 1 "1 under S 4 and thus the strong type will not deviate. Therefore,
this equilibrium survives the Intuitive Criterion.
0
0
[2] On region Ab = f"1 < X10 1
1s and X2 2
2w < "1 < X2 2 + ( 2s + 2w )p( )
2w g, we
show that the pooling with (Q; Q) cannot survive the Intuitive Criterion of Cho and Kreps (1987)
while the pooling with (B; B) survives it.
Pooling with (A1ts ; A1tw ) = (Q; Q) and A2jQ = N F :
Note u1 (tw ; Q; N F ) > u1 (tw ; B; F ) since X10 1 "1 + 1w > 0 under Ab and note u1 (tw ; Q; N F ) >
u1 (tw ; B; N F ) since 1w > 0. This means that the weak type has no incentive to deviate in
any case. Thus, Player 2 will assign 2 (t1 = ts jB) = 1 and hence Player 2 will choose N F
after observing the deviation play B under region Ab . Now we need to check whether the
strong type is better o¤ by deviation under this situation. Note
u1 (ts ; Q; N F ) = X10
1
"1
1s
< u1 (ts ; B; N F ) = X10
1
"1
since 1s > 0. Thus, the strong type will deviate for sure. Therefore, this equilibrium fails
the Intuitive Criterion.
Pooling with (A1ts ; A1tw ) = (B; B) and A2jB = N F :
Note u1 (ts ; B; N F ) > u1 (ts ; Q; N F ) since 1s > 0 and note u1 (ts ; B; N F ) > u1 (ts ; Q; F )
27
under Ab . This means that the strong type has no incentive to deviate in any case. Thus,
Player 2 will assign 2 (t1 = ts jQ) = 0 and hence Player 2 will choose F after observing the
deviation play Q under region Ab . Now we need to check whether the weak type is better o¤
by deviation under this situation. Note
u1 (tw ; B; N F ) = X10
"1
1
1w
> u1 (tw ; Q; F ) = 0
under region S 5 . Thus, the weak type will not deviate. Therefore, this equilibrium survives
the Intuitive Criterion.
[3] The separating equilibrium (B; Q) with (A2jB = N F; A2jQ = F ) under region S 3 cannot fail
the Intuitive Criterion because none of Player 1 wants to deviate regardless of Player 2’s action.
B
Conditional Probabilities of Observed Outcomes
In the Supplementary Appendix, we derive all possible PBE of the signaling game. Then applying
the re…nement of Cho and Kreps (1987) we achieve uniqueness of equilibrium. This enables us to
derive the conditional probabilities of observed outcomes given the parameterization of the signaling
game. Here we summarize the conditional probabilities in the game with IS-A and SA2. For the
game with IS and SA, we obtain the same forms of conditional probabilities but in replace of p(Z)
with p, so W with X.
For the conditional ML estimation using these model probabilities we need to normalize either
( 1;0 = 0; 2;0 = 0) or ( 1s = 1w , 2s = 2w ) below.
1 (Y1 = 1; Y2 = 1) i.e. (B; N F ): It occurs under S 5 with probability one (pooling), under
S 3 [ S 7 with probability p(Z) (separating), and under S 2 [ S 6 (semi-separating). Combining
measures of these regions, we obtain
Pr (Y1 = 1; Y2 = 1jW; ) =
0
G2 (X20 2
G1 (X10 1
1w ) (G2 (X2 2 + p(Z) ( 2s + 2w )
2w )
2w ))
0
0
0
0
+p( ) (G1 (X1 1 + 1s ) G1 (X1 1
))
(G
(X
+
)
G
(X
2
2
1w
2s
2w ))
2 2
2 2
0
+p( )G2 (X2 2
2w )
R1
p( )(1 p( ))( 2s + 2w )
(p( ) + (1 p( )) ) g2 X20 2 + p( )+ p((1) p( )) ( 2s + 2w )
d
2w
(p( )+(1 p( )) )2
+ 0R 1
2
0
e2 g1 (X1 1
1w =e 2 ) 1w =e 2 de 2
R1 0
p( )(1 p( ))( 2s + 2w )
)p( )
p( ) g2 X20 2 + (1 (1
( 2s + 2w )
d
2w
0
)p(
)+(1
p(
))
(1 p( ))2
+ R1
2
0
2 )) 1s = (1
2) d 2
0 2 g1 (X1 1 + 1s =(1
2 (Y1 = 1; Y2 = 0) i.e. (B; F ): It occurs under S 1 with probability p(Z) (separating) and under
S 2 [ S 6 (semi-separating). We obtain
Pr (Y1 = 1; Y2 = 0jW; ) =
p( ) (1 G2 (X20 2 + 2s ))
R1
(p( ) + (1 p( )) ) g2 X20 2 + p( )+ p((1) p( )) ( 2s +
0
+ R1
2
(1 e2 ) g1 (X10 1
1w =e 2 ) 1w =e 2 de 2
R1 0
)p( )
p( )g2 X20 2 + (1 (1
2w
)p( )+(1 p( )) ( 2s + 2w )
+ 0R 1
2
0
2 ) g1 (X1 1 + 1s =(1
2 )) 1s = (1
2) d 2
0 (1
28
2w )
2w
p( )(1 p( ))( 2s + 2w )
d
(p( )+(1 p( )) )2
p( )(1 p( ))( 2s +
(1 p( ))2
2w )
d
3 (Y1 = 0; Y2 = 1) i.e. (Q; N F ): It occurs under S 7 with probability 1
under S 2 [ S 6 (semi-separating). We obtain
Pr (Y1 = 0; Y2 = 1jW; ) =
(1 p( )) G2 (X20 2
2w )
R1
(1 p( )) (1
) g2 X20 2 + p( )+ p((1) p( )) (
+ 0R 1
2
e2 g1 (X10 1
1w =e 2 ) 1w =e 2 de 2
R1 0
)p( )
(1
p( )) g2 X20 2 + (11 p(
( 2s + 2w )
0
)
+ R1
2
0
2 )) 1s = (1
2) d
0 2 g1 (X1 1 + 1s =(1
2s
+
2w )
2w
p(Z) (separating), and
p( )(1 p( ))( 2s + 2w )
d
(p( )+ (1 p( )))2
p( )(1 p( ))( 2s +
(1 p( ))2
2w
2w )
d
2
4 (Y1 = 0; Y2 = 0) i.e. (Q; F ): It occurs under S 4 with probability one (pooling), under S 1 [ S 3
with probability 1 p(Z) (separating), and under S 2 [ S 6 (semi-separating). We obtain
C
Pr (Y1 = 0; Y2 = 0jW; ) =
(1 G1 (X10 1 + 1s )) (G2 (X20 2 + 2s ) G2 (X20 2 + p( ) ( 2s + 2w )
2w ))
0
+(1 p( )) (1 G2 (X2 2 + 2s ))
0
+ (1 p( )) (G1 (X10 1 + 1s ) G1 (X10 1
G2 (X20 2
1w )) (G2 (X2 2 + 2s )
2w ))
R1
p( )(1 p( ))( 2s + 2w )
p( )
0
(1 p( )) (1
) g2 X2 2 + p( )+ (1 p( )) ( 2s + 2w )
2 d
2w
(p( )+ w
B (1 p( )))
+ 0R 1
2
(1 e2 ) g1 (X10 1
1w =e 2 ) 1w =e 2 de 2
R1 0
p( )(1 p( ))( 2s + 2w )
)p( )
(1
p( )) g2 X20 2 + (11 p(
( 2s + 2w )
d
2w
)
(1 p( ))2
+ 0R 1
:
2
0
(1
)
g
(X
+
=(1
))
=
(1
)
d
2
1
2
2
2
1
1s
1s
1
0
Mathematical Proofs for the Sieve Conditional ML
C.1
Identi…cation (Proof of Lemma 5.1)
We let L(Y; W; ) = exp(l(Y jW; )) and note
L(Y; W; )
2
L(Y; W; 0 )
Pr (Y = yjW; )
: y 2 f(1; 1); (1; 0); (0; 1); (0; 0)g
Pr (Y = yjW; 0 )
where Pr (Y jW; ) denotes the conditional probability of Y given W = X [ Z when the parameter
equals to . Applying Jensen’s inequality, we have
noting
have
L(Y;W; )
L(Y;W; 0 )
ln fE [L(Y; W; )=L(Y; W;
0 )]g
<
E [ln fL(Y; W; )=L(Y; W;
is always positive and not constant whenever
Pr (L(Y; W; )=L(Y; W;
0)
6=
= Pr (Y = yjW; ) = Pr (Y = yjW;
0
0 )g]
(30)
by Assumption ID. We also
0 ))
= Pr (Y = yjW;
0) ;
for each y 2 f(1; 1) ; (1; 0) ; (0; 1) ; (0; 0)g under Assumptions IS-A and SA2. It follows that
h
i R nP
o
Pr(Y =yjW; )
L(Y;W; )
E L(Y;W;
=
Pr
(Y
=
yjW;
)
fW (W )dW
0
y Pr(Y =yjW; 0 )
0)
o
(31)
R nP
=
Pr
(Y
=
yjW;
)
f
(W
)dW
=
1
W
y
P
where the last equality holds since y Pr (Y = yjW; ) = 1 for all 2 A. Therefore, combining
(30) and (31), we conclude that for all 6= 0 2 A, 0 < E [ln L(Y; W; 0 )] E [ln L(Y; W; )]. This
implies Q( 0 ) > Q( ) and thus proves the claim.
29
C.2
Consistency
To prove the consistency, we need additional assumptions. Recall that Lij (W; ) denotes the
conditional probabilities of observed outcomes such that Lij (W; ) Pr(Y1 = i; Y2 = jjW; ) for
i; j = 0; 1.
Condition 1 (Lipschitz Condition)
(i) For
some convex combination of
(
Mij
1)
(
( ), Mij
2)
(h)
( ), and Mij ( ) such that
dLij (W; )
[ 1
2] =
d
( 1)
Mij (W; ) ( 11
12 )
( 2)
0
+Mij (W; )X2 ( 21
for all i; j = 0; 1;
(t)
(ii) sup Mij (W; )
2An
1, 2
(
(
)
+ Mij 2 (W; )(
22 )
+
(h)
Mij (W;
Ct (W ) < 1, 8t 2 f
1;
)
(
)
2 An , there exist functions Mij 1 ( ), Mij 2 ( ),
2;
22 )
21
) (h1
1;
(
+ Mij
1)
(W; )X10 (
11
12 )
h2 )
2 ; hg,
8i; j = 0; 1, n
9N .
This condition is not di¢ cult to verify as we discuss in Section E. We reproduce Theorem 3.1
in Chen (2007) for the sieve conditional ML estimator de…ned in (18).
Theorem C.1 (Theorem 3.1 in Chen (2007))
Suppose (C1) Q( ) is uniquely maximized on A at 0 2 A, and Q( 0 ) > 1;
(C2) An
An+1
A for all n
1, and for any
2 A there exists n
k n
ks ! 0 as n ! 1;
(C3) The criterion function, Q( ), is continuous in 2 A with respect to k ks ;
(C4) The sieve spaces, An , are compact under k ks ;
b n ( ) Q( ) = 0 holds.
(C5) plim sup Q
n!1 2An
Let b n be the approximate sieve ML estimator de…ned by (18), then kb n
2 An such that
ks = op (1).
Below we verify conditions (C1)-(C5) and thus prove the consistency. Remarks (1)-(4) after
Theorem 3.1 in Chen (2007) are applied here. Note that Condition (C1) is satis…ed by Lemma
5.1. Condition (C2) holds for the sieve space An =
Hn with Hn de…ned in (15) (see Section 5.3.2 of Timan (1963)). Condition (C3) is satis…ed since each Lij , i; j = 1; 0 is (pointwise)
Lipschitz continuous by Condition 1. Condition (C4) holds for the sieve space (15). Now let
Fn = fl(yjw; ; h) : ( ; h) 2 An g denote the class of measurable functions indexed by ( ; h). Condition (C5) will be satis…ed, for example, if Fn is P -Glivenko-Cantelli (see e.g. van der Vaart and
Wellner (1996)). The following lemma establishes this uniform convergence result.
Lemma C.1 (Uniform convergence over sieves)
b n ( ) and Q( ) de…ned in (17) and (19),
Suppose Assumptions ID and SA3 hold. Then, for Q
b n ( ) Q( ) = 0.
respectively and for An =
Hn with Hn de…ned in (15), we have plim sup Q
n!1 2An
Proof. We prove this lemma by showing that all the conditions (i), (ii), and (iii) of Lemma A2
in Newey and Powell (2003) are satis…ed. The condition (i) is satis…ed for An =
Hn with Hn
de…ned in (15) and for the metric k ks . The condition (ii) is satis…ed if E[jl(Yi jWi ; ; h(Zi ))j] < 1,
30
for all ( ; h) 2 An by the law of large numbers. Note that this condition is satis…ed since Lij (W; ; h),
8i; j = 0; 1 is uniformly bounded between 0 and 1 over An and because under Assumptions SA2,
Pr (Lij (W; ; h) = 0 or Lij (W; ; h) = 1 for W 2 W) = 0 for 8i; j = 0; 1 and all ( ; h) 2 An ;
where Pr ( ) is a probability measure over W . Therefore, the condition (ii) holds. The condition
b n ( ) is Lipschitz with respect to by Condition 1. This completes the
(iii) is also satis…ed since Q
proof.
Therefore, under Assumptions ID, SA2, and SA3 and Condition 1, all the conditions in Theorem C.1 are satis…ed for b n . This establishes the consistency result of the sieve ML estimator
de…ned in (18).
C.3
Convergence Rate
In this section and the next section, we use the following notation. For any
pathwise …rst derivatives are de…ned as
dl(yjw;
d
dl(yjw;
d
0)
[
3)
[
1
1
dl(yjw;
d
2]
0]
lim
0)
!0
[
1
dl(yjw;
0]
3+ (
d
1
dl(yjw;
d
0 ))
0)
[
0]
2
1;
2;
3
2 A; the
and
.
The pathwise second derivatives are de…ned in a similar way such that
d2 l(yjw;
d 2
3)
[
1
0;
0]
2
lim
dl(yjw;
3
!0
+ (
d
0 ))
2
[
0 ].
1
In particular, the pathwise second derivative at the direction [
0 ] evaluated at 0 is denoted
d2 l(yjw; 0 )
d2 l(yjw; 0 + (
0 ))
by
[
lim !0
. Note that we have (information matrix
0;
0]
d 2
d 2
equality)
E
dl(Y jW;
d
0)
[
1
0]
dl(Y jW;
d
0)
[
2
0]
=
E
d2 l(Y jW;
d 2
0)
[
1
0;
2
0]
(32)
by Wong and Severini (1991).
Here we present a convergence rate of the sieve ML estimator de…ned in (18) under the metric
k k2 . We use Theorem 3.2 of Chen (2007) which is a version of Chen and Shen (1996) for iid data.
Before stating Theorem 3.2 of Chen (2007) and proving Proposition 5.1, we introduce additional
notation.
P
Now let K( 0 ; ) = n 1 ni=1 E [l(yi jwi ; 0 ) l(yi jwi ; )] denote the Kullback-Leibler information (divergence measure) on n observations. Let k ka be a metric on A which is equivalent to
K( ; )1=2 . The equivalence means there exist constants C1 and C2 > 0 such that C1 K( 0 ; )1=2
k 0
ka C2 K( 0 ; )1=2 for all 2 A. It is well known in the literature that the convergence
rate depends on two factors. One is how fast the sieve approximation error rate, k 0
n 0 ka ,
goes to zero. The other is the complexity of the sieve space An . Let Lr (P0 ), r 2 [1; 1) denote the space of real-valued random variables with …nite r-th moments and k kLr denote the
Lr (P0 )-norm. Let Fn = fg( ; ) : 2 An g be a class of real-valued, Lr (P0 )-measurable functions
indexed by 2 An . We let N ; Fn ; k kLr denote the covering numbers without bracketing, which
nn
o
o
is the minimal number of -balls
f : kf gj kLr
; kgj kLr < 1; j = 1; : : : ; N that covers
Fn . We often use the notion of Lr (P0 )-metric entropy, H
31
; Fn ; k kLr
log N
; Fn ; k kLr
as a measure of the complexity of Fn since the covering numbers can grow very fast. The
second notion of complexity of the class Fn is the covering numbers with bracketing. Let CLr
be the completion of Fn under the norm k kLr . The Lr (P0 )-covering numbers with bracketing, denoted by N[] ; Fn ; k kLr , is the minimal number of N for which there exist -brackets
[lj ; uj ] : lj ; uj 2 CLr ; max klj
1 j N
uj kLr
; klj kLr < 1; kuj kLr < 1; j = 1; : : : ; N
to cover Fn (i.e.,
for each g 2 Fn , there exists a j = j(g) 2 f1; : : : ; N g such that lj g uj a.e.-P0 . Similarly we let
H[] ; Fn ; k kLr
log N[] ; Fn ; k kLr which is called the Lr (P0 )-metric entropy with bracketing
of the class Fn . Pollard (1984), Andrews (1994), van der Vaart and Wellner (1996), and van der
Geer (2000) provide more detailed discussions of metric entropy. We also use a simpli…ed notation
b1n b2n for two sequences of positive numbers b1n and b2n when the ratio of these two b1n =b2n is
bounded below and above by some positive constants.
Now let Fn = fl(yjw; ) l(yjw; 0 ) : k
; 2 An g and for some constant b > 0, let
0 ka
n
p
2 (0; 1) : ( n 2 )
= inf
1
R
b
q
H[]
2
; Fn ; k kL2 d
const :
(33)
Note that n does not only depend on the smoothness of l( j ; ) but also it depends on the complexity
of the sieve An . Now we present Theorem 3.2 in Chen (2007) tailored to our estimator.
Theorem C.2 (Theorem 3.2 in Chen (2007))
Suppose that (CC1) The data fY1i ; Y2i ; Wi gni=1 are iid; (CC2) There is a C1 > 0 such that for all
V ar (l(Yi jWi ; ) l(Yi jWi ; 0 )) C1 2 ;
small > 0, sup 2An ;k
0 k2
(CC3) For any > 0, there exists a constant 2 (0; 2) such that
sup 2An ;k
jl(Yi jWi ; ) l(Yi jWi ; 0 )j
U (Wi ) with E jU (Wi )jt
C2 for some t
2.
0 k2
Let b n be the approximate sieve ML de…ned in (18).
Then, kb n
0 k2 = Op ( n ) with n = maxf n ; k n 0
0 k2 g.
Note that n increases with the complexity of the sieve An , which can be interpreted as a
measure of the standard deviation form, while we interpret the deterministic approximation error
k n 0
0 k2 as a measure of the bias since it decreases with the complexity of the sieve An . Now
we prove Proposition 5.1 by showing the conditions in Theorem C.2 hold under Assumptions ID,
SA2, and SA3. We impose the following conditions
Condition 2 (i)
0; 1;
(ii)
2An ;k
2An ;k
inf
1;
2;
0 ks =o(1)
(t)
0 ks =o(1)
Mij (W; )
sup
Condition 3
8t 2 f
(t)
sup
1;
2An ;k
2 ; hg,
0 ks =o(1)
Mij (W; )
C11 (W ) < 1, 8t 2 f
C12 (W ) > 0, 8t 2 f
(t)
Mij (W; )
(t)
Mij (W;
1;
2;
1;
2 ; hg,
0)
= C2 (W ) k
1;
2;
1;
2 ; hg,
8i; j =
8i; j = 0; 1;
0 ks
with C2 (W ) < 1,
8i; j = 0; 1.
These conditions are not di¢ cult to verify as we discuss in Section E. These conditions are
su¢ cient to verify the conditions (CC2) and (CC3) as we show below.
32
C.3.1
Proof of Proposition 5.1
We prove Proposition 5.1 by verifying conditions in Theorem C.2. The condition (CC1) is directly
assumed by Assumption SA3 (i). Condition 1 implies that the pathwise derivative of l( j ; ) is well
de…ned. Condition 1 and 2 implies that l( j ; ) satis…es a Lipschitz condition in 2 A. Condition
3 is useful to provide some regularities on the di¤erence of the pathwise derivatives of l( j ; ). We
…rst verify the condition (CC2). Using the mean value theorem, we have l(yjw; ) l(yjw; 0 ) =
dl(yjw;e )
[
and 0 . It follows that
0 ] where e lies between
d
#
"
2
h
i
dl(Y jW; e )
2
[
E (l(Y jW; ) l(Y jW; 0 )) = E
0]
d
h
i
from which we conclude E (l(Y jW; ) l(Y jW; 0 ))2
k
0 k2 since Conditions 1 and 2-3
implies that
dl(yjw; e )
[
d
0]
dl(yjw;
d
0)
[
0]
2
0 ks )
= O(k
with k
0 ks
= o(1):
Thus, the condition (CC2) holds immediately.
Now we verify the condition (CC3). Note if 0 < C a b, then jln a ln bj ja
cause Lij (W; ), 8i; j = 0; 1 is bounded away from zero and one for all 2 f 2 An : k
for all small > 0, we have
jln Lij (W; )
ln Lij (W;
0 )j
jLij (W; )
Lij (W;
0 )j =C
bj =C. Beg
0 k2
for all i; j = 0; 1.
This fact and the fact that Lij (W; ) satis…es the Lipschitz condition for all i; j = 0; 1 (Condition
1 and 2) implies that
jl(Y jW; ) l(Y jW; 0 )j C(W ) k
(34)
0 ks
with E C(W )2 < 1. By Theorem 1 of Gabushin (1967) (for integer
and Shen (1998) (for 1 > 0, any positive number), we have
k
0 ks
1 =(2 1 +dz )
2
0 k2
C1 k
1
> 0) or Lemma 2 in Chen
.
(35)
From (34) and (35), we see that the condition (CC3) is satis…ed with = 2 dz1 =(2 dz1 + 1) and
U (W ) = C1 C(W ). We have veri…ed all the conditions in Theorem C.2.
The next step is to derive the convergence rate depending on the choice of sieves. For the sieve
An
Hn with Hn de…ned in (15), we have
k
0 ks
n 0
by Lorentz (1986). Now we need to calculate
uh
suph2Hn khk1 , then for all 0 <
H[]
; Fn ; k kL2
C
n
= O Kn
1 =dz
(36)
r h
i
that solves (33). Now let C = E jU (W )j2 and
< 1, we have
log N ( =C; Hn ; k k1 )
const
Kn
log (1 + 4uh = )
by Lemma 2.5 in van der Geer (2000). Using this result, we obtain
R nq
R nq
1
p1 2
p
H
;
F
;
k
k
d
C
Kn log 1 +
2
n
[]
L2
n n b n
n 2n b 2n
R
p
p
4u
n
C pn1 2 Kn b 2 log 1 + h d
C pn1 2 Kn n const:
n
n
n
33
4uh
d
(37)
p
From the last inequality of (37), we obtain n
Kn =n. We then complete the proof by combining
this result with (36). Finally, by letting n k n 0
0 ks and Kn = n , we obtain the optimal
1=(2v
=d
+1)
z
1
rate with Kn = n
.
C.4
Asymptotic Normality
We derive the asymptotic normality of the structural parameter estimates using Theorem 4.3 in
Chen (2007). Below let n denote any sequence satisfying n = o(n 1=2 ) and let n (g(Y; W )) =
P
n 1 ni=1 fg(yi ; wi ) E[g( )]g denote the empirical process indexed by the function g. The following
p
theorem shows that the plug-in sieve conditional ML estimator f (b n ) achieves the n-asymptotic
normality.
Theorem C.3 (Theorem 4.3 in Chen (2007))
Suppose that (AN1) (i) there is an ! > 0 such that f ( )
uniformly over
that k
2 An with k
k2
n
(AN2)
sup
2An ;k
0 k2
kb n
0 k2
n
n
l(Y jW;
)=
] = op (n
n
2
! N (0;
0)
d
n
2
) with
[
0]
0)
[
n
1
n i + o(n );
i
dl(Y jW; 0 )
E
[
]
=
n
h d
i
Var dl(Y djW; 0 ) [ ] > 0
p
n (f (b n )
f(
!
0 k2 )
= O (kb n
2 An such
n
] = Op ( 2n );
n
h0 ;
Then, for the sieve ML estimate b n given in (18), we have
C.4.1
0)
dl(Y jW;
d
)
n
hb n
n
1=2 ); (ii)
df (
d
0)
< 1; (iii) there is a
1=2 );
= op (n
n
df (
d
= o(1);(ii)
l(Y jW; )
K( 0 ; b n
0; bn)
dl(Y jW; 0 )
[ n
n
d
(AN5) n1=2 n dl(Y djW; 0 ) [ ]
!
1=2 ).
and kb n
0 k2 = op (n
(AN3) K(
(AN4) (i)
0 k2
f(
o(n
1=2 );
for i.i.d. data holds
0 ))
! N (0;
d
2
).
Proof of Theorem 5.2
0 which implies that f ( )
Note that dfd( 0 ) [
f(
0] = (
0)
hence the condition (AN1) (i) holds with ! = 1. In addition, note
sup06=
= sup06=(
=
0
(E
0 2V
jf ( ) f ( 0 )j2
2
k
0k
0 ;b)2V
(
0)
[Db (Y; W )0 Db
0
E
= sup06=
0 2V
E
0
(
0 )(
0
dl(Y jW; 0 )
[b]
dh
dl(Y jW; 0 )
d 0
(Y; W )])
dl(Y jW; 0 )
(
d 0
1
0)
df (
d
0)
j 0(
0 )j
0)
[
0]
= 0 and
2
dl(Y jW; 0 )
[h
0 )+
dh
h0 ]
2
0
dl(Y jW; 0 )
d 0
dl(Y jW; 0 )
[b]
dh
(
0)
which implies f ( ) = 0 is bounded if and only if E [Db (Y; W )0 Db (Y; W )] is …nite positivede…nite, in which case we have
2 V such that
f( )
f(
0)
0
(
0)
=h
by (23) and the Riesz representation theorem.
;
0i
(
= E Db (Y; W )0 Db (Y; W )
;
1
h)
for all
2A
(38)
2 V satis…es (38) with
and
h
=
b
:
Thus, the condition (AN1) (ii) is satis…ed by Assumption SA4 (ii). Assumption SA4 (iii) implies
that we can …nd n 2 An
Hn with Hn de…ned in (15) such that k n
ks = O n 1=4 .
34
1=4 ) supported by Proposition 5.1, we obtain
Combining this with the condition kb n
0 k2 = op (n
1=2 ) with
k n
k2 kb n
0 k2 = op (n
1 =dz > 1=2. This satis…es the condition (AN1) (iii).
Next, we verify the condition (AN3) (we verify (AN2) later).
Note that we have
dl(Y jW; 0 )
E
[
(39)
0] = 0
d
for any
0 (it does not need to be in V ) because (i) the directional derivative of l(Y jW; )
at 0 is well-de…ned and (ii) it is unconstrained maximization (see Shen (1997)). This is the
zero expectation of score function (as in a parametric ML). We can show (39) as follows. Denote
L(Y; W; ) = exp(l(Y jW; )).
E
h
= lim
dl(Y jW;
d
d
!0 d
0)
[
RP
y
i
0]
=E
dL(Y;W; 0 )
[
d
L(Y;W;
L(Y = y; W;
0
0]
0)
+ (
=
RP
y
0 ))fW dW
dL(Y =y;W; 0 )
[
d
L(Y =y;W;
= lim
d
!0 d
0]
0)
R
L(Y = y; W;
0 )fW dW
fW dW = 0
where the second equality holds by construction since L(Y = y; W; 0 ) = Pr(Y = yjW; 0 ) for all
y 2 f(1; 1); (1; 0); (0; 1); (0; 0)g, the third equality holds by the interchangeability of integral and
P
derivative and by de…nition of directional derivative, the fourth equality holdsR since y L(Y =
y; W; 0 + (
and , and the last result holds since fW (W )dW = 1.
0 )) = 1 regardless of
From (39), it follows that
(i) E
dl(Y jW;
d
0)
[
dl(Y jW;
d
] = 0 and (ii) E
0)
[
] = 0.
n
(40)
We also need the following results to verify the condition (AN3).
For 3 2 An such that k 3
0 k2
n and for 1 , 2 2 An
0 , note that
h
i
h
i
E dl(Y djW; 3 ) [ 1 ] dl(Y djW; 3 ) [ 2 ]
E dl(Y djW; 0 ) [ 1 ] dl(Y djW; 0 ) [ 2 ]
i
h
E dl(Y djW; 3 ) [ 1 ] dl(Y djW; 0 ) [ 1 ] dl(Y djW; 3 ) [ 2 ]
i
h
+ E dl(Y djW; 0 ) [ 1 ] dl(Y djW; 3 ) [ 2 ] dl(Y djW; 0 ) [ 2 ]
C1 k
3
0 ks k 2 ks k 1 ks
C1 (k
0 k2 k 2 k2 k 1 k2 )
3
2 1 =dz
2 1 =dz +1
where the …rst inequality uses the triangle inequality, the second inequality holds by Conditions
2-3, and the last result holds by Theorem 1 of Gabushin (1967) (when 1 =dz is an integer) or by
Lemma 2 in Chen and Shen (1998) for any 1 =dz > 0. Thus, the condition A4 (i) in Wong and
Severini (1991) holds with 1 =dz > 1. Note also that
h
i
h
i
h
i
E dl(Y djW; 3 ) [ 1 ; 2 ]
E dl(Y djW; 0 ) [ 1 ; 2 ] = E dl(Y djW; 3 ) dl(Y djW; 0 ) [ 1; 2 ]
(41)
2 1 =dz
2 1 =dz +1
C1 k 3
k
k
k
k
k
C
(k
k
k
k
k
k
)
0 s
2 s
1 s
1
3
0 2
2 2
1 2
and hence the condition A4 (ii) in Wong and Severini (1991) holds with
K(
0; bn)
K(
0; bn
= E [l(Y jW; b n
n
h
dl(Y jW; 0 )
=E
[ n
h 2 d
d l(Y jW; 0 )
=E
[b n
d 2
n
n
)
n
]+
0;
n
)
l(Y jW; b n )] = E
d2 l(Y jW; 0 )
d 2i
n
n
[b n
] + o(n
35
h
dl(Y jW;b n en
d
en
1)
=
0;
n
n
hb n
)
n
n
1 =dz
[
n
0;
n
> 1. Now consider
n
i
]
] + op (n
n
i
1)
i + o(n
1)
where the …rst equality holds by de…nition of K( ; ), the second equality is using the mean value theorem with en = o(n 1=2 ), the
using
the fourth equalh third equality is obtained
i
h the Taylor expansion,
i
dl(Y jW; 0 )
dl(Y jW; 0 )
ity is obtained using (i) E
[ n n ] = nE
[ n ] = 0 by (40) (ii) and using
d
d
i
h 2
0)
[
;
]
(41) with en ; n = o(n 1=2 ), the last result is from the fact that h 1 ; 2 i = E d l(Yd jW;
1
2
2
from (32) (see Wong and Severini (1991)). Therefore, the condition (AN3) holds since n is arbitrary.
h
i
We turn to (AN4). Note that the condition (AN4) (ii) immediately holds since E dl(Y djW; 0 ) [ n ] =
0 by (40) (ii). De…ne
8
9
(h)
(h)
(W; 0 )
M10 (W; 0 )
< Y Y M11
=
+
Y
(1
Y
)
1 2
1
2
L11
L10
.
(42)
M (h) (Y; W; 0 ) =
(h)
(h)
(W; 0 )
M (W; 0 ) ;
: + (1 Y ) Y M01
+ (1 Y ) (1 Y ) 00
1
2
1
L01
2
L00
From this, it follows that
=
1
n
1
n
Pn
Pi=1
n
i=1
dl(yi jwi ;
d
M (h) (y
0)
[ n
;
w
i
i ; 0 )(
n
] = n1
h (zi )
Pn
i=1
dl(yi jwi ;
dh
(40) (i) and (ii), and from k
=
=
h k1
n h
dl(Y jW; 0 )
[
d
@l(Y jW; 0 )
@ 0
@l(Y jW; 0 )
@ 0
= o(n
@l(Y jW; 0 )
@ 0
(h)
M (Y; W;
+
]=
dl(Y jW;
dh
0)
[
h
dl(Yi jWi ;
d
dl(Y jW;
dh
0)
[
(E [Db (Y; W )0 Db (Y; W )])
0 )b
=E
=
0
h
0
[
]
(E [Db (Y; W )0 Db (Y; W )])
by Assumption SA4 (ii). Now note
Therefore, the conclusion
p
n(bn
0)
1
1
n
1
1
2
(E [Db (Y; W )0 Db (Y; W )])
i
] due to
1
[b ] (E [Db (Y; W )0 Db (Y; W )])
the condition (AN5) is satis…ed by the Lindberg and Levy CLT since E
0)
[
h]
where the second equality is obtained using (42) and de…nitions of
dl(Y jW;
d
0)
by Assumption SA4 (iii). Now note that
= Db (Y; W ) (E [Db (Y; W )0 Db (Y; W )])
E
h]
n h
h (zi ))
and thus (AN4) (i) holds by the Chebyshev inequality from 0 = E
1=4 )
0)
h.
dl(Y jW;
d
and
h
This implies
that
i
0)
[ ] = 0 and
Db (Y; W )0 Db (Y; W ) (E [Db (Y; W )0 Db (Y; W )])
1
<1
2
Var
dl(Y jW;
d
0)
[
(E [Db (Y; W )0 Db (Y; W )])
] =
i
1
0
.
1
!d N 0; E Db (Y; W )0 Db (Y; W )
follows since is arbitrary with 6= 0.
Now the condition (AN2) remains to be shown. Note that the condition (AN2) is implied by
sup
2An ;k
0k
n
n
dl(Y jW; )
[
d
n
]
36
dl(Y jW;
d
0)
[
n
]
= op (n
1=2
).
(43)
)
)
Let F = f dl(yjw;
[ n ] : 2 Ag. Condition 3 implies that dl(yjw;
[ n ] satis…es the Lipschitz
d
d
dl(yjw; )
condition with respect to and the metric k ks . Thus,
[ n ] satis…es the condition (3.1)
d
of Theorem 3 in Chen, Linton, and van Keilegom (2003). Note also that
is compact and H is
a subset of a Hölder space. Thus, from the Lipschitz
r condition and the remark 3 (ii) of Chen,
R1
log N[] ; F; k kL2 (P0 ) d < 1 by the proof
Linton, and van Keilegom (2003), it follows that 0
of Theorem 3 in Chen, Linton, and van Keilegom (2003). Now note
E
dl(Y jW; )
[
d
C E
h
supw
]
n
dl(Y jW; )
[
d
n
y2W Y;k
dl(Y jW;
d
[
dl(Y jW;
d
]
0k
0)
n;
2An
]
n
0)
[
n
2
E
]
E
dl(yjw; )
[
d
i
n
]
dl(yjw;
d
0)
[
n
]
E
!0
as k
0 ks ! 0 where the last result holds by Condition 3. Therefore, applying Lemma 1 in
Chen, Linton, and van Keilegom (2003), we …nd that (43) is true by combining above two results.
This completes the proof.
C.4.2
Proof of Proposition 5.2
Similarly to the proof of Theorem 5.1 in Ai and Chen (2003) we can prove Proposition 5.2. We note
2
P
i ;b n )
that ni=1 dl(yidjwji ;b n ) dl(yi jw
[bj ] is globally convex in bj and hence the solution of (25) bbj
dh
must be bounded by jjbb jjs
C. Thus, we only care about the subset b 2 B : kbk
C in the
j
s
following discussion.
Note that uniformly over bj 2 Hn ; kbj ks < C, we have
1 Pn
Dbj (yi ; wi ; b n )
n i=1
since Dbj (y; w; ) = Dbj (y; w;
2
=
0)
1 Pn
Dbj (yi ; wi ;
n i=1
0)
2
+ op (1)
(44)
+ op (1) uniformly over k
0 ks = o(1) by Condition 3 and
kbj ks < C. Thus, it su¢ ces to show that bbj ( ) bj ( ) = op (1) because it implies
s
Dbb (y; w;
j
0)
Then combining (44) and (45), we conclude
= Dbj (y; w;
1
n
Pn
i=1
0)
+ op (1):
Dbb (yi ; wi ; b n )
2
(45)
=
j
1
n
Pn
i=1
Dbj (yi ; wi ;
0)
2
+
op (1) from which the claim follows. Finally we note that jjbbj ( ) bj ( )jjs = op (1) is satis…ed by
Condition 3, Assumption SA4, kb n
0 ks = o(1), and Lemma A.1 in Newey and Powell (2003).
D
Asymptotic Normality of Type Distribution
We can prove Proposition 5.3 by showing all conditions in Theorem 2 of Chen, Linton, and van
Keilegom (2003) hold. Here instead we directly prove Proposition 5.3 since it is a simple case of
Chen, Linton, and van Keilegom (2003).
37
D.1
Proof of Proposition 5.3
R
P
Let M (h) denote Z L(h(z))fZ (z)dz and Mn (h) denote n1 ni=1 L(h(zi )). We have
P
P
p
n (b
pn p0 ) = p1n ni=1 L(b
hn (zi )) L(h0 (zi )) + p1n ni=1 (L(h0 (zi )) E[L (h0 (Zi ))])
p
p
hn ) Mn (h0 ) + n (Mn (h0 ) M (h0 )) :
= n Mn (b
(46)
Now let (h) = Mn (h) M (h) be a stochastic process indexed by h 2 H. Then, we obtain the
following stochastic equicontinuity such that for any positive sequence n = o(1),
supkh
h0 k1
n
j (h)
(h0 )j = op (n
1=2
)
(47)
by applying Lemma 1 of Chen, Linton, and van Keilegomh (2003) after establishing
that (a) fv(h)
i
2
L(h) M (h0 ) : h 2 Hg is a Donsker class and that (b) E (v(h1 ) v(h2 )) ! 0 as kh1 h2 k1 ! 0
noting E [v(h0 )] = 0 by construction. Consider that fv(h) L(h) M (h0 ) : h 2 Hg is a subset
of C11 (Z) and C11 (Z) is a Donsker class by Theorem 2.5.6 of van der Vaart and Wellner (1996).
Thus, the condition (a) is satis…ed. Now note
h
i
h
i
E (v(h1 ) v(h2 ))2 = E (L(h1 ) L(h2 ))2
E [jL(h1 ) L(h2 )j] supz jL(h1 ) L(h2 )j
= E [jL(h1 )
L(h2 )j] supz L0 (e
h(z)) kh1
h 2 k1
1
4E
[jL(h1 )
L(h2 )j] kh1
h 2 k1
since L0 = L(1 L)
1=4 where the second equality is obtained by applying the mean value
theorem and thus, the condition (b) is satis…ed. Therefore, (47) holds. Now consider
p
n Mn (b
hn ) Mn (h0 )
p
p
(48)
= n M (b
hn ) M (h0 ) + n Mn (b
hn ) M (b
hn ) Mn (h0 ) + M (h0 )
p
hn ) M (h0 ) + op (1)
= n M (b
where the last result is obtained by (47) and b
hn h0
= op (1). Now applying the mean value
1
theorem, we have
p
p R
n M (b
hn ) M (h0 ) = n Z L0 (e
hn (z)) b
hn (z) h0 (z) fZ (z)dz
p R
(49)
hn (z) h0 (z) fZ (z)dz
= n Z L0 (h0 (z)) b
p R
+ n Z L0 (e
hn (z)) L0 (h0 (z)) b
hn (z) h0 (z) fZ (z)dz
where e
hn lies between b
hn and h0 . Applying the mean value theorem again, we obtain (noting
0
L = L(1 L))
jL0 (e
hn )
= j(1
L0 (h0 )j = jL(e
hn )(1 L(e
hn ))
e
L(e
hn ) L (h0 ))L0 (e
hn )(e
hn h0 )j
e
where e
hn lies between e
hn and h0 . It follows that
p R
n Z (L0 (e
hn (z)) L0 (h0 (z)))(b
hn (z) h0 (z))fZ (z)dz
L (h0 ) (1
(1=4)je
hn
1p e
njjhn
4
by the condition (i). Thus, we …nd
p
p R
n M (b
hn ) M (h0 ) = n Z L0 (h0 (z))(b
hn (z)
L (h0 )) j
h0 j
h0 jj1 jjb
hn
h0 jj1 = op (1) (50)
h0 (z))fZ (z)dz + op (1)
from (49) and (50). From (46), (48), and (51), the claim then follows by the condition (ii).
38
(51)
D.2
Proof of Proposition 5.4
Let Mn (h) =
note that
1
n
We write
and we have
Also note
Pn
i=1 L(h(zi ))
p
p
p
and
supkh
pbn ) =
n (b
pn
n Mn (b
hn )
p
(h) = Mn (h)
h0 k1
n
j (h)
n Mn (b
hn )
Mn (b
hn ) =
p
Mn (h). Due to Giné and Zinn (1990), we …rst
(h0 )j = oP (1):
Mn (b
hn ) +
n Mn (b
hn )
p
n Mn (b
hn )
(52)
Mn (b
hn )
M (b
hn ) + oP (1) by Giné and Zinn (1990).
n Mn (b
hn ) Mn (b
hn )
p
p
= n Mn (b
hn ) Mn (b
hn ) Mn (h0 ) + Mn (h0 )
n Mn (b
hn ) Mn (b
hn )
p
p
+ n Mn (b
hn ) Mn (b
hn ) = n Mn (b
hn ) Mn (b
hn ) + oP (1)
Mn (h0 ) + Mn (h0 )
where the last equality is obtained using (52) and by the conditions (i) and (ii). Now consider
p
hn )
n Mn (b
hn ) Mn (b
p
p
= n M (b
hn ) M (b
hn ) + n Mn (b
hn ) M (b
hn ) (Mn (h0 ) M (h0 ))
p
p
hn ) + oP (1)
n Mn (b
hn ) M (b
hn ) (Mn (h0 ) M (h0 )) = n M (b
hn ) M (b
by the conditions (i), (ii), and (iii). Now similarly with (51), we can show that
p
n M (b
hn )
M (b
hn ) =
p R 0
n Z L (b
hn )(b
hn
b
hn )dFZ + oP (1):
From above results the claim then follows by the condition (iv).
D.2.1
Stochastic Expansion of b
hn
h0
To check the condition (ii) in Proposition 5.3 (or the condition (iv) of Proposition 5.4), we need to
derive the stochastic expansion of b
hn h0 . Here we provide a sketch of such an expansion for the
sieve conditional ML estimator. Deriving the stochastic expansion of a nonparametric estimator
obtained from a highly nonlinear objective function is often di¢ cult. To facilitate this task, we
de…ne a pseudo true value of and h such that
(
0K
0K ; h0K )
=
argmax
QK ( ; h)
2 ,h=RK ( )0 2Hn
and let h0K ( ) = RK ( )0
De…ne
(h) =
R
0K .
b
hn ( )
Z L(h(z))(1
(a)
E [l(Y jW; ; h(Z))]
Similarly, we let b
hn ( ) = RKn ( )0 bn . Then,
h0K ( ) = RK ( )0 (bn
0K )
with K = Kn .
(54)
L(h(z)))RK (z)dFZ (z) and suppose that
(h0 )0 (bn
0K )
=
1 Pn
(
n i=1
39
0K ; Yi ; Wi )
+ op (n
(53)
1=2
);
where E[ ( 0K ; Yi ; Wi )] = o(1) and E[k (
o n 1=2 . Then, we have
2
0K ; Yi ; Wi )kE ]
< 1 and suppose (b) kh0K
p R
n Z L(h0 )(1 L(h0 )) b
hn h0 dFZ
p R
p R
= n Z L(h0 )(1 L(h0 )) b
hn h0K dFZ + n Z L(h0 )(1 L(h0 )) (h0K
p
p R
= n (h0 )0 (bn
L(h0 )) (h0K h0 ) dFZ
0K ) + n Z L(h0 )(1
1 Pn
p
= n i=1 ( 0K ; Yi ; Wi ) + op (1)
h 0 k1 =
h0 ) dFZ
where the second equality is obtained from (54) and the de…nition of (h) and the third equality
is obtained from (a) and (b) noting L( )(1 L( )) 1=4. From this, it follows
Vp = lim E ( (
K!1
0K ; Yi ; Wi )
+ '(Zi )) ( (
0K ; Yi ; Wi )
+ '(Zi ))0
where '(Zi ) = L(h0 (Zi )) E [L(h0 (Zi ))]. Therefore, to verify the condition (ii) of Proposition 5.3
holds, it su¢ ces to show (a) and (b) hold. The condition (a) can be veri…ed using the …rst order
conditions of (17) and (53) since with …xed K, the problem becomes a parametric one. For (b), we
can show kh0K h0 k1 = (K)K 1 =2dZ (with (K) = supz2Z RK (z) E ) similarly with Hirano,
Imbens, and Ridder (2003) who consider a logit series estimation.
E
Smoothness of Conditional Probabilities
Here we note that for the conditional probabilities presented in Appendix B, the pathwise …rst
and second derivatives are well-de…ned. This result is useful to verify Conditions 1 and 2-3 for
the sieve conditional ML. It is easy to see that the pathwise derivatives are well-de…ned as long
as G1 ( ) and G2 ( ) are continuously di¤erentiable since the function h( ) appears only in p(Z) =
exp(h(z))=(1 + exp(h(z)) and we have dp(z)
h2 ] = (1 p(z))p(z)(h1 h2 ). Therefore, for the
dh [h1
conditional probabilities given in Appendix B, we have
dPij (Y jW; ;p(z))
[h1
dh
d2 Pij (Y jW; ;p(z))
[h1
dh2
dh@Pij (Y jW; ;p(z))
[h1
dh@t
(h)
(h)(h)
(h)
h2 ] = Mij (h1
h0 ; h 2
h2 ] =
h2 ), 8i; j = 0; 1,
(h)(h)
h0 ] = Mij
(h)(t)
Mij
(h1
(h1
h0 )(h2
h0 ), 8i; j = 0; 1, and
h0 ), 8i; j = 0; 1 and for any element t of ,
(h)(t)
where Mij , Mij
, and Mij
are some well-de…ned ordinary derivatives. We also note that
those derivatives and other derivatives with respect to …nite dimensional parameters are uniformly
bounded by some constants when (i) G1 ( ) and G2 ( ) are continuously di¤erentiable, (ii) the parameter space
is compact, (iii) W is compact, and (iv) 0 < p(Z) < 1. Therefore, the Lipschitz
conditions for the conditional probabilities and the Lipschitz conditions for the pathwise …rst derivatives of the conditional probabilities are well satis…ed. This implies that the Lipschitz conditions
for the log likelihood and the Lipschitz conditions for the pathwise derivatives of the log likelihood
are also well-de…ned. Speci…c forms of derivatives for each model can be provided upon request.
40
References
[1] Ackerberg D. A. (2003), “Advertising, Learning, and Consumer Choice in Experience Good
Markets: An Empirical Examination,” International Economic Review, 44-3, p.1007-1040.
[2] Ackerberg, D., X. Chen, and J. Hahn (2012), “A Practical Asymptotic Variance Estimator for
Two-Step Semiparametric Estimators,” Review of Economics and Statistics 94, 481-498.
[3] Aguirregabiria and Mira (2007), “Sequential Estimation of Dynamic Discrete Games,”Econometrica, 75-1, 1-53.
[4] Ai, C. and X. Chen (2003), “E¢ cient Estimation of Models With Conditional Moment Restrictions Containing Unknown Functions,” Econometrica, v71, No.6, 1795-1843.
[5] Andrews, D.W.K. (1994), “Empirical Process Methods in Econometrics,”Handbook of Econometrics IV, edited by R.F. Engle and D.L. McFadden, 2247-2294.
[6] Andrews, D.W.K., S. Berry, and P. Jia (2004), “Con…dence Regions for Parameters in Discrete
Games with Multiple Equilibria, with an Application to Discount Chain Store Location,”
Working paper, Yale University.
[7] Aradillas-Lopez, A. (2010), “Semiparametric Estimation of a Simultaneous Game with Incomplete Information,” Journal of Econometrics, 157-2, 409-431.
[8] Bajari, P., C.L. Benkard, and J. Levin (2007), “Estimating Dynamic Models of Imperfect
Competition,” Econometrica, 75-5, 1331-1370.
[9] Bajari, P., H. Hong, J. Krainer, D. Nekipelov (2010), “Estimating Static Models of Strategic
Interactions,” Journal of Business and Economic Statistics, 28-4, 469-482.
[10] Bajari, P, H. Hong, S. Ryan (2010), “Identi…cation and Estimation of Discrete Games of
Complete Information,” Econometrica, 78-5, 1529-68.
[11] Banks, J.S. and J. Sobel (1987), “Equilibrium Selection in Signaling Games,” Econometrica,
55, 647-661.
[12] Beresteanu, A., I. Molchanov, and F. Molinari (2011), “Sharp Identi…cation Regions in Models
with Convex Moment Predictions,” Econometrica, 79-6, 1785-1821.
[13] Berry, S. (1992), “Estimation of a Model of Entry in the Airline Industry,” Econometrica, 60,
889-917.
[14] Berry, S. and E. Tamer (2006), “Identi…cation in Models of Oligopoly Entry,” Advances in
Economics and Econometrics, Blundell, Newey and Persson (eds), Vol2, Ninth World Congress,
46-85, Cambridge University Press, 2006
[15] Berry, S., M. Ovstrovsky, and A. Pakes (2007), “Simple Estimators for the Parameters of
Discrete Dynamic Games (With Entry/Exit Examples),” RAND Journal of Economics, 38-2,
373-399.
[16] Birgé, L. and P. Massart (1998), “Minimum Contrast Estimators on Sieves: Exponential
Bounds and Rates of Convergence,” Bernoulli, 4, 329-375.
[17] Bjorn, P. and Q. Vuong (1984), “Simultaneous Models for Dummy Endogenous Variables: A
Game Theoretic Formulation with an Application to Household Labor Force Participation,”
Working paper, California Institute of Technology.
41
[18] Bjorn, P. and Q. Vuong (1985), “Econometric modeling of a Stackelberg Game With an Application to Household Labor Force Participation,” Working paper, California Institute of
Technology.
[19] Bresnahan, T.F. and P.C. Reiss (1990), “Entry in Monopoly Markets,” Review of Economic
Studies, 57, p.531-553.
[20] Brenahan, T.F. and P.C. Reiss (1991), “Empirical Models of Discrete Games,” Journal of
Econometrics, 48, p.57-81.
[21] Brock, W. and S. Durlauf (2001), “Interactions-based Models,”in: J. Heckman and E. Leamer,
eds., Handbook of Econometrics, V.5, 3297-3380.
[22] Brock, W. and S. Durlauf (2003), “Multinomial Choice with Social Interactions,” Working
paper, University of Wisconsin.
[23] Brown, J. F. (2002), “Prestige E¤ects in IPO Underwriting,”Ph.D. Dissertation, Department
of Economics, UCLA.
[24] Chen, X. (2007), “Large Sample Sieve Estimation of Semi-Nonparametric Models,”Handbook
of Econometrics, in: J.J. Heckman E.E. Leamer (ed.), Handbook of Econometrics, vol 6,
Elsevier.
[25] Chen, X., O. Linton, and I. van Keilegom (2003), “Estimation of Semiparametric Models When
the Criterion Function is Not Smooth,” Econometrica, 71, 1591-1608.
[26] Chen, X. and X. Shen (1996), “Asymptotic Properties of Sieve Extremum Estimates for Weakly
Dependent Data with Applications,” manuscript, University of Chicago.
[27] Chen, X. and X. Shen (1998), “Sieve Extremum Estimates for Weakly Dependent Data,”
Econometrica, 66, 289-314.
[28] Cho, I.K. and D. Kreps (1987), “Signaling Games and Stable Equilibria,” Quarterly Journal
of Economics, 102, 179-221.
[29] Ciliberto, F. and E. Tamer (2009), “Market Structure and Multiple Equilibrium in Airline
Markets,” Econometrica, 77-6, 1791-1828.
[30] Davis, P. (2006), “Estimation of Quantity Games in the Presence of Indivisibilities and Heterogeneous Firms,” Journal of Econometrics, 134-1, 187-214.
[31] Fudenberg, D. and J. Tirole (1991), “Perfect Bayesian Equilibrium and Sequential Equilibrium,” Journal of Economic Theory, 53-2, 236-260.
[32] Gabushin, V.N. (1967), “Inequalities for Norms of Functions and their Derivatives in the Lp
Metric,” Matematicheskie Zametki, 1, 291-298.
[33] Gallant, A.R. (1987), “Identi…cation and Consistency in Seminonparametric Regression,” in
T.F. Bewley (ed.), Advances in Econometrics, Cambridge University Press, Vol. I, 145-170.
[34] Gallant, A.R. and D. Nychka (1987), “Semi-nonparametric Maximum Likelihood Estimation,”
Econometrica, 55, 363-390.
[35] Geman, S. and C. Hwang (1982), “Nonparametric Maximum Likelihood Estimation by the
Method of Sieves,” The Annals of Statistics, 10, 410-414.
[36] Giné, E. and J. Zinn (1990),“Bootstrapping General Empirical Measures,” Annals of Probability, 18, 851-869.
42
[37] Hirano, K., G.W. Imbens, and G. Ridder (2003), “E¢ cient Estimation of Average Treatment
E¤ects Using the Estimated Propensity Score,” Econometrica, 71, 1161-1189.
[38] Holmes, T.J. (2011), “The Di¤usion of Wal-Mart and Economies of Density,” Econometrica,
79-1, 253-302.
[39] Ichimura, H. and P. Todd (2007), “Implementing Nonparametric and Semiparametric Estimators, ” Handbook of Econometrics, in: J.J. Heckman E.E. Leamer (ed.), Handbook of
Econometrics, vol 6, Elsevier.
[40] Kim, K. (2006), “Semiparametric Estimation of Signaling Games,” Doctoral Dissertation,
UCLA.
[41] Lorentz, G. (1986), Approximation of Functions, New York: Chelsea Publishing Company.
[42] Mazzeo, M. (2002), “Product Choice and Oligopoly Market Structure,”Rand Journal of Economics, 33-2, 221-242.
[43] Manski, C. F. (1995), Identi…cation Problems in the Social Sciences, Harvard University Press.
[44] Matzkin, R.L. (1992), “Nonparametric and Distribution-Free Estimation of the Binary Choice
and the Threshold Crossing Models,” Econometrica, 60-2, 239-270.
[45] Newey, W.K. and D. McFadden (1994), “Large Sample Estimation and Hypothesis Testing,”
Handbook of Econometrics IV, edited by R.F. Engle and D.L. McFadden, 2247-2294.
[46] Newey, W.K. and J.L. Powell (2003), “Instrumental Variable Estimation of Nonparametric
Models,” Econometrica, 71, 1565-1578.
[47] Osborne, M.J. and A. Rubinstein (1994), A Course in Game Theory, the MIT press.
[48] Pakes, A., J. Porter, K. Ho, and J. Ishii (2006), “Moment Inequalities and Their Applications”
Working paper.
[49] Pesendorfer, M. and P. Schmidt-Dengler (2008), “Asymptotic Least Squares Estimators for
Dynamic Games,” Review of Economic Studies, 75-3, 901-928.
[50] Pollard, D. (1984), Convergence of Stochastic Processes. New York: Springer-Verlag.
[51] Riley, J.G. (2001), “Silver Signals: Twenty-Five Years of Screening and Signaling,”Journal of
Economic Literature, 39(2), 432-478.
[52] Seim, K. (2006), “An Empirical Model of Firm Entry with Endogenous Product-type Choices,”
RAND Journal of Economics, 37-3, 619-640.
[53] Shen, X. (1997), “On Methods of Sieves and Penalization,”The Annals of Statistics, 25, 25552591.
[54] Shen, X. and W. Wong (1994), “Convergence Rate of Sieve Estimates,”The Annals of Statistics, 22(2), 580-615.
[55] Spence, A. M. (1974), Market Signaling, Cambridge, Mass., Harvard University Press.
[56] Sweeting, A. (2009), “The Strategic Timing of Radio Commercials: An Empirical Analysis
Using Multiple Equilibria,” RAND Journal of Economics, 40-4, 710-742.
[57] Tamer, E.T. (2003), “Incomplete Simultaneous Discrete Response Model with Multiple Equilibria,” Review of Economic Studies 70, p.147-165.
[58] Timan, A.F. (1963), Theory of Approximation of Functions of a Real Variable, MacMillan,
New York.
43
[59] Toivanen, O. and M. Waterson (2000), “Empirical Research on Discrete Choice Game Theory
Models of Entry: An Illustration,” European Economic Review, 44, 985-992.
[60] Van de Geer, S. (1993), “Hellinger-consistency of Certain Nonparametric Maximum Likelihood
Estimators,” The Annals of Statistics, 21, 14-44.
[61] Van de Geer, S. (2000), Empirical Processes in M-estimation, Cambridge University Press.
[62] Van der Vaart, A. (1991), “On Di¤erentiable Functionals,” The Annals of Statistics,19, 178204.
[63] Van der Vaart, A. and J. Wellner (1996), Weak Convergence and Empirical Processes: with
Applications to Statistics, New York: Springer-Verlag.
[64] Wong, W.H. and T. Severini (1991), “On Maximum Likelihood Estimation in In…nite Dimensional Parameter Spaces,” The Annals of Statistics, 19, 603-632.
[65] Wong, W.H. and X. Shen (1995), “Probability Inequalities for Likelihood Ratios and Convergence Rates for Sieve MLE’s,” The Annals of Statistics, 23, 339-362.
44