Large Sample Properties of Matching Estimators for Average

Econometrica,Vol. 74, No. 1 (January,2006), 235-267
LARGE SAMPLEPROPERTIESOF MATCHINGESTIMATORS
FOR AVERAGE TREATMENTEFFECTS
BY ALBERTO ABADIE AND GUIDO
IMBENS1
W.
Matchingestimatorsfor averagetreatmenteffects are widely used in evaluationresearch despite the fact that their large sample propertieshave not been establishedin
many cases. The absence of formal results in this area may be partlydue to the fact
that standardasymptoticexpansionsdo not apply to matchingestimatorswith a fixed
number of matches because such estimatorsare highly nonsmooth functionals of the
data. In this articlewe develop new methods for analyzingthe large sample properties
of matchingestimators and establish a number of new results. We focus on matching
with replacementwith a fixed number of matches. First, we show that matchingestimators are not N1/2-consistentin general and describeconditionsunderwhich matching estimatorsdo attainN1/2-consistency.Second, we show that even in settingswhere
matchingestimatorsare N1/2-consistent,simple matchingestimatorswith a fixed number of matches do not attain the semiparametricefficiencybound. Third,we provide a
consistent estimatorfor the large samplevariancethat does not requireconsistentnonparametricestimationof unknownfunctions.Softwarefor implementingthese methods
is availablein Matlab,Stata, and R.
KEYWORDS:
Matchingestimators,averagetreatmenteffects, unconfoundedness,selection on observables,potential outcomes.
1. INTRODUCTION
ESTIMATION
OF AVERAGETREATMENT
EFFECTSis an important goal of much
evaluation research,both in academic studies, as well as in substantiveevaluations of social programs.Often, analyses are based on the assumptionsthat
(i) assignment to treatment is unconfounded or exogenous, that is, independent of potential outcomes conditional on observed pretreatmentvariables,
and (ii) there is sufficientoverlap in the distributionsof the pretreatmentvariables. Methods for estimatingaveragetreatmenteffects in parametricsettings
under these assumptions have a long history (see, e.g., Cochran and Rubin
(1973), Rubin (1977), Barnow,Cain, and Goldberger(1980), Rosenbaum and
Rubin (1983), Heckman and Robb (1984), and Rosenbaum(1995)). Recently,
a number of nonparametric implementations of this idea have been proposed. Hahn (1998) calculatesthe efficiencybound and proposes an asymptotically efficient estimatorbased on nonparametricseries estimation. Heckman,
1We wish to thank Donald Andrews, Joshua Angrist, Gary Chamberlain,Geert Dhaene,
Jinyong Hahn, James Heckman, Keisuke Hirano, Hidehiko Ichimura,Whitney Newey, Jack
Porter, James Powell, Geert Ridder, Paul Rosenbaum, Edward Vytlacil, a co-editor and two
anonymous referees, and seminar participantsat various universitiesfor comments, and Don
Rubin for many discussions on the topic of this article. Financial support for this research
was generously provided through National Science FoundationGrants SES-0350645(Abadie),
SBR-9818644,and SES-0136789(Imbens). Imbens also acknowledgesfinancialsupportfrom the
GianniniFoundationand the AgriculturalExperimentalStation at UC Berkeley.
235
236
A. ABADIE AND G. W.IMBENS
Ichimura, and Todd (1998) focus on the average effect on the treated and
consider estimatorsbased on local linear kernel regression methods. Hirano,
Imbens, and Ridder (2003) propose an estimator that weights the units by
the inverse of their assignmentprobabilitiesand show that nonparametricseries estimation of this conditional probability,labeled the propensityscore by
Rosenbaumand Rubin (1983), leads to an efficient estimatorof averagetreatment effects.
Empirical researchers, however, often use simple matching procedures to
estimate average treatment effects when assignmentfor treatment is believed
to be unconfounded.Much like nearest neighborestimators,these procedures
match each treated unit to a fixednumberof untreatedunitswith similarvalues
for the pretreatmentvariables.The averageeffect of the treatmentis then estimated by averagingwithin-matchdifferences in the outcome variablebetween
the treated and the untreatedunits (see, e.g., Rosenbaum(1995), Dehejia and
Wahba(1999)). Matchingestimatorshave great intuitiveappeal and are widely
used in practice.However, their formal large sample propertieshave not been
established. Part of the reason may be that matching estimatorswith a fixed
numberof matches are highlynonsmooth functionalsof the distributionof the
data, not amenable to standard asymptoticmethods for smooth functionals.
In this article we study the large sample properties of matchingestimators of
average treatment effects and establish a numberof new results. Like most of
the econometricliterature,but in contrastwith some of the statisticsliterature,
we focus on matchingwith replacement.
Our results show that some of the formal large sample propertiesof matching estimatorsare not very attractive.First, we show that matchingestimators
include a conditionalbias term whose stochasticorder increaseswith the number of continuous matchingvariables. We show that the order of this conditional bias term may be greater than N-1/2, where N is the sample size. As a
result, matching estimators are not N1'2-consistentin general. Second, even
when the simple matching estimator is N1/2-consistent, we show that it does
not achieve the semiparametricefficiencybound as calculatedby Hahn (1998).
However,for the case when only a single continuouscovariateis used to match,
we show that the efficiencyloss can be made arbitrarilyclose to zero by allowing a sufficientlylarge numberof matches. Despite these poor formal properties, matching estimators do have some attractivefeatures that may account
for their popularity.In particular,matching estimators are extremely easy to
implement and they do not require consistent nonparametricestimation of
unknown functions. In this article we also propose a consistent estimator for
the varianceof matchingestimatorsthat does not require consistent nonparametric estimationof unknownfunctions.This result is particularlyrelevantbecause the standardbootstrapdoes not lead to valid confidence intervalsfor the
PROPERTIESOF MATCHINGESTIMATORS
237
simple matchingestimatorstudied in this article (Abadie and Imbens (2005)).
Softwarefor implementingthese methods is availablein Matlab,Stata, and R.2
2. NOTATIONAND BASIC IDEAS
2.1. Notation
We are interested in estimating the average effect of a binary treatment
on some outcome. For unit i, with i = 1, ..., N, following Rubin (1973), let
denote the two potential outcomes given the control treatment
Yi(0) and
and given Yi(1)
the active treatment, respectively.The variable Wi,with WiE {0, 1},
indicates the treatment received. For unit i, we observe Wiand the outcome
for this treatment,
Yi(O), if Wi= 0,
=
- =Yi(1), if Wi 1,
as well as a vector of pretreatmentvariablesor covariates,denoted by Xi. Our
main focus is on the population average treatment effect and its counterpart
for the population of the treated:
7 = E[Yi(1) - Yi(0)] and t7 = E[Yi(1) Yi(0)l
= 1].
See Rubin (1977), Heckman and Robb (1984), and Imbens (2004) for discussion of these estimands.
We assume that assignmentto treatment is unconfounded (Rosenbaum and
Rubin (1983)), and that the probabilityof assignment is bounded away from
0 and 1.
1: Let X be a randomvector of dimension k of continuouscoASSUMPTION
variates distributedon Rk with compact and convex supportX, with (a version
of the) density bounded and bounded awayfrom zero on its support.
2: For almost every x E X, where X is the supportof X,
ASSUMPTION
(i) (unconfoundedness) W is independent of (Y(0), Y(1)) conditional on
X = x;
(ii) (overlap) q < Pr(W = 1IX = x) < 1 - q for some r > 0.
The dimension of X, denoted by k, will be seen to play an important role
in the properties of matching estimators. We assume that all covariates have
2Software for STATA and Matlab is available at http://emlab.berkeley.edu/users/imbens/
estimators.shtml.Software for R is available at http://jsekhon.fas.harvard.edu/matching/Match.
html. Abadie, Drukker,Herr, and Imbens (2004) discussthe implementationin STATA.
238
A. ABADIE AND G. W.IMBENS
continuous distributions.3Compactness and convexity of the support of the
covariates are convenient regularityconditions. The combination of the two
conditions in Assumption 2 is referred to as strong ignorability(Rosenbaum
and Rubin (1983)). These conditions are strong and in many cases may not be
satisfied.
Heckman, Ichimura,and Todd (1998) point out that for identificationof the
average treatment effect, r, Assumption 2(i) can be weakened to mean independence (E[Y(w)IW, X] = E[Y(w)IX] for w = 0, 1). For simplicity,we
assume full independence, although for most of the results, mean independence is sufficient.When the parameterof interest is the averageeffect for the
treated, 7', Assumption 2(i) can be relaxed to require only that Y(0) is independent of W conditional on X. Also, when the parameter of interest is r',
Assumption 2(ii) can be relaxed so that the support of X for the treated (X1)
is a subset of the supportof X for the untreated (Xo).
2': For almost every x E X,
ASSUMPTION
(i) W is independent of Y(0) conditional on X = x;
(ii) Pr(W = 11X = x) < 1 - 1 for some 71> 0.
Under Assumption 2(i), the average treatment effect for the subpopulation
with X = x equals
(1)
= x]
-7() = E[Y(1)- Y(O)IX
= E[YIW = 1, X = x] - E[YIW = 0, X = x]
almost surely. Under Assumption 2(ii), the difference on the right-handside
of (1) is identified for almost all x in X. Therefore, the average effect of
the treatment can be recovered by averaging E[YIW = 1, X = x] - E[YI
W = 0, X = x] over the distributionof X:
7=
= 1,X = x] - E[YW= 0, X = x]].
E[r(X)]= E[E[YYW
Under Assumption 2'(i), the average treatment effect for the subpopulation
with X = x and W = 1 is equal to
(2)
7t(x) = E[Y(1) - Y(O)jW= 1, X = x]
= E[YW = 1, X = x]- E[YIW = 0, X= x]
3Discretecovariateswith a finitenumberof supportpoints can be easilydealtwith by analyzing
estimation of average treatment effects within subsamplesdefined by their values. The number
of such covariatesdoes not affect the asymptoticpropertiesof the estimators.In small samples,
however, matches along discrete covariatesmay not be exact, so discrete covariatesmay create
the same type of biases as continuouscovariates.
PROPERTIESOF MATCHINGESTIMATORS
239
almost surely. Under Assumption 2'(ii), the difference on the right-handside
of (2) is identified for almost all x in X1. Therefore, the average effect of the
treatment on the treated can be recoveredby averagingE[YIW = 1, X = x] E[YIW = 0, X = x] over the distributionof X conditionalon W = 1:
7"
=
E[T'(X)IW = 1]
= E[E[YIW= 1,X = x] - E[YIW= 0, X= x]W = 1].
Next, we introduce some additional notation. For x X and w e {0, 1},
= E[Y(w)IX = x], Oa2(x, w) =
let gu(x, w) = E[YIX = x, W = w],
V(YIX = x, W = w),
o-'(x)=
= x), and
= V(Y(w)IX,w(x)
=
Yi - Awi(Xi). Une•
= o-2(x).
Let fw(x) be the
and
der Assumption 2, ,a(x, w)
w)
W = w ando'2(x,
let e(x) = Pr(W = 1IX = x) be the
conditional density of X given~w,(x)
propensity score (Rosenbaum and Rubin (1983)). In part of our analysis,we
adopt the following assumption.
3: Assume {(Yi,
ASSUMPTION
distributionof (Y, W, X).
,
Xi)}N=,
are independent draws from the
In some cases, however, treated and untreated are sampled separatelyand
their proportionsin the sample may not reflect their proportionsin the population. Therefore, we relax Assumption 3 so that conditional on Wi,sampling
is random. As we will show later, relaxingAssumption 3 is particularlyuseful
when the parameterof interest is the averagetreatment effect on the treated.
The numbers of control and treated units are No and N1, respectively,with
N = No + N1. We assume that No is at least of the same order of magnitude
as N1.
3': Conditional on Wi= w, the sample consists of indepenASSUMPTION
dent draws from Y, XIW = w for w = 0, 1. For some r > 1, N/No -- 6 with
0< 0<00.
In this article we focus on matching with replacement, allowing each unit
to be used as a match more than once. For x e X, let Ilxll= (x'x)1/2 be the
standardEuclidean vector norm.4Let j,,(i) be the index j E {1, 2,..., N} that
solvesWj= 1- Wiand
Y'
XWi-} { |Xl
I:. =I-wi
-
X1il:
I|Xj
=
m,
4Alternativenorms of the form Ilxllv= (x'Vx)1/2 for some positive definite symmetricmatrix V are also covered by the results below, because Ilxllv = ((Px)'(Px))1/2 for P such that
P'P = V.
240
A. ABADIE AND G. W IMBENS
where fL{.}is the indicator function, equal to 1 if the expression in brackets
is true and 0 otherwise. In other words, jm(i) is the index of the unit that is
the mth closest to unit i in terms of the covariatevalues, among the units with
the treatmentopposite to that of unit i. In particular,jl(i), whichwill be sometimes denoted by j(i), is the nearest match for unit i. For notational simplicity
and because we consider only continuous covariates,we ignore the possibility
of ties, which happenwith probability0. Let JM(i) denote the set of indices for
the firstM matchesfor unit i: JM(i) = {j (i), ..., jm(i)}.5Finally,let KM(i) denote the number of times unit i is used as a match given that M matches per
unit are used:
N
KM(i)-= /=1
n1{iEJM(l)}.
The distributionof KM(i) will play an important role in the variance of the
estimators.
In many analyses of matching methods (e.g., Rosenbaum (1995)), matching is carried out without replacement, so that every unit is used as a match
at most once and KM(i) < 1. In this article, however, we focus on matching
with replacement, allowing each unit to be used as a match more than once.
Matchingwith replacementproduces matches of higher qualitythan matching
without replacement by increasing the set of possible matches.6In addition,
matchingwith replacementhas the advantagethat it allows us to consider estimators that match all units, treated as well as controls, so that the estimand is
identical to the population averagetreatmenteffect.
2.2. TheMatchingEstimator
The unit-leveltreatmenteffect is 7i = Yi(1) - Yi(0). For the units in the sample, only one of the potential outcomes, Yi(0) and Y(1), is observed and the
other is unobserved or missing. The matching estimator imputes the missing
potential outcomes as
if
Yi,
Yi(0)
if=I,
i
= 0,
Y,
M
jEM (i)
SForthis definition to make sense, we assume that No > M and N1 > M. We maintain this
assumptionimplicitlythroughout.
'As we show below, inexactmatchesgenerate bias in matchingestimators.Therefore,expanding the set of possible matcheswill tend to produce smallerbiases.
PROPERTIESOF MATCHINGESTIMATORS
241
and
Y, if
( 1)IjEJM(i)
-=0,
if W= 1,
Yi,
leading to the following estimatorfor the averagetreatmenteffect:
1NKm
-
1
(3)
"M
i=1
(Y(1)
=
Y(0))
L(2W
(i)
- 1) 1 +
)
-M-
i=1
This estimatorcan easily be modified to estimate the average treatmenteffect
on the treated:
1
(4)
(4)
= Nz
t-Ntl
(Yi-
Y (0))
=
Wi=l
y-)ll(w
1
- (1 -
N
i=1
i)M
Y.
ii-w -iKM(i))Yi'M
It is useful to compare matching estimators to covariance-adjustment
or regression imputation estimators. Let ',,(Xi) be a consistent estimator
of
Let
It,(Xg).
(5)
Yi,
if W = 0,
Yki(0) -=
= 1,
ifWi
A0(XA),
,i7
/)m
0,
(Xi), if Wi-=
l/`
=
if
Wi 1.
Yi,
i-1
The regressionimputationestimatorsof r and 7r are
(6)
and
-reg
i= 1
1Li((1)-
Yi(0))
reg,t))
1.
Wi=1
In our discussion we classify as regression imputation estimators those for
which ',(x) is a consistent estimator of pt(x). The estimators proposed by
Hahn (1998) and some of those proposed by Heckman, Ichimura,and Todd
(1998) fall into this category.7
If l,(Xi) is estimated using a nearest neighbor estimatorwith a fixed number of neighbors, then the regression imputation estimator is identical to the
matching estimator with the same number of matches. The two estimators
7In a working paper version (Abadie and Imbens (2002)), we consider a bias-correctedversion of the matching estimator that combines some of the feature of matching and regression
estimators.
242
A. ABADIE AND G. W.IMBENS
differ in the way they change with the sample size. We classify as matching
estimators those estimators that use a finite and fixed number of matches.
Interpretingmatching estimators in this way may provide some intuition for
some of the subsequent results. In nonparametricregression methods one
typicallychooses smoothingparametersto balance bias and varianceof the estimated regressionfunction. For example, in kernel regressiona smallerbandwidth leads to lower bias but higher variance. A nearest neighbor estimator
with a single neighbor is at the extreme end of this. The bias is minimized
no
within the class of nearest neighbor estimators,but the variance of
'w(x)
matchas
shall
we
the
size.
vanishes
with
show,
Nevertheless,
sample
longer
ing estimatorsof averagetreatment effects are consistent under weak regularity conditions. The variance of matchingestimators,however, is still relatively
high and, as a result, matchingwith a fixed number of matches does not lead
to an efficient estimator.
The firstgoal of this articleis to derive the propertiesof the simple matching
estimatorin large samples, that is, as N increases,for fixed M. The motivation
for our fixed-M asymptoticsis to providean approximationto the samplingdistributionof matchingestimatorswith a small numberof matches. Such matching estimators have been widely used in practice. The properties of interest
include bias and variance.Of particularinterest is the dependence of these results on the dimension of the covariates.A second goal is to provide methods
for conducting inference through estimation of the large sample variance of
the matchingestimator.
3. LARGE SAMPLEPROPERTIESOF THE MATCHINGESTIMATOR
In this section we investigate the properties of the matching estimator,M',
defined in (3). We can decompose the difference between the matching estimator 'M and the population averagetreatmenteffect 7 as
TM - 7
(7)
=
(T(X) - 7) + EM + BM,
where 7(X) is the averageconditional treatmenteffect,
(8)
r(X)=
1N
L(1 (Xi)i=1
-o(X)),
E, is a weighted average of the residuals,
(9)
E='
=N LE
i=1
=
N
i=1
(2i
/
- 1) 1 + KM(i)
PROPERTIESOF MATCHINGESTIMATORS
243
and BMis the conditionalbias relative to r(X),
(10)
BM =
BLB i
i=1
1N11-
( L(.iWi(xi)
=Ni=1 (2W 1)
m=l1
lwi(Xjm(i))) .
The firsttwo terms on the right-handside of (7), (7(X) - 7) and EM, have
zero mean. They will be shown to be of order N-1/2 and asymptoticallynormal. The firstterm depends only on the covariates,and its varianceis VT(x)/N,
where VT(x)= E[(7(X) - 7)2] is the varianceof the conditionalaveragetreatment effect 7(X). Conditionalon X and W (the matrixand vector with ith row
equal to Xj and Wi,respectively),the varianceof' is equal to the conditional
varianceof the second term, V(EMIX,W). We will M
analyzethis variancein Section 3.2. We will refer to the third term on the right-handside of (7), BM,as
the conditional bias, and to E[BM]as the (unconditional)bias. If matching is
exact, Xi = Xjm(i)for all i and the conditional bias is equal to zero. In general
it differs from zero and its properties,in particularits stochasticorder, will be
analyzedin Section 3.1.
Similarly,we can decompose the estimator for the average effect for the
treated, (4), as
(11)
- rT= (r(X) t - 7') + Et + BY,
where
=
N7(X)
W(p(X , 1)
i= 1
N
SN1
N,
-ko(Xi)),
Eti=
N
w
KM(i))
i=1
and
M N
_tt
i=1
1
=
Mi
N1
i=1
-1B
M m=1 (O(Xi) -
ILO(Xjm(i)))"
3.1. Bias
Here we investigate the stochastic order of the conditional bias (10) and
its counterpart for the average treatment effect for the treated. The conditional bias consists of sums of terms of the form pi(Xjm(i,)) - I1(Xi) or
244
A. ABADIE AND G. W.IMBENS
Ipo(Xi)- iLo(Xjm(j)).To investigate the nature of these terms, expand the difference Itl(XIm(j))- At,(Xi) around
Xi:
I
(i)I) - 1
1(Xi)
,1(Xjm
= (Xjm
ii - Xi)'
(Xi)
I1
+2
1
'
X)') dxd2
- X,)+ O(lXjm(,- IX,)I
dx (Xi)(XjmU)
-(X ')2
To studythe components of the bias, it is therefore useful to analyzethe distribution of the k vector Xjm(j)- Xi, which we term the matchingdiscrepancy.
First, let us analyze the matchingdiscrepancyat a general level. Fix the covariatevalue at X = z and suppose we have a randomsample X1, ..., XN with
density f(x) over a bounded support X. Now consider the closest match to z
- zlj and let U1 = Xj, - z be the
in the sample. Let ji = argmin.
IXj
in
We
are
interested
the distributionof the k vector U1.
matchingdiscrepancy.
More generally,we are interested in the distributionof the mth closest matching discrepancy,Um= Xjm- z, where jmis the mth closest match to z from the
randomsample of size N. The followinglemma describessome key asymptotic
propertiesof the matchingdiscrepancyat interiorpoints of the supportof X.
LEMMA1-Matching Discrepancy-Asymptotic Properties: Suppose that
is
f differentiablein a neighborhoodof z. Let Vmr= N1/k Um and let fVm,(v)be
the densityof Vm.Then
lim fvm,(V)
N-+
c~
f(z)
k f(z)
2r.k/2
(m 1)! IxV k F(k/2)
m-i
k f(z)
2nk/2
k F(k/2)
'
where TF(y)= f0 e-'ty-1 dt (for y > 0) is Euler's gamma function. Hence
Um= O (N-1/k ). Moreover,thefirst threemomentsof Umare
1
()
k
E[Um]_=F(mk+2 ) (m - 1)!k
1
1
X af (Z
N2/k
f(z) dx
] = F(mk+2
E[UmUm
k ) (m 2/k
1
rk/2
-2/k
F(1 + k/2)
1
N2/k
1)!k f(Z)
,
(1 +k/2k/2)
'N2/k
whereIk is the identitymatrixof size k and E[IIUmI3]= O(N-3/k)
-21k
N-22/k
1
PROPERTIESOF MATCHINGESTIMATORS
245
(All proofs are given in the Appendix.)
This lemma shows how the order of the matchingdiscrepancyincreaseswith
the numberof continuouscovariates.The lemma also shows that the firstterm
in the stochasticexpansionof N1/k U,, has a rotation invariantdistributionwith
respect to the origin. The following lemma shows that for all points in the support, includingthe boundarypoints not covered by Lemma 1, the normalized
moments of the matchingdiscrepancies,U,rn,are bounded.
LEMMA2-Matching Discrepancy-Uniformly Bounded Moments: If Assumption1 holds, thenall the momentsof N1/k 11Um
IIare uniformlyboundedin N
and z e X.
These results allow us to establish bounds on the stochastic order of the
conditional bias.
1-Conditional Bias for the Average TreatmentEffect: Under
THEOREM
1,
Assumptions 2, and 3, (i) if uot(x) and ptl(x) are Lipschitz on X, then
is not in generallowerthan N-2/k
BM= Op(N-1/k), and (ii) the orderof E[IBM]
Consider the implicationsof this theorem for the asymptoticproperties of
the simple matching estimator. First notice that, under regularityconditions,
Q(r(X) - 7) = Op(l) with a normallimitingdistribution,by a standardcentral limit theorem. Also, it will be shown later that, under regularityconditions
,IEM = Op(1), againwith a normallimitingdistribution.However,the result
of the theorem implies that VNBM is not O,(1) in general. In particular,if
k is large enough, the asymptoticdistributionof -(2'VM- r) is dominatedby
the bias term and the simple matchingestimator is not N1/2-consistent. However, if only one of the covariatesis continuouslydistributed,then k = 1 and
BM= Op(N-1), so -N('M - 7) will be asymptoticallynormal.
The following result describes the properties of the matching estimator for
the average effect on the treated.
THEOREM
2-Conditional Bias for the Average TreatmentEffect on the
Treated: UnderAssumptions1, 2', and 3'
(i) if uo(x) is Lipschitzon X0,then B' = Op(Nr/k), and
(ii) if X,1is a compact subset of the interiorof X0o,Lo(x) has bounded third
derivativesin the interiorof Xo, and fo(x) is differentiablein the interiorof Xo
withboundedderivatives,then
Bias' = E[B']
-
(1
M
m= 1
1
2)
(mk
k
(m-
1)!k
1
1
Nl2r/k
A. ABADIE AND G. W.IMBENS
246
x 02/k
k/2
(x)
x
-2/k
F( +
f dx'(x)
fo(x)
(x) +
x f, (x)dx + o -x /k
d2o
2 tr( ax'dx(x))}
This case is particularlyrelevant because often matching estimators have
been used to estimate the average effect for the treated in settings in which a
large number of controls are sampled separately.Typicallyin those cases the
conditional bias term has been ignored in the asymptotic approximationto
standarderrors and confidence intervals.Theorem 2 shows that ignoring the
conditional bias term in the first-orderasymptotic approximationto the distribution of the simple matching estimator is justified if No is of sufficiently
high order relative to N1 or, to be precise, if r > k/2. In that case it follows
that B' = o,(N 1/2) and the bias term will get dominated in the large sample distributionby the two other terms, 7(X) - 7' and E', both of which
are
Op(N 1/2)
In part (ii) of Theorem 2, we show that a general expression of the bias,
E[B' ], can be calculated if X, is compact and X1 C int Xo (so that the bias is
not affected by the geometric characteristicsof the boundaryof Xo). Under
these conditions, the bias of the matchingestimator is at most of order NJ2/k.
This bias is furtherreducedwhen /0o(x) is constantor when /1o(x) is linear and
fo(x) is constant, among other cases. Notice, however, that usual smoothness
assumptions (existence of higher order derivatives)do not reduce the order
of E[Bt].
3.2. Variance
In this section we investigatethe varianceof the matchingestimator 'M. We
focus on the firsttwo terms of the representationof the estimatorin (7), that is,
the term that representsthe heterogeneityin the treatmenteffect, (8), and the
term that representsthe residuals,(9), ignoringfor the moment the conditional
bias term (10). Conditional on X and W, the matrix and vector with ith row
equal to X' and W1,respectively,the numberof times a unit is used as a match,
KM(i) is deterministicand hence the varianceof ~M is
(12)
=
V(JMIX,W) N
i
i= 1
?KA,
o)2(X,
i).
PROPERTIESOF MATCHINGESTIMATORS
247
For 'tF we obtain
(13)
V(t
X, W)
1 i=
(
- (1 - Wi) M
/
).
2(X,
Let VE = NV($MIX,W) and VE,t = NV( 'FIX, W) be the correspondingnormalized variances. Ignoring the conditional bias term, BM, the conditional
.
expectation of M is 7(X). The variance of this conditional mean is therefore VT(x)/N, where VT(X) = E[(7(X)- r)2]. Hence the marginal variance
of ',, ignoringthe conditional bias term, is V(TM) = (E[VE] (+ V(x))/N. For
the estimator for the average effect on the treated, the marginal variance
= (E[VE,t] + VT(x),t)/N1,
is, againignoringthe conditionalbias term,
V(7'F,)
where V(X),t -= E[(rt(X) - =t)21W = 1].
The following lemma shows that the expectationof the normalizedvariance
is finite. The key is that KM(i), the number of times that unit i is used as a
match, is 0,(1) with finite moments.8
LEMMA3-Finite Variance: (i) Suppose Assumptions 1-3 hold. Then
is boundeduniformlyin N for any q > 0. (ii) If,
KM(i) = Op(1) and EE[Kym(i)q]
in addition,o2(x, w) areLipschitzin Xfor w = 0, 1, then E[VE + Vy(X)] = 0(1).
= 0] is uni(iii) SupposeAssumptions1, 2', and 3'. Then
(No/NI)E[KM(i)qWi
>
are
in
N
0.
in
bounded
w)
addition,
for any q
Lipschitz
(iv) If,
formly
ar2(x,
in Xfor w = 0, 1, thenE[VEt + V(x),t] = 0(1).
3.3. ConsistencyandAsymptoticNormality
In this section we show that the matchingestimatoris consistentfor the average treatment effect and, without the conditional bias term, is N1/2-consistent
and asymptotically normal. The next assumption contains a set of weak
smoothness restrictionson the conditional distributionof Y given X. Notice
that it does not require the existence of higher order derivatives.
4: For w = 0, 1, (i) ba(x,w) and U2(x, w) are Lipschitzin X,
ASSUMPTION
of the conditional distributionof Y given W = w and
moments
the
fourth
(ii)
X = x exist and are bounded uniformly in x, and (iii) 0-2(x, w) is bounded
awayfrom zero.
THEOREM
3-Consistency of the MatchingEstimator:
(i) SupposeAssumptions1-3 and 4(i) hold. Then 'M (ii) SupposeAssumptions1, 2', 3', and 4(i) hold. Then
-
-
_
0.
--
0•40.
8Notice that, for 1 < i < N, KM(i) are exchangeablerandom variables and therefore have
identicalmarginaldistributions.
248
A. ABADIE AND G. W. IMBENS
Notice that the consistency result holds regardlessof the dimension of the
covariates.
Next, we state the formal result for asymptotic normality.The first result
gives an asymptoticnormalityresult for the estimators M, and ,-F after subtractingthe bias term.
THEOREM
4-Asymptotic Normalityfor the MatchingEstimator:
(i) SupposeAssumptions1-4 hold. Then
(VE
Vr(X)-I/2
-
N(•M
BM- T)
g(0,
1).
(ii) SupposeAssumptions1, 2', 3', and 4 hold. Then
(VE,t
VT7(X),t/2
-
1(
B
-
't)
(0, 1).
Although one generallydoes not know the conditionalbias term, this result
is useful for two reasons. First, in some cases the bias term can be ignored
because it is of sufficientlylow order (see Theorems 1 and 2). Second, as we
show in Abadie and Imbens (2002), under some additional smoothness conditions, an estimate of the bias term based on nonparametricestimation of
/0o(x) and i/l(x) can be used in the statement of Theorem 4 without changing
the resultingasymptoticdistribution.
In the scalar covariate case or when only the treated are matched and the
size of the control group is of sufficientorder of magnitude,there is no need
to remove the bias.
COROLLARY
1-Asymptotic Normalityfor MatchingEstimator-Vanishing
Bias:
(i) SupposeAssumptions1-4 hold and k = 1. Then
(VE + VT(X))-1/2N(M -- r)
nA(0,1).
(ii) SupposeAssumptions1, 2', 3', and 4 hold, and r > k/2. Then
(VEt + v-(Xt)-tl/2
NtI($
-
T')
d
A'(0, 1).
3.4. Efficiency
The asymptoticefficiencyof the estimatorsconsidered here depends on the
limit of E[VE], which in turn depends on the limitingdistributionof KM(i). It is
difficult to work out the limiting distributionof this variable for the general
PROPERTIESOF MATCHINGESTIMATORS
249
case.9 Here we investigate the form of the variancefor the special case with a
scalar covariate(k = 1) and a general M.
THEOREM
5: Suppose k = 1. If Assumptions1-4 hold, and fo(x) and fi(x)
are continuouson int X, then
N
1 - e(X)
e(X)
N.(M)
+
+
I2(X)
0"2(X)-+-
V(FE)-=E
+
V7-(x)
V•.(x)
V(x)
- e(X) o-1(X)
?1E
2M L(e(X)
+
1
e(X)-(1
-
e(X))
o2(X)
+ o(1).
Note that with k = 1 we can ignore the conditionalbias term, BM. The semiparametricefficiencybound for this problemis, as establishedby Hahn (1998),
Veff
o(X)
[. e(X)
V
o2(X)
7(X)
1- e(X)
The limiting variance of the matching estimator is in general larger. Relative
to the efficiencybound it can be written as
lim<SN.-V(M)N--o
Veff
Veff
1
2M
The asymptoticefficiency loss disappearsquicklyif the number of matches is
large enough and the efficiencyloss from using a few matches is very small. For
example, the asymptoticvariancewith a single match is less than 50% higher
than the asymptoticvariance of the efficient estimator and with five matches,
the asymptoticvarianceis less than 10%higher.
4. ESTIMATINGTHE VARIANCE
Corollary1 uses the square roots of VE + V7(x) and VE,t + Vr(x),', respectively,as normalizingfactorsto obtain a limitingnormaldistributionfor matching estimators. In this section, we show how to estimate these asymptotic
variances.
9The key is the second moment of the volume of the "catchmentarea"AM(i), defined as the
subset of X such that each observation,j, with Wj= 1 - Wi and X E AM(i) is matched to i.
In the single match case with M = 1, these catchmentareas are studied in stochasticgeometry
where they are known as Poisson-Voronoitessellations(Okabe, Boots, Sugihara,and Nok Chiu
(2000)). The varianceof the volume of such objects under uniformfo(x) and fi(x), normalized
by the mean volume, has been workedout analyticallyfor the scalarcase and numericallyfor the
two- and three-dimensionalcases.
A. ABADIE AND G. W.IMBENS
250
4.1. Estimatingthe ConditionalVariance
Estimatingthe conditionalvariance,
VE =
+KM(i)/M)2J2(Xi,
conditional
outcome variis complicatedby the fact that it involvestheN=1(1
Wi)/N,
In
these
conditional
variances
could
be
ances, a2(x, w). principle,
consistently
estimated using nonparametricsmoothing techniques. We propose, however,
an estimatorof the conditionalvarianceof the simple matchingestimatorthat
does not require consistent nonparametricestimation of unknownfunctions.
Our method uses a matchingestimatorfor o-2(x,w), where instead of the original matchingof treated to control units,we now match treatedunits to treated
units and control units to control units.
Let Cm (i) be the mth closest unit to unit i among the units with the same
value for the treatment.Then, for fixedJ, we estimate the conditionalvariance
as
(14)
2(Xi,
-)=
J+ 1
-
J
.
Notice that if all matches are perfect so X(j, = Xi for all j = 1,..., J, then
= x, W1= w] = o2(x, w). In practice,if the covariatesare conIE[2(Xi,
i)jIXi
not
be possible to find perfect matches, so f2(Xf, Wi)will be
it
will
tinuous,
only asymptoticallyunbiased. In addition,because a2(Xi, W1)is an averageof
a fixed number (i.e., J) of observations,this estimator will not be consistent
for U2(Xi, W1).However, the next theorem shows that the appropriateaverages of the f2(Xf , 14) over the sample are consistent for VE and VE,t.
6: Let ~2(Xi, WK)
be as in (14). Define
THEOREM
KM(i) 2
^1
1N M,"
PE= i=1
-2(Xi
-K),
W
Et N1
i=1
i/
wiM(i)22(X,
M
W).
IfAssumptions1-4 hold, then IVE - VEI = o, (1). IfAssumptions1, 2', 3', and 4
hold, then I/VE,t- VEt = Op(1l).
4.2. Estimatingthe MarginalVariance
Here we develop consistent estimators for V = VE + V/(X) and V' = VEt +
VT(x),'.The proposed estimatorsare based on the same matchingapproachto
ESTIMATORS
PROPERTIES
OFMATCHING
251
estimating the conditional error variance
w) as in Section 4.1. In addition, these estimatorsexploit the fact that o•2(X,
)2] V(x) + E e +
•
Y,(0)
E[(Yi(1)-
8 in(i)
The average on the left-hand side can be estimated as ji(gY(1) - Yg(0) ITM)2/N.To estimate the second term on the right-handside, we use the fact
that
1i=1
m=
Si=1
which can be estimated using the matching estimator for 0-2(Xi, 1). These
two estimates can then be combined to estimate VT(x)and this in turn can be
combined with the previouslydefined estimatorfor VE to obtain an estimator
of V.
7: Let a2(Xi,
THEOREM
5Wi)
1
be as in (14). Define
2
-
Yi(0)
V
i=1 (Y(1)
--T)M
N•
N
1
KM(i)
+N M~L
2
(2M- 1 (KM(i)T-2 (
M
M
i
and
1
(2
i=
Yi( )
,_
If Assumptions1-4 hold, then IV - V = op(1). If Assumptions1, 2', 3', and 4
hold, then IVt- VtI = op(1).
5. CONCLUSION
In this article we derive large sample properties of matching estimators of
average treatment effects that are widely used in applied evaluation research.
The formal large sample propertiesof matchingestimatorsare somewhat surprisingin the light of this popularity.We show that matchingestimatorsinclude
A. ABADIE AND G. W.IMBENS
252
a conditional bias term that may be of order larger than N-1/2. Therefore,
matching estimators are not N1/2-consistent in general and standard confidence intervals are not necessarily valid. We show, however, that when the
set of matchingvariables contains at most one continuouslydistributedvariable, the conditional bias term is o,(N-1/2), so that matching estimators are
N1/2-consistent in this case. We derive the asymptoticdistributionof matching estimatorsfor the cases when the conditionalbias can be ignored and also
show that matching estimatorswith a fixed number of matches do not reach
the semiparametricefficiencybound. Finally,we propose an estimator of the
asymptoticvariance.This is particularlyrelevantbecause there is evidence that
the bootstrapis not valid for matchingestimators(Abadie and Imbens(2005)).
John F Kennedy School of Government,Harvard University,79 John F
KennedyStreet, Cambridge,MA 02138, U.S.A.; and NBER; alberto_abadie@
harvard.edu;http://www.ksg.harvard.edu/fs/aabadie/
and
and
Economics
Dept. of Agriculturaland ResourceEconomics, UniDept. of
at
versityof California Berkeley,661 Evans Hall #3880, Berkeley,CA 947203880, U.S.A.; and NBER; [email protected];
http://elsa.berkeley.edu/
users/imbens/.
ManuscriptreceivedAugust,2002;final revisionreceivedMarch,2005.
APPENDIX
Before provingLemma 1, we collect some results on integrationusing polar
coordinatesthat will be useful. See, for example,Stroock(1994). Let Sk= to E
Rk: wto
I = 11be the unit k sphere and let Ask be its surfacemeasure.Then the
area and volume of the unit k sphere are
(dw)
fAsk
Asdk
27k/2
F(k/2)
and
k/2
ASk(dw) dr=
frk-1
S(d)fOI
respectively.In addition,
f
dr
27"k/2
kF(k/2)
T(1 + k/2)'
A (do)= 0
sk
4k
and
s~k k
~(d~ -Iw)=
k
fs As (do)
F(A
k (d))
3Tk/2
(1 + k/2)
'
PROPERTIES
OFMATCHING
ESTIMATORS
253
where Ik is the k-dimensionalidentitymatrix.For any nonnegativemeasurable
function g(.) on I'k,
r -1( g(r)Ak (dw))dr.
IRkg(x)dx=
We will also use the following result on Laplace approximationof integrals.
LEMMA
A.1: Let a(r) and b(r) be two realfunctions; a(r) is continuousin a
neighborhoodof zero and b(r) has continuousfirst derivativein a neighborhood
of zero. Supposethat b(O)= 0, b(r) > 0 for r > 0 and thatfor everyi > 0, the
infimumof b(r) over r > F as positive. Supposealso that thereexistpositive real
numbersao, bo, a, and p such that
lim a(r)rl-a
r-+0
=
ao,
lim b(r)r-0 = bo,
and
db
= bo1.
lim
r- O
dr(r)r1'-
r--O
Suppose also that fo lIa(r)lexp(-Nb(r)) dr < oc for all sufficientlylarge N.
Then, for N -> ,00
fo
a(r) exp(-Nb(r)) dr =F
ao
Nalp
o alb8
i
?0
Nalp
.
The proof follows from Theorem 7.1 in Olver (1997, p. 81).
PROOFOF LEMMA1: First consider the conditional probability of unit i be-
ing the mth closest match to z, given Xi = x:
Pr(jm=
=
iIXi
x) =
-)(Pr(lIX
(N 1m-1
-
zl > Ix zl))N-
x (Pr(llX - zIl< lix - zll))m-1
Because the marginalprobabilityof unit i being the mth closest match to z is
Pr(jm= i) = 1/N and because the density of Xi is f(x), then the distribution
of Xi conditional on it being the mth closest match is
fxiijm=i(x)
=
= Nf(x) Pr(jm =
ilXi x)
Nf(x)
N-l
(1 - Pr(lIX
-
x (Pr(llX - zll < Ilx - zll))m-1
zI lIx
- zl))N-m
254
A. ABADIE AND G. W.IMBENS
and this is also the distributionof Xj,. Now transformto the matchingdiscrepancy Um= Xj,,,- z to get
(A.1)
fu,,(u) = N (N-
u)(1 Pr( X
f(z
))
x (Pr(lJX- zjj
Ilui))m-.
<_
Transformto Vm= NI/kUm with JacobianN-1 to obtain
=
fM(V)
f z+-
1-Pr
N)
(JX-
x(Pr
X-zll<_
m-l
v
zj
NIk(N
= N-m
N
f
<_N1/
+N
Pr1
x"(1-
( NPr
N(+0u
"vl
Ix-ztk<
- z~
x PrX
(1 + o(1))
IIX - zl
N1/k
Note that
Pr(IIX zji
/IvI/N1/k
_< lvlN-1/k)=-
rk-1i
(j
Sk
f(z+ rw)
(dw)) dr,
where as before Sk { E Rk:• t = 1) is the unit k sphere, and Ask is its sur=face measure.The derivative
of Pr( lX - zll < IlvllN-1/k) with respect to N is
-1+
f
z
Ask(dw).
N/k (0
Therefore, by l'Hospital'srule,
XltvtlN-/k) =
lim Pr(t
zlj<
1
1/N
N-.o~
U-m
f
I-lv•fk
In addition,it is easy to check that for fixed m,
N-m
Nm-1
+o(1).
(m-1)!+?(1)"
f
k(Z)kAsk(dmo).
Sk
PROPERTIESOF MATCHINGESTIMATORS
255
Therefore,
(m-1
f
f(z)
lim
(z)
1)!
,(V)=
-
g--
k
k
Skk
A (dw))
f•
(m-xexp(-IIv
kf(z)
kASk(dw).
The previous equation shows that the density of Vmconverges pointwise to a
nonnegative function that is rotation invariantwith respect to the origin. As
a result, the matching discrepancyUmis Op(N-1/k) and the limiting distribution of N1/k Um is rotation invariantwith respect to the origin.This finishesthe
proof of the first result.
Next, given fUm(u) in (A.1),
E[Um]=
N(N-I)Am,
where
Am
=f
u)(1
uf(z
RRk
-
Pr(lIX - zil < Ilull))N-m
x
(Pr(llX zll < I|ull))m-1du.
Boundedness of X implies that Am convergesabsolutely.It is easy to relax the
bounded support condition here. We maintainit because it is used elsewhere
in the article. Changingvariablesto polar coordinatesgives
Am =
10 Trk-1(fSk
rwf(z
+ rw)A k(d w)
x (1 - Pr(IIX- zl| < r))N-m(Pr(IX - zII< r))m-1 dr.
Then, rewritingthe probabilityPr(llX - zll < r) as
- zl
Skf(x)1{llx
< r})dx
=
f
=
(z +
sk-1 (1
v){llvll < r}dv
f(z + sw)Ak (dw) ds
and substitutingthis into the expressionfor Am gives
Am
=s
rk-1
(
rwf(z + rw)ASk(dw)
skl
k
256
A. ABADIE AND G. W.IMBENS
x
f(z
sk-1
Sk
=
+ Sw)ASk(dIw)) ds)
dr
e-Nb(r)a(r) dr,
where
Sk-1
b(r) = -log
1-
a(r) = rk
(f(z+
(z + Sw)ASk(dw)) ds
(k
and
ro)ASk (dw))
ds)m-l
(S f(Z+S)Ak(dw))
(fosk-i
f(Z + Sw)ASk(dw)) ds)m"
(1 - fo sk1(k
That is, a(r) = rkc(r)g(r)m-l,
where
fsk Of (z + r)k
c(r) = 1
-
5
sk-1 (fk
f(z
(dw)
+ Sw)hsk(dw)) ds'
ds
f (z + sw)AASk(dw))
for
- fSk-1(fa
g(r)
sk-(fk f(z + S)Ak (dw)) ds
First notice that b(r) is continuous in a neighborhood of zero and b(0) = 0.
By Theorem 6.20 in Rudin (1976), sk-k fSk f(z + Sw)Agk (dw) is continuousin s
and
db
dr
(r) =
rk-1(fSk f (z
1-
fo
+ ro)Ak (do))
f (z + sw))hAk(do)) ds'
Sk(fk
which is continuous in r. Using l'Hospital'srule,
limb(r)r-k = lim 1 db
dr
r-*O
r--Okrk-1
kk1
(dw).
k
f
Similarly,c(r) is continuous in a neighborhoodof zero, c(0) = 0, and
lim c(r)r-1 = lim
r-+O
r-+O
=
w
(r) =
d(r
)
hAk(dw)
f
k
(z)
=
•z)
ASk
(dw).
PROPERTIESOF MATCHINGESTIMATORS
257
Similarly,g(r) is continuous in a neighborhoodof zero and g(O) = 0, and
lim g(r)r-k = lim 1 d(
rO
r-+O krk-l dr
(z)
k
A (dw).
sk
Therefore,
limg(r)m-lr(ml)k
(lim g(r)
r
1
m-
(z)
k
A(dw)m-
r--+O
r--O
Now, it is clear that
lim a(r)r-(mk+l)
(limg(r)m-1r-(m-1)k)
\r-+0
r->0
k
f(z)
lim c(r)r
\)(r-+0
/
ASk (dw)
Sk
f (z)
=
-
(z)
k dx
Ask(dw)
Sk
m
kz)Sx
(dw))
hSk
(
1fdf(z).
Therefore, the conditions of Lemma A.1 hold for a = mk + 2, / = k,
ao=- kIf (z)
(z)
ASk(dw)
and
f (z)
bo
Ak, (dw).
Applying Lemma A.1, we get
ao
mk
Am=F(mk+2
+
=
1
kbmk+2)/kN(mk+2)/k
( N(mk+2)/k
mk ++ 22 1
mk
k
)
k/2
1T(z)
f(z) r(1 + k/2)
+ o N(mk+2)/k
-2/k
1 df
f(z) dx
1
N(mk+2)/k
A. ABADIE AND G. W. IMBENS
258
Therefore,
1
(mk+-2
]
k
(
x
rrk/2
(m - 1)!k (f
1
df (z)
+o
N2/k
f(z) dx
-2/k
(1 + k/2)
1
N2/k'
which finishes the proof for the second result of the lemma. The results for
and E[IIUmII3]follow from similararguments.
Q.E.D.
E[UmUm']
The proof of Lemma 2 is availableon the authors'webpages.
PROOFOFTHEOREM
1(i): Let the unit-level matching discrepancyUm,~=
Define
the
unit-level
conditionalbias from the mth match as
Xi Xjm(i).
Bm,i
=
=
-
Wi(/-to(Xi)
(Wi(Lto(Xi)
- (1 -
-
- (1 W')(bLi(XI) ,il(Xjm(i)))
.LoO(Xjm(i)))
+ Um,i))
.Lo(Xi -
Wi)(.1 (Xi)
Lipschitz assumption on /t0
+ Um,i)).
iA1(Xi
and /t1, we obtain IBm,
i < CIIUm,;i for
By the
some positive constant C1.The bias term is
N
BM =
NM E
M
EBm,i.
i=1 m=l
Using the Cauchy-Schwarzinequalityand Lemma 2,
]E[N2/
k(BM )2]
<(C2N2/kIE[
-
E/
C2N2/klE N
kI UM,i
WN
W,
i=1
N2/k
-+
1
E[NZ2/kUN|2Mi2WI,
..
,
WN,Xi]
Wi=o
for some positive constant C2. Using Chernoff'sinequality,it can be seen that
any moment of N/N1 or N/No is uniformly bounded in N (with N, > M for
PROPERTIESOF MATCHINGESTIMATORS
259
w = 0, 1). The result of the theorem follows now from Markov'sinequality.
This proves part (i) of the theorem. We defer the proof of Theorem 1(ii) until
after the proof of Theorem 2(ii), because the former will follow directlyfrom
the latter.
Q.E.D.
LEMMA
A.2: Let X be distributedwithdensityf (x) on some compactset X of
dimension k: X c Rk . Let Z be a compactset of dimension k that is a subsetof
int X. Supposethat f (x) is boundedand boundedawayfrom zero on X, 0 < f <
f < oc for all x E X. Supposealso that f (x) is differentiablein the interior
f(x)
_ boundedderivatives
X
with
is
of
supEint Idf(x)/dX1I < o0. Then N2/k IIE[Um]II
boundedby a constantuniformlyover z E Z and N > m.
The proof of Lemma A.2 is availableon the authors'webpages.
2: The proof of the first part of Theorem 2 is very
PROOFOF THEOREM
similarto the proof of Theorem 1(i) and therefore is omitted.
Consider the second part:
E[B[]
=
M
E [NM
Wio(X)
/o(Xjm(i)))
i=1
m=l
1
M
L
-
m=1
E[ o(Xi) - /o(Xjm(i))W
=
1].
Applying a second-orderTaylorexpansion,we obtain
Io(Xjm(i))
-
,Ao(Xi)
2
1
o
=diPo(Xi)Umi + tr
+
2
dx'
dx
(Xi)UmiUm i
O(llUm,i13).
dx'
Therefore, because the trace is a linear operator,
E[Ao(Xjm(i)) - A0o(Xi)IXi
= do
dx'
+
1
-
(z))E[Um,iXi
= z,
=
Wi 1]
= z, Wi = 1]
2
trd(
0'
(z)E[Um,iUm,,Xi
= Z,
+ O(EE[IIUm,ill3|Xi
i
=
z, i=
1]
= 1]).
Lemma 2 implies that the norms of N2/kIE[Um, iUIXi = z, 14 = 1] and
= z, 1 = 1] are uniformly bounded over z e X1 and No.
Ni/kE[IUm,ill3IXi
260
A. ABADIE AND G. W.IMBENS
Lemma A.2 implies the same result for N/kIE[Um,itXi = z, W = 1]. As a re= z, W 111(is uniformlybounded over
sult, IINZ/kE[Po(Xji())=o(Xi)Xi-= convergence theorem along
z e X1 and No. Applying Lebesgue's dominated
with Lemma 1, we obtain
=
Nu/kE[ILo(Xjm(,))[Lo(Xi)IW 1]
I
(mk+2
k
(m - 11)!k
x
7(
(fo(x)
x
-2/k
k/2
(I
+
f o(x)x
dx'
fo(x)
)
k12)
dx
-(x) + 2- tr
f,(x)dx
(x'dxd•2xo(x))
+ o(1).
Now the result follows easily from the conditions of the theorem.
Q.E.D.
PROOFOF THEOREM
1(ii): Consider the special case where tul(x) is flat
is flat in a neighborhoodof the boundary,B. Then
over X and
matching
bxo(x)
does not create bias. Matchingthe treated units creates a bias
the control units
that is similarto the formula in Theorem 2(ii), but with r = 1, 0 = p/(1 - p),
and the integraltaken over X n IBc.
Q.E.D.
PROOFOF LEMMA3: Define f = infx,,fw(x) and f =
with
supx,fw(x),
f > 0 and f finite. Let -i = supx,yEXIIx- yII.Considerthe ball B(x, u) with center x E X and radius u. Let c(u) (0 < c(u) < 1) be the infimumover x E X of
the proportion that the intersectionwith X represents in volume of the balls.
Note that, because X is convex, this proportion is nonincreasingin u, so let
c = c(ii) and c(u) > c for u < i. The proof consists of three parts. First we
derive an exponential bound for the probabilitythat the distance to a match,
- XilI, exceeds some value. Second,we use this to obtain an exponential
IlXjm(i)
bound
on the volume of the catchmentarea, AM(i), defined as the subset of X
such that i is matched to each observation,j, with = 1 - WJand Xj e AM(i).
WFormally,
Am(i)=Ix
Thus, if W = 1 -
X1i{ ,-
xil
|Xi
xij}
EMI.
and Xj E AM(i), then i E JM(j). Third, we use the expoJi
volume of the catchment area to derive an exponential
on
the
nential bound
PROPERTIESOF MATCHINGESTIMATORS
261
bound on the probabilityof a large KM(i), which will be used to bound the
moments of KM(i).
For the first part we bound the probabilityof the distance to a match. Let
x EX and u < N 1k
~•i Then
=
>
Xi
Pr(iXj-
uN
W, ...,
r
1-
.
rk-1
k-1/k
ur-W
O
+
Wif(x
Sk
, Xi =
WN, W = 1-
rw)As(dw)
x)
dr
-1i/k
<1 I-cf
dr
wirk-1 SkAs(dw)
O
= 1 - cfuk N-1
1-w
=k/2
F(1 + k/2)
Similarly,
Pr(IIXj-X|i| • uN
W,...,
WN,WJ
x)
-1-,Xg=
k/2
fk-1
-< fukN-1
UN1wi- F(1 + k/2)
Notice also that
Pr(X -
Xi>ll
>
W11
Pr(X - Xi > uN
=E
N-wl
m=O
)Pr(
,
jE
7j
x,
X
WN,
M(i))
W1,..., W, X = x, j= j(i))
X-X|
> uN
-1/k
= 1- W, Xi = X)NI-U
W,-m
W,
..., W , W x
Pr(XII - Xill uN
<_ -_1
x W1,..., WN,Wj= 1 -
, Xi = x).
In addition,
NwiPr(IIX - Xi
<-m!
< uN1
(1 +
k/2)"
. .. , WN,JW= 1 - W, Xi = x)m
X'=W1,
262
A. ABADIE AND G. W.IMBENS
Therefore,
Pr(liXj- Xll >
uNW•lW1,
M-1
m=
G
...., WN,X- =x, j J(i))
m
ITk/2
mi F( +k/2)
N
N
k
7Tuk/2
+ k/2)
-lukcf
N-w
-m
wi
_F(1
Then, for some constant C1> 0,
Pr(jlXj- X, ll >
uNi
< C1max{1,uk(M-1)
=
I
W, . .., WN,Xi x,
M-1
- ukCf
-F(1+
1
m=0
< CIMmax{1,uk(M-1)}exp
-(M?
(M
+
Notice that this bound also holds for u > N1/k
1)
-(
j
(i))
k2
+
k/2
k/2)
N-
+
-m
N
/
+ k/2)
lC-F(1
because in that case the prob-
i,
is zero.
- X|II >
abilitythat IIXIjm••i
i, the volume BM(i) of the catchmentarea AM(i),
Next, we consider for unituN••Il
= x,
defined as BM(i)= JAMi) dx. Conditional on
W1,..., WN,i E JM(j) Xi
and AM(i), the distribution of Xj is proportional to fi_w(x)1{x E AM(i)}.
Notice that a ball withradius (b/2) /kl( Tk/2/F(1 + k/2))1/k has volume b/2.
Therefore, for Xi in AM(i) and BM(i) > b, we obtain
(b/2)1l/2k
Pr( X-X
S(rk/2/F(1
Wi,
+ k/2))1/k
X = x,
iE
AM(i), BM(i) > b, jm(j)
..., WN,
f
2f
The last inequalitydoes not depend on Am(i) (given BM(i) > b). Therefore,
Pr( lX-
X1>
(7.k/2/r(i(b/2)I/k•
+ k/2))1/k
PROPERTIESOF MATCHINGESTIMATORS
263
As a result, if
(A.2)
(b/2)l/k
X - X>
Pr
(1 -+ k/2))l/k
(k/2(A.2)
Pr
X=
<8
i
EjM(j)
x,
W,...,WN,
2f
-f
then it must be the case thatPr(BM(i)> blW, ..., WN,X, = x, i
In fact, inequality(A.2) has been establishedabove for
M(j)) < 8.
k/2
7T
b 2Uk
Nw,(F(l + k /2)
and
= 2f
k/2
Muk
u (M-1)lexp
(M + 1)
(M
f fCxMmax{1,
cfi(1
f(1
+
k/
k/2)
Let t = 2uk7Tk/2/F(1 + k/2). Then
Pr(NwBM(i)> tlWj, ..., WN,Xi
<
= x,
iE
(j))
C2max{1,C3tM-1exp(-C4t)
for some positive constants, C2, C3, and C4. This establishes an uniform
exponential bound, so all the moments of NwBM(i) exist conditional on
=
J(j) (uniformlyin N).
W1,... , WN,Xi x, i EM
For the third part of the proof, consider the distributionof KM(i), the number of times unit i is used as a match. Let PM(i) be the probabilitythat an observationwith the opposite treatment is matched to observationi conditional
on AM(i):
PM(i)
=
AM() fw(x)
dx < fBBM(i).
Note that for n > 0,
E[(NPM1(i))
nXi
= x,
W, ...,
WN]
iE
x,
< E[(NPMP(i>))"Xi=
Wl,...,WN, AM(j)]
Xi = x, W ,...,
< fnE[(NwiBM(i))"
N, i E
,7AM(j)].
264
A. ABADIE AND G. W. IMBENS
= x, W1,..., WN]is uniformlybounded. CondiAs a result,
E[(NwPM(i))"|Xtional on PM(i)
and on X, = x, WV1,...,WN,the distributionof KM(i) is binomial with parametersNIw and PM(i). Therefore, conditional on PM(i) and
Xi = x, W1,..., WN,the qth moment of KM(i) is
E[Kq(i)|PM(i), X = x, W, ..., WN]
n
(i)"
S S(q, n)N,_w,!PM
(NI-w
n=on=O
S(q, n)(Nn_P,(i))",
- n)!
n=0
where S(q, n) are Stirling numbers of the second kind and q > 1 (see, e.g.,
Johnson, Kotz, and Kemp (1992)). Then, because S(q, 0) = 0 for q > 1,
E[KLgq(i)X,= x, W,...,
N]
C
ES(q,
n=1
n)g Nw,
for some positive constant C. Using Chernoff's bound for binomial tails, it
= E[(N j_/N )nl W] is
can be easily seen that E[(N_w/N, )nIXi = x,
W]- of the first
>
in
bounded
N
for
all
n
so
the
result
uniformly
1,
part of the
lemma follows. Because Km(i)q < KM(i) for 0 < q < 1, this proof applies also
to the case with 0 < q < 1.
Next, consider part (ii) of Lemma 3. Because the variance o2(x, w) is
Lipschitz on a bounded set, it is therefore bounded by some constant,
-2 =
o2(x, w). As a result, E[(1 + KM/M)2o-2(x, w)] is bounded by
supw,,x
o2E[(1 + KM/M)2], which is uniformlybounded in N by the result in the first
part of the lemma. Hence E[VE]= 0(1).
Next, consider part (iii) of Lemma 3. Using the same argument as for
E[K (i)], we obtain
| =0] yS(q, n)
E[Kq1(i)I
<
Therefore,
| =0]
(Z)E[K&(i)
n
n-1
which is uniformlybounded because r> 1.
[(No
())
= 0].
PROPERTIESOF MATCHINGESTIMATORS
265
For part (iv) notice that
1
i=
K
+
E
-
Y(1
Wi)
2(Xi)
i=1
(No
< 02+0JNO
K_ (i
)E[(EKM(i))2 wi=o]0
Therefore, E[VE,t] is uniformlybounded.
Q.E.D.
PROOFOFTHEOREM
3: We only prove the first part of the theorem. The
second part follows the same argument. We can write TM - 7 = (7(X) 7) + EM+ BM.We consider each of the three terms separately.First, by Assumptions 1 and 4(i), At(x) is bounded over x E X and w = 0, 1. Hence
(X) - to(X) - 7 has mean zero and finite variance. Therefore, by a stan11x
dard law of large numbers, r(X) - T
0. Second, by Theorem 1, BM=
7
=
because
O,(N-1/k)
Op(l). Finally,
E[e2X, W] < 0-2 and E[eiejiX,W] = 0
(i = j), we obtain
E[( NE )2]]=
EL
1+ -M
=[(+ KM(i))
2(X,
)
=
O(1),
where the last equality comes from Lemma 3. By Markov'sinequality EM=
Q.E.D.
Op(N-1/2) = op(l).
4: We only prove the firstassertionin the theorem bePROOFOFTHEOREM
cause the second follows the same argument.We can write N(M - BM- 7) =
Nr(T(X) - 7) + JNEM.First, consider the contributionof V2N(r(X) 7).
By a standardcentral limit theorem,
(A.3)
N(7(X)
- 7)
./(0', V(X)).
=
Second, consider the contribution of x/EM/
NVE.
NEM,/
Conditional on W and X the unit-level terms EM,, = (2WK- 1)(1 + KM(i)/M)ei
are independent with zero means and nonidentical distributions.The conditional varianceof E,~, is (1 + KM(i)/M)2u2(Xi, W). We will use a Lindeberg-
A. ABADIE AND G. W.IMBENS
266
For a given X, W, the LindeFeller central limit theorem for -/NEM/
y.V-E.
that
condition
berg-Feller
requires
(A.4)
NE
i
i=l
i?r[(EM,i)21.{1EM
rl/NVE
} X, W] -+ 0
for all r > 0. To prove that the (A.4) condition holds, notice that by Hilder's
and Markov'sinequalitieswe have
,
IEM,I rl NVE } X, W]
IE[(EM,i)211I{
< (E[(E ,i)41X,W])1/2(E[J1{EM,I> /NVE-I}IX,W])1/2
•q
X, W])1/2(pr(JEM,'i
< (E[(E,i)4
IX, W))
>/vNVE
<
E[(EM,i)2|X,VW]
(E[(EMi)41X,W])1/2
S(E[(EiX,
W)NVE
Let o2 = sup , o:2(x,w) < 00, 0o2 infw,x 2(x, w) > 0, and C = sup,x IE[eI
Xi= x, Wi= w] < ~o. Notice that VE > 0-2. Therefore,
N
'
>
-/NVE
i=1 E[(Em,i)21{IEM'i[
YV
<
1
(N(
-NVE
} W]
} X,
KM (ji) )4[4
1+
E[e1X,W)1/2
X, ]/2
+
(1 +
KM(i)/M)2ur2(Xi,Wi)
7q2NVE
-2C1/2
<2C/
7724
1+
i=1N
(i))4)
M
KMBecause E[(1 + KM(i)/M)4] is uniformlybounded, by Markov'sinequality,the
factor in parentheses is bounded in probability.Hence, the Lindeberg-Feller
condition is satisfiedfor almost all X and W. As a result,
N1/2
(
1(1 + KU(i)/M)2o2(Xi,
Finally, I/EM!/-V
N
and V/I(r(X)-
E
=1N1/2EM
=1 EM,i
d
dV(O,
1).
W))1/2
7) are asymptotically independent
(the central limit theorem for -NE,/EM - holds conditional on X and W).
PROPERTIESOF MATCHINGESTIMATORS
267
Thus, the fact that both converge to standard normal distributions,boundedness of VE and VT(x),and boundedness away from zero of VE imply that
BM - 7) converges to a standardnormal distribu(VE + Vr(X))-1/2N1/2(_M
tion.
Q.E.D.
The proofs of Theorems 5, 6, and 7 are availableon the authors'webpages.
REFERENCES
J. HERR,ANDG. IMBENS
ABADIE,A., D. DRUKKER,
(2004): "ImplementingMatchingEstimators for Average TreatmentEffects in Stata,"TheStataJournal,4, 290-311.
ABADIE,A., ANDG. IMBENS
(2002): "Simpleand Bias-CorrectedMatchingEstimatorsfor Average TreatmentEffects,"TechnicalWorkingPaperT0283, NBER.
(2005): "On the Failure of the Bootstrap for Matching Estimators,"Mimeo, Kennedy
School of Government,HarvardUniversity.
B. S., G. G. CAIN,ANDA. S. GOLDBERGER
(1980): "Issuesin the Analysisof Selectivity
BARNOW,
Bias,"in EvaluationStudies,Vol. 5, ed. by E. Stromsdorferand G. Farkas.San Francisco:Sage,
43-59.
COCHRAN,
W., ANDD. RUBIN(1973): "ControllingBias in ObservationalStudies:A Review,"
Sankhya,35, 417-446.
DEHEJIA,R., ANDS. WAHBA(1999): "CausalEffects in NonexperimentalStudies: Reevaluating the Evaluationof TrainingPrograms,"Journalof the AmericanStatisticalAssociation, 94,
1053-1062.
HAHN,J. (1998): "On the Role of the PropensityScore in Efficient SemiparametricEstimation
of Average TreatmentEffects,"Econometrica,66, 315-331.
ANDP. TODD(1998): "Matchingas an Econometric Evaluation
J., H. ICHIMURA,
HECKMAN,
Estimator,"Reviewof Economic Studies,65, 261-294.
J., ANDR. ROBB(1984): 'AlternativeMethods for Evaluatingthe Impact of InterHECKMAN,
ventions," in LongitudinalAnalysisof Labor MarketData, ed. by J. Heckman and B. Singer.
Cambridge,U.K.: CambridgeUniversityPress, 156-245.
ANDG. RIDDER(2003): "EfficientEstimation of Average Treatment
HIRANO,K., G. IMBENS,
Effects Using the EstimatedPropensityScore,"Econometrica,71, 1161-1189.
G. (2004): "NonparametricEstimationof AverageTreatmentEffects underExogeneity:
IMBENS,
A Survey,"Reviewof Economicsand Statistics,86, 4-30.
JOHNSON,
N., S. KOTZ,ANDA. KEMP(1992): UnivariateDiscreteDistributions(Second Ed.). New
York:Wiley.
ANDS. NOKCHIU(2000): Spatial Tessellations:Concepts
OKABE,A., B. BOOTS,K. SUGIHARA,
and Applicationsof VoronoiDiagrams(Second Ed.). New York:Wiley.
OLVER,E W. J. (1997):Asymptoticsand SpecialFunctions(Second Ed.). New York:Academic
Press.
P. (1995): ObservationalStudies.New York:Springer-Verlag.
ROSENBAUM,
ROSENBAUM,
P.,ANDD. RUBIN(1983): "The CentralRole of the PropensityScore in Observational Studies for CausalEffects,"Biometrika,70, 41-55.
RUBIN,D. (1973): "Matchingto Remove Bias in ObservationalStudies,"Biometrics,29, 159-183.
to TreatmentGroup on the Basis of a Covariate,"Journalof Educa(1977): '"Assignment
tional Statistics,2, 1-26.
RUDIN,W. (1976):PrinciplesMathematicalAnalysis(ThirdEd.). New York:McGraw-Hill.
D. W.(1994):A ConciseIntroductionto the Theoryof Integration.Boston: Birkhiuser.
STROOCK,