Estimating the causal effect of fertility on economic wellbeing: data

Empir Econ (2013) 44:355–385
DOI 10.1007/s00181-010-0356-9
Estimating the causal effect of fertility on economic
wellbeing: data requirements, identifying assumptions
and estimation methods
Bruno Arpino · Arnstein Aassve
Received: 21 April 2008 / Accepted: 20 November 2009 / Published online: 14 March 2010
© Springer-Verlag 2010
Abstract This article aims to answer to what extent fertility has a causal effect on
households’ economic wellbeing—an issue that has received considerable interest in
development studies and policy analysis. However, only recently has this literature
begun to give importance to adequate modelling for estimation of causal effects. We
discuss several strategies for causal inference, stressing that their validity must be
judged on the assumptions we can plausibly formulate in a given application, which
in turn depends on the richness of available data. We contrast methods relying on
the unconfoundedness assumption, which include regressions and propensity score
matching, with instrumental variable methods. This discussion has a general importance, representing a set of guidelines that are useful for choosing an appropriate
strategy of analysis. The discussion is valid for both cross-sectional or panel data.
Keywords Fertility · Poverty · Causal inference · Unconfoundedness · Instrumental
variables · VLSMS
JEL Classification D19 · I32 · J13
1 Introduction
There is a strong positive correlation between poverty and family size in most developing countries (Schoumaker and Tabutin 1999). Not much is known however, about
the extent fertility has a causal impact on households’ wellbeing. Needless to say, the
issue is of critical importance for implementing sound policies. This article considers
B. Arpino (B) · A. Aassve
Department of Decision Sciences, DONDENA Centre for Research on Social Dynamics,
Bocconi University, Via Roentgen, 20136 Milan, Italy
e-mail: [email protected]
123
356
B. Arpino, A. Aassve
different strategies for establishing the causal effect of fertility on households’ wellbeing. We take a quasi-experimental approach where fertility is considered as a treatment
and the outcome is the equivalised household consumption expenditure. We adopt the
potential outcomes framework (Neyman 1923; Rubin 1974, 1978) where recorded
childbearing events are used as a measure of fertility. Consequently, each household
i has two potential outcomes: Yi1 if it experiences a childbearing event between two
points in time (treated) and Yi0 otherwise (untreated or control). However, childbearing
is, at least in part, down to individual choice, giving rise to self-selection: households
that choose to have more children (self-selected into the treatment) may be very different from households that choose to have fewer children irrespective of the treatment.
Hence, if we observe that the first group of households has on average lower per
capita expenditure, we cannot necessarily assert that this is due to fertility since the
two groups of households are likely to be different in respect to many other characteristics, such as education. Thus, a simple difference in the average consumption (or
income) for the two groups of households gives a biased estimate.
We discuss different strategies to deal with the self-selection problem, stressing
that their validity must be judged on the assumptions we can formulate in a given
application, which in turn depends on the richness of available data. A key distinction
is between those situations where we can assume that selection depends only on characteristics that are observed by the researcher (selection on observables) and those
situations where one or more of the relevant characteristics are unobserved (selection
on unobservables).
In the first case, we compare units of similar characteristics that differ only by the
treatment status. For these units the observed difference in the outcome can be reasonably assumed to be due to the treatment. Propensity score matching (PSM) relies on
the selection on observables assumption, which is referred as the unconfoundedness
assumption (UNC). Multiple regression is also a method relying on this assumption,
though the identifying assumption can be stated in a weaker way (see, e.g. Wooldridge
2002).
The empirical analyses use the Vietnam Living Standard measurement Survey, a
rich panel data set, which was first surveyed in 1992/1993 and with a follow up in
1997/1998. Exploiting the longitudinal structure of the data, we develop our estimators in a pre–post treatment setting. This has several advantages. First, covariates are
measured before the exposure to the treatment, which makes it more likely that covariates are not affected by the treatment (e.g. Rosenbaum 1984; Imbens 2004). A second
advantage is that the lagged value of the outcome variable, Yt1 , can be included in
the set of matching covariates—all of which being measured at the first wave. This is
important because the household’s level of living standard prior to treatment is relevant
both for the probability of experiencing a childbearing event between the two waves
and for the consumption expenditure levels at the second wave, Yt2 . Having information at two points in time, the dependent variable can be defined as the difference
between the levels of the outcome after and before the treatment. In particular, we
match individuals in the treatment group with individuals in the control group having
similar first-period values, and their changes in outcomes are compared. An advantage of taking the difference in the pre- and post-treatment outcomes is that this helps
removing residual imbalance in the average values of Yt1 between treated and control
123
Estimating the causal effect of fertility on economic wellbeing
357
group. Moreover, it is likely (in our application at least) that the variance is lower when
outcome is defined as a change, as opposed to when maintaining the level. Hence, the
resulting estimator will be more efficient than the one relying on levels. Importantly,
we stress the fact that specifying the outcome as a difference does not change the estimand. The interest remains on the effect of childbearing events between the two waves
on the consumption expenditure level at the second wave. Our approach is useful in
the sense that the general discussion of methods based on the assumption of selection
on observables compared to those based on unobservables, applies independently of
whether the application is based on longitudinal or cross-sectional data.
The standard solution to deal with selection on unobservables is to use an instrumental variable (IV) method, which of course relies on the availability of a good
instrument, which in our case should be a variable which influence fertility and has
no direct impact on consumption expenditures. However, even if such a variable is
available the estimator can be unsatisfactory. The reason is that, unless we are willing to impose very strong assumptions, IV estimates refer only to the unobserved
sub-sample of the population that reacts to the chosen instrument, i.e. the compliers
(Imbens and Angrist 1994; Angrist et al. 1996). The corresponding parameter estimate
is, consequently, the local average treatment effect (LATE) which, in the presence of
heterogeneous treatment effects, may be different from average treatment effect (ATE)
and the average treatment effect for the treated (ATT) that usually are the parameters
of interest. This is of course important for policy analysis, since only if the instrument
coincides with a variable of real policy relevance, can we also argue that the estimated LATE has direct policy usefulness (Heckman 1997). Moreover, the estimated
LATE based on different instruments are generally different because the identified
sub-populations of compliers are different.
We implement the IV approach demonstrating its benefits and drawbacks by using
two very different instruments. The first instrument is the sex composition of existing
children. This is a widely used instrument (see, e.g. Angrist and Evans 1998; Chun
and Oh 2002; Gupta and Dubey 2003) and is based on the fact that parents in Vietnam tend to have a strong preference for boys, especially in the North (Haughton
and Haughton 1995; Johansson 1996, 1998; Belanger 2002). Since the preference
for sons is a wide-spread phenomenon among Vietnamese households we expect the
proportion of compliers to be rather high. The second instrument is the availability of
contraception at the community level. This is similar to other well-used instruments
related to the availability of services in the neighbourhood or its distance from the
dwelling (examples include McClellan et al. (1994) who use proximity to cardiac
care centres or Card (1995) who uses college proximity). An interesting aspect of this
instrument is that it corresponds to a potential policy variable on which policy makers
can act to both reduce fertility and, through it, make an impact on poverty. However,
areas without availability of contraceptives in Vietnam are few (Nguyen-Dinh 1997;
Duy et al. 2001; Anh and Thang 2002), which means that it cannot be considered as
a general policy tool.
From a statistical point of view, a key difference between the two instruments is
that the second one cannot be considered as randomised, because availability of contraception is related to other characteristics of the community, which in turn may
influence households’ wellbeing. As a consequence, detailed control for covariates
123
358
B. Arpino, A. Aassve
is required, which is usually accomplished by imposing functional form and additive
separability in the error term. However, these and other strong assumptions can be
avoided if implementing a non-parametric approach, such as the one suggested by
Frölich (2007).
Whereas we have in our application access to valid instruments, this is not always
so. In those situations, IV estimators cannot be used and it becomes important to
implement sensitivity analysis for estimators based on selection on observables. So
far, this is not very common in the applied literature, but is a critical tool as a means to
assess the credibility of the identifying assumption. The key idea of the approach is to
evaluate how strong the associations among an unmeasured variable, the treatment and
outcome variables must be in order to undermine the results of the analysis based on
the UNC. If the results are highly sensitive, the validity of the identifying assumption
becomes questionable. Among the different approaches for sensitivity analysis proposed in the literature, we discuss and apply those suggested by Rosenbaum (1987b)
and Ichino et al. (2008). The article is organised as follows. Section 2 reviews the statistical issues, Sect. 3 provides background information about the application, Sect. 4
shows the results, and Sect. 5 concludes.
2 Causal inference in observational studies under the potential outcomes
approach
The potential outcomes approach was introduced by Neyman (1923) and extended
by Rubin (1974) to observational studies. We invoke the stable unit treatment value
assumption (SUTVA) (Rubin 1980), which states that the potential outcomes for each
unit are not affected by the treatments assigned to any other units and that there are no
hidden versions of the treatment. Potential outcomes are denoted by Y1 , to indicate the
outcome that would have resulted if the unit was exposed to the treatment and Y0 if it
was exposed to the control (Rosenbaum and Rubin 1983a). Since each unit receives
only the treatment or control, either Y1 or Y0 is observed for each unit.
Assume that we have a random sample of N individual units under study
N
{ di , yi , xi } i=1
. D represents the treatment indicator that takes the value 1 for treated
units and 0 for untreated or the controls, Y indicates the observed outcome, and X
indicates the set of covariates or confounders.1
The two causal parameters usually of interest are the ATE and the ATT which are
defined as:
ATE = E (Y1 − Y0 )
ATT = E (Y1 − Y0 |D = 1) .
(1)
(2)
The ATE is the expected effect of the treatment on a randomly drawn unit from the
population while the ATT gives the expected effect of the treatment on a randomly
1 As a convention, capital letters usually denote random variables, whereas small letters indicate their realisations. For simplicity, population units are usually not indexed by unit indicators unless this is necessary
for clarity.
123
Estimating the causal effect of fertility on economic wellbeing
359
drawn unit from the population of treated. It is consequently the parameter that tends
to be of interest to policy makers (Heckman et al. 1997).
2.1 Identifying assumptions
Those situations where selection depends only on observed characteristics represent a
critical distinction from the case where selection also depends on unobserved characteristics. The selection on observables assumption is also known as the UNC and
represents the fundamental identifying assumption for a large range of empirical
studies:2
Assumption A.1 (Unconfoundedness)
Y1 , Y0 ⊥ D|X
where ⊥ in the notation introduced by Dawid (1979) indicates independence. Assumption A.1 implies that after conditioning on variables influencing both the selection and
the outcome, the dependence among potential outcomes and the treatment is cancelled
out. Regression and matching techniques, as well as stratification and weighting methods, all rely on this assumption. In the regression analysis, it suffice to assume that
conditional independence of potential outcomes on the treatment hold in expected
values (see, e.g. Wooldridge 2002, p. 607). That is, we can substitute assumption A.1
with the weaker: E(Y1 |D, X ) = E(Y1 |X ) and E(Y0 |D, X ) = E(Y0 |X ). The fundamental idea behind these methods is to compare treated units with control units that are
similar in their characteristics. Another assumption, termed overlap, is also required.
Assumption A.2 (Overlap)
0 < P(D = 1|X ) < 1.
where P(D = 1|X ) is the conditional probability of receiving the treatment given
covariates, X. Assumption A.2 implies equality in the support of X in the two groups
of treated and controls (i.e. Support (X |D = 1) = Support(X |D = 0)) which guaranties that ATE is well defined (Heckman et al. 1997). If the assumption does not hold,
then it is possible that for some values of the covariates there are no comparable units.
The most common approach to deal with selection on unobservables is to exploit the
availability of an IV—a variable assumed to impact the selection into treatment but to
have no direct influence on the outcome. The concrete possibility to use an IV method
relies, of course, on the availability of such a variable. In practice, instruments are
often difficult to find. In this case, a sensitivity analysis becomes very useful because
it can be used to assess the importance of the violation of the UNC for the estimated
causal effect. Of course, it does not represent an alternative to the IV approach.
2 The unconfoundedness assumption is sometimes referred to as the conditional independence or the
exogeneity assumption (Imbens 2004).
123
360
B. Arpino, A. Aassve
As with methods based on the UNC also IV methods impose a range of critical
identifying assumptions. Let us consider a binary instrument indicated by Z. In randomised settings, the levels of the instruments can be seen as the assignment to the
treatment, which is different from the treatment actually taken due to non-compliance.
Under SUTVA, we indicate with Dz and Yz,d , respectively, the binary potential treatment indicator and the potential outcomes for unit i. The identifying assumptions for
the estimation of causal effects using the availability of one IV are clarified by Angrist
et al. (1996, in the following AIR). Apart from the SUTVA and the randomisation of
the instrument, the fundamental assumptions are:
Assumption B.1 (Exclusion Restriction)
Yz,d = Yd
Assumption B.2 (Nonzero Average Causal Effect of Z on D)
E[D1 − D0 ] = 0
Assumption B.3 (Monotonicity)
Di1 ≥ Di0 for all i = 1, . . ., N .
The exclusion restriction means that the instrument Z impacts on Y only through D
and corresponds to validity of the instrument. Assumption B.2 requires that for at least
some unit the instrument changes the treatment status and corresponds to the hypothesis of nonzero correlation between the instrument and the endogenous variable (i.e.
relevance of the instrument).
The assumption of monotonicity is critical when comparing the IV approach to
methods based on the UNC. To see how, we have to characterise units by the way they
might react to the level of the instrument. A first group is termed compliers and defined
by units that are induced to take the treatment by the instrument: Di1 − Di0 = 1. Other
units may not be influenced by the instrument and are defined as either always-takers,
where Di1 = Di0 = 1 (they always take the treatment whatever being the level of the
instrument), or never-takers, if Di1 = Di0 = 0 (they always take the control). Finally,
we might encounter defiers, who are units that do the opposite of their assignment
status. The monotonicity assumption implies that there are no defiers and is crucial for
identification since otherwise the treatment effect for those who shift from non-participation to participation when Z shift from 0 to 1 can be cancelled out by the treatment
effect of those who shift from participation to non-participation (Imbens and Angrist
1994). Importantly, the monotonicity assumption, likewise the exclusion restriction
and the UNC, is untestable and its plausibility has to be evaluated in the context of the
given application.
AIR demonstrate that under the aforementioned assumptions we can only identify the average causal effect calculated on the sub-population of compliers, which is
termed the LATE:
LATE = E [Y1 − Y0 |D1 − D0 = 1] .
123
(3)
Estimating the causal effect of fertility on economic wellbeing
361
Critically important for empirical work is that in case of heterogeneous treatment
effects LATE is in general different from the ATE and the ATT, which tend to be
the parameters of interest. This is because LATE refers only to the sub-population of
compliers, while ATE and ATT are defined, respectively, on the whole population and
on the sub-population of treated. Moreover, a serious drawback of the LATE is that
the sub-population of compliers is not identifiable by the data. Finally, the estimated
LATE depends on the instrument used because different instruments identify different
sub-population of compliers.
In specific applications LATE becomes an interesting parameter for policy. Suppose that the policy maker wants to know the (average) causal effect of D on Y when
we obtain a change in D by manipulating it through Z. In this case, the interest lies in
the (average) causal effect of D on Y for units that react to the policy intervention on
Z (the compliers). In this situation, however, the policy maker cannot identify which
are the compliers, but can only estimate the dimension of this group. The presumption in such cases is that the average causal effect calculated on the sub-population of
units whose behaviour was modified by assignment is likely to be informative about
sub-populations that will comply in the future.
2.2 Strategies for the estimation of causal effects
We discuss here three strategies for the estimation of causal effects. The first is based
on assumptions A.1 and A.2, and includes regression and PSM. The second strategy
consists of combining these methods with a sensitivity analysis. In essence, the sensitivity analysis assesses the robustness of estimates when we suspect failure of the
UNC assumption. The third method is the IV approach.
Rather than discussing the technical details of each estimator in depth we present instead the general ideas and limitations of the different techniques. For a formal
comparison of these methods, we refer to Blundell et al. (2005) and Imbens (2004).
2.2.1 Strategy 1: methods based on the UNC
In the standard multivariate regression model, we assume a linear relationship between
outcome and independent variables and homogeneity of treatment effects; in fact, in
the simplest regression model the treatment variable is not interacted with covariates
and its coefficient is the same for all units. This model constrains the ATE to coincide
with the ATT and if treatment effects are heterogeneous we are not able to make separate estimates of the two quantities.3 Moreover, if the true model is nonlinear, the OLS
estimates of the treatment effects would be in general biased. In parametric regression,
the overlap assumption is not required in so far we can be sure to have the correct specification of the model. Otherwise the comparison of treated and control units outside
the common support rely heavily on the linear extrapolation. Of course, the standard
model can be extended and made flexible to overcome these limitations. For example,
3 In general, ATE and ATT are expected to differ if the distribution of covariates in the treated and control
group are different and if the treatment interacts with covariates (at least some of them).
123
362
B. Arpino, A. Aassve
the common support problem can be circumvented by first estimating it and running
the regression conditioning on it. Moreover, we can avoid to assume homogeneous
treatment effects by including a complete set of interactions between each one of the
covariate X and the treatment indicator D. This gives rise to the so-called fully interacted linear model (FILM in the following—see Goodman and Sianesi 2005). Since
in the FILM, differently from a fully saturated model, covariates are not recoded into
qualitative variables, this approach is still parametric with respect to the way continuous covariates enter the regression function and interact with the treatment. Also,
the linearity assumption can be avoided if we use a non-parametric method, such as
a kernel estimator (see Hardle and Linton 1994), which allows the functional form
between outcome and independent variables to be determined by the data themselves.
Non-parametric methods, however, have computational drawbacks when the set of
covariates is large and many of them are multi-valued, or, worse, continuous. This
problem, known as curse of dimensionality, is also relevant for matching methods. A
popular way to overcome the dimensionality problem is to implement the matching
on the basis of a univariate propensity score (Rosenbaum and Rubin 1983a). This
is defined as the conditional probability of receiving a treatment given pre-treatment
characteristics: e(X ) ≡ Pr {D = 1|X } = E{D|X }. When the propensity scores are
balanced across the treatment and control groups, the distribution of all covariates X,
are balanced in expectation across the two groups (balancing property of the propensity score). Therefore, matching on the propensity score is equivalent of matching on
X. Once the propensity score is estimated, several methods of matching are available.
The most common ones are kernel (gaussian and epanechnikov), nearest neighbour,
radius and stratification matching (for a discussion about these methods see Caliendo
and Kopeinig 2005; Smith and Todd 2005; Becker and Ichino 2002).
Asymptotically, all PSM estimators should yield the same results (Smith 2000),
while in small samples the choice of the matching algorithm can be important and
generally a trade-off between bias and variance arises (Caliendo and Kopeinig 2005).
As noted by Bryson et al. (2002) it is sensible to try a number of approaches. If they
give similar results, the choice is irrelevant. Otherwise, further investigation is needed
in order to reveal the source of the disparity. As will be explained in Sect. 4, we
adopt this pragmatic approach and assess the sensitivity of results with respect to the
matching method. Consistent with many other previous studies (see, e.g. Smith and
Todd 2005), the different estimators yield very similar results (both in terms of point
estimate and standard errors).
The analysis in Sect. 4 is based on a nearest neighbour matching method meaning
that for each treated (control) unit the algorithm finds the control (treated) unit with the
closest propensity score. We use the variant with replacement implying that we allow
a control (treated) individual to be used more than once as a match for individuals in
the treated (control) sample. Among the other methods we tried (nearest neighbour
without replacement, k-nearest neighbour, radius and kernel) this approach guarantees
the best quality of matches, because only units with the closest propensity score are
matched, but at the cost of higher variance (Caliendo and Kopeinig 2005).
Focussing on the estimation of the ATT, to estimate the treatment effect for a treated
person i, the observed outcome yi1 is compared to the outcomes y j0 for the matched
unit j in the untreated sample. The ATT estimator can be written as:
123
Estimating the causal effect of fertility on economic wellbeing
ˆ = 1
yi1 − ym(i)0 ,
ATT
nD
363
(4)
i:di =1
where n D is the number of treated that find a match in the untreated group and m(i)
indicates the matched control for treated unit i.
Under assumptions A.1 and A.2, regression and matching techniques can be used
with cross-sectional data to estimate ATE and ATT, in which case Y, X, D are all
measured at the same time. Longitudinal data available for at least two time points
offers some important practical advantages. First, one is in a better position to measure
covariates before the exposure to treatment. As is well known, one should only control
for those covariates not being affected by the treatment itself (e.g. Rosenbaum 1984).
Hence, being able to measure variables before the treatment makes this condition more
likely to hold (Imbens 2004). To make this explicit we indicate covariates as X t1 , while
the outcome as Yt2 . The treatment indicator, D, measures childbearing events between
t1 and t2 and the ATT estimator can be written as:
ˆ = 1
ATT
yi1 t2 − ym(i)0t2
nD
i:di =1
A second advantage is that we can include in the matching set the outcome variable
of interest measured before the exposure to treatment. In our application, where the
outcome is the consumption expenditure (see Sect. 4), we include this variable, Yt1 ,
in the conditioning set measured at the first wave. This reflects the households’ level
of living standards prior to treatment, and is likely to be of relevance both for the
probability to experience a childbearing event between the two waves and for the
consumption expenditure levels at the second wave, Yt2 . The UNC assumption can be
more explicitly written as:
Assumption A.3 (Unconfoundedness)
Y1t2 , Y0t2 ⊥ D|X t1 , Yt1
As noted by Athey and Imbens (2006), assumption A.3 implies that individuals in
the treatment group should be matched with individuals in the control group with
similar (identical if the matching could be perfect) first-period outcome, as well as
other pre-treatment characteristics, and their second-period outcomes should be compared. However, perfect matching is not feasible and matching on the propensity
score guarantees that, on average, covariates (including Yt1 in our case) are balanced
in the matched treated and control group. Importantly, taking the difference in the
pre- and post-treatment outcomes helps in reducing any remaining unbalance in Yt1 .
This approach is similar in spirit to the bias-correction proposal of Abadie and Imbens
(2002) to reduce bias due to residual imbalance in covariates after matching. The fact
that the dependent variable is now defined as the difference in the levels of the outcome
after and before the treatment implies that the ATT estimator can be written as:
123
364
B. Arpino, A. Aassve
ˆ = 1
yi D,t2 − yi D,t1 − ym(i)U,t2 − ym(i)U,t1 ,
ATT
nD
(5)
i:di =1
where the subscripts D and U make explicit that the two first outcomes in (5) are measured on treated units and the other two on untreated units. From formula (5) we can
see that if the matching is exact on the variable Yt1 then the estimate obtained using the
difference as outcome (5) is exactly equal to that in (4). However, even if the matching
is not exact but the PSM works well (i.e. we succeed in balancing Yt1 ) then the two
estimators are expected to give similar results. It is worth noting that despite the fact
that estimator (5) used as dependent variable the change instead of levels at time 2,
the estimands of interest, namely, ATE and ATT—as defined in (1) and (2)—are the
same. For example, for the ATT we can note that: ATT = E(Y1t2 − Y0t2 |D = 1) =
E[(Y1t2 − Yt1 ) − (Y0t2 − Yt1 )|D = 1]. Another advantage from taking the difference
is that we expect the resulting estimator to be more efficient. This is likely to be the
case in our application (and we suspect in many other applications), since there will be
more heterogeneity in the levels of the consumption expenditure at time t2 compared
to the consumption growth between the two waves. In other words, the variable Yt2 is
likely to have a higher variance than (Yt2 − Yt1 ) although this is not true in general.
A related literature motivates the advantages of considering the difference in the
pre–post levels of the outcome as a way to improve the robustness of the matching
method through elimination of possible time-invariant unobservables (Heckman et al.
1997; Smith and Todd 2005; Aassve et al. 2007). The resulting estimator is similar to
ours, apart from the fact that Yt1 is not included in the set of matching covariates. The
estimator is labelled as matching-difference-in-difference (MDID) and relies on an
identifying assumption that is different from A.3. For example, for the ATT the identifying assumption can be written as4 : (Y0t2 − Y0t1 )⊥D|X t1 . As noted by Athey and
Imbens (2006, p. 448) the two assumptions coincide under special conditions imposed
on the unobserved components. Otherwise, the two identifying strategy, even though
similar, are different and the A.3 remains a selection on observables assumption. The
choice is subject matter and depends on what the researcher believes is the best identifying strategy for his/her application. We use A.3 as a starting point and compare
treated and control with similar background characteristics X and initial values of the
outcome instead of relying on assumptions of conditional parallel trend in the outcome as with the MDID. Having maintained an unconfoundedness-type assumption,
the discussion in this section applies also to cross-sectional studies. To deal with the
possible presence of unobservables we discuss methods for sensitivity analysis and
IV methods.
2.2.2 Strategy 2: sensitivity analysis of methods based on the UNC
The UNC becomes implausible once one or more relevant confounders are unobserved. If an instrument is available then one can proceeds with an IV estimator that
we discuss in the next sub-section. Several approaches are proposed in the literature
4 If only ATE are to be identified, the assumption can be stated in a weaker form as mean independence
instead of full independence (e.g. Heckman et al. 1997).
123
Estimating the causal effect of fertility on economic wellbeing
365
to deal with situations where instruments are not available and where the plausibility
of the unconfoundedness is doubtful.
One approach is to implement indirect test of the UNC assumption, relying on the
estimation of a ‘pseudo’ causal effect that is known to be zero (Imbens 2004). A first
type is to focus on estimating the causal effect of the treatment of interest on a variable that is known to be unaffected by it. Another type of tests relies on the presence
of multiple control groups (Rosenbaum 1987a; Heckman et al. 1997) that arise, for
example, when rules for eligibility are in place. The presence of ineligibility rules is
also the basis for the bias-correction method proposed by Costa Dias et al. (2008).
An important alternative to the indirect tests is the implementation of sensitivity
analyses. The fundamental idea of this approach is to relax the unconfoundedness
with the aim to assess how strong an unmeasured variable must be in order to undermine the implications of the matching analysis. If the results are highly sensitive,
then the validity of the identifying assumption becomes questionable and alternative
estimation strategies must be considered. Different approaches for sensitivity analysis have been proposed in the literature. Rosenbaum and Rubin (1983b) and Imbens
(2003) propose methods to assess the sensitivity of ATE estimates in parametric regression models. Here, we apply the approaches suggested by Rosenbaum (1987b) and
Ichino et al. (2008, in the following IMN) that does not rely on any parametric models for the estimation of the treatment effects. The underlying hypothesis in all of
these approaches is that assignment to treatment may be confounded given the set
of observed covariates but it is unconfounded given observed and an unobservable
covariate, U: Y1 , Y0 ⊥ D|X, U.
In the Rosenbaum’s approach, sensitivity is measured using only the relation between the unobserved covariate and the treatment assignment. To briefly describe the
Rosenbaum approach, we link the probability that to receives the treatment, π , to
observed characteristics, X, and an unobserved covariate, U, with a logistic regression
function:
π
= κ (X ) + γ U ; with 0 ≤ U ≤ 1.
log
1−π
Under these assumptions, Rosenbaum shows that the odds ratios between two units i
and j with the same X values can be bounded in the following way:
1
πi /(1 − πi )
≤ ,
≤
πj/ 1 − πj
where = eγ .
If = 1 this means that unconfoundedness holds and that no hidden bias exists.
Increasing values of imply an increasingly important role for unobservables on
the selection into treatment. Rosenbaum suggests to progressively increase the values
of in order to assess the association required to overturn, or change substantially,
p-values of statistical tests of no effect of the treatment. If this happens at high values
of this means that the results of the analysis based on the UNC are sensitive to
the presence of an unobservable only if this was strongly associated with treatment
123
366
B. Arpino, A. Aassve
selection. The plausibility of the presence of such an unobservable has to be judged
by the research, depending on the richness of information included in the analysis.
Unlike Rosenbaum, the approach by IMN assesses the sensitivity of point estimates
of the ATT under different possible scenarios of deviation from the UNC.5 The underlying hypothesis is, as in the previous approaches, that assignment to treatment may
be confounded given the set of observables covariates but it is unconfounded given
observed and an unobservable covariate, U. The procedure can be summarised in the
following steps:
(1)
(2)
(3)
(4)
Calculate ATT using PSM on X;
Simulate a variable U representing a potential unobserved confounder;
Include U together with X in the matching set and calculate ATT;
Repeat steps 2 and 3 several times (e.g. 1,000) and calculate average ATT to be
compared with the baseline estimate obtained in (1) under UNC.
In the simulation process, IMN assume that U and the outcome are binary variables.
In case of continuous outcomes, as in our application, a transformation is needed so
that the outcome takes the value 1 if it is above a certain threshold (the median for
example) and 0 otherwise, alternatively one could consider other outcome variables
such as poverty status which essentially is a dichotomous transformation of consumption expenditure.6 However, this transformed variable is only required to simulate the
values of U (step 2) and it is not used as the outcome variable when estimating the ATT
(step 3). Since all the involved variables in the simulation are binary, the distribution
of U is specified by the four key parameters:
pkw = P (U = 1|D = k, Y = w) = P(U = 1|D = k, Y = w, X ) k, w = 0, 1 (6)
It is assumed here that U is independent to X conditional to D and Y. In order to choose
the signs of the associations between U, Y0 and D, IMN note that if q = p01 − p00 > 0
then U has a positive effect on Y0 (conditioning on X), whereas if s = p1 − p0 > 0,
where pk = P(U = 1|D = k), then U has a positive effect on D. If we set pu =
P(U = 1) and q = p11 − p10 the four parameters pkw are univocally identified
from specifying the values of q and s. Hence, by changing the values of q and s we
can produce different scenarios for U. For example, if we want to mimic the effect of
unobserved ability we can set q to a positive value (positive effect on consumption)
and s to a negative value (negative effect on fertility). It is important to note that with
this approach we can only choose the signs of the associations of U with D and Y0
according to the values of q and s. However, for increasingly higher absolute values
of q and s the strength of the associations increases. Therefore, the idea is to use this
sensitivity analysis as in the Rosenbaum approach. The difference is now that, by progressively increase the values of both q and s, we can increase the levels of association
between U and treatment and outcome instead of treatment only. In order to have an
5 Under the assumption of an additive treatment effect, Rosenbaum also derives bounds on the Hodges–
Lehmann point estimate of the treatment effect (see Rosenbaum 2002 for details).
6 For more details on the simulations, see Ichino et al. (2008) and Nannicini (2007) for details on the
STATA module sensatt which implements this method.
123
Estimating the causal effect of fertility on economic wellbeing
367
easily interpretable measure of these associations, IMN propose to use the following
parameters:
rep
1 Pr (Y = 1|D = 0, Ur = 1, X )/Pr (Y = 0|D = 0, Ur = 1, X )
=
rep Pr (Y = 1|D = 0, Ur = 0, X )/Pr (Y = 0|D = 0, Ur = 0, X )
r =1
and
=
rep
1 Pr (D = 1|Ur = 1, X )/Pr (D = 0|Ur = 1, X )
rep Pr (D = 1|Ur = 0, X )/Pr (D = 0|Ur = 0, X )
r =1
where rep indicates the number of replications.
The parameter is the average odds ratio from the logit model of P(Y = 1|D =
0, U, X ) calculated over several replications of the simulation procedure. It is in other
words a measure of the effect of U on Y, and is in this sense an outcome effect. The
parameter refers to the average odds ratio from the logit model of P(D = 1|U, X ).
This is a measure of the effect of U on D, and is therefore a measure of the selection
effect. At each replication of the simulation exercise, together with the two mentioned
odd ratios, the ATT is estimated using as covariates the set X and the simulated U.
The final simulated ATT estimate is the average of the estimates obtained in all the
replications.7
2.2.3 Strategy 3: IV methods
When UNC is implausible and an instrument is available one would naturally implement IV methods. The way they are implemented depends on whether the available
instrument can be thought of as randomised or not. In the previous discussion, we
assumed that the instrument is randomised, which means that there is no need to control for covariates. In this case, AIR shows that LATE can be simply estimated by the
Wald estimator. However, in many applications Z is not randomly assigned and can be
confounded with D or with Y or both. The implication is that in this contexts usually
the IV assumptions, as the exclusion restriction, can be thought as being reliable only
conditional on a set of covariates. In other words, in these situations Z can be considered unconfounded only conditional on covariates. The conventional approach to
accommodate covariates in IV estimation consists of parametric or semi-parametric
methods—two stages least squares being the most common—and classic examples
include Card (1995) and Angrist and Krueger (1991). A serious drawback of these
methods is that most of them impose additive separability in the error term, which
amounts to rule out unobserved heterogeneity in the treatment effects. One approach
that overcomes the strong assumptions used by the aforementioned IV methods is the
7 A complementary approach proposed by Manski (1990) consists to drop the UNC assumption entirely
and construct bounds for ATT that rely on alternative identifying assumptions, for example that outcome
is bounded. IMN show how this approach is related with their sensitivity analysis and argue that non-parametric bounds are too much a conservative method and bounds calculations rely on extreme circumstances
that are implausible. Moreover in our application the outcome is continuous and has no natural bounds.
123
368
B. Arpino, A. Aassve
non-parametric approach suggested by Frölich (2007). The identifying assumptions
in this case are basically the same as is the case of a randomised instrument but stated
in terms of conditioning on covariates. In this way, we can identify the conditional
LATE, which is the LATE defined for units with specific observed characteristics. The
marginal LATE is identified as follows:8
(E[Y |X, Z = 1] − E[Y |X, Z = 0]) dFX
LATE = .
(7)
(E[W |X, Z = 1] − E[W |X, Z = 0]) dFX
When the number of covariates included in the set X is high, non-parametric estimation of equation (7) becomes difficult, especially in small samples. An alternative is
to make use of the aforementioned balancing property of the propensity score that
allows us to substitute the high dimensional set X in (7) by a univariate variable:
π = P(Z = 1|X ).
3 Fertility and economic wellbeing and the Vietnamese context
Our application is concerned with estimating the causal effect of fertility on economic
wellbeing. The interrelationship between the two has received considerable interest
in development studies and the economics literature. The traditional micro-economic
framework considers children as an essential part of the household’s work force as
they generate income. This is especially true for male children. In rural underdeveloped regions of the world, which rely largely on a low level of farming technology
and where households have no or little access to state benefits, this argument makes
a great deal of sense (Admassie 2002). In this setting households will have a high
demand for children. The down side is that a large number of children participating
in household production hamper investment in human capital (Moav 2005). There are
of course important supply side considerations in this regard: rural areas in developing countries have poor access to both education and contraceptives, both limiting
the extent couples are able to make choices about fertility outcomes (Easterlin and
Crimmins 1985). As households attain higher levels of income and wealth, they also
have fewer children, either due to a quantity–quality trade-off, as suggested by Becker
and Lewis (1973), or due to an increase in the opportunity cost of women earning
a higher income, as suggested by Willis (1973). An important aspect with regard to
Vietnam is that the country has experienced a tremendous decline in fertility over the
past two decades, and at present one can safely claim that the country has completed
the fertility transition. The figures speak for themselves: in 1980 the total fertility
rate (TFR) was 5.0, in 2003 it was 1.9. Contraceptive availability and knowledge is
widespread and family planning programs were initiated already in 1960s (Scornet
2007).9
8 It is important to note that a common support assumption is needed, as stated by Frölich: Supp(X/Z=1) =
Supp(X/Z=0). However, here we give only some intuitions about the assumptions underlying this method.
For a detailed and more formal discussion we refer to Frölich’s paper.
9 An important factor in this change was the introduction of the “Doi Moi” (renewal) policy in the late
eighties which consisted of replacement of collective farms by allocation of land to individual households;
123
Estimating the causal effect of fertility on economic wellbeing
369
In light of our technical discussion in Sect. 2, the key issue in this application
is that fertility decisions can be driven by both observed and unobserved selection.
In terms of observables, predicting their effects is relatively straightforward within
an economics framework. The key is to understand the drivers behind women’s perceived opportunity cost of childbearing. Higher education and labour force participation among women increase women’s opportunity cost, producing a negative effect
on fertility. It will also increase their income level and hence consumption expenditure. Typically, any increase in the opportunity costs dominates the positive income
effect. Increased education among men, and therefore higher earnings, translate into a
positive income effect, and hence having a positive effect on fertility (Ermisch 1989).
However, empirical analysis shows that there is not necessarily a positive relationship between income and family size (i.e. number of children), the key explanation
being that couples make trade-offs between quantity and quality (Becker and Lewis
1973), especially as the country in question develops and pass through the fertility
transition.
As for the unobservables these can operate through different mechanisms. The key
unobserved variables are ability and aspirations and they play an important role in our
application. In general, we would expect those with higher ability or aspirations in
terms of work and career to have lower fertility because of their higher opportunity
cost. Thus, ability is negatively correlated with fertility but is positively associated
with consumption expenditure. Moreover, fertility is commonly measured in terms of
childbearing events—as we do here. However, the childbearing outcomes are the direct
result of contraceptive practices, which are typically unmeasured in household surveys.
Better knowledge and higher uptake of contraceptives reduces unwanted pregnancies,
which would reduce fertility. However, unobserved ability is positively associated with
contraceptive use, which reinforces the negative effect of ability on fertility. Fertility
is of course based on the joint decision of a couple, and not the woman alone. Hence,
behind the childbearing outcomes, there is also a bargaining process taking place.
Again, unobserved ability may play an important role. High ability women, may have
stronger bargaining power, either as a result of the ability itself (e.g. they are better
negotiators), or through the effect higher ability has on their labour supply and hence
earnings. Whereas ability works through different mechanisms, the prediction of its
effect is rather clear in the sense that high ability is associated with lower fertility,
but higher income and hence consumption expenditure. Consequently, its omission
implies a negative bias in the estimation of the effect of fertility on consumption
expenditure.
The data we use comes from the Vietnam living standard measurement survey
(VLSMS) first surveyed in 1992/1993 with a follow-up in 1997/1998. The longitudinal nature of the data set allows us to measure if any women in the household experienced another birth between the two waves. The treatment is then defined as a binary
Footnote 9 continued
legalisation of many forms of private economic activity; removal of price controls; and legalisation and
encouragement of Foreign Development Investment (FDI). Since the introduction of Doi-Moi, the country
embarked on a remarkable economic recovery, followed by a substantial poverty reduction (Glewwe et al.
2002).
123
370
B. Arpino, A. Aassve
Table 1 Average equivalised household consumption expenditure at the two waves and its growth by
number of children born between the two waves
Number of
children born
between the
two waves
Observations
Average
consumption
in 1992
Average
consumption
in 1997
0
1,232
970
2,436
1,466
1
581
856
1,892
1,036
2
182
790
1,755
965
3
28
571
1,154
583
791
832
1,835
1,004
2,023
916
2,201
1,285
At least 1
Total
Average
consumption
growth
in 1997–1992
Notes: We consider the number of children of all household members born between the two waves and still
alive at the second wave. All consumption measures are valued in dongs and rescaled using prices in 1992.
The 2,023 households represented in the table are selected taking only households with at least one married
woman aged between 15 and 40 in the first wave. Consumption is expressed in thousands of dongs
variable taking value 1 if the household experiences a childbearing event between the
two waves (treated) and 0 otherwise (untreated or control). The outcome of interest
is the equivalised consumption expenditure level in the second wave. In the empirical
implementation presented in the next section, we control for a range of explanatory
variables measured in the first wave. The data follows otherwise the standard format
of the World Bank LSMS, including detailed information about education, employment, fertility, expenditure and incomes. The survey also provides detailed community
information from a separate community questionnaire. This information is available
for the 120 rural communities sampled and consists of data on health, schooling and
main economic activities. The availability of this information is important for two
reasons. First, characteristics of communities where households reside are likely to
influence both economic wellbeing and fertility and, hence, are potentially relevant
confounders. Second, from this information we get an interesting IV, represented by
the availability of contraceptives in the community.
The conventional approximation for the household’s welfare is to use the household’s observed consumption expenditure, which requires detailed information on
consumption behaviour and its expenditure pattern (Coudouel et al. 2002; Deaton and
Zaidi 2002). The expenditure variables are calculated by the World Bank procedure
which is readily available with the VLSMS. We choose a relatively simple equivalence scale giving to each child aged 0–14 in the household a weight of 0.65 relative
to adults.10 Table 1 shows simple descriptive analysis highlighting a clear negative
association between number of children and economic wellbeing.
Our choice of covariates is based mainly on dimensions which are important for
both household’s standard of living and fertility behaviour and hence are potentially
confounders that have to be included in the conditioning set X to make the UNC plausible. All these variables can theoretically have an impact on change in consumption
10 We assessed the robustness of results to the imposed equivalence scale. Results are consistent to those
presented here for reasonable equivalence scales. This analysis is available from authors upon request.
123
Estimating the causal effect of fertility on economic wellbeing
371
expenditure and on the decision to have children. Many of these variables are defined
in terms of household ratios. That is, we include the number of household members
that are engaged in gainful employment as a ratio of the total number of household
members. We also include demographic characteristics of the household such as the
sex and the age of the household head, the household size and the presence of existing
children. The effect of children is further distinguished by their age distribution, and
again expressed as a ratio of the total number of household members. Other covariates include the ratio of male and female members aged 15–45, the ratio of male
and female working members aged 15–45 out of the respective groups, an education
index, the level of equivalised consumption at the first wave and regional dummies.
We also use two binary variables indicating, respectively, if the household is mainly
engaged in farming or not and if the household head belongs to the majority ethnic
group (the Kinh) or not. Importantly, we include also community information through
three indexes: (1) an index of economic development, (2) health facilities and (3)
educational infrastructures.
4 Estimating causal effects of fertility on economic wellbeing
4.1 Regression and propensity score results
Here, we present the results of the estimation of the causal effect of childbearing on
consumption expenditures by using multiple regression and the PSM method. Our
sample is restricted to households where in the first wave consisted of at least one
married woman aged between 15 and 40 years.11 The selection is important since it
avoids units who are in effect incapable of childbearing.
As anticipated in Sect. 2, since in general it is not clear which is the best matching technique to use, we compare different methods including nearest neighbour
with/without replacement, k-nearest neighbour, radius and kernels. Since the estimates (ATE and ATT) and standard errors are found to be stable across the different
approaches, the choice of the matching method is not critical in this application. We
present results based on the nearest neighbour method with replacement as implemented by the nnmatch module in STATA (Abadie et al. 2004).12 This approach
11 This sample selection criterion is part of the matching strategy since we avoid comparing households
having a child with households who were essentially out of the risk set (here because there are no women of
fecund age in the household). Obviously different selection strategies are possible. However, this selection
criterion gives low attrition with respect to households having additional children. Moreover, we tried the
following alternative selection criterion: (1) select households with at least one married woman aged 15–35
in the first wave; (2) select households where the head or its spouse is a married woman aged 15–40 in the
first wave and (3) select households where the head or its spouse is a married woman aged 15–35 in the
first wave. However, results are very similar to those presented here.
12 This software implements the estimators suggested by Abadie and Imbens (2002) and enables us to
obtain analytical standard errors which are robust to potential heteroschedasticity. We prefer analytical
standard errors to bootstrapped ones since Abadie and Imbens (2004) show that bootstrap fails with nearest
neighbour matching.
123
372
B. Arpino, A. Aassve
Table 2 Estimates from methods based on the unconfoundedness assumption (robust standard error in
parentheses)
Regressions
Propensity score matching
(nearest neighbour)
Simple (ATE = ATT)
Multiple with
no interactions
(ATE = ATT)
FILM (conditioned on CS)
ATE
ATT
ATE
ATT
−462 (56)
−414 (62)
−421 (60)
−432 (59)
−411 (87)
−356 (116)
Notes CS common support, FILM fully interacted linear model. Figures are in thousands of dongs. Standard
errors for regressions are robust to heteroschedasticity and correlation within communities. PSM standard
errors are robust to heteroschedasticity
achieves a good balancing in all the pre-treatment covariates, as measured by the
absolute standardised bias calculated after matching.13
The results are presented in Table 2. We also report the results of the estimation of
a simple regression of Y on D without any covariates. This estimate can be obtained
also from Table 1 as the difference in average consumption growth for households
having at least one birth and the remaining.
This approach would be acceptable under the randomisation of D. However, it is
clear that selection is present and the estimate of fertility on expenditure is reduced
by around 10% in the multiple regression.
As discussed in Sect. 2, multiple regression and PSM rely on the assumption that
the treatment can be thought of as randomised after having controlled for covariates (unconfoundedness). However, the assumptions imposed for the two estimation
strategies differ. The standard multiple regression implicitly assumes that the effect
of childbearing on poverty is constant while the FILM, including all interactions
among D and covariates, allows it to change with covariate values. As a consequence,
regression does not distinguish between ATE and ATT since they coincide under a
constant-treatment effect. In contrast FILM does, and ATE and ATT will in general
differ. FILM was implemented with and without conditioning on common support.
Since results are very similar we show only FILM with the common support. In this
case, FILM requires a first stage estimation of the propensity score and the common
support.14 With FILM, the multiple regression model is made more similar to PSM. A
critical difference of course is that PSM does not impose any functional form for the
13 The absolute standardised bias is defined as the absolute difference in sample means between the
matched treated and control samples as a percentage of the square root of the average sample variance in
the groups (Rosenbaum and Rubin 1985). The results obtained using the other methods are not presented
here for brevity but they are available from the authors upon request.
14 In principle, any standard probability model can be used to estimate the propensity score. For example,
using the common logit or probit models, we can write Pr {D = 1|X } = F(h(X )), where F(.) is, respectively, the normal or the logistic cumulative distribution and h(X) is a function of covariates with linear and
higher order terms. The choice of which higher order terms to include, as well as interactions among covariates, is determined solely by the need to balance covariate distribution in the two treatment groups (Dehejia
and Wahba 1999). We used a logit specification including some interaction terms to achieve balancing. We
avoided the inclusion of higher order terms because, as demonstrated by Zhao (2005) their inclusion could
have some biasing effect (while the inclusion of irrelevant interactions has not this drawback).
123
Estimating the causal effect of fertility on economic wellbeing
373
relationship between consumption expenditure and fertility. Regression, in contrast
imposes linearity. As we can see from Table 2 the estimate for ATE is similar in all
these methods, while the estimated ATT in the PSM is slightly lower. Thus, relaxing
linearity matters and the PSM is the preferred option because it does not impose any
functional form in the stage of the estimation of the causal effect of the treatment.
Moreover, PSM allows us to assess directly the balancing we reach in covariates and
the common support. From Table 2, we note that using PSM, the estimated ATT is
different from ATE. In general, ATT and ATE differs if the distribution of covariates in
the two groups of treated and control are different (this is expected due to self-selection
into treatment) and if the treatment interacts with covariates, i.e. if the causal effects
are heterogeneous.15
Given that the average consumption growth between the two waves amounts to
1285 thousand dongs, the estimates ranging from −356 to −462 thousand dongs, as
presented in Table 2, are clearly substantial. As a confirmation, the amount needed in
1992 to buy a quantity of rice giving 1,000 calories (about 300 gr.) each day for 1 year
was 215 thousand of dongs.16 Moreover, the food poverty line in 1992 was estimated
to 750 thousand of dongs (corresponding to 68 US dollars in 1992), which is another
indication that the estimated effects are substantial.17
4.1.1 Sensitivity analysis to violations of the unconfoundedness assumption
The reliability of the previous PSM estimates depends on the balancing property
being satisfied, the sensitivity to the imposition of common support and to the matching method used. All of these are checked rigorously, and in general the estimates
are robust.18 However, the most critical requirement of the PSM is the plausibility
15 The ATE can be seen as a weighted average of the ATE on the treated (ATT) and the ATE on the untreated
(ATU). Since in our application ATE < ATT this means that ATU < ATT. We can interpret this result as
follows. The effect of childbearing events on consumption is estimated to be negative for household actually
experiencing these events (treated). However, the negative effect of childbearing events is even higher on
households which do not experiencing these events (untreated). This means that the treatment interacts with
household characteristics and untreated households show, on average, characteristics negatively associated
with consumption and this increases the negative effect of childbearing events on this sub-population. It
may indicate optimisation on the part of the household decision maker: households in favourable economic
conditions (that can easily afford it) decide to have children.
16 These figures are derived by Molini (2006).
17 For further insights into the composition of the Vietnamese food basket and for more details about the
construction of the Vietnamese food poverty line, see Tung (2004).
18 The balancing property is checked comparing the distribution of covariates X before and after matching
and calculating the reduction in the absolute standardised bias. As already mentioned in the text, we calculated the ATE and ATT with different matching methods in order to assess the sensitivity of the method
presented here. Finally, we found that very few units fall outside the common support calculated with the
minima–maxima criterion. Using different methods for calculating CS, including the tick support, we get
results similar to the ones presented in Table 2. We also implemented matching on the underlying continuous index instead to the propensity score to control for the heavy—tails problem. Again results are stable.
Details are available from authors on request.
123
374
Table 3 Rosenbaum bounds
B. Arpino, A. Aassve
p-value
1.0
6.10E−13
1.1
4.40E−10
1.2
7.40E−08
1.3
4.10E−06
1.4
0.000096
1.5
0.001133
1.6
0.007673
1.7
0.033198
1.8
0.099821
1.9
0.223538
2.0
0.395373
of the UNC. The UNC is an untestable assumption and has to be judged on the
basis of the data available. The richness of covariates makes us reasonably confident that the most relevant confounders are observed and included in the matching
set.
To assess the reliability of the UNC assumption we apply the sensitivity analysis
suggested by Rosenbaum and IMN as outlined in Sect. 2. The idea is to assess how
strong the effect of an unmeasured confounder must be in order to undermine conclusions drawn from the analysis based on the UNC. The result of the Rosenbaum type
sensitivity analyses are reported in Table 3. The key is to vary the effect of the potential
unobserved confounder on the selection into treatment by choosing different levels of
the parameter . As mentioned in Sect. 2.2, fixing = 1 corresponds to assuming that
the UNC holds, while increasing the values of progressively increases the deviation
from the UNC. The simulation exercise shows at what point the treatment effect is no
longer statistically significant. The results as shown in Table 3 suggest that a value of
= 1.8 provides such a cut-off point. The interpretation is as follows. If an unobserved
covariate caused the odds of experiencing a childbearing event to differ between treated
and untreated households with the same observed characteristics by a factor of 1.8 (or
80%), then the 90% confidence interval of the ATT would include zero. The value
1.8 is a rather large number given that we have adjusted for many important observed
background characteristics (Aakvik 2001). Rosenbaum (2002) applying this type of
sensitivity analysis found similar cut-off points for the well-known Card and Krueger
(1994) minimum wage studies (the author found figures between 1.34 and 1.5).
Finally, we apply the sensitivity approach of IMN. In particular, we implemented
four separate sensitivity analyses according to the different signs of the association
between U and D and Y. The results are shown in Tables 4, 5, 6, and 7. In each table
we present, in correspondence to different values of the parameters q and s, defined in
Sect. 3, the estimated ATT, its standard error, and the values of the parameters and
, which measures, respectively, the effect of the simulated U on the outcome and
on the treatment, controlling for the observed confounders. As explained in Sect. 2,
the simulation exercise consists of varying the sign and strength of the associations
123
Estimating the causal effect of fertility on economic wellbeing
375
Table 4 Sensitivity analysis to confounders such that < 1 and < 1
q = −0.1
q = −0.2
q = −0.3
q = −0.4
q = −0.5
s = −0.1
s = −0.2
s = −0.3
s = −0.4
−442 (142)
−452 (157)
−457 (172)
−464 (193)
s = −0.5
−460 (217)
= 0.73
= 0.71
= 0.67
= 0.66
= 0.62
= 0.48
= 0.29
= 0.17
= 0.09
= 0.04
−446 (139)
−456 (150)
−463 (168)
−466 (179)
−471 (207)
= 0.42
= 0.39
= 0.38
= 0.35
= 0.31
= 0.56
= 0.34
= 0.20
= 0.11
= 0.04
−451 (135)
−446 (146)
−461 (161)
−472 (173)
−477 (196)
= 0.25
= 0.22
= 0.20
= 0.18
= 0.14
= 0.64
= 0.40
= 0.23
= 0.12
= 0.05
−446 (136)
−441 (142)
−470 (156)
−469 (169)
−480 (188)
= 0.12
= 0.12
= 0.10
= 0.07
= 0.05
= 0.74
= 0.45
= 0.26
= 0.13
= 0.04
−434 (135)
−446 (140)
−471 (152)
−468 (161)
***
= 0.07
= 0.05
= 0.04
= 0.03
***
= 0.86
= 0.52
= 0.29
= 0.15
***
Note *** Combination resulting in inadmissible values of the parameters characterising the distribution of
U
between U and Y0 and between U and D, by choosing, respectively, different values
of the parameters q and s.19 Simulated values of U are calculated and the ATT is
estimated using the simulated covariate as if it was an additional observed confounder
to be included in the set of matching variables. The procedure is repeated 1,000 times.
The final ATT estimate for a given combination of the values of q and s is the average of the estimates obtained across all the replications. As already noted in Sect. 2,
the strength of the associations between the confounder, treatment and outcome cannot be specified precisely prior to the simulation. However, they can be assessed by
considering the average odd ratios and .
In the first simulation scenario (Table 4) we assess the sensitivity of the estimated
ATT when both the outcome and selection effects of the unobserved simulated confounder are negative. Thus, we impose negative values of the parameters q and s,
which generate odds ratios and smaller than 1. The estimated ATT are always
significant, even when and are very low. The largest difference between the ATT
and the baseline estimate (ATT = −356; Table 2) is obtained when q and s are set to
−0.4 and −0.5, respectively. However, in this case the outcome and selection effects
are very strong ( = 0.05; = 0.04). In order to assess the plausibility of the effect
19 As mentioned in Sect. 2, in order to simulate the values of U a binary transformation of our continuous
outcome, consumption growth, is needed. This new dichotomous variable Y* takes value 1 if the original
outcome (equivalised consumption growth) is higher than the median growth and 0 otherwise. This new
variable Y* is used to simulate the values of U depending on the values of the parameters q and s that links
the unobserved confounder to Y* and D. After U has been generated, this is included in the matching set
together with the observed covariates and the ATT is estimated. However, in the estimation of the ATT the
original continuous outcome, and not its dichotomous transformation Y*, is used.
123
376
B. Arpino, A. Aassve
Table 5 Sensitivity analysis to confounders such that > 1 and < 1
q = +0.1
q = +0.2
q = +0.3
q = +0.4
q = +0.5
s = −0.1
s = −0.2
s = −0.3
s = −0.4
−445 (150)
−448 (168)
−449 (195)
−446 (217)
s = −0.5
−435 (251)
= 2.03
= 1.93
= 2.08
= 2.17
= 2.14
= 0.36
= 0.22
= 0.13
= 0.07
= 0.03
−444 (154)
−445 (177)
−444 (202)
−433 (238)
−417 (290)
= 3.33
= 3.27
= 3.77
= 3.79
= 4.28
= 0.31
= 0.18
= 0.11
= 0.06
= 0.02
−443 (163)
−435 (186)
−437 (216)
−415 (263)
−402 (324)
= 5.99
= 6.33
= 6.53
= 7.58
= 9.84
= 0.26
= 0.15
= 0.09
= 0.04
= 0.02
−436 (172)
−417 (197)
−417 (238)
−383 (308)
−359 (435)
= 11.57
= 13.01
= 13.29
= 18.17
= 31.28
= 0.21
= 0.12
= 0.07
= 0.03
= 0.01
−420 (184)
−398 (219)
−369 (271)
−338 (388)
−238 (639)
= 36.02
= 30.67
= 51.94
= 117.46
= 2128.61
= 0.17
= 0.09
= 0.05
= 0.02
= 0.01
Table 6 Sensitivity analysis to confounders such that < 1 and > 1
q = −0.1
q = −0.2
q = −0.3
q = −0.4
q = −0.5
s = +0.1
s = +0.2
s = +0.3
s = +0.4
s = +0.5
−431 (135)
−440 (137)
−445 (149)
−448 (171)
−450 (192)
= 0.73
= 0.73
= 0.76
= 0.74
= 0.69
= 8.43
= 1.19
= 1.85
= 2.94
= 4.79
−445 (135)
−432 (140)
−448 (156)
−440 (175)
−436 (200)
= 0.45
= 0.43
= 0.43
= 0.43
= 0.38
= 10.03
= 1.39
= 2.16
= 3.43
= 5.69
−441 (133)
−436 (147)
−441 (166)
−437 (189)
−441 (224)
= 0.25
= 0.26
= 0.25
= 0.24
= 0.22
= 12.47
= 1.62
= 2.56
= 4.10
= 6.98
−432 (136)
−435 (149)
−431 (172)
−427 (201)
−415 (247)
= 0.14
= 0.15
= 0.14
= 0.13
= 0.11
= 16.41
= 1.90
= 3.05
= 4.99
= 8.73
−429 (142)
−430 (163)
−422 (189)
−402 (222)
−389 (293)
= 0.08
= 0.08
= 0.07
= 0.06
= 0.05
= 2.27
= 3.74
= 6.39
= 11.47
= 23.49
of the unobserved simulated confounder U on Y0 and D, as measured, respectively, by
the average odds ratios and , we compare these values with the odds ratios obtained
by estimating two separate logit models, taking as outcomes the binary transformation
of the original outcome, Y*, and the treatment indicator, D. The estimated odds ratios
123
Estimating the causal effect of fertility on economic wellbeing
377
Table 7 Sensitivity analysis to confounders such that > 1 and > 1
q = +0.1
q = +0.2
q = +0.3
q = +0.4
q = +0.5
s = +0.1
s = +0.2
s = +0.3
s = +0.4
s = +0.5
−428 (136)
−451 (138)
−441 (143)
−460 (155)
−459 (173)
= 2.12
= 2.12
= 2.16
= 2.18
= 2.40
= 1.18
= 1.38
= 2.19
= 3.54
= 6.08
−442 (134)
−434 (137)
−442 (136)
−456 (150)
−465 (169)
= 3.29
= 3.85
= 4.17
= 4.49
= 5.28
= 1.16
= 1.18
= 1.88
= 3.08
= 5.33
−441 (135)
−416 (133)
−447 (135)
−447 (148)
−461 (163)
= 6.32
= 7.97
= 7.78
= 8.07
= 10.59
= 4.65
= 1.05
= 1.03
= 1.65
= 2.67
−441 (136)
−424 (134)
−452 (135)
−449 (145)
−465 (155)
= 13.09
= 15.94
= 40.14
= 24.41
= 49.47
= 1.05
= 1.08
= 1.41
= 2.34
= 4.14
−433 (142)
−437 (134)
−435 (137)
−450 (142)
−465 (152)
= 40.72
= 83.74
= 232.02
= 143.41
= 172.65
= 1.06
= 1.05
= 1.21
= 2.04
= 3.64
in the first model range from 0.61 to 4.99 and for the second from 0.70 to 5.65.20
Therefore, the values = 0.05 = 0.04 are unrealistic since they are very different
from the observed effects.
The practical relevance of the four scenarios defined in the Tables 4, 5, 6, and 7
depends on the assumption the researcher makes about how U relates to the treatment
and the outcome. On the basis of the discussion in Sect. 3, an interesting scenario
in our application is represented by the sensitivity analysis of Table 5, where the
unobserved confounder is assumed to have a positive effect on the outcome (q > 0;
> 1) and a negative effect on the treatment (s < 0; < 1). This confounder mimics
the potential effect that unobserved ability produce on consumption expenditure and
fertility. That is, unobserved ability will be positively correlated with labour supply
and income for the spouse (and hence consumption expenditure) and negatively correlated with childbearing. As we can see from Table 5, again the estimated effect is
almost always significant and not much different from the baseline result. Only if the
associations of U with D and/or Y are unreasonably strong the ATT becomes insignificant. For example, the weakest effect (ATT = −238) corresponds to q = +0.5 and
s = −0.5. In this case both the outcome and the selection effects are unrealistically
strong ( = 2128.61; = 0.01).21 These effects are unrealistic in the sense that the
effect of ability, after having controlled for important confounders, including observed
20 For dummy variables we calculated the odds ratios comparing units with values 0 and 1, while for
continuous covariates we compared two units differing by 0.5 standard deviations. The estimates from
these models, not showed here, are available from the authors upon request.
21 Again, the plausibility of the values of the odds ratios is gauged with reference to the observed odds
ratios reported in the text (ranging from 0.61 to 4.99, for the outcome model, and from 0.70 to 5.65, for the
treatment model).
123
378
B. Arpino, A. Aassve
education of family members, cannot be expected to be as strong as the values of the
parameters and if we compare these effect to those of observed key covariates
(ranging as said before from 0.61 to 5.65).
The results for the other two scenarios (Tables 6, 7) are qualitatively similar to
those in the first two tables. As a result, the conclusion from the IMN type sensitivity
analysis is in line with the Rosenbaum type approach. The ATT estimated through
PSM is rather robust to the presence of potentially omitted variables.
4.2 Instrumental variable results
While the previous sensitivity analysis is a useful way to test the robustness of the
PSM estimates, the availability of instruments is the key to avoid the UNC assumption. Whereas instruments are not always easy to come by we propose two alternatives
for our setting. The first is a variable that takes value 1 if the household has no male
children in the first wave—0 otherwise. As already mentioned this kind of instrument
is widely used (see, e.g. Angrist and Evans 1998; Chun and Oh 2002; Gupta and
Dubey 2003). The argument is that couples have certain gender preferences for their
children—in particular they tend to have a preference for having at least one son. In
other words, couples are more likely to have another child if the previous ones were
girls. In so far couples have a preference for boys, such a variable work well as an
instrument since it is expected to have an impact on fertility but not a direct effect
on poverty. Hence, the exclusion restriction seems to be reasonable. The strong preference for sons in Vietnam is confirmed by many studies (Haughton and Haughton
1995; Johansson 1996 and Johansson 1998; Belanger 2002). To better highlight the
preference for sons we selected only households that had at least two children in the
first wave. Among these households the percentage having an additional child between
the two waves is 28% if they had at least one son, but as high as 48% if they did not
(difference significant at 1% level). Among households with at least three children, the
percentages are, respectively, 17 and 44% (difference significant at 1% level). These
simple analyses confirm the strong preference for sons, implying strong relevance of
the instrument.
The monotonicity assumption, implying absence of defiers, seems plausible in this
application. In fact, defiers would in this case be households that would not have any
further children if they already had daughters, whereas if they instead had sons they
would have more children. In other words, defiers are households with strong preference for daughters, which is unlikely for Vietnam (but of course possible). In any case
the proportion of this type of households is small22 (see, e.g. Haughton and Haughton
1995) and should not present a serious problem for the IV estimator,23 Another characteristic of the instrument is that it is as randomised since households cannot choose
22 We found that the percentage of households with two children in wave 1 having an additional child is
30% if they had no daughters, 27% if they had one daughter and 48% if they had two daughters.
23 In fact, AIR show that the sensitivity of IV results to the monotonicity assumption is limited if the
instrument is strong, meaning that the association between the instrument and the treatment is high and the
proportion of defiers is small.
123
Estimating the causal effect of fertility on economic wellbeing
379
Table 8 Local average treatment effect estimates through instrumental variables
Estimates
Instrumental variable
Son preference
LATE (standard
errors in
parenthesis)
Proportion of compliers
Contraception availability in the
community
Wald
estimator
Frölich
estimator
Wald
estimator
Frölich
estimator
−429 (228)
−478 (672)
−8822 (8597)
−821 (1844)
0.28
0.17
0.09
0.01
Note Point estimates and standard errors for the Frölich’s method have been obtained by bootstrapping over
1,000 iterations. As for the matching method, consistently with the choice we made for the propensity score
matching, we used a nearest neighbour matching with replacement
the sex of their children,24 meaning that in this case controlling for covariates is not
necessary.
The second instrument is a variable taking the value 1 if in the community where
the household reside no contraceptive methods (mainly Intra Uterine Device (IUD)
and condom) were available and 0 otherwise. IUD and condom are the most available
contraceptives method in Vietnam and the IUD is the most widely used (Anh and
Thang 2002). Instruments based on geographical variation in availability of services
are often used (see, e.g. McClellan et al. 1994; Card 1995). These variables may work
well if households living in communities with no contraceptive facilities have higher
risks of childbearing and if contraceptive availability in the community has no direct
effect on consumption growth. However, it is possible that unavailability of contraceptives as a community characteristic is associated with the economic wellbeing
for those household living in the community. In other words, the instrument cannot
be thought of as randomised. Controlling for covariates is consequently important.
Monotonicity seems plausible also in this case. Here, it implies that households who
would have (at least) one child between the waves if they live in a community with
available contraception would also have one child if they live in a community without
contraception.
The fact that the first instrument can be assumed as randomised means that we can
apply the Wald estimator without covariates. The results presented in Table 8 indicate
a strong and negative effect of having any additional children on consumption expenditure. Compared with the Frölich estimator, we can see that the results are similar,
confirming that controlling for covariates does not make a huge difference.25 The
Frölich estimates have been obtained as the ratio of two matching estimators. As for
24 This is not completely true in those countries were the selective abortions are a current practice. This
is the case, for example in India where amniocentesis diagnoses are available and used for sex-selective
abortions (Gupta and Dubey 2003).
25 Point estimates and standard errors were obtained from bootstrapping over 1,000 iterations. Also for the
Frölich method, the choice of the matching algorithm had little impact on the point estimates and standard
errors.
123
380
B. Arpino, A. Aassve
the matching method, consistently with the choice we made for the PSM, we used a
nearest neighbour matching with replacement.
Since we selected households with at least two children these estimates refer strictly
speaking only to this sub-population. Moreover, and more importantly, since IV estimates the LATE, these results are referred only to the latent sub-population of compliers, who are those choosing to have another child because they did not already have
a male child. The extent to which the IV estimates can be compared with PSM ones
depends on the nature of the instrument. If we can assume that the average causal effect
for ‘current’ non-compliers (always-takers and never-takers) is equal to the average
causal effect for ‘current’ compliers (the LATE) then LATE and ATE coincide. This
hypothesis is a-priori strong. However, in the case where we use the sex ratio of the
children as the instrument, one may expect that LATE and ATE will indeed be quite
similar. Two considerations are in order here. First, we note that the estimated proportion of compliers is about 0.2, which is quite high compared to many other studies (see,
e.g. Angrist and Evans 1998). This proportion of compliers is equal to the estimated
causal effect of the instrument on the treatment variable D. Hence, there is evidence
confirming that the instrument is not weak.26 Moreover, given that male preference
for children is a nationwide phenomenon in Vietnam, implies that the estimated effect
on the sub-group of compliers could be applied to the whole population. These considerations are supported by the fact that the LATE from the IV estimation (−429) is
in this case almost equal to the ATE from the PSM estimation (−414). In this sense,
the IV estimate can be considered as a robustness check of the estimates provided by
PSM.
Using the community level availability of contraception as an instrument gives a
very different picture. Controlling for covariates is essential, as there are in this case
huge differences between the Wald and the Frölich estimates. The Frölich estimate
shows nevertheless a strong and negative effect for the sub-population of households
that are ‘encouraged’ to have a child due to the lack of contraception in the community. This LATE is not comparable with the LATE estimated with the first instrument
since the sub-population of compliers are very different. The estimated proportion of
compliers for the second instrument is 0.01, hence the sub-population of households
that ‘reacts’ to this instrument is small and the instrument is weak. One could argue
that since in Vietnam contraception is quite diffused compliers are in this case a rather
selected group and, perhaps, constituted by less educated and generally marginalised
households. This also could contribute to explain the much stronger effect. The low
proportion of compliers implies that this instrument is weak.27 As pointed out by AIR
26 This is also confirmed by the standard F test for relevance. This test gives a value of 45.10, whereas
a value of 10 is normally taken as a threshold for whether the association between instruments and the
endogenous covariate is sufficient (Staiger and Stock 1997).
27 For this instrument the F test gives value of 10.89 which is just above the threshold of 10. In the
standard 2SLS framework when instruments are weak results are sensitive in the sense that even in large
samples the bias of the IV estimator remains substantial. Similarly, in the non-parametric LATE approach
weak instruments make results sensitive to the violations of identifying assumption (i.e. monotonicity and
exclusion restriction). Interestingly, the weakness of instruments can be interpreted as the existence of a
small proportion of compliers. We also performed the Sargan test of overidentifying restrictions when both
instruments were included and did not reject the null hypothesis of valid instruments ( p value = 0.4537).
123
Estimating the causal effect of fertility on economic wellbeing
381
care is needed in the interpretation of this result since IV estimators are more sensitive
to violations of exclusion restriction and monotonicity when the proportion of compliers is low. We have to note that in both cases IV estimates are quite imprecise with
high standard errors—especially for the second instrument.
Whereas the previous considerations would favour the sex ratio as an instrument
over and above the community level availability of contraception, it is important to note
that the latter has more direct policy relevance. In fact, availability of contraception
at community level is a variable on which policy makers can act directly. The Frölich
estimates indicate that the expected causal effect on poverty from a fertility reduction
induced by raising contraception availability in the communities is quite high. However, the size of the sub-population reacting to this policy (compliers) is rather small.
Obviously, policy makers must consider both aspects in order to calibrate efficient
policies. In our case, the policy could have a consistent effect for a rather small group.
The estimates resulting from the second instrument, as well as those obtained using
regression and PSM, indicate that fertility affects economic wellbeing considerably,
though the policy implications are less obvious. If policy makers worry about the fact
that households with many children are poorer then targeted tax reductions and other
benefits could help. In this way the existing gap between households with different
number of children will be reduced. This policy clearly does not act on poverty through
a fertility reduction and instead could impose an increase in fertility. However, other
analyses (not shown here) do not suggest any strong effects going from economic
wellbeing onto fertility levels (Arpino and Aassve 2007).
5 Conclusions
Several approaches are available to estimate causal effects. The appropriateness and
interpretations of the different strategies depend on the application at hand, and importantly, the available data. Whereas the statistical properties of the various estimators
are well established, there appears to be considerable confusions on these issues in
empirical policy analysis. We consider an application where interest lies in the estimating the effect of fertility on households’ economic wellbeing—a topic which has
received considerable interest in development studies, but where issues of causality
has not been considered in much details.
In sum, if our interest lies in the ATE and/or ATT one must consider the plausibility
of the UNC, which depends heavily on available data. If unobserved covariates are
relevant, the first option is to maintain the UNC and implement a sensitivity analysis.
Using methods based on IV is, instead, critical. Without being willing to impose strong
assumptions we can identify only the LATEs, which is the effect on the sub-population
of compliers. If treatment effects are heterogeneous, LATE is generally different from
ATE and ATT, which are usually the parameters of interest.
LATE, on the contrary, answers our research question if we are interested in the
average causal effect of a change in a treatment variable caused by intervention on a
third variable. In this case, an IV method can be used both in scenarios where UNC is
plausible or not. Apart of this case, the use of IV methods in situations where UNC is
reliable has little sense of course.
123
382
B. Arpino, A. Aassve
The methods are discussed in light of an application where we consider the effect of
fertility on changes in consumption expenditure. Childbearing events cannot be considered as an exogenous measure of fertility when the outcome relates to economic
wellbeing—in our case measured in terms of consumption expenditure. However, the
methodological discussion is however general and applies to many other applications.
Using methods based on the UNC assumption, such as regression and PSM, we find
that those households having children between the recorded waves have considerably
worse outcomes in terms of consumption expenditure. We then demonstrate how one
can make an assessment of the potential effect from omitting relevant but unobserved
variables without actually implementing an IV approach. This is a very useful tool in
the sense that valid and relevant instruments are often hard to come by. In our application the estimates are robust with respect to unobserved omitted variables. We find
that the estimated effect becomes non-significant only if the association between the
omitted covariate, selection and the outcome is extremely (and unreasonably) large.
Despite the robustness of the UNC in our application we implement nevertheless
the IV method using two different instruments, demonstrating two situations in which
IV analysis can be useful.
The use of IV methods in our application illustrates that reasonable instruments can
lead to estimates that differ from those of methods based on UNC, but differ among
them. In fact, compliers for one instrument can be different from compliers to another
instrument and consequently if the treatment effect is heterogeneous the estimated
LATE in the two cases will generally differ. With the first instrument, which exploits
the strong preference for sons in Vietnam, we estimated a negative impact of fertility
on poverty with a magnitude not dramatically different from that obtained by method
based on the UNC. This effect is driven by the fact that son preference is rather common in Vietnam, thus not involving any particular kinds of households. The estimated
proportion of compliers in this case is rather high at 17%, which contrasts the second instrument, availability of contraceptives in the community, where the proportion
of compliers is as low as 1%. This small sub-population of households reacting to
availability of contraceptives is a highly selected group. Clearly their opportunity to
control fertility through contraceptive practices is much reduced as it is unlikely that
compliers are able to get contraceptives from elsewhere. In this sense these households
have a higher exposure to childbearing.
Whereas the estimates based on this instrument is very different compared to the
one based on the sex preference, an advantage is that it does have direct policy relevance, simply because the instrument itself is a policy variable. The effect on this
sub-population is high and importantly, much higher than the estimates for the whole
population (ATE) and for the sub-population of treated (ATT). However, the size of
this sub-population is rather small, which is an equally important consideration for
the policy maker.
From a methodological point of view, a key difference among the two instruments
is that the second cannot be considered as randomised, implying the need to control
for covariates. Using the approach suggested by Frölich (2007), which overcomes
many of the typical strong assumptions, we demonstrate that inclusion of covariates
do matter for the parameter estimates.
123
Estimating the causal effect of fertility on economic wellbeing
383
Acknowledgements This article forms part of the project “Poverty dynamics and fertility in developing
countries”, funded by the Economic and Social Research Council (award no. RES000230462). We are
grateful for comments by Fabrizia Mealli, Stefano Mazzuco, Letizia Mencarini, Stephen Pudney, the editor
and two anonymous referees. All errors and inconsistencies in the article are our own.
References
Aakvik A (2001) Bounding a matching estimator: the case of a Norwegian training program. Oxf Bull Econ
Stat 63(1):115–143
Aassve A, Betti G, Mazzuco S, Mencarini L (2007) Marital disruption and economic well-being: a comparative analysis. J R Stat Soc Ser A 170(3):781–799
Abadie A, Imbens GW (2002) Simple and bias-corrected matching estimators for average treatment effects.
NBER working paper T0283
Abadie A, Imbens GW (2004) On the failure of the bootstrap for matching estimators. NBER working
paper T0325
Abadie A, Drukker D, Leber Herr J, Imbens GW (2004) Implementing matching estimators for average
treatment effects in stata. Stata J 4(3):290–311
Admassie A (2002) Explaining the high incidence of child labour in sub-Saharan Africa. Afr Dev Rev
14(2):251–275
Angrist JD, Evans WN (1998) Children and their parents’ labor supply: evidence from exogenous variation
in family size. Am Econ Rev 88(3):450–477
Angrist JD, Krueger A (1991) Does compulsory school attendance affect schooling and earnings? Quart J
Econ 106:979–1014
Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J
Am Stat Assoc 91:444–472
Anh DN, Thang NM (2002) Accessibility and use of contraceptives in Vietnam. Int Fam Plann Perspect
28(4):214–219
Arpino B, Aassve A (2007) Estimation of causal effects of fertility on economic wellbeing: evidence from
rural Vietnam. ISER working paper 2007–24. University of Essex, Colchester
Athey S, Imbens GW (2006) Identification and inference in nonlinear difference-in-differences models.
Econometrica 74(2):431–497
Becker SO, Ichino A (2002) Estimation of average treatment effects based on propensity scores. STATA J
2:358–377
Becker GS, Lewis HG (1973) On the interaction between the quantity and quality of children. J Polit Econ
81(2):S279–S288
Belanger D (2002) Son preference in a rural village in North Vietnam. Stud Fam Plann 33(4):321–334
Blundell R, Dearden L, Sianesi B (2005) Evaluating the impact of education on earnings in the UK: models,
methods and results from the NCDS. J R Stat Soc Ser A 168(3):473–512
Bryson A, Dorsett R, Purdon S (2002) The use of propensity score matching in the evaluation of labour
market policies. Working paper no. 4, Department for Work and Pensions
Caliendo M, Kopeinig S (2005) Some pratical guidance for the implementation of propensity score matching. IZA discussion paper no. 1588
Card D (1995) Using geographic variation in college proximity to estimate the return to schooling. In:
Christofides L, Grant E, Swidinsky R (eds) Aspects of labor market behaviour: essays in honour of
John Vanderkamp. University of Toronto Press, Toronto pp 201–222
Card D, Krueger BA (1994) Minimum wages and employment: a case study of the fast-food industry in
New Jersey and Pennsylvania. Am Econ Rev 84:772–793
Chun H, Oh J (2002) An instrumental variable estimate of the effect of fertility on the labour force participation of married women. Appl Econ Lett 9:631–634
Costa Dias M, Ichimura H, van den Berg GJ (2008) The matching method for treatment evaluation with
selective participation and ineligibles. IZA discussion paper no. 3280
Coudouel A, Hentschel J, Wodon Q (2002) Poverty measurement and analysis, poverty reduction strategy
paper sourcebook, World Bank, Washington D.C. regressors. J Econom 124:335–361
Dawid AP (1979) Conditional independence in statistical theory. J R Stat Soc Ser B 41:1–31
Deaton A, Zaidi S (2002) Guidelines for constructing consumption aggregates for welfare analysis. Living
standards measurement study working paper no. 135, The World Bank
123
384
B. Arpino, A. Aassve
Dehejia R, Wahba S (1999) Causal effects in non-experimental studies: re-evaluating the evaluation of
training programs. J Am Stat Assoc 94(448):1053–1062
Duy LV, Haughton D, Haughton J, Kiem DA, Ky LD (2001) Fertility decline. In: Haughton D, Haughton J,
Phong N (eds) Living standards during an economic boom. Vietnam 1993–1998. Statistical Publishing
House, Hanoi
Easterlin RA, Crimmins EM (1985) The fertility revolution. University of Chicago Press, Chicago
Ermisch J (1989) Purchased child care, optimal family size and mother’s employment: theory and econometric analysis. J Popul Econ 2:79–102
Frölich M (2007) Non parametric IV estimation of local average treatment effects with covariates. J Econom
139:35–75
Glewwe P, Gragnolati M, Zaman H (2002) Who gained from Vietnam’s boom in the 1990s? Econ Dev Cult
Change 50(4):773–792
Goodman A, Sianesi B (2005) Early education and children’s outcomes: how long the impacts last. Fisc
Stud 26(4)
Gupta ND, Dubey A (2003) Poverty and fertility: an instrumental variables analysis on Indian micro data.
Working paper 03–11, Aarhus School of Business
Hardle W, Linton O (1994) Applied nonparametric methods. In: Engle RF, McFadden D (eds) Handbook
of econometrics, vol 4. North Holland, Amsterdam pp 391–448
Haughton J, Haughton D (1995) Son preference in Vietnam. Stud Fam Plann 26(6):325–337
Heckman JJ (1997) Instrumental variables: a study of implicit behavioural assumptions used in making
program evaluations. J Hum Resour 32:441–462
Heckman JJ, Ichimura H, Todd P (1997) Matching as an econometric evaluation estimator. Rev Econ Stud
65:261–294
Ichino A, Mealli F, Nannicini T (2008) From temporary help jobs to permanent employment: what can we
learn from matching estimators and their sensitivity? J Appl Econom 23(3):305–327
Imbens GW (2003) Sensitivity to exogeneity assumptions in program evaluation. AEA Pap Proc 93(2):126–
132
Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity: a review. Rev
Econ Stat 86:4–30
Imbens GW, Angrist JD (1994) Identification and estimation of local average treatment effects. Econometrica 62:467–475
Johansson A (1996) Family planning in Vietnam—women’s experiences and dilemma: a community study
from the Red River Delta. J Psychosom Obstet Gynecol 17:59–67
Johansson A (1998) Population policy, son preference and the use of IUDs in North Vietnam. Reprod Health
Matters 6:66–76
Manski CF (1990) Nonparametric bounds on treatment effects. Am Econ Rev Pap Proc 80:319–323
McClellan M, McNeil BJ, Newhouse JP (1994) Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality? J Am Med Assoc 272(11):859–866
Moav O (2005) Cheap children and the persistence of poverty. Econ J 115(11):88–110
Molini V (2006) Food security in Vietnam during the 1990s—the empirical evidence. Reaserch paper no.
2006/67, United Nation University
Nannicini T (2007) Simulation-based sensitivity analysis for matching estimators. Stata J 7(3):334–350
Neyman J [1990(1923)] On the application of probability theory to agricultural experiments: essay on
principles, section 9. Transl Stat Sci 5(4):465–480
Nguyen-Dinh H (1997) A socioeconomic analysis of the determinants of fertility: the case of Vietnam. J
Popul Econ 10(3):251–271
Rosenbaum PR (1984) The consequences of adjustment for a concomitant variable that has been affected
by the treatment. J R Stat Soc Ser A 147:656–666
Rosenbaum PR (1987a) The role of a second control group in an observational study, Stat Sci (with discussion) 2(3):292–316
Rosenbaum PR (1987b) Sensitivity analysis to certain permutation inferences in matched observational
studies. Biometrika 74(1):13–26
Rosenbaum PR (2002) Observational studies. Springer, New York
Rosenbaum PR, Rubin DB (1983a) The central role of the propensity score in observational studies for
causal effects. Biometrika 70:41–55
Rosenbaum PR, Rubin DB (1983b) Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Ser B 45:212–218
123
Estimating the causal effect of fertility on economic wellbeing
385
Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39(1):33–38
Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ
Psychol 66:688–701
Rubin DB (1978) Bayesian inference for causal effects: the role of randomization. Ann Stat 6:34–58
Rubin DB (1980) Discussion of randomization analysis of experimental data: the fisher randomization test
by D. Basu. J Am Stat Assoc 75:591–593
Schoumaker B, Tabutin D (1999) Relationship between poverty and fertility in Southern countries: knowledge, methodology and cases. Working paper no. 2, Department of Science of Population and Development, Université Catholique de Louvain
Scornet C (2007) 1963–2003: Quarante ans de planification familiale au Viêt-Nam. In: Adjamagbo A,
Msellati P, Vimard P (eds) Santé de la reproduction et fécondité dans les pays du Sud. Nouveaux
contextes et nouveaux comportements. Academia Bruylant, Louvain la Neuve, pp 142–171
Smith J (2000) A critical survey of empirical methods for evaluating active labor market policies. Schweiz
Z Volkswirtsch Stat 136(3):1–22
Smith JA, Todd P (2005) Does matching overcome Lalonde’s critique of nonexperimental estimators?
J Econom 125:305–353
Staiger D, Stock J (1997) Instrumental variables regression with weak instruments. Econometrica
65(3):557–586
Tung PD (2004) Poverty line, poverty measurement, monitoring and assessment of MDG in Viet Nam.
Report presented at the “2004 international conference on offical poverty statistics—methodology
and comparability”, Manila October 2004
Willis RJ (1973) A new approach to the economic theory of fertility behavior. J Polit Econ 81(2, part 2):
S14–S64
Wooldridge J (2002) Econometric Analysis of cross section and panel data. MIT Press, Cambridge
Zhao Z (2005) Sensitivity of propensity score methods to the specifications. IZA discussion paper no. 1873
123