Empir Econ (2013) 44:355–385 DOI 10.1007/s00181-010-0356-9 Estimating the causal effect of fertility on economic wellbeing: data requirements, identifying assumptions and estimation methods Bruno Arpino · Arnstein Aassve Received: 21 April 2008 / Accepted: 20 November 2009 / Published online: 14 March 2010 © Springer-Verlag 2010 Abstract This article aims to answer to what extent fertility has a causal effect on households’ economic wellbeing—an issue that has received considerable interest in development studies and policy analysis. However, only recently has this literature begun to give importance to adequate modelling for estimation of causal effects. We discuss several strategies for causal inference, stressing that their validity must be judged on the assumptions we can plausibly formulate in a given application, which in turn depends on the richness of available data. We contrast methods relying on the unconfoundedness assumption, which include regressions and propensity score matching, with instrumental variable methods. This discussion has a general importance, representing a set of guidelines that are useful for choosing an appropriate strategy of analysis. The discussion is valid for both cross-sectional or panel data. Keywords Fertility · Poverty · Causal inference · Unconfoundedness · Instrumental variables · VLSMS JEL Classification D19 · I32 · J13 1 Introduction There is a strong positive correlation between poverty and family size in most developing countries (Schoumaker and Tabutin 1999). Not much is known however, about the extent fertility has a causal impact on households’ wellbeing. Needless to say, the issue is of critical importance for implementing sound policies. This article considers B. Arpino (B) · A. Aassve Department of Decision Sciences, DONDENA Centre for Research on Social Dynamics, Bocconi University, Via Roentgen, 20136 Milan, Italy e-mail: [email protected] 123 356 B. Arpino, A. Aassve different strategies for establishing the causal effect of fertility on households’ wellbeing. We take a quasi-experimental approach where fertility is considered as a treatment and the outcome is the equivalised household consumption expenditure. We adopt the potential outcomes framework (Neyman 1923; Rubin 1974, 1978) where recorded childbearing events are used as a measure of fertility. Consequently, each household i has two potential outcomes: Yi1 if it experiences a childbearing event between two points in time (treated) and Yi0 otherwise (untreated or control). However, childbearing is, at least in part, down to individual choice, giving rise to self-selection: households that choose to have more children (self-selected into the treatment) may be very different from households that choose to have fewer children irrespective of the treatment. Hence, if we observe that the first group of households has on average lower per capita expenditure, we cannot necessarily assert that this is due to fertility since the two groups of households are likely to be different in respect to many other characteristics, such as education. Thus, a simple difference in the average consumption (or income) for the two groups of households gives a biased estimate. We discuss different strategies to deal with the self-selection problem, stressing that their validity must be judged on the assumptions we can formulate in a given application, which in turn depends on the richness of available data. A key distinction is between those situations where we can assume that selection depends only on characteristics that are observed by the researcher (selection on observables) and those situations where one or more of the relevant characteristics are unobserved (selection on unobservables). In the first case, we compare units of similar characteristics that differ only by the treatment status. For these units the observed difference in the outcome can be reasonably assumed to be due to the treatment. Propensity score matching (PSM) relies on the selection on observables assumption, which is referred as the unconfoundedness assumption (UNC). Multiple regression is also a method relying on this assumption, though the identifying assumption can be stated in a weaker way (see, e.g. Wooldridge 2002). The empirical analyses use the Vietnam Living Standard measurement Survey, a rich panel data set, which was first surveyed in 1992/1993 and with a follow up in 1997/1998. Exploiting the longitudinal structure of the data, we develop our estimators in a pre–post treatment setting. This has several advantages. First, covariates are measured before the exposure to the treatment, which makes it more likely that covariates are not affected by the treatment (e.g. Rosenbaum 1984; Imbens 2004). A second advantage is that the lagged value of the outcome variable, Yt1 , can be included in the set of matching covariates—all of which being measured at the first wave. This is important because the household’s level of living standard prior to treatment is relevant both for the probability of experiencing a childbearing event between the two waves and for the consumption expenditure levels at the second wave, Yt2 . Having information at two points in time, the dependent variable can be defined as the difference between the levels of the outcome after and before the treatment. In particular, we match individuals in the treatment group with individuals in the control group having similar first-period values, and their changes in outcomes are compared. An advantage of taking the difference in the pre- and post-treatment outcomes is that this helps removing residual imbalance in the average values of Yt1 between treated and control 123 Estimating the causal effect of fertility on economic wellbeing 357 group. Moreover, it is likely (in our application at least) that the variance is lower when outcome is defined as a change, as opposed to when maintaining the level. Hence, the resulting estimator will be more efficient than the one relying on levels. Importantly, we stress the fact that specifying the outcome as a difference does not change the estimand. The interest remains on the effect of childbearing events between the two waves on the consumption expenditure level at the second wave. Our approach is useful in the sense that the general discussion of methods based on the assumption of selection on observables compared to those based on unobservables, applies independently of whether the application is based on longitudinal or cross-sectional data. The standard solution to deal with selection on unobservables is to use an instrumental variable (IV) method, which of course relies on the availability of a good instrument, which in our case should be a variable which influence fertility and has no direct impact on consumption expenditures. However, even if such a variable is available the estimator can be unsatisfactory. The reason is that, unless we are willing to impose very strong assumptions, IV estimates refer only to the unobserved sub-sample of the population that reacts to the chosen instrument, i.e. the compliers (Imbens and Angrist 1994; Angrist et al. 1996). The corresponding parameter estimate is, consequently, the local average treatment effect (LATE) which, in the presence of heterogeneous treatment effects, may be different from average treatment effect (ATE) and the average treatment effect for the treated (ATT) that usually are the parameters of interest. This is of course important for policy analysis, since only if the instrument coincides with a variable of real policy relevance, can we also argue that the estimated LATE has direct policy usefulness (Heckman 1997). Moreover, the estimated LATE based on different instruments are generally different because the identified sub-populations of compliers are different. We implement the IV approach demonstrating its benefits and drawbacks by using two very different instruments. The first instrument is the sex composition of existing children. This is a widely used instrument (see, e.g. Angrist and Evans 1998; Chun and Oh 2002; Gupta and Dubey 2003) and is based on the fact that parents in Vietnam tend to have a strong preference for boys, especially in the North (Haughton and Haughton 1995; Johansson 1996, 1998; Belanger 2002). Since the preference for sons is a wide-spread phenomenon among Vietnamese households we expect the proportion of compliers to be rather high. The second instrument is the availability of contraception at the community level. This is similar to other well-used instruments related to the availability of services in the neighbourhood or its distance from the dwelling (examples include McClellan et al. (1994) who use proximity to cardiac care centres or Card (1995) who uses college proximity). An interesting aspect of this instrument is that it corresponds to a potential policy variable on which policy makers can act to both reduce fertility and, through it, make an impact on poverty. However, areas without availability of contraceptives in Vietnam are few (Nguyen-Dinh 1997; Duy et al. 2001; Anh and Thang 2002), which means that it cannot be considered as a general policy tool. From a statistical point of view, a key difference between the two instruments is that the second one cannot be considered as randomised, because availability of contraception is related to other characteristics of the community, which in turn may influence households’ wellbeing. As a consequence, detailed control for covariates 123 358 B. Arpino, A. Aassve is required, which is usually accomplished by imposing functional form and additive separability in the error term. However, these and other strong assumptions can be avoided if implementing a non-parametric approach, such as the one suggested by Frölich (2007). Whereas we have in our application access to valid instruments, this is not always so. In those situations, IV estimators cannot be used and it becomes important to implement sensitivity analysis for estimators based on selection on observables. So far, this is not very common in the applied literature, but is a critical tool as a means to assess the credibility of the identifying assumption. The key idea of the approach is to evaluate how strong the associations among an unmeasured variable, the treatment and outcome variables must be in order to undermine the results of the analysis based on the UNC. If the results are highly sensitive, the validity of the identifying assumption becomes questionable. Among the different approaches for sensitivity analysis proposed in the literature, we discuss and apply those suggested by Rosenbaum (1987b) and Ichino et al. (2008). The article is organised as follows. Section 2 reviews the statistical issues, Sect. 3 provides background information about the application, Sect. 4 shows the results, and Sect. 5 concludes. 2 Causal inference in observational studies under the potential outcomes approach The potential outcomes approach was introduced by Neyman (1923) and extended by Rubin (1974) to observational studies. We invoke the stable unit treatment value assumption (SUTVA) (Rubin 1980), which states that the potential outcomes for each unit are not affected by the treatments assigned to any other units and that there are no hidden versions of the treatment. Potential outcomes are denoted by Y1 , to indicate the outcome that would have resulted if the unit was exposed to the treatment and Y0 if it was exposed to the control (Rosenbaum and Rubin 1983a). Since each unit receives only the treatment or control, either Y1 or Y0 is observed for each unit. Assume that we have a random sample of N individual units under study N { di , yi , xi } i=1 . D represents the treatment indicator that takes the value 1 for treated units and 0 for untreated or the controls, Y indicates the observed outcome, and X indicates the set of covariates or confounders.1 The two causal parameters usually of interest are the ATE and the ATT which are defined as: ATE = E (Y1 − Y0 ) ATT = E (Y1 − Y0 |D = 1) . (1) (2) The ATE is the expected effect of the treatment on a randomly drawn unit from the population while the ATT gives the expected effect of the treatment on a randomly 1 As a convention, capital letters usually denote random variables, whereas small letters indicate their realisations. For simplicity, population units are usually not indexed by unit indicators unless this is necessary for clarity. 123 Estimating the causal effect of fertility on economic wellbeing 359 drawn unit from the population of treated. It is consequently the parameter that tends to be of interest to policy makers (Heckman et al. 1997). 2.1 Identifying assumptions Those situations where selection depends only on observed characteristics represent a critical distinction from the case where selection also depends on unobserved characteristics. The selection on observables assumption is also known as the UNC and represents the fundamental identifying assumption for a large range of empirical studies:2 Assumption A.1 (Unconfoundedness) Y1 , Y0 ⊥ D|X where ⊥ in the notation introduced by Dawid (1979) indicates independence. Assumption A.1 implies that after conditioning on variables influencing both the selection and the outcome, the dependence among potential outcomes and the treatment is cancelled out. Regression and matching techniques, as well as stratification and weighting methods, all rely on this assumption. In the regression analysis, it suffice to assume that conditional independence of potential outcomes on the treatment hold in expected values (see, e.g. Wooldridge 2002, p. 607). That is, we can substitute assumption A.1 with the weaker: E(Y1 |D, X ) = E(Y1 |X ) and E(Y0 |D, X ) = E(Y0 |X ). The fundamental idea behind these methods is to compare treated units with control units that are similar in their characteristics. Another assumption, termed overlap, is also required. Assumption A.2 (Overlap) 0 < P(D = 1|X ) < 1. where P(D = 1|X ) is the conditional probability of receiving the treatment given covariates, X. Assumption A.2 implies equality in the support of X in the two groups of treated and controls (i.e. Support (X |D = 1) = Support(X |D = 0)) which guaranties that ATE is well defined (Heckman et al. 1997). If the assumption does not hold, then it is possible that for some values of the covariates there are no comparable units. The most common approach to deal with selection on unobservables is to exploit the availability of an IV—a variable assumed to impact the selection into treatment but to have no direct influence on the outcome. The concrete possibility to use an IV method relies, of course, on the availability of such a variable. In practice, instruments are often difficult to find. In this case, a sensitivity analysis becomes very useful because it can be used to assess the importance of the violation of the UNC for the estimated causal effect. Of course, it does not represent an alternative to the IV approach. 2 The unconfoundedness assumption is sometimes referred to as the conditional independence or the exogeneity assumption (Imbens 2004). 123 360 B. Arpino, A. Aassve As with methods based on the UNC also IV methods impose a range of critical identifying assumptions. Let us consider a binary instrument indicated by Z. In randomised settings, the levels of the instruments can be seen as the assignment to the treatment, which is different from the treatment actually taken due to non-compliance. Under SUTVA, we indicate with Dz and Yz,d , respectively, the binary potential treatment indicator and the potential outcomes for unit i. The identifying assumptions for the estimation of causal effects using the availability of one IV are clarified by Angrist et al. (1996, in the following AIR). Apart from the SUTVA and the randomisation of the instrument, the fundamental assumptions are: Assumption B.1 (Exclusion Restriction) Yz,d = Yd Assumption B.2 (Nonzero Average Causal Effect of Z on D) E[D1 − D0 ] = 0 Assumption B.3 (Monotonicity) Di1 ≥ Di0 for all i = 1, . . ., N . The exclusion restriction means that the instrument Z impacts on Y only through D and corresponds to validity of the instrument. Assumption B.2 requires that for at least some unit the instrument changes the treatment status and corresponds to the hypothesis of nonzero correlation between the instrument and the endogenous variable (i.e. relevance of the instrument). The assumption of monotonicity is critical when comparing the IV approach to methods based on the UNC. To see how, we have to characterise units by the way they might react to the level of the instrument. A first group is termed compliers and defined by units that are induced to take the treatment by the instrument: Di1 − Di0 = 1. Other units may not be influenced by the instrument and are defined as either always-takers, where Di1 = Di0 = 1 (they always take the treatment whatever being the level of the instrument), or never-takers, if Di1 = Di0 = 0 (they always take the control). Finally, we might encounter defiers, who are units that do the opposite of their assignment status. The monotonicity assumption implies that there are no defiers and is crucial for identification since otherwise the treatment effect for those who shift from non-participation to participation when Z shift from 0 to 1 can be cancelled out by the treatment effect of those who shift from participation to non-participation (Imbens and Angrist 1994). Importantly, the monotonicity assumption, likewise the exclusion restriction and the UNC, is untestable and its plausibility has to be evaluated in the context of the given application. AIR demonstrate that under the aforementioned assumptions we can only identify the average causal effect calculated on the sub-population of compliers, which is termed the LATE: LATE = E [Y1 − Y0 |D1 − D0 = 1] . 123 (3) Estimating the causal effect of fertility on economic wellbeing 361 Critically important for empirical work is that in case of heterogeneous treatment effects LATE is in general different from the ATE and the ATT, which tend to be the parameters of interest. This is because LATE refers only to the sub-population of compliers, while ATE and ATT are defined, respectively, on the whole population and on the sub-population of treated. Moreover, a serious drawback of the LATE is that the sub-population of compliers is not identifiable by the data. Finally, the estimated LATE depends on the instrument used because different instruments identify different sub-population of compliers. In specific applications LATE becomes an interesting parameter for policy. Suppose that the policy maker wants to know the (average) causal effect of D on Y when we obtain a change in D by manipulating it through Z. In this case, the interest lies in the (average) causal effect of D on Y for units that react to the policy intervention on Z (the compliers). In this situation, however, the policy maker cannot identify which are the compliers, but can only estimate the dimension of this group. The presumption in such cases is that the average causal effect calculated on the sub-population of units whose behaviour was modified by assignment is likely to be informative about sub-populations that will comply in the future. 2.2 Strategies for the estimation of causal effects We discuss here three strategies for the estimation of causal effects. The first is based on assumptions A.1 and A.2, and includes regression and PSM. The second strategy consists of combining these methods with a sensitivity analysis. In essence, the sensitivity analysis assesses the robustness of estimates when we suspect failure of the UNC assumption. The third method is the IV approach. Rather than discussing the technical details of each estimator in depth we present instead the general ideas and limitations of the different techniques. For a formal comparison of these methods, we refer to Blundell et al. (2005) and Imbens (2004). 2.2.1 Strategy 1: methods based on the UNC In the standard multivariate regression model, we assume a linear relationship between outcome and independent variables and homogeneity of treatment effects; in fact, in the simplest regression model the treatment variable is not interacted with covariates and its coefficient is the same for all units. This model constrains the ATE to coincide with the ATT and if treatment effects are heterogeneous we are not able to make separate estimates of the two quantities.3 Moreover, if the true model is nonlinear, the OLS estimates of the treatment effects would be in general biased. In parametric regression, the overlap assumption is not required in so far we can be sure to have the correct specification of the model. Otherwise the comparison of treated and control units outside the common support rely heavily on the linear extrapolation. Of course, the standard model can be extended and made flexible to overcome these limitations. For example, 3 In general, ATE and ATT are expected to differ if the distribution of covariates in the treated and control group are different and if the treatment interacts with covariates (at least some of them). 123 362 B. Arpino, A. Aassve the common support problem can be circumvented by first estimating it and running the regression conditioning on it. Moreover, we can avoid to assume homogeneous treatment effects by including a complete set of interactions between each one of the covariate X and the treatment indicator D. This gives rise to the so-called fully interacted linear model (FILM in the following—see Goodman and Sianesi 2005). Since in the FILM, differently from a fully saturated model, covariates are not recoded into qualitative variables, this approach is still parametric with respect to the way continuous covariates enter the regression function and interact with the treatment. Also, the linearity assumption can be avoided if we use a non-parametric method, such as a kernel estimator (see Hardle and Linton 1994), which allows the functional form between outcome and independent variables to be determined by the data themselves. Non-parametric methods, however, have computational drawbacks when the set of covariates is large and many of them are multi-valued, or, worse, continuous. This problem, known as curse of dimensionality, is also relevant for matching methods. A popular way to overcome the dimensionality problem is to implement the matching on the basis of a univariate propensity score (Rosenbaum and Rubin 1983a). This is defined as the conditional probability of receiving a treatment given pre-treatment characteristics: e(X ) ≡ Pr {D = 1|X } = E{D|X }. When the propensity scores are balanced across the treatment and control groups, the distribution of all covariates X, are balanced in expectation across the two groups (balancing property of the propensity score). Therefore, matching on the propensity score is equivalent of matching on X. Once the propensity score is estimated, several methods of matching are available. The most common ones are kernel (gaussian and epanechnikov), nearest neighbour, radius and stratification matching (for a discussion about these methods see Caliendo and Kopeinig 2005; Smith and Todd 2005; Becker and Ichino 2002). Asymptotically, all PSM estimators should yield the same results (Smith 2000), while in small samples the choice of the matching algorithm can be important and generally a trade-off between bias and variance arises (Caliendo and Kopeinig 2005). As noted by Bryson et al. (2002) it is sensible to try a number of approaches. If they give similar results, the choice is irrelevant. Otherwise, further investigation is needed in order to reveal the source of the disparity. As will be explained in Sect. 4, we adopt this pragmatic approach and assess the sensitivity of results with respect to the matching method. Consistent with many other previous studies (see, e.g. Smith and Todd 2005), the different estimators yield very similar results (both in terms of point estimate and standard errors). The analysis in Sect. 4 is based on a nearest neighbour matching method meaning that for each treated (control) unit the algorithm finds the control (treated) unit with the closest propensity score. We use the variant with replacement implying that we allow a control (treated) individual to be used more than once as a match for individuals in the treated (control) sample. Among the other methods we tried (nearest neighbour without replacement, k-nearest neighbour, radius and kernel) this approach guarantees the best quality of matches, because only units with the closest propensity score are matched, but at the cost of higher variance (Caliendo and Kopeinig 2005). Focussing on the estimation of the ATT, to estimate the treatment effect for a treated person i, the observed outcome yi1 is compared to the outcomes y j0 for the matched unit j in the untreated sample. The ATT estimator can be written as: 123 Estimating the causal effect of fertility on economic wellbeing ˆ = 1 yi1 − ym(i)0 , ATT nD 363 (4) i:di =1 where n D is the number of treated that find a match in the untreated group and m(i) indicates the matched control for treated unit i. Under assumptions A.1 and A.2, regression and matching techniques can be used with cross-sectional data to estimate ATE and ATT, in which case Y, X, D are all measured at the same time. Longitudinal data available for at least two time points offers some important practical advantages. First, one is in a better position to measure covariates before the exposure to treatment. As is well known, one should only control for those covariates not being affected by the treatment itself (e.g. Rosenbaum 1984). Hence, being able to measure variables before the treatment makes this condition more likely to hold (Imbens 2004). To make this explicit we indicate covariates as X t1 , while the outcome as Yt2 . The treatment indicator, D, measures childbearing events between t1 and t2 and the ATT estimator can be written as: ˆ = 1 ATT yi1 t2 − ym(i)0t2 nD i:di =1 A second advantage is that we can include in the matching set the outcome variable of interest measured before the exposure to treatment. In our application, where the outcome is the consumption expenditure (see Sect. 4), we include this variable, Yt1 , in the conditioning set measured at the first wave. This reflects the households’ level of living standards prior to treatment, and is likely to be of relevance both for the probability to experience a childbearing event between the two waves and for the consumption expenditure levels at the second wave, Yt2 . The UNC assumption can be more explicitly written as: Assumption A.3 (Unconfoundedness) Y1t2 , Y0t2 ⊥ D|X t1 , Yt1 As noted by Athey and Imbens (2006), assumption A.3 implies that individuals in the treatment group should be matched with individuals in the control group with similar (identical if the matching could be perfect) first-period outcome, as well as other pre-treatment characteristics, and their second-period outcomes should be compared. However, perfect matching is not feasible and matching on the propensity score guarantees that, on average, covariates (including Yt1 in our case) are balanced in the matched treated and control group. Importantly, taking the difference in the pre- and post-treatment outcomes helps in reducing any remaining unbalance in Yt1 . This approach is similar in spirit to the bias-correction proposal of Abadie and Imbens (2002) to reduce bias due to residual imbalance in covariates after matching. The fact that the dependent variable is now defined as the difference in the levels of the outcome after and before the treatment implies that the ATT estimator can be written as: 123 364 B. Arpino, A. Aassve ˆ = 1 yi D,t2 − yi D,t1 − ym(i)U,t2 − ym(i)U,t1 , ATT nD (5) i:di =1 where the subscripts D and U make explicit that the two first outcomes in (5) are measured on treated units and the other two on untreated units. From formula (5) we can see that if the matching is exact on the variable Yt1 then the estimate obtained using the difference as outcome (5) is exactly equal to that in (4). However, even if the matching is not exact but the PSM works well (i.e. we succeed in balancing Yt1 ) then the two estimators are expected to give similar results. It is worth noting that despite the fact that estimator (5) used as dependent variable the change instead of levels at time 2, the estimands of interest, namely, ATE and ATT—as defined in (1) and (2)—are the same. For example, for the ATT we can note that: ATT = E(Y1t2 − Y0t2 |D = 1) = E[(Y1t2 − Yt1 ) − (Y0t2 − Yt1 )|D = 1]. Another advantage from taking the difference is that we expect the resulting estimator to be more efficient. This is likely to be the case in our application (and we suspect in many other applications), since there will be more heterogeneity in the levels of the consumption expenditure at time t2 compared to the consumption growth between the two waves. In other words, the variable Yt2 is likely to have a higher variance than (Yt2 − Yt1 ) although this is not true in general. A related literature motivates the advantages of considering the difference in the pre–post levels of the outcome as a way to improve the robustness of the matching method through elimination of possible time-invariant unobservables (Heckman et al. 1997; Smith and Todd 2005; Aassve et al. 2007). The resulting estimator is similar to ours, apart from the fact that Yt1 is not included in the set of matching covariates. The estimator is labelled as matching-difference-in-difference (MDID) and relies on an identifying assumption that is different from A.3. For example, for the ATT the identifying assumption can be written as4 : (Y0t2 − Y0t1 )⊥D|X t1 . As noted by Athey and Imbens (2006, p. 448) the two assumptions coincide under special conditions imposed on the unobserved components. Otherwise, the two identifying strategy, even though similar, are different and the A.3 remains a selection on observables assumption. The choice is subject matter and depends on what the researcher believes is the best identifying strategy for his/her application. We use A.3 as a starting point and compare treated and control with similar background characteristics X and initial values of the outcome instead of relying on assumptions of conditional parallel trend in the outcome as with the MDID. Having maintained an unconfoundedness-type assumption, the discussion in this section applies also to cross-sectional studies. To deal with the possible presence of unobservables we discuss methods for sensitivity analysis and IV methods. 2.2.2 Strategy 2: sensitivity analysis of methods based on the UNC The UNC becomes implausible once one or more relevant confounders are unobserved. If an instrument is available then one can proceeds with an IV estimator that we discuss in the next sub-section. Several approaches are proposed in the literature 4 If only ATE are to be identified, the assumption can be stated in a weaker form as mean independence instead of full independence (e.g. Heckman et al. 1997). 123 Estimating the causal effect of fertility on economic wellbeing 365 to deal with situations where instruments are not available and where the plausibility of the unconfoundedness is doubtful. One approach is to implement indirect test of the UNC assumption, relying on the estimation of a ‘pseudo’ causal effect that is known to be zero (Imbens 2004). A first type is to focus on estimating the causal effect of the treatment of interest on a variable that is known to be unaffected by it. Another type of tests relies on the presence of multiple control groups (Rosenbaum 1987a; Heckman et al. 1997) that arise, for example, when rules for eligibility are in place. The presence of ineligibility rules is also the basis for the bias-correction method proposed by Costa Dias et al. (2008). An important alternative to the indirect tests is the implementation of sensitivity analyses. The fundamental idea of this approach is to relax the unconfoundedness with the aim to assess how strong an unmeasured variable must be in order to undermine the implications of the matching analysis. If the results are highly sensitive, then the validity of the identifying assumption becomes questionable and alternative estimation strategies must be considered. Different approaches for sensitivity analysis have been proposed in the literature. Rosenbaum and Rubin (1983b) and Imbens (2003) propose methods to assess the sensitivity of ATE estimates in parametric regression models. Here, we apply the approaches suggested by Rosenbaum (1987b) and Ichino et al. (2008, in the following IMN) that does not rely on any parametric models for the estimation of the treatment effects. The underlying hypothesis in all of these approaches is that assignment to treatment may be confounded given the set of observed covariates but it is unconfounded given observed and an unobservable covariate, U: Y1 , Y0 ⊥ D|X, U. In the Rosenbaum’s approach, sensitivity is measured using only the relation between the unobserved covariate and the treatment assignment. To briefly describe the Rosenbaum approach, we link the probability that to receives the treatment, π , to observed characteristics, X, and an unobserved covariate, U, with a logistic regression function: π = κ (X ) + γ U ; with 0 ≤ U ≤ 1. log 1−π Under these assumptions, Rosenbaum shows that the odds ratios between two units i and j with the same X values can be bounded in the following way: 1 πi /(1 − πi ) ≤ , ≤ πj/ 1 − πj where = eγ . If = 1 this means that unconfoundedness holds and that no hidden bias exists. Increasing values of imply an increasingly important role for unobservables on the selection into treatment. Rosenbaum suggests to progressively increase the values of in order to assess the association required to overturn, or change substantially, p-values of statistical tests of no effect of the treatment. If this happens at high values of this means that the results of the analysis based on the UNC are sensitive to the presence of an unobservable only if this was strongly associated with treatment 123 366 B. Arpino, A. Aassve selection. The plausibility of the presence of such an unobservable has to be judged by the research, depending on the richness of information included in the analysis. Unlike Rosenbaum, the approach by IMN assesses the sensitivity of point estimates of the ATT under different possible scenarios of deviation from the UNC.5 The underlying hypothesis is, as in the previous approaches, that assignment to treatment may be confounded given the set of observables covariates but it is unconfounded given observed and an unobservable covariate, U. The procedure can be summarised in the following steps: (1) (2) (3) (4) Calculate ATT using PSM on X; Simulate a variable U representing a potential unobserved confounder; Include U together with X in the matching set and calculate ATT; Repeat steps 2 and 3 several times (e.g. 1,000) and calculate average ATT to be compared with the baseline estimate obtained in (1) under UNC. In the simulation process, IMN assume that U and the outcome are binary variables. In case of continuous outcomes, as in our application, a transformation is needed so that the outcome takes the value 1 if it is above a certain threshold (the median for example) and 0 otherwise, alternatively one could consider other outcome variables such as poverty status which essentially is a dichotomous transformation of consumption expenditure.6 However, this transformed variable is only required to simulate the values of U (step 2) and it is not used as the outcome variable when estimating the ATT (step 3). Since all the involved variables in the simulation are binary, the distribution of U is specified by the four key parameters: pkw = P (U = 1|D = k, Y = w) = P(U = 1|D = k, Y = w, X ) k, w = 0, 1 (6) It is assumed here that U is independent to X conditional to D and Y. In order to choose the signs of the associations between U, Y0 and D, IMN note that if q = p01 − p00 > 0 then U has a positive effect on Y0 (conditioning on X), whereas if s = p1 − p0 > 0, where pk = P(U = 1|D = k), then U has a positive effect on D. If we set pu = P(U = 1) and q = p11 − p10 the four parameters pkw are univocally identified from specifying the values of q and s. Hence, by changing the values of q and s we can produce different scenarios for U. For example, if we want to mimic the effect of unobserved ability we can set q to a positive value (positive effect on consumption) and s to a negative value (negative effect on fertility). It is important to note that with this approach we can only choose the signs of the associations of U with D and Y0 according to the values of q and s. However, for increasingly higher absolute values of q and s the strength of the associations increases. Therefore, the idea is to use this sensitivity analysis as in the Rosenbaum approach. The difference is now that, by progressively increase the values of both q and s, we can increase the levels of association between U and treatment and outcome instead of treatment only. In order to have an 5 Under the assumption of an additive treatment effect, Rosenbaum also derives bounds on the Hodges– Lehmann point estimate of the treatment effect (see Rosenbaum 2002 for details). 6 For more details on the simulations, see Ichino et al. (2008) and Nannicini (2007) for details on the STATA module sensatt which implements this method. 123 Estimating the causal effect of fertility on economic wellbeing 367 easily interpretable measure of these associations, IMN propose to use the following parameters: rep 1 Pr (Y = 1|D = 0, Ur = 1, X )/Pr (Y = 0|D = 0, Ur = 1, X ) = rep Pr (Y = 1|D = 0, Ur = 0, X )/Pr (Y = 0|D = 0, Ur = 0, X ) r =1 and = rep 1 Pr (D = 1|Ur = 1, X )/Pr (D = 0|Ur = 1, X ) rep Pr (D = 1|Ur = 0, X )/Pr (D = 0|Ur = 0, X ) r =1 where rep indicates the number of replications. The parameter is the average odds ratio from the logit model of P(Y = 1|D = 0, U, X ) calculated over several replications of the simulation procedure. It is in other words a measure of the effect of U on Y, and is in this sense an outcome effect. The parameter refers to the average odds ratio from the logit model of P(D = 1|U, X ). This is a measure of the effect of U on D, and is therefore a measure of the selection effect. At each replication of the simulation exercise, together with the two mentioned odd ratios, the ATT is estimated using as covariates the set X and the simulated U. The final simulated ATT estimate is the average of the estimates obtained in all the replications.7 2.2.3 Strategy 3: IV methods When UNC is implausible and an instrument is available one would naturally implement IV methods. The way they are implemented depends on whether the available instrument can be thought of as randomised or not. In the previous discussion, we assumed that the instrument is randomised, which means that there is no need to control for covariates. In this case, AIR shows that LATE can be simply estimated by the Wald estimator. However, in many applications Z is not randomly assigned and can be confounded with D or with Y or both. The implication is that in this contexts usually the IV assumptions, as the exclusion restriction, can be thought as being reliable only conditional on a set of covariates. In other words, in these situations Z can be considered unconfounded only conditional on covariates. The conventional approach to accommodate covariates in IV estimation consists of parametric or semi-parametric methods—two stages least squares being the most common—and classic examples include Card (1995) and Angrist and Krueger (1991). A serious drawback of these methods is that most of them impose additive separability in the error term, which amounts to rule out unobserved heterogeneity in the treatment effects. One approach that overcomes the strong assumptions used by the aforementioned IV methods is the 7 A complementary approach proposed by Manski (1990) consists to drop the UNC assumption entirely and construct bounds for ATT that rely on alternative identifying assumptions, for example that outcome is bounded. IMN show how this approach is related with their sensitivity analysis and argue that non-parametric bounds are too much a conservative method and bounds calculations rely on extreme circumstances that are implausible. Moreover in our application the outcome is continuous and has no natural bounds. 123 368 B. Arpino, A. Aassve non-parametric approach suggested by Frölich (2007). The identifying assumptions in this case are basically the same as is the case of a randomised instrument but stated in terms of conditioning on covariates. In this way, we can identify the conditional LATE, which is the LATE defined for units with specific observed characteristics. The marginal LATE is identified as follows:8 (E[Y |X, Z = 1] − E[Y |X, Z = 0]) dFX LATE = . (7) (E[W |X, Z = 1] − E[W |X, Z = 0]) dFX When the number of covariates included in the set X is high, non-parametric estimation of equation (7) becomes difficult, especially in small samples. An alternative is to make use of the aforementioned balancing property of the propensity score that allows us to substitute the high dimensional set X in (7) by a univariate variable: π = P(Z = 1|X ). 3 Fertility and economic wellbeing and the Vietnamese context Our application is concerned with estimating the causal effect of fertility on economic wellbeing. The interrelationship between the two has received considerable interest in development studies and the economics literature. The traditional micro-economic framework considers children as an essential part of the household’s work force as they generate income. This is especially true for male children. In rural underdeveloped regions of the world, which rely largely on a low level of farming technology and where households have no or little access to state benefits, this argument makes a great deal of sense (Admassie 2002). In this setting households will have a high demand for children. The down side is that a large number of children participating in household production hamper investment in human capital (Moav 2005). There are of course important supply side considerations in this regard: rural areas in developing countries have poor access to both education and contraceptives, both limiting the extent couples are able to make choices about fertility outcomes (Easterlin and Crimmins 1985). As households attain higher levels of income and wealth, they also have fewer children, either due to a quantity–quality trade-off, as suggested by Becker and Lewis (1973), or due to an increase in the opportunity cost of women earning a higher income, as suggested by Willis (1973). An important aspect with regard to Vietnam is that the country has experienced a tremendous decline in fertility over the past two decades, and at present one can safely claim that the country has completed the fertility transition. The figures speak for themselves: in 1980 the total fertility rate (TFR) was 5.0, in 2003 it was 1.9. Contraceptive availability and knowledge is widespread and family planning programs were initiated already in 1960s (Scornet 2007).9 8 It is important to note that a common support assumption is needed, as stated by Frölich: Supp(X/Z=1) = Supp(X/Z=0). However, here we give only some intuitions about the assumptions underlying this method. For a detailed and more formal discussion we refer to Frölich’s paper. 9 An important factor in this change was the introduction of the “Doi Moi” (renewal) policy in the late eighties which consisted of replacement of collective farms by allocation of land to individual households; 123 Estimating the causal effect of fertility on economic wellbeing 369 In light of our technical discussion in Sect. 2, the key issue in this application is that fertility decisions can be driven by both observed and unobserved selection. In terms of observables, predicting their effects is relatively straightforward within an economics framework. The key is to understand the drivers behind women’s perceived opportunity cost of childbearing. Higher education and labour force participation among women increase women’s opportunity cost, producing a negative effect on fertility. It will also increase their income level and hence consumption expenditure. Typically, any increase in the opportunity costs dominates the positive income effect. Increased education among men, and therefore higher earnings, translate into a positive income effect, and hence having a positive effect on fertility (Ermisch 1989). However, empirical analysis shows that there is not necessarily a positive relationship between income and family size (i.e. number of children), the key explanation being that couples make trade-offs between quantity and quality (Becker and Lewis 1973), especially as the country in question develops and pass through the fertility transition. As for the unobservables these can operate through different mechanisms. The key unobserved variables are ability and aspirations and they play an important role in our application. In general, we would expect those with higher ability or aspirations in terms of work and career to have lower fertility because of their higher opportunity cost. Thus, ability is negatively correlated with fertility but is positively associated with consumption expenditure. Moreover, fertility is commonly measured in terms of childbearing events—as we do here. However, the childbearing outcomes are the direct result of contraceptive practices, which are typically unmeasured in household surveys. Better knowledge and higher uptake of contraceptives reduces unwanted pregnancies, which would reduce fertility. However, unobserved ability is positively associated with contraceptive use, which reinforces the negative effect of ability on fertility. Fertility is of course based on the joint decision of a couple, and not the woman alone. Hence, behind the childbearing outcomes, there is also a bargaining process taking place. Again, unobserved ability may play an important role. High ability women, may have stronger bargaining power, either as a result of the ability itself (e.g. they are better negotiators), or through the effect higher ability has on their labour supply and hence earnings. Whereas ability works through different mechanisms, the prediction of its effect is rather clear in the sense that high ability is associated with lower fertility, but higher income and hence consumption expenditure. Consequently, its omission implies a negative bias in the estimation of the effect of fertility on consumption expenditure. The data we use comes from the Vietnam living standard measurement survey (VLSMS) first surveyed in 1992/1993 with a follow-up in 1997/1998. The longitudinal nature of the data set allows us to measure if any women in the household experienced another birth between the two waves. The treatment is then defined as a binary Footnote 9 continued legalisation of many forms of private economic activity; removal of price controls; and legalisation and encouragement of Foreign Development Investment (FDI). Since the introduction of Doi-Moi, the country embarked on a remarkable economic recovery, followed by a substantial poverty reduction (Glewwe et al. 2002). 123 370 B. Arpino, A. Aassve Table 1 Average equivalised household consumption expenditure at the two waves and its growth by number of children born between the two waves Number of children born between the two waves Observations Average consumption in 1992 Average consumption in 1997 0 1,232 970 2,436 1,466 1 581 856 1,892 1,036 2 182 790 1,755 965 3 28 571 1,154 583 791 832 1,835 1,004 2,023 916 2,201 1,285 At least 1 Total Average consumption growth in 1997–1992 Notes: We consider the number of children of all household members born between the two waves and still alive at the second wave. All consumption measures are valued in dongs and rescaled using prices in 1992. The 2,023 households represented in the table are selected taking only households with at least one married woman aged between 15 and 40 in the first wave. Consumption is expressed in thousands of dongs variable taking value 1 if the household experiences a childbearing event between the two waves (treated) and 0 otherwise (untreated or control). The outcome of interest is the equivalised consumption expenditure level in the second wave. In the empirical implementation presented in the next section, we control for a range of explanatory variables measured in the first wave. The data follows otherwise the standard format of the World Bank LSMS, including detailed information about education, employment, fertility, expenditure and incomes. The survey also provides detailed community information from a separate community questionnaire. This information is available for the 120 rural communities sampled and consists of data on health, schooling and main economic activities. The availability of this information is important for two reasons. First, characteristics of communities where households reside are likely to influence both economic wellbeing and fertility and, hence, are potentially relevant confounders. Second, from this information we get an interesting IV, represented by the availability of contraceptives in the community. The conventional approximation for the household’s welfare is to use the household’s observed consumption expenditure, which requires detailed information on consumption behaviour and its expenditure pattern (Coudouel et al. 2002; Deaton and Zaidi 2002). The expenditure variables are calculated by the World Bank procedure which is readily available with the VLSMS. We choose a relatively simple equivalence scale giving to each child aged 0–14 in the household a weight of 0.65 relative to adults.10 Table 1 shows simple descriptive analysis highlighting a clear negative association between number of children and economic wellbeing. Our choice of covariates is based mainly on dimensions which are important for both household’s standard of living and fertility behaviour and hence are potentially confounders that have to be included in the conditioning set X to make the UNC plausible. All these variables can theoretically have an impact on change in consumption 10 We assessed the robustness of results to the imposed equivalence scale. Results are consistent to those presented here for reasonable equivalence scales. This analysis is available from authors upon request. 123 Estimating the causal effect of fertility on economic wellbeing 371 expenditure and on the decision to have children. Many of these variables are defined in terms of household ratios. That is, we include the number of household members that are engaged in gainful employment as a ratio of the total number of household members. We also include demographic characteristics of the household such as the sex and the age of the household head, the household size and the presence of existing children. The effect of children is further distinguished by their age distribution, and again expressed as a ratio of the total number of household members. Other covariates include the ratio of male and female members aged 15–45, the ratio of male and female working members aged 15–45 out of the respective groups, an education index, the level of equivalised consumption at the first wave and regional dummies. We also use two binary variables indicating, respectively, if the household is mainly engaged in farming or not and if the household head belongs to the majority ethnic group (the Kinh) or not. Importantly, we include also community information through three indexes: (1) an index of economic development, (2) health facilities and (3) educational infrastructures. 4 Estimating causal effects of fertility on economic wellbeing 4.1 Regression and propensity score results Here, we present the results of the estimation of the causal effect of childbearing on consumption expenditures by using multiple regression and the PSM method. Our sample is restricted to households where in the first wave consisted of at least one married woman aged between 15 and 40 years.11 The selection is important since it avoids units who are in effect incapable of childbearing. As anticipated in Sect. 2, since in general it is not clear which is the best matching technique to use, we compare different methods including nearest neighbour with/without replacement, k-nearest neighbour, radius and kernels. Since the estimates (ATE and ATT) and standard errors are found to be stable across the different approaches, the choice of the matching method is not critical in this application. We present results based on the nearest neighbour method with replacement as implemented by the nnmatch module in STATA (Abadie et al. 2004).12 This approach 11 This sample selection criterion is part of the matching strategy since we avoid comparing households having a child with households who were essentially out of the risk set (here because there are no women of fecund age in the household). Obviously different selection strategies are possible. However, this selection criterion gives low attrition with respect to households having additional children. Moreover, we tried the following alternative selection criterion: (1) select households with at least one married woman aged 15–35 in the first wave; (2) select households where the head or its spouse is a married woman aged 15–40 in the first wave and (3) select households where the head or its spouse is a married woman aged 15–35 in the first wave. However, results are very similar to those presented here. 12 This software implements the estimators suggested by Abadie and Imbens (2002) and enables us to obtain analytical standard errors which are robust to potential heteroschedasticity. We prefer analytical standard errors to bootstrapped ones since Abadie and Imbens (2004) show that bootstrap fails with nearest neighbour matching. 123 372 B. Arpino, A. Aassve Table 2 Estimates from methods based on the unconfoundedness assumption (robust standard error in parentheses) Regressions Propensity score matching (nearest neighbour) Simple (ATE = ATT) Multiple with no interactions (ATE = ATT) FILM (conditioned on CS) ATE ATT ATE ATT −462 (56) −414 (62) −421 (60) −432 (59) −411 (87) −356 (116) Notes CS common support, FILM fully interacted linear model. Figures are in thousands of dongs. Standard errors for regressions are robust to heteroschedasticity and correlation within communities. PSM standard errors are robust to heteroschedasticity achieves a good balancing in all the pre-treatment covariates, as measured by the absolute standardised bias calculated after matching.13 The results are presented in Table 2. We also report the results of the estimation of a simple regression of Y on D without any covariates. This estimate can be obtained also from Table 1 as the difference in average consumption growth for households having at least one birth and the remaining. This approach would be acceptable under the randomisation of D. However, it is clear that selection is present and the estimate of fertility on expenditure is reduced by around 10% in the multiple regression. As discussed in Sect. 2, multiple regression and PSM rely on the assumption that the treatment can be thought of as randomised after having controlled for covariates (unconfoundedness). However, the assumptions imposed for the two estimation strategies differ. The standard multiple regression implicitly assumes that the effect of childbearing on poverty is constant while the FILM, including all interactions among D and covariates, allows it to change with covariate values. As a consequence, regression does not distinguish between ATE and ATT since they coincide under a constant-treatment effect. In contrast FILM does, and ATE and ATT will in general differ. FILM was implemented with and without conditioning on common support. Since results are very similar we show only FILM with the common support. In this case, FILM requires a first stage estimation of the propensity score and the common support.14 With FILM, the multiple regression model is made more similar to PSM. A critical difference of course is that PSM does not impose any functional form for the 13 The absolute standardised bias is defined as the absolute difference in sample means between the matched treated and control samples as a percentage of the square root of the average sample variance in the groups (Rosenbaum and Rubin 1985). The results obtained using the other methods are not presented here for brevity but they are available from the authors upon request. 14 In principle, any standard probability model can be used to estimate the propensity score. For example, using the common logit or probit models, we can write Pr {D = 1|X } = F(h(X )), where F(.) is, respectively, the normal or the logistic cumulative distribution and h(X) is a function of covariates with linear and higher order terms. The choice of which higher order terms to include, as well as interactions among covariates, is determined solely by the need to balance covariate distribution in the two treatment groups (Dehejia and Wahba 1999). We used a logit specification including some interaction terms to achieve balancing. We avoided the inclusion of higher order terms because, as demonstrated by Zhao (2005) their inclusion could have some biasing effect (while the inclusion of irrelevant interactions has not this drawback). 123 Estimating the causal effect of fertility on economic wellbeing 373 relationship between consumption expenditure and fertility. Regression, in contrast imposes linearity. As we can see from Table 2 the estimate for ATE is similar in all these methods, while the estimated ATT in the PSM is slightly lower. Thus, relaxing linearity matters and the PSM is the preferred option because it does not impose any functional form in the stage of the estimation of the causal effect of the treatment. Moreover, PSM allows us to assess directly the balancing we reach in covariates and the common support. From Table 2, we note that using PSM, the estimated ATT is different from ATE. In general, ATT and ATE differs if the distribution of covariates in the two groups of treated and control are different (this is expected due to self-selection into treatment) and if the treatment interacts with covariates, i.e. if the causal effects are heterogeneous.15 Given that the average consumption growth between the two waves amounts to 1285 thousand dongs, the estimates ranging from −356 to −462 thousand dongs, as presented in Table 2, are clearly substantial. As a confirmation, the amount needed in 1992 to buy a quantity of rice giving 1,000 calories (about 300 gr.) each day for 1 year was 215 thousand of dongs.16 Moreover, the food poverty line in 1992 was estimated to 750 thousand of dongs (corresponding to 68 US dollars in 1992), which is another indication that the estimated effects are substantial.17 4.1.1 Sensitivity analysis to violations of the unconfoundedness assumption The reliability of the previous PSM estimates depends on the balancing property being satisfied, the sensitivity to the imposition of common support and to the matching method used. All of these are checked rigorously, and in general the estimates are robust.18 However, the most critical requirement of the PSM is the plausibility 15 The ATE can be seen as a weighted average of the ATE on the treated (ATT) and the ATE on the untreated (ATU). Since in our application ATE < ATT this means that ATU < ATT. We can interpret this result as follows. The effect of childbearing events on consumption is estimated to be negative for household actually experiencing these events (treated). However, the negative effect of childbearing events is even higher on households which do not experiencing these events (untreated). This means that the treatment interacts with household characteristics and untreated households show, on average, characteristics negatively associated with consumption and this increases the negative effect of childbearing events on this sub-population. It may indicate optimisation on the part of the household decision maker: households in favourable economic conditions (that can easily afford it) decide to have children. 16 These figures are derived by Molini (2006). 17 For further insights into the composition of the Vietnamese food basket and for more details about the construction of the Vietnamese food poverty line, see Tung (2004). 18 The balancing property is checked comparing the distribution of covariates X before and after matching and calculating the reduction in the absolute standardised bias. As already mentioned in the text, we calculated the ATE and ATT with different matching methods in order to assess the sensitivity of the method presented here. Finally, we found that very few units fall outside the common support calculated with the minima–maxima criterion. Using different methods for calculating CS, including the tick support, we get results similar to the ones presented in Table 2. We also implemented matching on the underlying continuous index instead to the propensity score to control for the heavy—tails problem. Again results are stable. Details are available from authors on request. 123 374 Table 3 Rosenbaum bounds B. Arpino, A. Aassve p-value 1.0 6.10E−13 1.1 4.40E−10 1.2 7.40E−08 1.3 4.10E−06 1.4 0.000096 1.5 0.001133 1.6 0.007673 1.7 0.033198 1.8 0.099821 1.9 0.223538 2.0 0.395373 of the UNC. The UNC is an untestable assumption and has to be judged on the basis of the data available. The richness of covariates makes us reasonably confident that the most relevant confounders are observed and included in the matching set. To assess the reliability of the UNC assumption we apply the sensitivity analysis suggested by Rosenbaum and IMN as outlined in Sect. 2. The idea is to assess how strong the effect of an unmeasured confounder must be in order to undermine conclusions drawn from the analysis based on the UNC. The result of the Rosenbaum type sensitivity analyses are reported in Table 3. The key is to vary the effect of the potential unobserved confounder on the selection into treatment by choosing different levels of the parameter . As mentioned in Sect. 2.2, fixing = 1 corresponds to assuming that the UNC holds, while increasing the values of progressively increases the deviation from the UNC. The simulation exercise shows at what point the treatment effect is no longer statistically significant. The results as shown in Table 3 suggest that a value of = 1.8 provides such a cut-off point. The interpretation is as follows. If an unobserved covariate caused the odds of experiencing a childbearing event to differ between treated and untreated households with the same observed characteristics by a factor of 1.8 (or 80%), then the 90% confidence interval of the ATT would include zero. The value 1.8 is a rather large number given that we have adjusted for many important observed background characteristics (Aakvik 2001). Rosenbaum (2002) applying this type of sensitivity analysis found similar cut-off points for the well-known Card and Krueger (1994) minimum wage studies (the author found figures between 1.34 and 1.5). Finally, we apply the sensitivity approach of IMN. In particular, we implemented four separate sensitivity analyses according to the different signs of the association between U and D and Y. The results are shown in Tables 4, 5, 6, and 7. In each table we present, in correspondence to different values of the parameters q and s, defined in Sect. 3, the estimated ATT, its standard error, and the values of the parameters and , which measures, respectively, the effect of the simulated U on the outcome and on the treatment, controlling for the observed confounders. As explained in Sect. 2, the simulation exercise consists of varying the sign and strength of the associations 123 Estimating the causal effect of fertility on economic wellbeing 375 Table 4 Sensitivity analysis to confounders such that < 1 and < 1 q = −0.1 q = −0.2 q = −0.3 q = −0.4 q = −0.5 s = −0.1 s = −0.2 s = −0.3 s = −0.4 −442 (142) −452 (157) −457 (172) −464 (193) s = −0.5 −460 (217) = 0.73 = 0.71 = 0.67 = 0.66 = 0.62 = 0.48 = 0.29 = 0.17 = 0.09 = 0.04 −446 (139) −456 (150) −463 (168) −466 (179) −471 (207) = 0.42 = 0.39 = 0.38 = 0.35 = 0.31 = 0.56 = 0.34 = 0.20 = 0.11 = 0.04 −451 (135) −446 (146) −461 (161) −472 (173) −477 (196) = 0.25 = 0.22 = 0.20 = 0.18 = 0.14 = 0.64 = 0.40 = 0.23 = 0.12 = 0.05 −446 (136) −441 (142) −470 (156) −469 (169) −480 (188) = 0.12 = 0.12 = 0.10 = 0.07 = 0.05 = 0.74 = 0.45 = 0.26 = 0.13 = 0.04 −434 (135) −446 (140) −471 (152) −468 (161) *** = 0.07 = 0.05 = 0.04 = 0.03 *** = 0.86 = 0.52 = 0.29 = 0.15 *** Note *** Combination resulting in inadmissible values of the parameters characterising the distribution of U between U and Y0 and between U and D, by choosing, respectively, different values of the parameters q and s.19 Simulated values of U are calculated and the ATT is estimated using the simulated covariate as if it was an additional observed confounder to be included in the set of matching variables. The procedure is repeated 1,000 times. The final ATT estimate for a given combination of the values of q and s is the average of the estimates obtained across all the replications. As already noted in Sect. 2, the strength of the associations between the confounder, treatment and outcome cannot be specified precisely prior to the simulation. However, they can be assessed by considering the average odd ratios and . In the first simulation scenario (Table 4) we assess the sensitivity of the estimated ATT when both the outcome and selection effects of the unobserved simulated confounder are negative. Thus, we impose negative values of the parameters q and s, which generate odds ratios and smaller than 1. The estimated ATT are always significant, even when and are very low. The largest difference between the ATT and the baseline estimate (ATT = −356; Table 2) is obtained when q and s are set to −0.4 and −0.5, respectively. However, in this case the outcome and selection effects are very strong ( = 0.05; = 0.04). In order to assess the plausibility of the effect 19 As mentioned in Sect. 2, in order to simulate the values of U a binary transformation of our continuous outcome, consumption growth, is needed. This new dichotomous variable Y* takes value 1 if the original outcome (equivalised consumption growth) is higher than the median growth and 0 otherwise. This new variable Y* is used to simulate the values of U depending on the values of the parameters q and s that links the unobserved confounder to Y* and D. After U has been generated, this is included in the matching set together with the observed covariates and the ATT is estimated. However, in the estimation of the ATT the original continuous outcome, and not its dichotomous transformation Y*, is used. 123 376 B. Arpino, A. Aassve Table 5 Sensitivity analysis to confounders such that > 1 and < 1 q = +0.1 q = +0.2 q = +0.3 q = +0.4 q = +0.5 s = −0.1 s = −0.2 s = −0.3 s = −0.4 −445 (150) −448 (168) −449 (195) −446 (217) s = −0.5 −435 (251) = 2.03 = 1.93 = 2.08 = 2.17 = 2.14 = 0.36 = 0.22 = 0.13 = 0.07 = 0.03 −444 (154) −445 (177) −444 (202) −433 (238) −417 (290) = 3.33 = 3.27 = 3.77 = 3.79 = 4.28 = 0.31 = 0.18 = 0.11 = 0.06 = 0.02 −443 (163) −435 (186) −437 (216) −415 (263) −402 (324) = 5.99 = 6.33 = 6.53 = 7.58 = 9.84 = 0.26 = 0.15 = 0.09 = 0.04 = 0.02 −436 (172) −417 (197) −417 (238) −383 (308) −359 (435) = 11.57 = 13.01 = 13.29 = 18.17 = 31.28 = 0.21 = 0.12 = 0.07 = 0.03 = 0.01 −420 (184) −398 (219) −369 (271) −338 (388) −238 (639) = 36.02 = 30.67 = 51.94 = 117.46 = 2128.61 = 0.17 = 0.09 = 0.05 = 0.02 = 0.01 Table 6 Sensitivity analysis to confounders such that < 1 and > 1 q = −0.1 q = −0.2 q = −0.3 q = −0.4 q = −0.5 s = +0.1 s = +0.2 s = +0.3 s = +0.4 s = +0.5 −431 (135) −440 (137) −445 (149) −448 (171) −450 (192) = 0.73 = 0.73 = 0.76 = 0.74 = 0.69 = 8.43 = 1.19 = 1.85 = 2.94 = 4.79 −445 (135) −432 (140) −448 (156) −440 (175) −436 (200) = 0.45 = 0.43 = 0.43 = 0.43 = 0.38 = 10.03 = 1.39 = 2.16 = 3.43 = 5.69 −441 (133) −436 (147) −441 (166) −437 (189) −441 (224) = 0.25 = 0.26 = 0.25 = 0.24 = 0.22 = 12.47 = 1.62 = 2.56 = 4.10 = 6.98 −432 (136) −435 (149) −431 (172) −427 (201) −415 (247) = 0.14 = 0.15 = 0.14 = 0.13 = 0.11 = 16.41 = 1.90 = 3.05 = 4.99 = 8.73 −429 (142) −430 (163) −422 (189) −402 (222) −389 (293) = 0.08 = 0.08 = 0.07 = 0.06 = 0.05 = 2.27 = 3.74 = 6.39 = 11.47 = 23.49 of the unobserved simulated confounder U on Y0 and D, as measured, respectively, by the average odds ratios and , we compare these values with the odds ratios obtained by estimating two separate logit models, taking as outcomes the binary transformation of the original outcome, Y*, and the treatment indicator, D. The estimated odds ratios 123 Estimating the causal effect of fertility on economic wellbeing 377 Table 7 Sensitivity analysis to confounders such that > 1 and > 1 q = +0.1 q = +0.2 q = +0.3 q = +0.4 q = +0.5 s = +0.1 s = +0.2 s = +0.3 s = +0.4 s = +0.5 −428 (136) −451 (138) −441 (143) −460 (155) −459 (173) = 2.12 = 2.12 = 2.16 = 2.18 = 2.40 = 1.18 = 1.38 = 2.19 = 3.54 = 6.08 −442 (134) −434 (137) −442 (136) −456 (150) −465 (169) = 3.29 = 3.85 = 4.17 = 4.49 = 5.28 = 1.16 = 1.18 = 1.88 = 3.08 = 5.33 −441 (135) −416 (133) −447 (135) −447 (148) −461 (163) = 6.32 = 7.97 = 7.78 = 8.07 = 10.59 = 4.65 = 1.05 = 1.03 = 1.65 = 2.67 −441 (136) −424 (134) −452 (135) −449 (145) −465 (155) = 13.09 = 15.94 = 40.14 = 24.41 = 49.47 = 1.05 = 1.08 = 1.41 = 2.34 = 4.14 −433 (142) −437 (134) −435 (137) −450 (142) −465 (152) = 40.72 = 83.74 = 232.02 = 143.41 = 172.65 = 1.06 = 1.05 = 1.21 = 2.04 = 3.64 in the first model range from 0.61 to 4.99 and for the second from 0.70 to 5.65.20 Therefore, the values = 0.05 = 0.04 are unrealistic since they are very different from the observed effects. The practical relevance of the four scenarios defined in the Tables 4, 5, 6, and 7 depends on the assumption the researcher makes about how U relates to the treatment and the outcome. On the basis of the discussion in Sect. 3, an interesting scenario in our application is represented by the sensitivity analysis of Table 5, where the unobserved confounder is assumed to have a positive effect on the outcome (q > 0; > 1) and a negative effect on the treatment (s < 0; < 1). This confounder mimics the potential effect that unobserved ability produce on consumption expenditure and fertility. That is, unobserved ability will be positively correlated with labour supply and income for the spouse (and hence consumption expenditure) and negatively correlated with childbearing. As we can see from Table 5, again the estimated effect is almost always significant and not much different from the baseline result. Only if the associations of U with D and/or Y are unreasonably strong the ATT becomes insignificant. For example, the weakest effect (ATT = −238) corresponds to q = +0.5 and s = −0.5. In this case both the outcome and the selection effects are unrealistically strong ( = 2128.61; = 0.01).21 These effects are unrealistic in the sense that the effect of ability, after having controlled for important confounders, including observed 20 For dummy variables we calculated the odds ratios comparing units with values 0 and 1, while for continuous covariates we compared two units differing by 0.5 standard deviations. The estimates from these models, not showed here, are available from the authors upon request. 21 Again, the plausibility of the values of the odds ratios is gauged with reference to the observed odds ratios reported in the text (ranging from 0.61 to 4.99, for the outcome model, and from 0.70 to 5.65, for the treatment model). 123 378 B. Arpino, A. Aassve education of family members, cannot be expected to be as strong as the values of the parameters and if we compare these effect to those of observed key covariates (ranging as said before from 0.61 to 5.65). The results for the other two scenarios (Tables 6, 7) are qualitatively similar to those in the first two tables. As a result, the conclusion from the IMN type sensitivity analysis is in line with the Rosenbaum type approach. The ATT estimated through PSM is rather robust to the presence of potentially omitted variables. 4.2 Instrumental variable results While the previous sensitivity analysis is a useful way to test the robustness of the PSM estimates, the availability of instruments is the key to avoid the UNC assumption. Whereas instruments are not always easy to come by we propose two alternatives for our setting. The first is a variable that takes value 1 if the household has no male children in the first wave—0 otherwise. As already mentioned this kind of instrument is widely used (see, e.g. Angrist and Evans 1998; Chun and Oh 2002; Gupta and Dubey 2003). The argument is that couples have certain gender preferences for their children—in particular they tend to have a preference for having at least one son. In other words, couples are more likely to have another child if the previous ones were girls. In so far couples have a preference for boys, such a variable work well as an instrument since it is expected to have an impact on fertility but not a direct effect on poverty. Hence, the exclusion restriction seems to be reasonable. The strong preference for sons in Vietnam is confirmed by many studies (Haughton and Haughton 1995; Johansson 1996 and Johansson 1998; Belanger 2002). To better highlight the preference for sons we selected only households that had at least two children in the first wave. Among these households the percentage having an additional child between the two waves is 28% if they had at least one son, but as high as 48% if they did not (difference significant at 1% level). Among households with at least three children, the percentages are, respectively, 17 and 44% (difference significant at 1% level). These simple analyses confirm the strong preference for sons, implying strong relevance of the instrument. The monotonicity assumption, implying absence of defiers, seems plausible in this application. In fact, defiers would in this case be households that would not have any further children if they already had daughters, whereas if they instead had sons they would have more children. In other words, defiers are households with strong preference for daughters, which is unlikely for Vietnam (but of course possible). In any case the proportion of this type of households is small22 (see, e.g. Haughton and Haughton 1995) and should not present a serious problem for the IV estimator,23 Another characteristic of the instrument is that it is as randomised since households cannot choose 22 We found that the percentage of households with two children in wave 1 having an additional child is 30% if they had no daughters, 27% if they had one daughter and 48% if they had two daughters. 23 In fact, AIR show that the sensitivity of IV results to the monotonicity assumption is limited if the instrument is strong, meaning that the association between the instrument and the treatment is high and the proportion of defiers is small. 123 Estimating the causal effect of fertility on economic wellbeing 379 Table 8 Local average treatment effect estimates through instrumental variables Estimates Instrumental variable Son preference LATE (standard errors in parenthesis) Proportion of compliers Contraception availability in the community Wald estimator Frölich estimator Wald estimator Frölich estimator −429 (228) −478 (672) −8822 (8597) −821 (1844) 0.28 0.17 0.09 0.01 Note Point estimates and standard errors for the Frölich’s method have been obtained by bootstrapping over 1,000 iterations. As for the matching method, consistently with the choice we made for the propensity score matching, we used a nearest neighbour matching with replacement the sex of their children,24 meaning that in this case controlling for covariates is not necessary. The second instrument is a variable taking the value 1 if in the community where the household reside no contraceptive methods (mainly Intra Uterine Device (IUD) and condom) were available and 0 otherwise. IUD and condom are the most available contraceptives method in Vietnam and the IUD is the most widely used (Anh and Thang 2002). Instruments based on geographical variation in availability of services are often used (see, e.g. McClellan et al. 1994; Card 1995). These variables may work well if households living in communities with no contraceptive facilities have higher risks of childbearing and if contraceptive availability in the community has no direct effect on consumption growth. However, it is possible that unavailability of contraceptives as a community characteristic is associated with the economic wellbeing for those household living in the community. In other words, the instrument cannot be thought of as randomised. Controlling for covariates is consequently important. Monotonicity seems plausible also in this case. Here, it implies that households who would have (at least) one child between the waves if they live in a community with available contraception would also have one child if they live in a community without contraception. The fact that the first instrument can be assumed as randomised means that we can apply the Wald estimator without covariates. The results presented in Table 8 indicate a strong and negative effect of having any additional children on consumption expenditure. Compared with the Frölich estimator, we can see that the results are similar, confirming that controlling for covariates does not make a huge difference.25 The Frölich estimates have been obtained as the ratio of two matching estimators. As for 24 This is not completely true in those countries were the selective abortions are a current practice. This is the case, for example in India where amniocentesis diagnoses are available and used for sex-selective abortions (Gupta and Dubey 2003). 25 Point estimates and standard errors were obtained from bootstrapping over 1,000 iterations. Also for the Frölich method, the choice of the matching algorithm had little impact on the point estimates and standard errors. 123 380 B. Arpino, A. Aassve the matching method, consistently with the choice we made for the PSM, we used a nearest neighbour matching with replacement. Since we selected households with at least two children these estimates refer strictly speaking only to this sub-population. Moreover, and more importantly, since IV estimates the LATE, these results are referred only to the latent sub-population of compliers, who are those choosing to have another child because they did not already have a male child. The extent to which the IV estimates can be compared with PSM ones depends on the nature of the instrument. If we can assume that the average causal effect for ‘current’ non-compliers (always-takers and never-takers) is equal to the average causal effect for ‘current’ compliers (the LATE) then LATE and ATE coincide. This hypothesis is a-priori strong. However, in the case where we use the sex ratio of the children as the instrument, one may expect that LATE and ATE will indeed be quite similar. Two considerations are in order here. First, we note that the estimated proportion of compliers is about 0.2, which is quite high compared to many other studies (see, e.g. Angrist and Evans 1998). This proportion of compliers is equal to the estimated causal effect of the instrument on the treatment variable D. Hence, there is evidence confirming that the instrument is not weak.26 Moreover, given that male preference for children is a nationwide phenomenon in Vietnam, implies that the estimated effect on the sub-group of compliers could be applied to the whole population. These considerations are supported by the fact that the LATE from the IV estimation (−429) is in this case almost equal to the ATE from the PSM estimation (−414). In this sense, the IV estimate can be considered as a robustness check of the estimates provided by PSM. Using the community level availability of contraception as an instrument gives a very different picture. Controlling for covariates is essential, as there are in this case huge differences between the Wald and the Frölich estimates. The Frölich estimate shows nevertheless a strong and negative effect for the sub-population of households that are ‘encouraged’ to have a child due to the lack of contraception in the community. This LATE is not comparable with the LATE estimated with the first instrument since the sub-population of compliers are very different. The estimated proportion of compliers for the second instrument is 0.01, hence the sub-population of households that ‘reacts’ to this instrument is small and the instrument is weak. One could argue that since in Vietnam contraception is quite diffused compliers are in this case a rather selected group and, perhaps, constituted by less educated and generally marginalised households. This also could contribute to explain the much stronger effect. The low proportion of compliers implies that this instrument is weak.27 As pointed out by AIR 26 This is also confirmed by the standard F test for relevance. This test gives a value of 45.10, whereas a value of 10 is normally taken as a threshold for whether the association between instruments and the endogenous covariate is sufficient (Staiger and Stock 1997). 27 For this instrument the F test gives value of 10.89 which is just above the threshold of 10. In the standard 2SLS framework when instruments are weak results are sensitive in the sense that even in large samples the bias of the IV estimator remains substantial. Similarly, in the non-parametric LATE approach weak instruments make results sensitive to the violations of identifying assumption (i.e. monotonicity and exclusion restriction). Interestingly, the weakness of instruments can be interpreted as the existence of a small proportion of compliers. We also performed the Sargan test of overidentifying restrictions when both instruments were included and did not reject the null hypothesis of valid instruments ( p value = 0.4537). 123 Estimating the causal effect of fertility on economic wellbeing 381 care is needed in the interpretation of this result since IV estimators are more sensitive to violations of exclusion restriction and monotonicity when the proportion of compliers is low. We have to note that in both cases IV estimates are quite imprecise with high standard errors—especially for the second instrument. Whereas the previous considerations would favour the sex ratio as an instrument over and above the community level availability of contraception, it is important to note that the latter has more direct policy relevance. In fact, availability of contraception at community level is a variable on which policy makers can act directly. The Frölich estimates indicate that the expected causal effect on poverty from a fertility reduction induced by raising contraception availability in the communities is quite high. However, the size of the sub-population reacting to this policy (compliers) is rather small. Obviously, policy makers must consider both aspects in order to calibrate efficient policies. In our case, the policy could have a consistent effect for a rather small group. The estimates resulting from the second instrument, as well as those obtained using regression and PSM, indicate that fertility affects economic wellbeing considerably, though the policy implications are less obvious. If policy makers worry about the fact that households with many children are poorer then targeted tax reductions and other benefits could help. In this way the existing gap between households with different number of children will be reduced. This policy clearly does not act on poverty through a fertility reduction and instead could impose an increase in fertility. However, other analyses (not shown here) do not suggest any strong effects going from economic wellbeing onto fertility levels (Arpino and Aassve 2007). 5 Conclusions Several approaches are available to estimate causal effects. The appropriateness and interpretations of the different strategies depend on the application at hand, and importantly, the available data. Whereas the statistical properties of the various estimators are well established, there appears to be considerable confusions on these issues in empirical policy analysis. We consider an application where interest lies in the estimating the effect of fertility on households’ economic wellbeing—a topic which has received considerable interest in development studies, but where issues of causality has not been considered in much details. In sum, if our interest lies in the ATE and/or ATT one must consider the plausibility of the UNC, which depends heavily on available data. If unobserved covariates are relevant, the first option is to maintain the UNC and implement a sensitivity analysis. Using methods based on IV is, instead, critical. Without being willing to impose strong assumptions we can identify only the LATEs, which is the effect on the sub-population of compliers. If treatment effects are heterogeneous, LATE is generally different from ATE and ATT, which are usually the parameters of interest. LATE, on the contrary, answers our research question if we are interested in the average causal effect of a change in a treatment variable caused by intervention on a third variable. In this case, an IV method can be used both in scenarios where UNC is plausible or not. Apart of this case, the use of IV methods in situations where UNC is reliable has little sense of course. 123 382 B. Arpino, A. Aassve The methods are discussed in light of an application where we consider the effect of fertility on changes in consumption expenditure. Childbearing events cannot be considered as an exogenous measure of fertility when the outcome relates to economic wellbeing—in our case measured in terms of consumption expenditure. However, the methodological discussion is however general and applies to many other applications. Using methods based on the UNC assumption, such as regression and PSM, we find that those households having children between the recorded waves have considerably worse outcomes in terms of consumption expenditure. We then demonstrate how one can make an assessment of the potential effect from omitting relevant but unobserved variables without actually implementing an IV approach. This is a very useful tool in the sense that valid and relevant instruments are often hard to come by. In our application the estimates are robust with respect to unobserved omitted variables. We find that the estimated effect becomes non-significant only if the association between the omitted covariate, selection and the outcome is extremely (and unreasonably) large. Despite the robustness of the UNC in our application we implement nevertheless the IV method using two different instruments, demonstrating two situations in which IV analysis can be useful. The use of IV methods in our application illustrates that reasonable instruments can lead to estimates that differ from those of methods based on UNC, but differ among them. In fact, compliers for one instrument can be different from compliers to another instrument and consequently if the treatment effect is heterogeneous the estimated LATE in the two cases will generally differ. With the first instrument, which exploits the strong preference for sons in Vietnam, we estimated a negative impact of fertility on poverty with a magnitude not dramatically different from that obtained by method based on the UNC. This effect is driven by the fact that son preference is rather common in Vietnam, thus not involving any particular kinds of households. The estimated proportion of compliers in this case is rather high at 17%, which contrasts the second instrument, availability of contraceptives in the community, where the proportion of compliers is as low as 1%. This small sub-population of households reacting to availability of contraceptives is a highly selected group. Clearly their opportunity to control fertility through contraceptive practices is much reduced as it is unlikely that compliers are able to get contraceptives from elsewhere. In this sense these households have a higher exposure to childbearing. Whereas the estimates based on this instrument is very different compared to the one based on the sex preference, an advantage is that it does have direct policy relevance, simply because the instrument itself is a policy variable. The effect on this sub-population is high and importantly, much higher than the estimates for the whole population (ATE) and for the sub-population of treated (ATT). However, the size of this sub-population is rather small, which is an equally important consideration for the policy maker. From a methodological point of view, a key difference among the two instruments is that the second cannot be considered as randomised, implying the need to control for covariates. Using the approach suggested by Frölich (2007), which overcomes many of the typical strong assumptions, we demonstrate that inclusion of covariates do matter for the parameter estimates. 123 Estimating the causal effect of fertility on economic wellbeing 383 Acknowledgements This article forms part of the project “Poverty dynamics and fertility in developing countries”, funded by the Economic and Social Research Council (award no. RES000230462). We are grateful for comments by Fabrizia Mealli, Stefano Mazzuco, Letizia Mencarini, Stephen Pudney, the editor and two anonymous referees. All errors and inconsistencies in the article are our own. References Aakvik A (2001) Bounding a matching estimator: the case of a Norwegian training program. Oxf Bull Econ Stat 63(1):115–143 Aassve A, Betti G, Mazzuco S, Mencarini L (2007) Marital disruption and economic well-being: a comparative analysis. J R Stat Soc Ser A 170(3):781–799 Abadie A, Imbens GW (2002) Simple and bias-corrected matching estimators for average treatment effects. NBER working paper T0283 Abadie A, Imbens GW (2004) On the failure of the bootstrap for matching estimators. NBER working paper T0325 Abadie A, Drukker D, Leber Herr J, Imbens GW (2004) Implementing matching estimators for average treatment effects in stata. Stata J 4(3):290–311 Admassie A (2002) Explaining the high incidence of child labour in sub-Saharan Africa. Afr Dev Rev 14(2):251–275 Angrist JD, Evans WN (1998) Children and their parents’ labor supply: evidence from exogenous variation in family size. Am Econ Rev 88(3):450–477 Angrist JD, Krueger A (1991) Does compulsory school attendance affect schooling and earnings? Quart J Econ 106:979–1014 Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91:444–472 Anh DN, Thang NM (2002) Accessibility and use of contraceptives in Vietnam. Int Fam Plann Perspect 28(4):214–219 Arpino B, Aassve A (2007) Estimation of causal effects of fertility on economic wellbeing: evidence from rural Vietnam. ISER working paper 2007–24. University of Essex, Colchester Athey S, Imbens GW (2006) Identification and inference in nonlinear difference-in-differences models. Econometrica 74(2):431–497 Becker SO, Ichino A (2002) Estimation of average treatment effects based on propensity scores. STATA J 2:358–377 Becker GS, Lewis HG (1973) On the interaction between the quantity and quality of children. J Polit Econ 81(2):S279–S288 Belanger D (2002) Son preference in a rural village in North Vietnam. Stud Fam Plann 33(4):321–334 Blundell R, Dearden L, Sianesi B (2005) Evaluating the impact of education on earnings in the UK: models, methods and results from the NCDS. J R Stat Soc Ser A 168(3):473–512 Bryson A, Dorsett R, Purdon S (2002) The use of propensity score matching in the evaluation of labour market policies. Working paper no. 4, Department for Work and Pensions Caliendo M, Kopeinig S (2005) Some pratical guidance for the implementation of propensity score matching. IZA discussion paper no. 1588 Card D (1995) Using geographic variation in college proximity to estimate the return to schooling. In: Christofides L, Grant E, Swidinsky R (eds) Aspects of labor market behaviour: essays in honour of John Vanderkamp. University of Toronto Press, Toronto pp 201–222 Card D, Krueger BA (1994) Minimum wages and employment: a case study of the fast-food industry in New Jersey and Pennsylvania. Am Econ Rev 84:772–793 Chun H, Oh J (2002) An instrumental variable estimate of the effect of fertility on the labour force participation of married women. Appl Econ Lett 9:631–634 Costa Dias M, Ichimura H, van den Berg GJ (2008) The matching method for treatment evaluation with selective participation and ineligibles. IZA discussion paper no. 3280 Coudouel A, Hentschel J, Wodon Q (2002) Poverty measurement and analysis, poverty reduction strategy paper sourcebook, World Bank, Washington D.C. regressors. J Econom 124:335–361 Dawid AP (1979) Conditional independence in statistical theory. J R Stat Soc Ser B 41:1–31 Deaton A, Zaidi S (2002) Guidelines for constructing consumption aggregates for welfare analysis. Living standards measurement study working paper no. 135, The World Bank 123 384 B. Arpino, A. Aassve Dehejia R, Wahba S (1999) Causal effects in non-experimental studies: re-evaluating the evaluation of training programs. J Am Stat Assoc 94(448):1053–1062 Duy LV, Haughton D, Haughton J, Kiem DA, Ky LD (2001) Fertility decline. In: Haughton D, Haughton J, Phong N (eds) Living standards during an economic boom. Vietnam 1993–1998. Statistical Publishing House, Hanoi Easterlin RA, Crimmins EM (1985) The fertility revolution. University of Chicago Press, Chicago Ermisch J (1989) Purchased child care, optimal family size and mother’s employment: theory and econometric analysis. J Popul Econ 2:79–102 Frölich M (2007) Non parametric IV estimation of local average treatment effects with covariates. J Econom 139:35–75 Glewwe P, Gragnolati M, Zaman H (2002) Who gained from Vietnam’s boom in the 1990s? Econ Dev Cult Change 50(4):773–792 Goodman A, Sianesi B (2005) Early education and children’s outcomes: how long the impacts last. Fisc Stud 26(4) Gupta ND, Dubey A (2003) Poverty and fertility: an instrumental variables analysis on Indian micro data. Working paper 03–11, Aarhus School of Business Hardle W, Linton O (1994) Applied nonparametric methods. In: Engle RF, McFadden D (eds) Handbook of econometrics, vol 4. North Holland, Amsterdam pp 391–448 Haughton J, Haughton D (1995) Son preference in Vietnam. Stud Fam Plann 26(6):325–337 Heckman JJ (1997) Instrumental variables: a study of implicit behavioural assumptions used in making program evaluations. J Hum Resour 32:441–462 Heckman JJ, Ichimura H, Todd P (1997) Matching as an econometric evaluation estimator. Rev Econ Stud 65:261–294 Ichino A, Mealli F, Nannicini T (2008) From temporary help jobs to permanent employment: what can we learn from matching estimators and their sensitivity? J Appl Econom 23(3):305–327 Imbens GW (2003) Sensitivity to exogeneity assumptions in program evaluation. AEA Pap Proc 93(2):126– 132 Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity: a review. Rev Econ Stat 86:4–30 Imbens GW, Angrist JD (1994) Identification and estimation of local average treatment effects. Econometrica 62:467–475 Johansson A (1996) Family planning in Vietnam—women’s experiences and dilemma: a community study from the Red River Delta. J Psychosom Obstet Gynecol 17:59–67 Johansson A (1998) Population policy, son preference and the use of IUDs in North Vietnam. Reprod Health Matters 6:66–76 Manski CF (1990) Nonparametric bounds on treatment effects. Am Econ Rev Pap Proc 80:319–323 McClellan M, McNeil BJ, Newhouse JP (1994) Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality? J Am Med Assoc 272(11):859–866 Moav O (2005) Cheap children and the persistence of poverty. Econ J 115(11):88–110 Molini V (2006) Food security in Vietnam during the 1990s—the empirical evidence. Reaserch paper no. 2006/67, United Nation University Nannicini T (2007) Simulation-based sensitivity analysis for matching estimators. Stata J 7(3):334–350 Neyman J [1990(1923)] On the application of probability theory to agricultural experiments: essay on principles, section 9. Transl Stat Sci 5(4):465–480 Nguyen-Dinh H (1997) A socioeconomic analysis of the determinants of fertility: the case of Vietnam. J Popul Econ 10(3):251–271 Rosenbaum PR (1984) The consequences of adjustment for a concomitant variable that has been affected by the treatment. J R Stat Soc Ser A 147:656–666 Rosenbaum PR (1987a) The role of a second control group in an observational study, Stat Sci (with discussion) 2(3):292–316 Rosenbaum PR (1987b) Sensitivity analysis to certain permutation inferences in matched observational studies. Biometrika 74(1):13–26 Rosenbaum PR (2002) Observational studies. Springer, New York Rosenbaum PR, Rubin DB (1983a) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55 Rosenbaum PR, Rubin DB (1983b) Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Ser B 45:212–218 123 Estimating the causal effect of fertility on economic wellbeing 385 Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39(1):33–38 Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66:688–701 Rubin DB (1978) Bayesian inference for causal effects: the role of randomization. Ann Stat 6:34–58 Rubin DB (1980) Discussion of randomization analysis of experimental data: the fisher randomization test by D. Basu. J Am Stat Assoc 75:591–593 Schoumaker B, Tabutin D (1999) Relationship between poverty and fertility in Southern countries: knowledge, methodology and cases. Working paper no. 2, Department of Science of Population and Development, Université Catholique de Louvain Scornet C (2007) 1963–2003: Quarante ans de planification familiale au Viêt-Nam. In: Adjamagbo A, Msellati P, Vimard P (eds) Santé de la reproduction et fécondité dans les pays du Sud. Nouveaux contextes et nouveaux comportements. Academia Bruylant, Louvain la Neuve, pp 142–171 Smith J (2000) A critical survey of empirical methods for evaluating active labor market policies. Schweiz Z Volkswirtsch Stat 136(3):1–22 Smith JA, Todd P (2005) Does matching overcome Lalonde’s critique of nonexperimental estimators? J Econom 125:305–353 Staiger D, Stock J (1997) Instrumental variables regression with weak instruments. Econometrica 65(3):557–586 Tung PD (2004) Poverty line, poverty measurement, monitoring and assessment of MDG in Viet Nam. Report presented at the “2004 international conference on offical poverty statistics—methodology and comparability”, Manila October 2004 Willis RJ (1973) A new approach to the economic theory of fertility behavior. J Polit Econ 81(2, part 2): S14–S64 Wooldridge J (2002) Econometric Analysis of cross section and panel data. MIT Press, Cambridge Zhao Z (2005) Sensitivity of propensity score methods to the specifications. IZA discussion paper no. 1873 123
© Copyright 2026 Paperzz