Estimating Marginal Returns to Education Jenna Stearns Department of Economics University of California, Santa Barbara 1 Introduction Estimating returns to education is an essential part of determining optimal schooling decisions, both on an individual and social level. Because education is costly, individuals make choices about how much time to spend in school based on the difference between the marginal return to an additional year of schooling and the cost. From a policy perspective, average marginal returns to education across the population drive investment decisions in education as well as laws regarding mandatory schooling. However, the causal effects of education on earnings are difficult to measure. Although economists have been estimating returns to education for decades, the variation in empirical results along with the continuous activity in this area both suggest that there is no clear consensus on how best to measure marginal returns to education, or even how big they are. In reality, people are heterogeneous and make decisions based on differences in individual characteristics that are both observed and unobserved by the economist. Because of this, people react to policies, standards, and choices in different ways. Economists would like to know, on average, how the marginal return to schooling changes as the number of years of education increases, and would also like to be able to evaluate policies that change the probability of attaining a certain level of schooling. Estimating the marginal returns to education can help do both of these things. The purpose of this literature review is not to list the various empirical estimates of marginal returns to education found in studies using different methods and different data. Instead, I aim to provide an overview of the influential methods and papers in the field, and to identify and explain some of the primary challenges in estimating marginal returns to schooling. These issues likely explain a lot of the variation in reported returns. I also explore potential avenues for future research that can help better identify marginal returns to education. The remainder of this paper is organized as follows. Section 2 describes two distinct ways of thinking about marginal returns to education. Section 3 discusses the estimation techniques and problems with a more traditional definition. Section 4 describes the marginal treatment effect, another way of interpreting marginal returns. The fifth section compares the estimated parameters from these different methods in two empirical papers that estimate 1 the marginal returns to college. Finally, section 5 discusses areas for future research. 2 What are Marginal Returns to Education? There are two related but fundamentally different ways of defining marginal returns to education, and both are used in the literature to answer different types of questions. This section identifies and explains these two concepts; subsequent sections relate each back to some of the influential literature in which they are used. The first way to think about marginal returns to education is in line with the traditional idea of marginal benefits: each additional unit of a good provides some additional utility. People want to consume more of the good until the benefits from the last unit are equal to the cost of obtaining it. The same concept is true of education. Individuals should choose to stay in school until the marginal return is no longer greater than the marginal cost.1 Thus, we are interested in how the marginal return to schooling changes as the amount of schooling increases. In other words, holding constant the individual, economists are interested in how the marginal return changes over time. Of course, the obvious problem with estimating an individual’s marginal return to education is that only one outcome per individual is realized. That is, if Tom completes twelve years of schooling, only his return to twelve years of schooling can be estimated from his observed earnings. Economists do not observe the counterfactual: his earnings had he gotten a college degree or had he dropped out of school in eleventh grade. To estimate individual level marginal returns in this context, economists must make the assumption that all individuals have the same returns to education, and attempt to control for the differences driving schooling decisions. More commonly, the average marginal return is what is actually estimated. The second way to define a marginal return to education is to hold constant the level of schooling and look at returns across individuals. To distinguish it from the above definition, this marginal return will be called the marginal treatment effect (MTE), which is consistent 1 Empirically, returns to education are generally estimated using a measure of earnings. However, theoretically, returns to education are often defined more broadly to include components of utility that are more difficult to measure in the data. For example, some people (e.g., Ph.D. students) enjoy learning and derive some intrinsic benefit from knowledge; for others education is simply a means to an end or a signal in the labor market. 2 with the definition of the MTE used in the literature.2 The MTE measures how, as the fraction of the population with a fixed level of schooling increases, the return to education for the individuals who are indifferent to staying in school or not staying in school changes. In other words, holding constant the level of schooling, we want to know how returns vary over individuals. More specifically, we want to know the return of the person on the margin. These two definitions are equivalent if returns to education are the same for everyone. However, as discussed in detail below, marginal returns to schooling are not homogeneous. Therefore, each of these concepts helps answer a specific question. If economists are interested in how individuals make schooling decisions, the first definition is more useful. If instead economists want to assess the impact of a policy aimed at changing the number of people who complete a certain level of education, then the second is preferred. 3 Marginal Return to Education 3.1 Estimation Any analysis of marginal returns to education starts with the assumption that individuals decide on their optimal amount of education by comparing the benefits to the costs. Benefits include improvements in earnings over the course of the lifetime, as well as non-monetary gains such as access to more desirable jobs, self-worth, and the joy of learning (“psychic earnings”). Costs usually are thought of as a combination of money spent directly on education, the forgone value of the time spent obtaining education, and disutility of studying. Becker and Chiswick (1966) play an important role in developing the literature on estimating marginal returns to education. Their canonical model of human capital views education as an investment decision, where the costs are compared to the discounted stream of expected future benefits. Thus, schooling is an endogenous decision. The simple model says that total earnings over the lifetime are the sum of the returns to investments in education plus the earnings from any original human capital: 2 The concept of the MTE was first introduced by Bjorklund and Moffitt (1987) as a way to estimate marginal returns, and was more formally defined by Heckman (1997). 3 Ei = Xi + m X rij Cij , j=1 where Ei is the lifetime earnings of person i, Cij is the amount spent by person i on the j th unit of investment in education, rij is the marginal rate of return on the j th unit of investment, and Xi is the return from any original human capital. Becker and Chiswick point out that each individual invests in education until the marginal rate of return on a dollar of investment is equal to the marginal “interest” cost of that dollar. If the rate of return is assumed to be constant across units of schooling, then the marginal return is equal to the average return. Becker (1967) extends this analysis. In a slightly more detailed model, he lays out how to solve for a condition for the optimal amount of schooling. Again, this is the point at which the marginal benefits equal the marginal costs. He also suggests that individual heterogeneity in this optimal choice can arise from one of two sources. Individuals can differ in their marginal returns to schooling, or they can face different marginal costs of schooling. However, if individuals all face the same costs (what he calls “equality of opportunity”) or have the same benefits (“equality of ability”) then the marginal return to a given level of schooling is the same for everyone. A main limitation of the Becker model is that schooling is the only specified source of human capital. If schooling and the error are uncorrelated, then an Ordinary Least Squares (OLS) regression will produce an unbiased estimate of the return to a year of education. However, if they are correlated, the estimate will be biased. Becker identifies two obvious reasons why schooling is likely correlated with the error in this model. First, human capital can also be accumulated through on-the-job training (experience). Years of schooling and years of experience, conditional on age, are highly correlated. Second, if ability is unobserved and the return to education varies by ability, then estimates of the marginal return to education will also be biased. If more able individuals make greater investments in education, then the estimated marginal rate of return to schooling is actually the return to schooling plus the returns to ability and other unobserved forms of human capital. Mincer (1974) extends Becker’s work to try to address part of this problem. He includes in his model a measure of on-the-job training and experience. Importantly, he also shows that percent changes in earnings are strictly proportional to the absolute differences in schooling. 4 In other words, log earnings are a linear function of years of education: ln Yi = ln Y0 + rSi + ui , where Yi is the level of earnings of individual i, and Y0 is the mean level of earnings of someone with zero years of schooling. The coefficient r is the marginal return to a year of education, which Mincer assumes to be constant in the simplest model, but can be subscripted by S and allowed to vary. In this framework of log earnings, Mincer argues, years of work experience should enter additively and not multiplicatively. Additionally, the experience term is concave, resulting in the following earnings equation for an individual with t years of experience: ln Yi = ln Y0 + rSi + β1 ti − β2 t2i + εi . (1) The above equation is the basis for many studies estimating the returns to schooling. It can be extended in several ways. There is no reason to assume that marginal returns to education are constant; for example adding in a nonlinear schooling term (Si2 ) captures how marginal returns are changing as the level of schooling changes. An interaction term between schooling and experience may help predict marginal returns as well. Mincer specifies the following equation to estimate the marginal effects of education on log earnings using a cross-sectional distribution of annual earnings in 1959 for white men: ln Yi = α + r1 Si − r2 Si2 − γti ∗ Si + β1 ti − β2 t2i + i . He estimates ln Yi = 4.87 + 0.255Si − 0.0029Si2 − 0.0043ti ∗ Si + 0.148ti − 0.0018t2i . Then the marginal returns at different levels of schooling can be approximated. The marginal return to the S th year of education is given by: rS = d(ln Y ) = 0.255 − 0.0058 ∗ S − 0.0043t. dS 5 Clearly, the marginal returns are decreasing in the level of schooling. For someone with eight years of experience (t = 8), the marginal return to the eighth year of education in this sample is 17.4 percent. The marginal return is 15.1 percent for the twelfth year of schooling, and is 12.8 percent for the sixteenth year. Although this specification is a good illustration of a simple way to estimate non-constant marginal returns empirically, Mincer notes that both the nonlinear schooling term and the interaction term become insignificant when other covariates (namely number of weeks worked in 1959) are controlled for. Ignoring possible sources of bias, which are discussed in the following section, Mincer’s simple model in (1) seems to do a good job of estimating the marginal return to schooling. In his 2001 survey, Card extends the Mincer model described above. He shows that one can allow for individual heterogeneity to affect both the intercept of the log earnings equation as well as the slope, through the coefficient on schooling. Individual intercepts are estimated by including an individual level fixed effect in the regression equation, and the average marginal return to education is just the expectation of the individual marginal return. 3.2 Problems with OLS Estimation There is significant cause for concern that OLS estimates of marginal returns to education are biased. Bias means that the difference between the probability limit of the estimator and the average marginal return to schooling in the true population is not equal to zero. The issue of bias is well-acknowledged throughout the literature using OLS to estimate returns to education, and Card (2001) has a particularly nice discussion about some of the main problems. First, suppose that log earnings are linear in years of education, and that there is no individual heterogeneity in returns. Even if these assumptions are true, OLS is biased if there is correlation between an individual’s unobserved ability and the marginal cost of schooling. If the marginal costs are lower for people who are more able (i.e., those who would earn more at any level of schooling than those of lower ability would), then OLS is biased upward. This is known as ability bias. If we allow for heterogeneity in returns, then there are even more issues. People with 6 higher returns to education have incentive to acquire more education, all else equal. Even assuming there is no ability bias, cross-sectional estimates likely yield upward biased estimates of the average marginal return to education. This endogeneity, or comparative advantage, bias arises from the fact that differences in the earnings-education relationship result from differences in returns as opposed to differences in preferences for education, costs, or ability. More simply stated, OLS estimates are upward biased if an individual’s return to schooling is positively correlated with the amount of education chosen. If there is ability bias on top of this, the upward bias in OLS is even larger. Education can also be mismeasured or misreported. Empirically, economists only observe information on rounded years of schooling rather than a continuous measure, and self-reported schooling information is not always accurate. Measurement error is a form of attenuation bias in OLS. Because returns to schooling are generally assumed to be positive, this is a downward bias. Griliches (1977) argues that measurement error is enough of a problem to at least partially offset any upward bias from the sources mentioned above. Angrist and Krueger (1999) conclude that the reliability of self-reported schooling is about 85-90 percent, which implies that the resulting measurement error is enough to offset modest ability bias. However, Card (2001) points out that measurement error in schooling is meanregressive. People with the highest levels of education cannot over-report, and those with the lowest levels cannot under-report. This means that conventional estimates of measurement error are likely overstated and the true attenuation bias in OLS estimates of marginal returns to education is smaller than previously thought. Theoretically, the direction of the overall bias in OLS estimates is ambiguous. However, the literature concludes that it is most likely positive. Ability and endogeneity biases dominate measurement error. This means that many of the empirical estimates of the marginal returns to education in the literature are probably overstating the true return. 3.3 Instrumental Variables and Marginal Returns to Education The instrumental variables (IV) method is often used as an alternative to OLS estimation of marginal returns to education in order to overcome the ability bias and measurement error issues. In the absence of heterogeneity in returns, if there exists an observable instrument that 7 affects education choices but is uncorrelated with ability, then an IV estimator based on this instrument will yield a consistent estimate of the average marginal return to schooling. When marginal returns are the same for everyone, any valid instrument will identify the same parameter. If there is heterogeneity in returns to schooling (endogeneity bias), however, a stronger independence assumption between the instrument, individual ability, and the error in the schooling equation is needed to produce a consistent estimate. Examples of common instruments used to estimate returns to education include distance to college (Card, 1993), tuition costs (Kane and Rouse, 1995), and minimum mandatory schooling laws (Angrist and Krueger, 1991; Oreopoulos, 2006). The stronger assumption is usually violated by these sorts of instruments. Different instruments thus measure different effects, depending on which individuals are induced to change their optimal education choice by a change in the instrument. Imbens and Angrist (1994) formalize the notion that when there is heterogeneity in returns, IV actually measures a local average treatment effect (LATE). The LATE parameter is consistently estimated given the instrument satisfies the standard assumptions, but it consistently estimates the marginal return to education only for a select subset of the population: those whose schooling decision is affected by a change in the instrument. In comparison to OLS, IV estimates are unaffected by classical measurement error. This is one reason why IV estimates are generally larger than OLS estimates. In the presence of heterogeneity, when IV estimates the LATE parameter, it could produce a larger parameter than OLS because of who the instrument affects. For example, if the individuals affected are more credit constrained but have high returns to schooling, then IV will overestimate the average marginal return to education of the sample population. The validity of the IV estimator depends crucially on the assumption that the instrument is uncorrelated with the error. Small violations of this assumption cause the estimated parameter to blow up. Carneiro and Heckman (2002) show that several commonly used instruments, including distance to college and tuition, are correlated with ability. If ability is not controlled for as a covariate, than these instruments are ”bad” and result in upward biased IV estimates. This is problematic because many commonly used data sources do not include a measure of individual ability, and thus it cannot be controlled for. 8 4 Marginal Treatment Effects 4.1 Relationship between Marginal Returns and Marginal Treatment Effects If the instrument is valid, the LATE parameter is a consistent estimate of the marginal return to education for a specific subset of the population. However, because of the extreme sensitivity to the choice of instrument, it is often difficult to establish exactly what the LATE parameter is measuring. Furthermore, instruments such as the ones mentioned above tend to affect a particular level of schooling: changing the compulsory schooling age from 16 to 17, for example, does not directly affect the returns of people who would have completed sixteen years of education anyway.3 Because the LATE estimates the return to education of those affected by the instrument, it is not measuring the return to an additional year of education for the average individual. For these reasons, Heckman (1997) argues that LATE does not necessarily produce a parameter that is economically interesting, nor does it capture how marginal returns are changing over the level of education. It fits somewhere in between the two definitions discussed in section 2. From a policy perspective, the second definition of a marginal return to education is often more relevant. Economists want to know what the marginal return to schooling is for the people affected by a policy change, so that they can compare the benefits and costs of the policy. In a series of papers, Heckman and Vytlacil (Heckman, 1997; Heckman and Vytlacil, 1999, 2001a) attempt to define such a parameter. The marginal treatment effect is the effect of treatment for individuals indifferent to taking or not taking treatment. The MTE is the limit form of the LATE. Unlike the LATE, however, the MTE parameter does not depend on the choice of instrument. Additionally, because the MTE identifies the return to education at every margin, it can be used to construct any treatment parameter of interest.4 In particular, Carneiro, Heckman, and Vytlacil (2010) show how the MTE can be used to identify the marginal return to education of individuals affected by a specific educational policy. The following section outlines the framework used to identify the MTE. 3 General equilibrium effects are ignored here, as in the majority of the literature. However, they are important to consider. See Heckman, Lochner, and Taber (1998) for a discussion of general equilibrium effects in education. 4 In practice, the MTE may not be defined everywhere. This limitation is discussed below. 9 4.2 A Generalized Roy Model to Estimate the MTE This section summarizes the generalized Roy Model used in much of the econometric literature deriving treatment effect parameters (e.g., Imbens and Angrist, 1994; Heckman and Vytlacil, 2001b; Heckman, Urzua, and Vytlacil, 2006). This framework is directly applicable to estimating the marginal effects of schooling on earnings. For simplicity, suppose that there are only two levels of education. Individuals can choose to go to college or not go to college. Because this decision is likely correlated with unobserved ability, a valid instrument is used to solve the selection bias issues. This instrument must affect the decision to go or not go to college, but must not affect earnings in any other way. For each individual i, assume there are two potential outcomes, (Y0i , Y1i ), corresponding to earnings if the individual does not go to college and does go to college, respectively. Earnings in each state are a function of a vector of observable random variables Xi and an unobserved random variable Ui : Y0i = µ0 (Xi ) + U0i Y1i = µ1 (Xi ) + U1i . There is no reason to impose that µ(Xi ) is linearly seperable, but separability between the observable and unobservable characteristics is assumed. Further, the outcome of individual i depends upon the college decision, given by: Di? = µD (Zi ) − UDi where the instrument Zi is a vector of observed random variables and UDi is an unobserved random variable. The vector of instruments can contain all of the Xi s and must include at least one instrument that is excludable from the outcome equation. The value of Di? is not observed by the economist, but an indicator of treatment, Di , is. The treatment indicator is defined as: Di = 1 if Di? > 0 0 if Di? ≤ 0 10 Although Di? is not observed, both the realized outcome of individual i and the treatment indicator are observed: Yi = Di Y1i + (1 − Di )Y0i . (2) This can be written as: Yi = µ0 (Xi ) + Di [µ1 (Xi ) − µ0 (Xi ) + U1i − U0i ] + U0i . The difference U1i − U0i can be interpreted as the idiosyncratic gain from education, and the average gain from going to college is µ1 (Xi ) − µ0 (Xi ). In this framework, Heckman and Vytlacil (2001a) define the conditional probability of going to college as: P (z) ≡ P r(Di = 1|Zi = z, Xi = x) = FUDi (µD (z)) where FUDi is the cumulative distribution function (CDF) of the random variable UDi with realization u. Empirically, the selection probability can be estimated using a probit or a logit model. Given the properties of a CDF and the assumptions above, it is possible to assume UDi ∼ Uniform[0, 1] without loss of generality.5 One way to think about this is that the transformed values of UDi represent the quantiles of the original values. This normalization allows us to avoid making an assumption about the true distribution of the error. It also then follows that µD (z) = P (z). Furthermore, because of the uniform distribution, we can define UDi = FUDi (UDi ). Notice that Di = 1 if and only if P (Zi ) ≥ UDi . The marginal treatment effect is the mean return to going to college for people who are indifferent to going or not going, conditional on both the observed and unobserved characteristics that determine the college decision: M T E(x, u) = E[Y1i − Y0i |Xi = x, UDi = u]. Because P (z) = u for the person indifferent to selecting or not selecting treatment (Di? = 0), the MTE can be estimated by taking the derivative of the conditional expectation of the 5 See Heckman and Vytlacil (2001a) for a proof of this within the latent variable framework. 11 outcome with respect to the probability of treatment:6 M T E(x, P (z)) = ∂E[Yi |Xi = x, P (Zi ) = P (z)] . ∂P (z) Clearly, the MTE is only defined over the support of P (Zi ), but is otherwise not sensitive to the instrument choice. This means that in regions of common support, two instruments will yield identical estimates of the MTE. 4.3 Advantages and Disadvantages of the MTE The marginal treatment effect has two primary advantages over other ways of estimating marginal returns. First, it is a natural way to characterize heterogeneity in returns to education because it shows how returns vary with Xi and UDi . By estimating the MTE for all values of UDi , it is possible to identify the returns to college at any relevant margin. In contrast, the LATE parameter estimates returns at unidentified intervals. The MTE aids in the analysis of variation in returns to education across populations, and also allows a more accurate analysis of policy changes, as described below. Second, Heckman and Vytlacil (2001a) show that all treatment effect parameters7 can be expressed as different weighted averages of the MTE, where the weights all integrate to one. In this sense, the MTE unifies all of the treatment effect parameters. It can be used to estimate other treatment effects when endogeneity bias prevents consistent estimation of these parameters directly. However, there are some important limitations of the MTE. Most importantly, estimation is empirically challenging. The MTE is identifiable only over the support of P (Zi ). If the support is not the full unit interval, the MTE will not be defined everywhere. If the CDF is not smooth enough (the conditional probability of going to college is not continuous), the MTE cannot be estimated either. Finally, the marginal treatment effect is in itself generally not the parameter of interest. It identifies the average return at very specific margins. Usually, economists are interested in the returns over a wider interval. This is not such an issue because it is relatively easy to 6 The derivative of a conditional expectation can be estimated using standard nonparametric regression techniques (see Heckman and Vytlacil, 2001b). 7 E.g., the average treatment effect, effect of treatment on the treated, LATE, and policy relevant treatment effects. 12 construct other treatment parameters once the MTE is defined over the relevant range. The next section describes how the MTE can be used to construct a parameter that estimates the return to education of people affected by specific policies. 4.4 Policy Relevant Treatment Effects The MTE can be used to derive treatment effect parameters that directly answer the pol- icy questions at hand. The main problem with IV estimation in the presence of heterogeneity in returns to schooling is that it is not always clear what effect the LATE parameter is identifying. However, using the MTE, economists can derive a treatment parameter that answers a specific question. Consider a policy that affects the probability of going to college, but that does not directly affect earnings or unobservables in the decision process. Let Di∗ be the college choice made under the alternative policy and P ∗ (Zi ) be the probability that Di∗ = 1 conditional on Zi . Then the earnings outcome under the alternative policy is: Yi∗ = Di∗ Y1i + (1 − Di∗ )Y0i and the outcome under the baseline policy is as in (2). The mean effect of going from the baseline policy to the alternative policy per person shifted is the policy relevant treatment effect (PRTE), defined by Heckman and Vytlacil (2001b). It is defined wherever E[Di |Xi = x] 6= E[Di∗ |Xi = x] as: PRTE = E[Yi |Alternative Policy, Xi = x] − E[Yi |Baseline Policy, Xi = x] , E[Di |Alternative Policy, Xi = x] − E[Di |Baseline Policy, Xi = x] which can be alternatively expressed as a weighted average of the MTE.8 The PRTE depends on the policy change only through the distribution of P ∗ (Zi ). Thus, the CDFs of P ∗ (Zi ) and P (Zi ), along with the MTE, are sufficient to calculate the average return to education of the people affected by the policy change. Unless the instrument used corresponds exactly to the policy change, this parameter is different from the LATE estimate. The PRTE can only be identified if the support of P ∗ (Zi ) is contained in the support of 8 PRTE = R1 0 M T E(x, u)ωP RT E (x, u) du, where ωP RT E = 13 FP |X (u|x)−FP ∗ |X (u|x) . E[P (Zi )|Xi =x]−E[P ∗ (Zi )|Xi =x] P (Zi ). This often means that the support must be the full unit interval, which again is a requirement that is empirically challenging to satisfy. The marginal policy relevant treatment effect (MPRTE) corresponds to a marginal change from a baseline policy and does not run into this problem. It still answers economically interesting questions and, according to Carneiro, Heckman and Vytlacil (2010), is the appropriate parameter with which to conduct cost-benefit analysis of policy changes. The MPRTE is expressed as a weighted average of the MTE as well, and places positive weight on the MTE only for values of u where the density of P (Zi ) is positive (fP (u) 6= 0) so it is identified under only the assumption that P (Zi ) is a continuous random variable. 5 Empirical Estimation of Marginal Returns to Education In a homogeneous world, the OLS, IV, and MTE methods of estimating returns to ed- ucation would all produce the same parameter. However, in practice heterogeneity in returns, ability bias, and measurement error complicate the analysis in the ways previously discussed. How much do these factors affect different estimation methods? Carneiro, Heckman, and Vytlacil (2011) use data from the National Longitudinal Survey of Youth (NLSY) of 1979 to show that returns to college vary across individuals. Furthermore, they provide evidence that the people in their sample act on the knowledge about their idiosyncratic returns to education. By calculating the MTE, they compare marginal policy relevant treatment effects and the average treatment effect to the OLS and IV estimates of the return to a year of college. Their findings suggest that both OLS and IV are substantially upward biased compared to the average return to a year of college in the sample, as estimated by the average treatment effect. Just as importantly, they show that different policies produce different marginal policy relevant treatment effect parameters. The average return to a year of college of the people affected by a policy that changes the probability of going to college by a fixed amount for everyone, for example, is similar in size to the average treatment effect. However, the people affected by a policy that changes the probability of going to college by a small proportion have an average return only one third the size of the IV estimate. This suggests that the IV/LATE estimates may be wildly off the mark if they are used to estimate the returns to individuals affected by a certain policy change and the instrument 14 does not correspond exactly to that policy. Moffitt (2008) uses a similar approach to estimate the MTE. He uses data on the earnings of 33 year old men in the United Kingdom in 1993 to estimate the returns to higher education. Again, education is considered a binary choice. The OLS and IV estimates of the return to college are very similar in size to those found by Carneiro, Heckman, and Vytlacil (2011) despite using different data and different instruments.9 He shows further evidence for heterogeneity in returns to education, and that the marginal returns to college fall as the proportion of the population with higher education rises. Encouragingly, he finds that the shape of the MTE over different values of UD (for a given level of the observable characteristics) in his sample is the same as Carneiro, Heckman, and Vytlacil estimate in their study. While Moffitt does not calculate MPRTEs, his results imply that both OLS and IV overstate the return to college in his sample. 6 Directions for Future Research The existing empirical work on estimating marginal treatment effects (and thus policy relevant treatment effects) is subject to several limitations. The current empirical literature limits educational decisions to a binary choice of whether or not to go to college. From this marginal return to college, economists can back out an approximate marginal return to a year of college (as in Carneiro, Heckman, and Vytlacil (2011)). However, this analysis is somewhat misleading. Empirically, we know that returns to schooling are not constant across years. Specifically, returns are highest in degree years. More work is needed to be able to apply this methodology to multiple schooling level outcomes. Heckman, Urzua, and Vytlacil (2006) extend their theoretical analysis of treatment effects to more than two outcomes using an ordered choice model. Ordered choice is theoretically correct for education because individuals must complete a grade before moving on to the next one. However, no one has been able to empirically estimate marginal treatment effects with multiple schooling choices. This is primarily because of data concerns. To estimate the MTE at any point, the conditional support of the probability to attaining that level of education must be the full unit interval. 9 The estimates reported in Carneiro, Heckman, and Vytlacil (2011) are the true parameters divided by four, so that they can be interpreted as the return to one year of college. Moffitt reports estimates of the return to a college education, and so are approximately four times as large. 15 Adding more choices significantly increases the demands on the data. Nevertheless, the non-linear returns are important to think about, especially when estimating policy relevant treatment effects. A policy that increases the probability of starting college will have a very different impact on earnings than a policy that increases the probability of graduating from college. A significant proportion of individuals who start college never receive a degree. This implies that students are learning about their costs and returns to education and updating their optimal schooling choices. While actual earnings data is used to estimate returns to schooling, it is an individual’s perceived return that influences educational choices. In a world of imperfect information, Jensen (2010) points out that there is no reason to expect the level of education chosen to be either individually or socially efficient. Using data on eighth grade boys in the Dominican Republic, he finds that perceived returns to secondary education are substantially lower than measured returns. Students who are provided with information about the observed returns to schooling in the area stay in school significantly longer than students who do not receive this information. Stinebrickner and Stinebrickner (2012) use data from a college that serves mostly low income students to examine the role of learning in the college dropout decision. In contrast to Jensen (2010), they find evidence that students overestimate returns to education. As they learn about their academic ability and psychic costs to education, they update their beliefs about their individual return to schooling. The authors find that in their sample, dropout would be reduced by 40 percent if learning about ability did not occur. Because perceived returns and perceived costs affect education decisions, it is important to understand these perceptions. This is a very relevant area for future work. If low-income students are less informed about the true costs and returns to education, then policies that disseminate information may be as important as policies that change the costs of schooling. Future research that explores the relationship between perceived marginal returns and actual marginal returns to education will help guide educational policy. Additionally, all of the empirical work on marginal treatment effects thus far has focused on while males. This is because more data is available. In most surveys the sample of minorities is smaller, and the sample of females with wage data is smaller than that of males because traditionally more women do not enter the labor market. Yet, evidence suggests 16 that working women have a higher rate of return to college degrees than do working men (Dwyer, Hodson, and McCloud, 2013). If the observed marginal returns to education differ for different groups of people (men, women, minorities, etc.), then policies that change the probability of getting a certain level of education are going to have different effects depending on which groups of people they primarily affect. In order to be able to do cost benefit analysis of educational policies that are targeted at particular groups, it is important to be able to estimate marginal returns for people other than white men. Further work on both the theoretical and empirical sides of estimating marginal treatment effects will hopefully result in techniques that can be applied more generally. This is especially relevant in terms of analyzing the effectiveness of policies that promote education through affirmative action or that aim to push women into STEM fields. Before economists can draw conclusions about whether such programs are good or bad, more research needs to be dome in order to learn how effective they are at improving outcomes. Finally, marginal returns to education have mostly been analyzed in a partial equilibrium setting. However, this is a simplistic view of the world. As the supply of workers with a certain level of education changes, wages will adjust and the returns to education will change as well. Consider a simple signaling model. If education is only a signal to employers about ability, as the proportion of individuals who get a college degree rises, for example, the conditional expectation of a college-educated person’s ability falls and the signal becomes less valuable in the market. This drives observed returns to education down. It is important to remember that economists cannot measure true returns to education. Wages do not capture the intrinsic value that education provides. There are likely network effects and other non-monetary benefits to education as well. When interpreting either the marginal return to an extra year of education or the marginal treatment effect, it is important to think carefully about what the parameter is measuring and what it is not. The enormous amount of heterogeneity in returns highlights the need for continued discussion about how to best estimate marginal returns to education. 17 References [1] Angrist, Joshua, and Alan Krueger. 1991. “Does Compulsory School Attendance Affect Schooling and Earnings?” Quarterly Journal of Economics 104(4): 979-1014. [2] Angrist, Joshua, and Alan Krueger. 1999. “Empirical Strategies in Labor Economics.” In Handbook of Labor Economics, Volume 3A, edited by Orley Ashenfelter and David Card. Amsterdam and New York: North Holland. [3] Becker, Gary. 1967. Human Capital and the Personal Distribution of Income. Anne Arbor Michigan: University of Michigan Press. [4] Becker, Gary, and Barry Chiswick. 1966. “Education and the Distribution of Earnings.” American Economic Review 56(1): 358-369. [5] Björklund, Anders, and Robert Moffitt. 1987. “The Estimate of Wage Gains and Welfare Gains in Self-Selection Models.” The Review of Economics and Statistics 69(1): 42-49. [6] Card, David. 1993. “Using Geographic Variation in College Proximity to Estimate the Return to Schooling.” National Bureau of Economic Research Working Paper 4483. [7] Card, David. 2001. “Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems.” Econometrica 69(5): 1127-1160. [8] Carneiro, Pedro, and James Heckman. 2002. “The Evidence on Credit Constraints in Post-Secondary Schooling.” Economic Journal 112(482): 705-734. [9] Carneiro, Pedro, James Heckman, and Edward Vytlacil. 2010. “Evaluating Marginal Policy Changes and the Average Effect of Treatment for Individuals at the Margin.” Econometrica 78: 377-394. [10] Carneiro, Pedro, James Heckman, and Edward Vytlacil. 2011. “Estimating Marginal Returns to Education.” American Economic Review 101: 2754-2781. [11] Dwyer, Rachel, Randy Hodson, and Laura McCloud. 2013. “Gender, Debt, and Dropping Out of College.” Gender and Society 27: 30-55. 18 [12] Griliches, Zvi. 1977. “Estimating the Returns to Schooling: Some Econometric Problems.” Econometrica 45: 1-22. [13] Heckman, James. 1997. “Instrumental Variables: A Study of Implicit Behavioral Assumptions Used in Making Program Evaluations.” The Journal of Human Resources 32(3): 441-462. [14] Heckman, James, Lance Lochner, and Christopher Taber. 1998. “General Equilibrium Treatment Effects: A Study of Tuition Policy.” American Economic Review 88: 381-386. [15] Heckman, James, Sergio Urzua, and Edward Vytlacil. 2006. “Understanding Instrumental Variables in Models with Essential Heterogeneity.” ational Bureau of Economic Research Working Paper 12574. [16] Heckman, James and Edward Vytlacil. 1999. “Instrumental Variables and Latent Variable Models for Identifying and Bounding Treatment Effects.” Economic Sciences 96: 4730-4734. [17] Heckman, James and Edward Vytlacil. 2001a. “Local Instrumental Variables.” In Nonlinear Statistical Modeling: Proceedings of the Thirteenth International Symposium in Economic Theory and Econometrics: Essays in Honor of Takeshi Amemiya, edited by Cheng Hsiao, Kimio Morimune, and James Powell, 1-46. New York: Cambridge University Press. [18] Heckman, James and Edward Vytlacil. 2001b. “Policy-Relevant Treatment Effects.” American Economic Review 91(2): 107-111. [19] Imbens, Guido and Joshua Angrist. 1994. “Identification and Estimation of Local Average Treatment Effects.” Econometrica 62(2): 467-475. [20] Jensen, Robert. 2010. “The (Perceived) Returns to Education and the Demand for Schooling.” Quarterly Journal of Economics 125(2): 515-548. [21] Kane, Thomas, and Cecilia Rouse. 1995. “Labor Market Returns to Two- and Four-Year Colleges.” American Economic Review 85(3): 600-614. [22] Mincer, Jacob. 1974. Schooling, Experience and Earnings. New York: Columbia University Press. 19 [23] Moffitt, Robert. 2008. “Estimating Marginal Treatment Effects in Heterogeneous Populations.” Annals of Economics and Statistics 91-92: 239-261. [24] Oreopoulos, Philip. 2006. “Estimating Average and Local Average Treatment Effects of Education when Compulsory Schooling Laws Really Matter.” American Economic Review 96(1): 152-175. [25] Stinebrickner, Todd and Ralph Stinebrickner. 2012. “Learning about Academic Ability and the College Dropout Decision.” Journal of Labor Economics 30(4): 707-748. 20
© Copyright 2026 Paperzz