Does Education cause better Health? A Panel Data Analysis Using School Reforms for Identification Jacob Nielsen Arendt* Institute of Local Government Studies Nyropsgade 37, 1602 Copenhagen V Abstract We address whether the relationship between education and health can be interpreted causally. Alternatively both may be determined by common unobserved factors such as individual time preferences. We add to the primarily U.S. based evidence by using a Danish panel data set of employed persons, and use Danish school reforms to instrument education effects on selfreported health (SRH), body mass index (BMI) and an indicator for never been smoking. Education is related to all three measures in the expected way and the relationships are amplified in magnitude when education is instrumented. However, as the standard errors also increase when using IV, neither exogeneity nor the null of no effect of education can be rejected for SRH and BMI. We address important criticism of our identification scheme, including tests for presence of weak instruments as suggested by Hahn and Hausman (2002). None of the supplementary analyses reveal problems with the estimation strategy, so the analyses remain inconclusive. JEL Classification: I12, I20, J18, C35 Keywords: Education Effect; Health Inequality; Selection Bias; School Reform; * e-mail: [email protected]. 1 1. Introduction The interest in the relation between social status and health is not new, but has received renewed interest during the last couple of decades, both within academia and in the public1. In this paper we address whether education related health differentials can be interpreted causally. This is of tremendous importance for our understanding of determinants of health as well as for our understanding of how schooling affects and shape individual lives. Education may cause better health in several ways. Education influences job opportunities and income, which in turn may influence health. Education may also enhance knowledge of how to live a healthy life, leading to improved choices of use of time and goods that affect health, see e.g. Kenkel (1991). Moreover, just as education is believed to enhance market productivity, education may enhance health productivity as argued by Grossman (1972). There might also be non-behavioural reasons for causal social gradients in health, e.g. relying on psychosocial factors as described for instance by Adler et al. (1994). However, there exist plausible reasons why education related differences in health cannot be interpreted causally. We focus on hypotheses suggesting that the relationship between education and health may be due to unobserved variables. This gives rise to a standard missing variables problem, which biases estimates of the effect of education on health. One important potential missing variable is childhood health or more broadly, health endowments shaped prior to educational attainment. Another potential missing variable is 1 Some examples are the MacArthur Research Network on Socioeconomic Status and Health in the US and the Whitehall studies in the UK or the British Acheson report (Acheson, 1998) and the Danish Health program 1999-2008 (Sundhedsministeriet, 1999). 2 preferences for the future, because these preferences affect the likelihood of engaging in activities with current costs and benefits that are harvested in the future. Educational attainment and good health behaviour can both be regarded as such activities. Therefore, with health endowments and time preferences unobserved, we may observe adults with low schooling and poor health and vice versa, even when schooling plays no direct role for health outcomes. Compared to the significance of the subject, very few studies have addressed whether education effects on health can be given a causal interpretation. We add to the existing literature by employing institutional changes, Danish school reforms, to identify education effects on health. Furthermore, we use a panel data set of Danish workers, which allow us to control for unobserved heterogeneity over time. Our analysis provides evidence on data from outside the U.S., where most of the existing evidence is from. We therefore contribute to the task of providing robust evidence, less sensitive to institutional settings and cultural habits which might affect the relation between health and education. Finally, we address various criticism of the identification scheme, which among others includes an application of two new tests for the presence of weak instruments. The results show that education is significantly related to both self-reported health (SRH) and body-mass index (BMI) in healthy range. When the education effect is instrumented, it increases in magnitude, but so does its standard error. Therefore, we neither reject that education is exogenous nor that it has no effect on SRH and BMI. In supplementary analyses it is however not possible to detect any problems with the estimation strategy. Therefore the results remain inconclusive. The paper is organized as follows. In the next section we discuss theory on the relation between education and health and mention results from previous empirical studies. In section 3 we present the data, and in section 4 we discuss the empirical model. Section 5 3 presents the school reforms and section 6 contains the main empirical analysis. Section 7 provides a discussion and section 8 concludes. 2. Theory and Previous Evidence A useful starting point for the discussion of health determinants is the reduced form model of the demand for health, explicitly derived from a structural model, from Grossman (1972). Related literature is discussed in Grossman (2000). In this model, health depreciates over time but is maintained by health investments depending on consumed goods and activities that affect health. Utility maximization determines health inputs as a function of prices, wages, a depreciation rate and technology parameters, yielding a reduced form model of the demand for health as: (2.1) Hit wwit e Ei a Ageit t Dt Bit Hit is the health stock of individual i at time t, wit is the wage, Ei is education, Dt are time effects capturing e.g. prices of health inputs and Bit is an unobserved component. The focus of our discussion is why Bit may be correlated with education, thus that standard estimates of e not necessarily reflects a causal parameter. A possible correlation between Bit and education may be explained by the endowment hypothesis. This hypothesis states that when those with higher "ability" obtain more education and when those with a high health "endowment" (both interpreted broadly including genetics, parental care and influences from the social environment) are more healthy as adults, a correlation between ability and health endowments will most likely imply a correlation between education and adult health, see e.g. Card (1999) and Rosenzweig and Schultz (1983) for hypotheses about ability and endowments. The endowment story also 4 implies that health may have important unobserved persistent components. A second explanation for a correlation between Bit and education states that individuals with higher preferences for the future are more likely to engage in activities with current costs and future benefits such as schooling and health investments, and dates back to at least Fuchs (1982). Grossman and Kaestner (1997), p. 87-93, mention studies investigating the time preference hypothesis. The mentioned studies deals with health behaviour, notably smoking, and does therefore not deal with the education effects on health per se. They conclude that the existing, sparse evidence does not favor the time preference hypothesis. The endowment story has a long history in empirical work, where it is often dealt with using proxies e.g. for family background and test scores as control variables. Examples of applications of this identification strategy are Grossman (1975) and Behrman and Wolfe (1989). The latter is also one of the few studies conducted on data from outside the U.S. (Nicaragua) where endogeneity of education is addressed. Behrman and Wolfe also apply household fixed effects methods. The results from both studies suggest that controlling for unobservables does not alter education effects considerably. A number of studies have dealt with endogeneity of education using instrumental variable techniques. Berger and Leigh (1989) use per capita income and per capita expenditures on education in the state of birth as instruments for education in their study of education effects on health on U.S. data. They find that education effects are slightly reduced, but remain significant, when correcting for endogeneity. We note that these instruments may be related to expenditures on health, which, if health expenditures affect health, would disqualify them as valid instruments. Lleras-Muney (2004) uses changes in compulsory school attendance and child labor laws from 30 U.S. states from 1915 to 1939 to identify education effects on mortality. She uses grouped U.S. census data as well as U.S. individual data 5 (NHANES, 1985). The results show that one year of education seems to increase life expectancy at age 35 by more than a year. We note that the instrumented education effects from individual data are not significant. Adams (2001) adapts the identification strategy suggested by Angrist and Krueger (1991), using quarter of birth in a U.S. cross-section of individuals (HRS, 1992). Adams uses self-reported health and a number of variables describing functional limitations as health measures. When instrumental variable estimation is used, education effects increase slightly but are insignificant. Furthermore, F-values on the instruments are just above 1, indicating a problem of weak instruments. Arkes (2002) estimates education effects on work-limiting health problems, on need for personal care and on mobility limitations using a large U.S. sample including white males aged 47 to 56. Arkes uses within-state differences in unemployment rates as instruments for education. He also finds that education effects increase a bit when allowing for endogeneity, and the effects are significant for two of the three health measures. Finally, Spasojevic (2003) uses changes in compulsory schooling in Sweden in the fifties, analyzing education effects on a constructed index of bad health and on an index of body mass index in healthy range over the period 1981-91. She finds significant positive effects from years of education on both. As it stands, the existing evidence suggests that there is a causal component running from education to health. However, the evidence is still sparse, and many studies suffer either from use of poor instruments, low precision or from a focus on very narrow samples (e.g. with respect to age or gender). In addition, few studies apply datasets from outside the U.S. or allow for individual specific heterogeneity over time, which we do. 3. Data We use a two-period data set of Danish workers interviewed in 1990 and 1995 (The Danish 6 National Work Environment Cohort Study (WECS)), aged 18 to 59 in 1990. We exclude individuals under age of 25 in 1990 and observations with missing education data. There is some attrition in the sample from 1990 to 1995, mainly across different regions, whereas it does not seem to affect the distribution of age, gender and occupation (Borg and Burr, 1997). Therefore controlling for region is necessary. The deletion of missing education might lead to sample problems, but we obtain a sample which seems fairly representative of Danish workers, see the appendix for more details. The sample consists of 1,710 men and 1,548 women observed in both 1990 and 1995. The data set includes self-reported health (SRH), graded as poor, fair, good, very good and excellent, which will be our main health outcome. The data also includes measures more closely related to health behaviour. We will use body mass index (BMI) and an indicator of never been smoking as supplemental outcomes. The use of SRH as a health measure has both pros and cons. First of all it is believed to be a useful summary measure of general health (see e.g. Case & Deaton, 2002), possibly capturing dimensions of health which is difficult to obtain with more “objective” health measures. It also seems to be related to intervening mechanisms leading to increased risk of functional disability and mortality, see e.g. Mossey and Shapiro (1982) and Idler (1994). However, SRH might be reported with error, see e.g. Butler et al. (1987). An example is that people in deprived situations might “justify” their situation by claiming that their health is poorer than it actually is. Changes in norms and the business cycle may lead to similar biases. Indeed, the state of the economy changed from a downturn in 1990 to an upturn in 1995, and health norms might have changed because of stagnating longevity and focus on unhealthy lifestyles in Denmark during this period. To control for such general changes in self-reporting behaviour over time, a time dummy is included in all estimations. If reporting errors in SRH 7 are related to education it introduces a spurious correlation between SRH and education. However, if the error is unrelated to the variables used to instrument education, instrumental variable estimation solves the problem. 4. Empirical Model We use an ordered quantal response panel model and a linear regression model respectively to model SRH and years of education: (3.1) H it* X 1i Ei i it H it K 1( H j 1 * it cj ) Ei X 1it 1 X 2 it 2 Vit H* is latent health, Hit is observed health category for individual i in period t, changing value when H* crosses an unknown threshold, cj, Ei is years of education, X1i are exogenous regressors in the health equation and X2i are instrumental variables for education. i is an individual specific effect capturing the idea that unobserved individual components determined prior to educational attainment selectively sort individuals into high and low health groups, and it is assumed to be random. When Vi and i are correlated, e.g. for reasons discussed in the previous section, education is endogenous. Note that it is assumed that the variables X1 are uncorrelated with both i and it. We therefore limit X1 to consist of age, a year dummy and dummies for region of living. All may have important health components, and in addition, the former is required for identification (see the next section) whereas the two latter are required because of the sampling scheme (see the previous section). To avoid multicollinearity problems we do not control for cohort. Therefore age may be related to i, because the latter includes cohort effects. Whether this gives rise to large biases can be inferred from estimations on separate cohorts, which is conducted in the last section. 8 To correct for endogeneity of education we use a panel version of the two-stage conditional maximum likelihood (henceforth referred to as 2SCML) estimator suggested by Vuong (1984)2, assuming that the conditional distribution of i given Vi has a mean that is linear in Vi:3 (3.2) E ( i | Vi , X ) Vi , s.t. i Vi wi η is the regression coefficient of αi on Vi, such that wi is orthogonal to Ei by construction. Substituting this into the model we get: (3.3) H it* X 1it Ei Vi wi it A first stage estimation of Ei on (Xi1, Xi2) yields a consistent estimate of Vi, which inserted into (3.3) allows consistent estimation of γ in by a random effects procedure. Following our discussion in the previous section, the random effect could be a propensity to be healthy unrelated to education. We refer to this estimator as the 2SCMLR estimator, where the R stands for random effects. Note that the sign of the coefficient η yields some useful insights, as it is of the same sign as the correlation between the individual specific health error term, i, and the education error term, Vi. If η is positive, models where education is assumed to be exogenous yield an 2 2SCML can be advocated on several grounds. First, Monte Carlo analyses show that 2SCML performs well in small samples (Arendt, 2002), outperforming an alternative suggested by Newey (1987). Further, 2SCML requires no distributional assumptions on first stage error terms, as for instance joint maximum likelihood does. Indeed, 2SCML outperforms MLE under misspecification of this error term (Mroz and Gilkey, 1992). Finally, 2SCML is easy to use. 3 This can be motivated from a joint normality assumption, but this is not needed. 9 upward bias, and vice versa if η is negative. If time preferences were the only reason for endogeneity, we expect the sign of η to be positive, because iand Vi contains only this one common factor. If endowments were the reason for endogeneity, the sign of η describes the sign of the correlation between education ability and health endowments, which could go either way. A negative estimate is therefore an indication that the endowment hypothesis matters relatively more than the time preference hypothesis. Finally, the t-test that η is zero is a test for exogeneity of education. We use a naive bootstrapping procedure to obtain corrections of standard errors for the first-stage regression, which are needed in order to do correct inference. 5. School Reforms as Instrumental Variables In this section we discuss the use of instrumental variables for education, using information on two school reforms that took place in Denmark in 1958 and 1975. Prior to the reform in 1958, pupils had to pass a test after 5th form, in order to enter “the middle school” (the 6th to 9th or 10th year of schooling), which was necessary for further education. If they did not pass the test, they would continue schooling in another more practically orientated track, ending schooling after 7th form. From 1958 these early sorting mechanisms were abandoned, such that only mild differences in curriculum during 5th to 7th form occurred (Folketingets Årbog, 1958). The formal tests after 5th form were replaced by recommendations after 7th form, of whether to continue school after 7th form. In addition, a distinction between village and city schools was eliminated. This meant that the number of lessons, which prior to the reform was lower in the village school, was brought to the same level as in the city schools. The village schools also gained the same right as the city schools to form classes from 8th to 10th form. This increased the proximity of schools, necessary for education beyond 7th form, outside larger 10 cities. It is a common view that the schooling structure prior to 1958 limited access to further education especially for children from less educated backgrounds and for children from the countryside, which the 1958 reform helped to alleviate (Bryld et al.,1990; Hansen, 1982). The school reform in 1958 therefore lowered educational barriers and, in village schools, increased number of lessons and school proximity for those wishing to attend 8th to 10th form4. The reform in 1975 (Folketingets Årbog, 1976) raised the minimum school-leaving-age, increasing the compulsory years of education from 7 to 9 years. The extent to which this had an impact on mean education is however dubious, since most children already obtained 9 years of schooling. However, the reform in 1975 also removed the distinction between two tracks during the 8th to 10th form, where one track prepared for gymnasium and another track for other, mostly vocational, educations. We generate two dummy instrumental variables to indicate whether individuals were affected by the reforms in 1958 and 1975. For both reforms, we assume that only pupils who entered 7th form after 1958 or 1975 were affected. This is crudely measured by whether individuals turned 14 of age in 1959 and in 1976. This amounts to a division of individuals into three groups, those below age 34 in 1995, those aged between 33 and 51 in 1995 and those 51 or older in 1995. Individuals in these groups were affected respectively by the 1975 reform, by the 1958 reform and by none of the reforms. To ensure that the instruments generated by the dummy variables are valid, we need to take account of upward drifts over time in health, occurring for reasons other than increasing educational attainment. If not accounted for, the instruments would be correlated 4 Examples of use of related instrumental variables from the wage litterature is given in Harmon and Walker (1995) and Meghir and Palme (2001), using compulsory schooling laws, and Kane and Rouse (1993) and Card (1995), using proximity of schools. 11 with health through a common correlation over time. We have to assume that other events, driving the upward drifts in health, can be captured by a relative smooth function, otherwise the design does not bring identification (see a formal discussion in Hahn et al., 2001). 6. Results 6.1 Descriptive statistics In table 1 we present descriptive statistics on age, education and SRH. SRH takes value one for excellent health and five for poor health. From the second column, we see that mean SRH decreases (i.e. health improves) with length of education and increases with age. Simple logit estimations (not reported for brevity) of SRH on dummies for each length of educational attainment and age groups, show that age affects SRH linearly but that a non-linearity occurs when crossing 11 years of education. The latter is primarily due to the very poor health of semi-skilled people who are categorized as having 11 years of education. This is a maximum, which makes the outlier look more extreme. They only constitute 5% of the sample and do not seriously affect the results. 6.2 The effect of school reforms on educational attainment In Figure 1 and 2 the mean years of education is shown (separately for men and women) in the year where individuals turned 14. At age 14 pupils may enter 7th grade, which is the age that divides people into whether they are affected by reforms or not. Two vertical lines are placed at 1958 and 1975, the years where the reforms were introduced. From both figures we see that, on top of a general increasing trend in educational attainment, educational levels jumped after 1958. The necessary condition for the 1958 reform to affect educational attainment therefore seems to be fulfilled. The changes induced by the 1975 reform are not as evident, as expected. 12 In order to infer more specifically to what extent the reforms have an impact on educational attainment, we report results from the first stage regressions for years of education regressed on the reform dummies, age, a 1995 dummy and regional dummies. The results are reported in table 2. The third and sixth columns contain the results with a quadratic in age, and it turns out that the quadratic term is insignificant for both men and women. The predicted profiles of the quadratic are very close to that of the linear. As mentioned above, the same is true for the relation between age and SRH. For these reasons, we will use a linear age function. From the second and fifth columns, with age included linearly, it is seen that the impact from the 1975 reform is not statistically significant and it is lower than the effect of the 1958 reform. Those affected by the 1958 reform have significantly higher educational attainment on top of a linear age trend and regional differences. This holds for both men and women; raising years of education on average by 0.35 year for women and by 0.4 year for men. The F-values of joint significance of both instruments are 3.65 for women and 4.47 for men. Although this is a bit low (see e.g. Staiger and Stock, 1997, recommending a value above 5), indicating a possible problem of weak instruments, the values are still higher than what has been reported in many other comparable studies. Finally, we show results from reduced form ordered logit models of SRH, i.e. where the reform dummies are included and education is excluded from the health equation, in column four and seven. Both reforms are seen to be insignificant (although only the impact of the 1975 reform for men has the wrong sign), but since these are reduced form coefficients, they do not rule out that the reforms have an impact on education and education has an impact on health. The results do however again indicate that the explanatory power of the reforms may be weak. We return to this issue below. 6.3 Logit estimates of the effect of education on SRH 13 Table 3 contains estimates of the model in (3.1) using a simple ordered logit and the 2SCML estimator with and without random effects. The model includes a year dummy for 1995, 17 regional dummies and age included as control variables. Since a higher value of SRH indicates worse health, a negative coefficient implies positive covariation with better health. In table 3 column (1), a simple ordered logit for men is presented. Both education and age are significant, education with a coefficient of -0.091, and age with a coefficient of 0.022, such that more educated have better health and the elderly have worse health. An additional year of education therefore corresponds to an odds ratio of e.g. having good or very good health of 1.095, i.e. each year of education roughly improves the probability of good or very good health by 10%5. Below the parameter estimates, a chi-squared value for testing the null of joint insignificance of 17 regional dummies is presented. The null is strongly rejected. In column (2) we allow for individual normal distributed random effects, which are integrated out by Gaussian quadrature. The presence of random effects is tested, comparing estimations in (1) and (2) by means of a likelihood ratio test; see the second and third row below the parameter estimates. This is chi-squared distributed with one degree of freedom, and the null of no random effects is rejected. The random effects estimate is a bit larger in magnitude than the one without. Before continuing to column (3) and (4), which allows for endogeneity of education, we test the validity of the reforms as instruments. Two conditions must be fulfilled: a relationship with education and no relationship with SRH given controls. Recall from Table 2 that the former is fulfilled. The latter is tested by including reform dummies one at a time in the second stage conditional on the other reform (not shown). Both tests, as well as when 5 Recall that in the ordered logit, coefficients and odds ratios are related by: 14 including the reform jointly in the second stage, produce insignificant results with t-values around 0.8. Comparing likelihood values form 2SCML and 2SCMLR estimates in column (3) and (4), the null of no random effects is again rejected. The 2SCMLR estimates in column (4), show that the coefficient on education decreases to -0.223, which, however, is insignificant. Moreover, we reject endogeneity by accepting the null that the coefficient on the residual is zero. Moving forward to the bottom part of the table, we look at results for women. Education is strongly significant in the simple logit with a coefficient at -0.079, i.e. the effect is smaller than for men. The estimators with no random effects are again rejected against estimators with random effects and the latter have higher education coefficients. Column (9) shows the 2SCMLR estimates, accounting for both random effects and endogeneity of education. Again, the coefficient on education is much larger than the simple logit estimates, but as we saw for men both education and the residual from the first stage are insignificant. Leaving insignificance aside for a moment, we note that for both men and women the coefficients to education and the residual are of opposite signs. As discussed in section 4 this indicates a relatively weak influence of time preferences, but leaves room for one version of the endowment hypothesis. Indeed, it need not be unrealistic that some people are endowed with good health while less so with respect to their mental ability and vice versa. 6.3 Logit estimates of the effect of education on BMI and smoking In this section we apply the same methods as above to examine whether education affects BMI and an indicator of never been smoking. Note that BMI and smoking are more closely P( SRH j | x 1) P( SRH j | x) e / , where is the coefficient to x. 1 P( SRH j | x 1) 1 P( SRH j | x) 15 related to health behaviour than SRH. There is an additional subtlety about the use of the never been smoking indicator, since a large share of smokers start smoking during their teenage years. Therefore, this is a variable that is determined simultaneously with education, which should be captured by the 2SCMLR estimator. Table 4 presents some descriptive statistics. The BMI is within a so-called healthy range (19-25) for 55% of the men and 72% of the women. 45% of the men are overweight (BMI>25), which is the case for 21% of the women. The table also shows that mean SRH follows a U-shape for men, being higher for those with BMI below 19 and above 25. Although this is not the case for women, we follow common practice in this area and model the event of having BMI in the healthy range between 19 and 25. The share who has never smoked is 30% for men and 40% for women. This is also related to SRH in the expected way and. Both smoking and BMI are related to education. Table 5 presents the estimation results. The top of the table contains results for men, while results for women are found in the bottom. Both random effect and 2SCMLR estimates are presented. The second and fourth columns present evidence that for both men and women, more educated are more likely to have BMI in the healthy range and it is more likely that they have never been smoking. Looking at 2SCMLR estimates, exogeneity of education cannot be rejected for BMI, whereas exogeneity is rejected when it comes to the never been smoking outcome for both men and women. Due to the latter result, one may say that the 2SCMLR pass a test of validity. The coefficient on the education residual is positive in the never smoking equation, which can be interpreted as showing that those with higher “schooling ability” are less likely to start smoking. 7. Discussion and additional analyses 16 In this section we discuss the results from the previous section. Although we cannot reject exogeneity of education, the test for exogeneity does not have a particular good power as the standard errors of the 2SCMLR estimates are large. It therefore seems reasonable to further evaluate the identification strategy. For both men and women, the estimated education effect is larger in magnitude when endogeneity is allowed for. As in the wage literature, this finding can be interpreted as showing evidence of presence of measurement error (Griliches, 1977), heterogeneity in education effects (Card, 1999, p. 1821) or from a bias due to weak instruments (Card, 1999; Stock and Staiger, 1997). We choose to focus on the third interpretation, since this is more likely to burden the interpretation of our analyses. We limit attention to SRH. We noted in section 6.2 that the F-values from first stage education regressions may indicate a situation with weak instruments. However, as Hahn and Hausman (2002) point out, a weak instrument bias is determined not only by the F-value from the first stage, but also by the number of instruments and the correlation between unobserved components of the outcome and the endogenous regressor (e.g. and V in (3.1)). Hahn and Hausman (2002) suggest two tests using second order asymptotic theory for detection of weak instruments. The tests in Hahn and Hausman (2002) are based on the linear regression model. Although our model is not linear, we use the tests as a tentative device for searching for these sources of bias. The first test is based on the difference between 2SLS and the inverse of reverse 2SLS estimates, corrected for an estimate of the second-order bias of this difference6. This is a 6 ' ' When 2SLS is given by y1' y 2 / y1 y1 , reverse 2SLS is: y1' y 2 / y 2 y 2 . 17 test of whether first order asymptotic theory seems reliable. The different estimators and the Hahn and Hausman test statistics are found in Table 6. The OLS estimates of the education effects on SRH are -0.029 for women and -0.034 for men. The 2SLS estimates are -0.087 for women and -0.064 for men. Their relative sizes (between men and women, and between standard and IV estimates) are in agreement with results using logistic models. The reverse 2SLS estimate for women is -0.386, and the test statistic (Test 1), which is a t-statistic, takes value 17.36. The test takes value 14.31 for men, i.e. the test rejects the conventional first-order asymptotic 2SLS estimates. Hahn and Hausman then suggest the use of a test based on forward and reverse bias corrected Nagar type estimates (see their article for details). The forward Nagar estimate is -0.087 for women and -0.064 for men, while the inverse of the reverse Nagar estimate is -0.105 for women and -0.139 for men. That is, they are reasonably close. The second test statistic is 0.09 for women and 0.40 for men, thus second-order asymptotics is not rejected. In this case, Hahn and Hausman recommend the use of LIML estimates, since they are second-order unbiased. This is also the case for the optimal Nagar estimator (Hausman, 1978, p. 413), which we have calculated. It is -0.091 for women and 0.068 for men. As these estimates are close to 2SLS, and given the similarity between relative sizes of linear and logit estimates noted above, these results indirectly suggest that 2SCMLR results are not severely biased from the presence of weak instruments. We end the discussion by pointing out that the school reform instruments essentially compare individuals from very different cohorts. In order to narrow down the potential sources of variation causing improvements in health over time besides educational reforms, we conduct estimations for individuals aged 44 to 57 in 1995. These are individuals who entered 7th form in one of the six years just before the 1958 reform and individuals who entered in one of the six years after the reform. For this group of individuals there are 750 18 women and 731 men. Results are presented in Table 3, column (5) and (10). For this specific sample the education coefficient is of qualitatively the same size as for the larger sample for women, while it increases even further for men. It is noted that the age coefficient is not different from the estimates on the larger sample, suggesting that cohort effects do not severely bias the results. 8. Conclusion It has been documented extensively that educational differences in health exist. Few investigations have considered whether the observed relationship describes a causal relationship. We have added to the existing literature, applying new instrumental variables and estimation techniques on a panel data set from outside the U.S., where most existing evidence is from. For both men and women, a longer education is associated with better SRH. When endogeneity is allowed for, this relationship increases in magnitude, but as is commonly found with IV methods, so do the standard errors. Therefore, it cannot be rejected that education is exogenous to SRH, nor can the null of no effect of education be rejected. Similar results are obtained when BMI is used as health outcome. The results are based on the use of school reforms as instruments for education, and we showed that educational attainment jumped to a higher level in the years following the 1958 reform. As with other studies of this kind, the power of the procedure is poor. The reliability of the estimates was investigated using an outcome often determined simultaneously with education (never smoked), by looking at results for specific cohorts and by the use of tests for weak instruments suggested by Hahn and Hausman (2002). These additional analyses did not detect any problems with the estimation strategy, so the results remain inconclusive about the effect of education on health. 19 Acknowledgements The first version of this paper was from my Ph.D.-thesis from University of Copenhagen. I am grateful the Danish National Institute of Social Research for giving me access to the WECS Data. I appreciate very valuable comments from two referees, from the editor and from my supervisors Karsten Albaek and Martin Browning, from Thomas Crossley, Michael Grossman and Bo E. Honoré as well as from participants at the Health Econometric Workshop in Odense, the EDGE Jamboree in Munich, the CAM Lunch Seminar in Copenhagen and at a Invited Seminar at Department of Economics, Lund University. All errors are mine. Appendix Definition of Variables and Sample Selection The WECS was collected by the Danish National Institute of Occupational Health (AMI) and the National Institute of Social Research (SFI) as a representative survey of Danish citizens aged 18-59 in 1990 with a follow-up survey in 1995. Borg and Burr (1997) describe the sampling method and how representative the data is. The total sample of individuals selected for interviews in 1990 and 1995 consists of 11,084 individuals. The response rate was 89.7 percent in 1990 and 80.2 percent in 1995, hence some sample attrition occurs. Most of the attrition is due to migration, and some attrition is due to death or refusal to participate. Those who refuse do not differ with respect to gender and age to those who participate. They do differ a bit with respect to civil status (higher attrition among divorced) and especially region (higher attrition among those living in the capital area). Among those who participated in 1990 but not in 1995, there are more medium level white collar workers and less unskilled or 20 high level white collar workers. Borg and Burr (1997) conclude that the sample is representative to a reasonable degree, and that attrition does not seem to be a big problem, at least when regional status is accounted for. We create our sample in the following way. We drop 911 observations with missing education information in both 1990 and 1995. We drop 702 of the remaining individuals who are under education in 1990. 38 are excluded because they have missing, no or untold education in 1995, in addition to missing or untold schooling in 1995 and untold education in 1990. 2,946 observations are deleted because individuals were not interviewed in both years. For 10 individuals, their age is reported as more than 6 or less than 4 years in 1995 from the reporting in 1990, i.e. erroneously, so we delete these. Among the 6,477 remaining observations, we keep the 3,755 with non-missing SHR in both 1990 and 1995, who are observed in both 1990 and 1995. The large number of missing observations is present because the question was only asked to people who were employed within the last two months of the interview. Finally we drop those under age 25, yielding a sample of 1,710 men and 1,548 women. The education variables differ in 1990 and 1995, because of a finer distinction between vocational educations in 1995, and because short advanced degree is not referred to as advanced in 1990. We therefore chose to use the 1995 classification in 1990 to get a consistent measure of education. When both education variables are missing, we use the schooling variable in 1995. Years of education are defined as follows: 7 years: 7th Grade, 9 years: 9th Grade, 10 years: 10th Grade, 11 years: one year vocational or semi-skilled, 12 years: Gymnasium or Other Short Educ, 13 years: Apprenticeship, 14 years: Short Advanced, 16 years: Medium Advanced, 18 years: Long Advanced. The 18 regional dummies are: One for each county (14) plus one for Frederiksberg, Odense, Århus and Ålborg. 21 Compared to the original sample, see Borg and Burr (1997) table 2.5-2.7, younger persons and those with only primary school are under represented in our sample. One reason could be that we exclude those under education, who are included in the tables in Borg and Burr (1997) if they have been employed within two months. Overall, the sample seems to be reasonably representative of employed individuals who have finished their education. References Acheson , D. (1998). Independent Inquiry into Inequalities in Health Report. The Stationary Office. London: The Stationery Office. Adams, S. J. (2001). Educational Attainment and Health: Evidence from a Sample of Older Adults. Education Economics 10 (1). 97-109. Adler, N. E., T. Boyce, M. A. Chesney, S. Choen, S. Folkman, R. L. Kahn, and S. Leonard Syme (1994). Socioeconomic Status and Health, the Challenge of the Gradient. American Psychologist 49 (1), 15-24. Angrist, J. and A. Krueger (1991). Does Compulsory school attendance affect schooling and earnings? Quarterly Journal of Economics 106 (4), 979-1014. Arendt, J. N. (2002). Endogeneity and Heterogeneity in LDV Panel Data Models. Chapter 4 in Ph.D.-thesis, Rød Serie, Nr. 82. Institute of Economics, University of Copenhagen. Arkes, J. (2001). Does Schooling Improve Adult Health? mimeo, RAND. Behrman, J. and B. L. Wolfe (1989). Does More education Make Women Better Nourished and Healthier? Journal of Human Resources 24 (1), 644-663. Berger, M. C. and J. P. Leigh (1989). Education, Self-Selection and Health. Journal of Human Resources 24 (3), 433-455. Borg, V. and H. Burr (1997). Danske Lønmodtageres Arbejdsmiljø og Helbred 1990-95. (The 22 Work Environment and Health of Danish Workers, 1990-95). København, Arbejdsmiljø Instituttet. Bryld, C. –J., H. Haue, K. H. Andersen, and I. Svane (1990). GL 100. Skole, Stand, Forening. Gymnasieskolernes Lærerforening 1890-1990. (GL 100. School, social status, Union. The Danish National Union of Upper Secondary School Teachers). København, Gyldendal. Butler, J. S., R. V. Burkhauser, J. M. Mitchell and T. P. Pincus (1987). Measurement Error in Self-Reported Health Variables. The Review of Economics and Statistics 69 (4), 644-650. Card, D. (1995). Using Geographical Variation in College Proximity to Estimate the Return to Schooling. In Louis N. Christofides, E. Kenneth Grant and Robert Swidinsky (Eds.): Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp. University of Toronto Press: Toronto, 210-222. Card, D. (1999). Causal Effects of Education on Earnings, in: Ashenfelter and Card (Eds.). Handbook of Labor Economics 3A, Elsevier Science, Amsterdam, 1801-1863. Case, A. and A. Deaton. (2002). Consumption, Health, Gender and Poverty. Research Program in Development Studies, Working Paper no. 07/02. Folketingets Årbog 1957-58 (1958). (The yearbook of the Danish Parliament, 1957-58). København: J. H. Schultz Forlag. Folketingets Årbog, 1974-75 (1976). (The yearbook of the Danish Parliament, 1974-75). København: J. H. Schultz Forlag. Fuchs, V. (1982). Time Preference and Health: An Exploratory Study. In Fuchs, V. (Ed.), Economic Aspects of Health, Second NBER Conference on Health in Stanford. University of Chicago Press, 93-119. Griliches, Z. (1977). Estimating the Returns to Schooling: Some Econometric Problems. Econometrica 45 (1), 1-22. 23 Grossman, M. (1972). On the Concept of Health Capital and the Demand for Health. Journal of Political Economy 80 (2), 223-255. Grossman, M. (1975). The Correlation between Health and Education. In: N. Terleckyj (Ed.): Household Production and Consumption. New York: Columbia University Press, pp. 147211. Grossman, M. (2000). The Human Capital Model of the Demand for Health. In: Culyer, A. J. and J. P. Newhouse (Eds.), Handbook of Health Economics 1A. Amsterdam, Elsevier Science, 347-408. Grossman, M. and R. Kaestner (1997). Effects of Education on Health. In: Behrman, J. and N. Stancey (Eds.): The Social Benefits of Education. Ann Arbor, The University of Michigan Press. Hahn, J., P. Todd, and W. Van der Klaauw (2001). Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design. Econometrica 69 (1), 201-209. Hahn, J. and J. Hausman (2002). A New Specification Test for the Validity of Instrumental Variables. Econometrica 70, 163-189. Hansen, E. J. (1982). Hvem bryder den Sociale Arv? (Who Breaks the Social Inheritance?). Socialforskningsinstituttet, publikation 112. Harmon, C. and I. Walker (1995). Estimates of the Economic Return to Schooling for the United Kingdom. American Economic Review 85 (5), 1278-86. Idler, E. (1994). Self-Ratings of health: Mortality, Morbidity, and Meaning. In: Schlechter, S. (Ed.), Proceedings of the 1993 NCHS Conference on the Cognitive Aspects of SelfReported Health Status. NCHS working paper series no.10, 36-59. Kane, T. J. and C. E. Rouse (1993). Labor Market Returns to Two-and Four Year Colleges: Is a Credit a Credit and Do Degrees Matter? NBER Working Paper No. 7235. 24 Kenkel, D. (1991). Health Behavior, Health Knowledge, and Education. Journal of Political Economy, 99 (2), 287-305. Lleras-Muney, A. (2004). The Relationship Between Education and Adult Mortality in the U.S. The Review of Economic Studies (Forthcoming). Meghir, C. and M. Palme (2001). The Effect of a Social Experiment in Education. The Institute of Fiscal Studies, Working Paper 01/11. Mossey, J. and E. Shapiro (1982). Self-rated Health: A Predictor of Mortality among the Elderly. American Journal of Public Health 72, 800-808. Mroz, T. A. and D. K. Guilkey (1992). Discrete Factor Approximations for Use in Simultaneous Equation Models with Both Continuous and Discrete Endogenous Variables. Carolina Population Center Papers 92-03, University of North Carolina at Chapel Hill. Newey, W. K. (1987). Efficient Estimation of Limited Dependent Variable Models with Endogenous Explanatory Variables. Journal of Econometrics 36, 231-250. Rosenzweig, M. R. and T. P. Schultz (1983). Estimating a Household Production Function: Heterogeneity, the Demand for Health Inputs, and Their Effects on Birth Weight. The Journal of Political Economy 91 (5), 723-746. Spasojevic, J. (2003). Effects of Education on Adult Health in Sweden: Results from a Natural Experiment. Ph.d.-thesis submitted to the Graduate School for Public Affairs and Administration, Metropolitan College of New York. Staiger, D. and J. H. Stock (1997). Instrumental Variables Regressions with Weak Instruments. Econometrica 65 (3), 557-586. Sundhedsministeriet (1999). Regeringens Folkesundhedsprogram 1999-2008. (The Danish Governments Public Health Program 1999-2008) TL Offset, København (1999). Vuong, Q. H. (1984). Two-Stage Conditional Maximum Likelihood Estimation of 25 Econometric Models. Social Science Working Paper 538, California Institute of Technology. 26 Figure 1. Mean years of education for men, by year when they are 14 of age. Lowess smoother, bandwidth = .15 Mean Years of Education 14 12.5 1948 1978 Year when turned 14 27 Figure 2. Mean years of education for women, by year when they are 14 of age. Lowess smoother, bandwidth = .14 Mean Years of Education 13.8333 11.5 1948 1978 Year when turned 14 28 Table 1. Summary Statistics from WECS Data, 1990 and 1995. Men Share Mean SHR Share PH Women Share Mean SHR Share PH 7 9 10 11 12 13 14 16 18 0.046 0.035 0.026 0.052 0.040 0.486 0.069 0.133 0.113 1.797 1.746 1.611 1.876 1.573 1.636 1.551 1.546 1.473 0.184 0.144 0.133 0.208 0.110 0.127 0.089 0.103 0.049 0.063 0.048 0.045 0.047 0.103 0.309 0.146 0.198 0.040 1.903 1.853 1.636 1.826 1.636 1.551 1.636 1.544 1.524 0.224 0.227 0.114 0.208 0.125 0.096 0.121 0.088 0.097 25-34 35-44 45-54 55-65 0.242 0.370 0.291 0.097 1.514 1.600 1.687 1.767 0.080 0.108 0.157 0.148 0.214 0.392 0.314 0.080 1.489 1.596 1.693 1.842 0.077 0.116 0.140 0.198 Mean SH Share PH 1.621 0.119 No. Obs 3,096 Variable Years of Education Age All No. Obs 3,420 Mean SH Share PH 1.623 0.122 Notes: Calculated for pooled data in 1990 and 1995. "Share" is the share of observations in the given group. Mean SHR is mean self-reported health status with value 1 corresponding to very excellent health and 5 to very poor. PH is poor health, which corresponds to the three worst outcomes. 29 Table 2. First stage education regressions and reduced form SRH estimations. Outcome: Constant Reform 1958 Reform 1975 Age Age2/1000 1995 F-value 17 Regions F-value 2 reforms Adj. R-squared Observations Women Education Education 14.970 13.525 (0.556) (1.303) 0.352 0.254 (0.172) (0.190) 0.166 0.173 (0.302) (0.302) -0.042 0.034 (0.011) (0.063) -0.911 (0.743) 0.207 0.212 (0.104) (0.104) 8.77 8.80 3.65 1.04 0.067 0.067 3,096 3,096 SRH Education 13.387 (0.545) -0.048 0.401 (0.132) (0.172) 0.066 0.262 (0.235) (0.285) 0.031 0.013 (0.008) (0.010) 0.084 (0.081) 31.42 0.98 3,096 -0.065 (0.100) 12.22 4.47 0.054 3,420 Men Education 13.988 (1.193) 0.437 (0.184) 0.249 (0.286) -0.179 (0.055) 0.365 (0.644) -0.064 (0.100) 12.21 3.82 0.054 3,420 Notes : OLS regressions for education and logit estimation for SRH with standard errors in parentheses. 30 SRH -0.144 (0.133) -0.240 (0.221) 0.014 (0.008) 0.132 (0.077) 56.00 1.28 3,420 Table 3. Ordered Logit Models of SRH. Estimator Men Education Age 1995 Logit (1) -0.091 (0.013) 0.022 (0.004) 0.099 (0.065) RE-Logit (2) -0.109 (0.021) 0.027 (0.006) 0.111 (0.089) 61.44 6,699,77 50.48 6,533.21 166.56 (6) -0.079 (0.014) 0.027 (0.005) 0.105 (0.075) (7) -0.089 (0.016) 0.032 (0.005) 0.125 (0.077) 29.46 6,149.08 26.48 6,091.42 57.66 Educ. Residual Chi2-test, 17 regions -2*Log-Likelihood Tests for RE Women Education Age 1995 Educ. Residual Chi2-test, 17 regions -2*Log(Likelihood) Tests for RE 2SCML 2SCMLR (3) (4) -0.153 -0.223 (0.275) (0.292) 0.022 0.027 (0.004) (0.091) 0.103 0.118 (0.041) (0.091) 0.061 0.115 (0.276) (0.289) 53.55 44.26 6,699.73 6,533.12 166.61 2SCMLR (5) -0.430 (0.820) 0.022 (0.034) 0.029 (0.165) 0.333 (0.823) 33.73 3,013.01 (8) (9) -0.207 -0.251 (0.248) (0.241) 0.021 0.023 (0.004) (0.004) 0.117 0.141 (0.074) (0.076) 0.129 0.162 (0.250) (0.238) 30.98 33.67 6,148.88 6,091.17 57.71 (10) -0.223 (0.414) 0.024 (0.030) 0.097 (0.162) 0.144 (0.415) 37.19 3,056.45 Notes : All estimations are from ordered logits. In (1) and (6) all regressors are exogenous. In (2) and (7), random effects are allowed for. (3) and (8) allows for endogeneity of education. (4) and (9) allows for endogeneity and random effects. (5) and (10) contains estimates for the cohort aged 44 to 57 in 1995. All are with age included linearly. All standard errors in two-stage estimations are bootstrapped using a naive bootstrapping procedure with 200 replications. School reforms in 1958 and 1975 are used as instruments. All estimations also include 17 regional dummies, and a Chi2 test for their joint significance is presented. In the row "Tests for RE", the difference between -2logL for models with and without random effects are given. 31 Table 4. Descriptive statistics for BMI and smoking, 1990 and 1995. BMI<19 19<=BMI<25 25<=BMI<30 30<=BMI Men Share Mean SRH Mean educ 0.010 1.636 14.303 0.546 1.565 13.761 0.376 1.656 13.067 0.070 1.862 12.180 Women Share Mean SRH Mean educ 0.072 1.561 13.646 0.725 1.597 13.172 0.158 1.693 12.501 0.045 1.900 11.707 Never smoked Have smoked 0.303 0.697 0.398 0.602 1.506 1.671 13.839 13.201 1.598 1.640 Notes: Calculated for pooled data in 1990 and 1995. "Share" is the share of observations in the given group. Mean SHR is mean self-reported health status with value 1 corresponding to very excellent health and 5 to very poor. Mean educ is mean years of education in the given group. 32 13.316 12.848 Table 5. Education effects on BMI in healthy range and never smoking indicator. Estimator Men Education Age 1995 Educ. Residual Chi2-test, 17 regions Women Education Age 1995 Educ. Residual Chi2-test, 17 regions RE-logit 2SCMLR BMI in healthy range 0.128 0.117 (0.015) (0.275) -0.030 -0.030 (0.004) (0.004) -0.176 -0.176 (0.073) (0.073) 0.011 (0.028) 19.26 19.22 RE-logit 2SCMLR Never smoked 0.116 -0.640 (0.016) (0.307) -0.046 -0.045 (0.005) (0.005) 0.216 0.212 (0.080) (0.080) 0.759 (0.302) 25.81 26.24 BMI in healthy range 0.063 0.084 (0.016) (0.336) -0.021 -0.020 (0.005) (0.019) -0.087 -0.093 (0.085) (0.124) -0.021 (0.336) 18.99 18.87 Never smoked 0.084 -0.639 (0.015) (0.305) 0.006 -0.033 (0.005) (0.017) -0.013 0.178 (0.078) (0.112) 0.725 (0.306) 39.36 32.74 Notes : From logit models, see table 3. 33 Table 6. Testing for weak instruments, in a linear model for SRH. Education effect OLS 2SLS Reverse 2SLS 2.order Bias Nagar Reverse Nagar Optimal Nagar Test 1 Test 2 Men -0.034 -0.064 -0.453 -0.119 -0.064 -0.139 -0.068 14.31 0.40 Women -0.029 -0.087 -0.386 -0.108 -0.087 -0.105 -0.091 17.36 0.09 Notes : All estimates are from linear models with control for age, a 1995 dummy and 17 regions. The estimators and tests are from Hahn and Hausman (2002). 34
© Copyright 2026 Paperzz