Letters to the Editor sequence of aggregation that the range of values is decreased (refer to the standard deviations in table 2 (1)). At a simple level, it is easy to see that the aggregate-level income attached to the lowest-income person in a sample will be above the individual income (there is no one on the same or lower income level to aggregate with); for the richest person, the aggregate income will be lower than the individual income for the same reason. The individual and aggregate coefficients are simply not comparable, as they have different meanings (e.g., the increased risk of mortality for someone earning $10,000 less than someone else as opposed to the increased risk of mortality for someone living in an area in which the median income is $10,000 less than in another area). As the range of area-based incomes is highly compressed compared with individual incomes, it is almost certain that the coefficients for the former will be considerably larger than those for the latter. In fact, taking the standard deviations into account, the standardized regression coefficients are, if anything, larger for individual than for area-based measures. For example, the standardized coefficient for microlevel education is 0.375; for aggregate 1980 education based on zip code, it is 0.225. We think that the work of Geronimus and Bound (1) should provide some reassurance to researchers on the use of area-based measures when no individual data on socioeconomic position exist. Certainly, when other sources of data are not available, as in some routine and administrative data systems, it can be valuable to link area-based measures to the data. However, it is also clear that individual and aggregate indices do not measure the same thing. Areabased measures are not a reliable substitute for individual data; equally, individual-based measures do not provide complete socioeconomic data. If possible, information should be collected about individual socioeconomic circumstances across the life course (10), together with area-based measures. Only by measuring both micro- and macrolevel data and exploring their interactions will researchers better understand the respective roles of individual and area-based socioeconomic influences on health. REFERENCES 1. Geronimus AT, Bound J. Use of census-based aggregate variables to proxy for socioeconomic group: evidence from national samples. Am J Epidemiol 1998; 148:475-86. 2. Dolk H, Mertens B, Kleinschmidt I, et al. A standardisation approach to the control of socioeconomic confounding in small area studies of environment and health. J Epidemiol Community Health 1995;49(suppl 2):S9-14. 3. Carr-Hill R, Rice N. Is enumeration district level an improvement on ward level analysis in studies of deprivation and health? J Epidemiol Community Health 1995;49(suppl 2): S28-9. 4. Hyndman JC, Holman CD, Hockey RL, et al. Misclassification of social disadvantage based on geographical areas: comparison of postcode and collector's district analyses. Int J Epidemiol 1995;24:165-76. 5. Krieger N. Overcoming the absence of socioeconomic data in medical records: validation and application of a census-based methodology. Am J Public Health 1992,82:703-10. 6. Carstairs V, Morris R. Deprivation and health in Scotland. Aberdeen, Scotland: Aberdeen University Press, 1991. 7. Davey Smith G, Hart C, Watt G, et al. Individual social class, area-based deprivation, cardiovascular diseaseriskfactors, and mortality: the Renfrew and Paisley Study. J Epidemiol Community Health 1998;52:399-405. 8. Kunst AE, Mackenbach JP. The size of mortality differences Am J Epidemiol Vol. 150, No. 9, 1999 997 associated with educational level in nine industrialized countries. Am J Public Health 1994,84:932-7. 9. Davey Smith G, Hart C, Hole D, et al. Education and occupational social class: which is the more important indicator of mortality risk? J Epidemiol Community Health 1998;52: 153-60. 10. Davey Smith G, Hart C, Blane D, et al. Lifetime socioeconomic position and mortality: prospective observational study. BMJ 1997;314:547-52. George Davey Smith Yoav Ben-Shlomo Department of Social Medicine University of Bristol Bristol, United Kingdom BS8 2PR Carole Hart Department of Public Health University of Glasgow Glasgow, Scotland G12 8RZ THE A UTHORS REPLY We welcome the comments of Davey Smith et al. (1) on our paper (2) reporting on our empirical assessment of the increasingly common practice of using census-based aggregate socioeconomic variables to measure individual socioeconomic characteristics when microlevel data are unavailable. We are pleased to see the additional support, conceptual elaboration, and empirical examples that Davey Smith et al. provide for our principal conclusions that areabased measures are not a reliable substitute for individual data; that, in many cases, when aggregate socioeconomic data are used, the size of the area does not appear to be key; and that the knowledge to be gained from using this approach is limited. Major advances in the understanding of social inequalities in health are likely to require substantially improved data collection, not overreliance on geocoding to substitute for uncollected microdata as well as to measure important contextual effects. Davey Smith et al. (1) do raise a question about our interpretation of the fact that a unit change in an aggregate variable will typically have a larger effect on health outcomes than a unit change in a comparable individual-level variable. Here, let us clarify the context in which we were working and in which we believe this interpretation is reasonable (refer to Geronimus et al. (3) for more details). We are imagining a situation in which the only data that researchers have available to them are aggregate-level variables. Researchers use these aggregate-level variables as proxies for individual characteristics. How fair is it to interpret such estimates as comparable to those that would have been obtained if microlevel information were available? In the fields of sociology and economics, a substantial literature discusses the circumstances under which it is appropriate to use aggregatelevel variables to proxy for individual-level ones (4—7). However, different circumstances apply in epidemiology, motivating us to explore this question in the specific context of health outcomes. Suppose that some health outcome, y (one could think of y as the mortality rate), depends linearly on individual-level socioeconomic position, x, and other factors that are independent of socioeconomic position, e. Thus, we can write the following: y - a + JT(3 + e, where a and (3 are parameters 998 Letters to the Editor and, by assumption, e is independent of x. Now suppose that we have information on y and x from a random sample of persons arrayed across distinct geographic areas. Let i index the person within an area and j the geographic areas. Thus, we can write y as yijt x as Xy, and e as £,-,-. Each of these variables can be decomposed into components that vary within and between geographic areas, therefore, for example, defining Xj as the mean of x within location j and vy as the individual-specific deviation around this mean, Xy = Xj + Vy. Now, one could imagine estimating (J by running a regression analysis of yy on Xy. However, if one had data only on the location means of x, then one could imagine estimating (3 by running a regression analysis of y« on Xj. If not only Xy but also x. are independent of e, then either procedure will consistently estimate p\ Since we would expect substantially less variance in jr • than in xy, the estimates will imply that a standard deviation change in the aggregate variable will have less of an impact on the outcome than a standard deviation change in the individual-level variable. In this context, the unstandardized coefficients would be comparable (i.e., they would consistently estimate the same parameter), while standardized coefficients would not be comparable. The situation we have outlined is one in which it would be perfectly legitimate to use an aggregate variable as a proxy for a microlevel one. However, the assumption that e is independent of x is unlikely to hold in practice. In particular, suppose that x represents only a component of socioeconomic position and that, among other things, e includes other unmeasured components of socioeconomic position. In this case, we would expect cov(e^c) > 0, and our estimates of P will tend to exaggerate the causal effect of x alone on the outcome variable. Under what circumstances will this tend to be more true when we use Xj rather than Xyl The use of x,• will produce the larger magnitude coefficient if, when both are included in the same regression, the coefficients on the two variables are of the same sign (6, 8). In the context, where x is an indicator of socioeconomic position, a natural interpretation is that the aggregate variable represents a broader construct than the microlevel variable and is likely to exaggerate the effect of the microlevel counterpart on outcomes of interest (2, 3, 7). Let us emphasize that we are talking within the context in which the aggregate-level variable (xj) represents the location-specific mean (or something like it) of the individual-level variable (xy) and in which the investigator wishes to use the aggregate variable as a proxy for the microlevel one. An example is when the investigator wants to estimate the effect of individual income on a health outcome but uses the areal measure of median income in a defined geographic unit in which the person resides as a proxy for individual income. It is within this context—in which, among other things, the aggregateand individual-level variables, Xj and Xy, are measured in the same units—that it makes sense to compare unstandardized regression coefficients. The context just described is precisely the context of our paper: In our health outcome equations, we compared coefficients for income, education, and occupation, first measured at the microlevel and then at the aggregate level. We were simulating what many US researchers have in fact been doing, that is, using median income in a census area to substitute for uncollected data on individual income. We did this exercise to arrive at an empirical sense of the validity of this increasingly used approach to remediating shortcomings in health data sets. One source of the confusion may be that Davey Smith et al. (1) come from a research tradition in the United Kingdom, where socioeconomic variables measured at the individual level are often collected. The availability of microlevel economic characteristics on health data sets offers researchers the option of including them in health outcome equations with the clear intention that they have meanings different from economic characteristics measured areally. For example, individual occupation can be used to measure a person's social class, while the Carstairs deprivation score can be included as a measure of the socioeconomic character of the area in which that person resides. Each of these variables is conceptualized explicitly as making an independent contribution to the health outcome. In such a case, it is important to find a way to compare the coefficients on these conceptually and metrically distinct variables to assess the relative magnitudes of each variable's independent contribution to a person's health. Using standardized coefficients represents one way to do this, while the relative index of inequality (RII) represents another. Unfortunately, it is the exceptional US health data set that includes detailed individual-level socioeconomic information and the geocodes necessary to make linkages to areal information on the characteristics of respondents' residential areas. This lack of individual and contextual information inhibits our ability to better understand health inequalities. Instead, social epidemiologists too often find themselves in the rather confusing position of having to rely on census-based variables to do "double duty": serving sometimes as proxies for an individual characteristic, sometimes as measures of contextual effects. Our findings that neither size of area nor length of time since census data collection (at least within a 20-year period) is critical (2) should be reassuring to those investigators who are forced to continue to use this approach in the absence of microlevel socioeconomic data. Davey Smith et al. (1) offer additional examples of the question of size that should also offer a measure of comfort to the many investigators who have access to zip-code-level data only. Even so, we believe that our broader findings imply that results arrived at by substituting aggregate socioeconomic variables for uncollected microlevel data require cautious interpretation, with attention to the demonstrated limits (2, 3). Our ultimate goal should be to get both sorts of socioeconomic measures (individual and contextual) together in the same data sets. While it currently may be expedient or economically prudent to use census-based proxies on double duty, greater advances in our understanding of health inequalities will likely be derived from analyses that conceptually and empirically separate individual and contextual socioeconomic indicators. REFERENCES 1. Davey Smith G, Hart C, Ben-Shlomo Y. Re: "Use of censusbased aggregate variables to proxy for socioeconomic group: evidence from national samples." (Letter). Am J Epidemiol 1999; 150:996-7. 2. Geronimus AT, Bound J. Use of census-based aggregate variables to proxy for socioeconomic group: evidence from national samples. Am J Epidemiol 1998; 148:475-86. 3. Geronimus AT, Bound J, Neidert LJ. On the validity of using Am J Epidemiol Vol. 150, No. 9, 1999 Letters to the Editor 999 census geocode characteristics to proxy individual socioeconomic characteristics. J Am Stat Assoc 1996;91:529-37. Theil H. Linear aggregation of economic relations. Amsterdam, the Netherlands: North-Holland Publishing Co, 1954. Hannan MT. Aggregation and disaggregation in the social sciences. Lexington, MA: Lexington Books, 1971. Firebaugh G. A rule for inferring individual-level relationships from aggregate data. Am Sociol Rev 1978;43:557-72. Hammond JL. Two sources of error in ecological correlations. Am Sociol Rev 1973;38:764-77. Mundlak Y. On the pooling of time series and cross section data. Econometrica 1978,46:69-85. Am J Epidemiol Vol. 150, No. 9, 1999 Arline T. Geronimus Department of Health Behavior and Health Education University of Michigan School of Public Health Ann Arbor, MI 48109-2029 John Bound Department of Economics University of Michigan Ann Arbor, MI 48109-2029
© Copyright 2026 Paperzz