American Journal of Epidemiology Copyright ª 2005 by the Johns Hopkins Bloomberg School of Public Health All rights reserved; printed in U.S.A. Vol. 163, No. 1 DOI: 10.1093/aje/kwj011 Advance Access publication November 17, 2005 Original Contribution Estimating the Proportion of Disease due to Classes of Sufficient Causes Kurt Hoffmann, Christin Heidemann, Cornelia Weikert, Matthias B. Schulze, and Heiner Boeing From the Department of Epidemiology, German Institute of Human Nutrition, Potsdam-Rehbrücke, Germany. Received for publication April 21, 2005; accepted for publication August 17, 2005. Disease can be caused by different mechanisms. A possible causal model proposed by Rothman is a complete causal mechanism or a so-called ‘‘sufficient cause’’ consisting of a set of component causes that can be illustrated in a pie chart. However, this model does not allow finding out what sufficient causes produce the majority of cases. The authors’ objective was to extend Rothman’s work by quantifying the proportion of disease that can be attributed to a class of sufficient causes. The underlying idea was to consider all combinations of a given set of known risk factors and to assign each combination to a class of sufficient causes. This assignment makes it possible to evaluate a class of sufficient causes by the population attributable fraction of the corresponding combination of risk factors. The approach presented was applied to sufficient causes of myocardial infarction by use of data on participants recruited between 1994 and 1998 into the European Prospective Investigation into Cancer and Nutrition-Potsdam Study. As a result, 51.8% of cases were attributed to only four different classes of sufficient causes. In conclusion, the statistical method described in the paper may be beneficial for quantifying the importance of different sufficient causes and for improving the efficiency of public health programs. models, statistical; myocardial infarction; risk factors; statistics Abbreviations: EPIC, European Prospective Investigation into Cancer and Nutrition; PAF, population attributable fraction; PDC, proportion of disease due to a class of sufficient causes. Rothman’s model of sufficient and component causes (1, 2) introduced in 1976 is one of the most discussed causal models in epidemiology. Despite some inherent limitations (3, 4), this model seems to be appropriate to reflect the multiplicity of causal pathways and the biologic interaction among component causes. The pie-chart description of possible classes of sufficient causes is an illustrative method to differentiate among distinct etiologic mechanisms, leaving some unlabeled slices to represent unknown component causes. However, up to now, Rothman’s model is only a theoretical framework for epidemiology, with no direct connection to empirical data. Although epidemiologic studies on a specific disease should give information as to which of the possible sufficient causes probably exist and which do not, no approach has been suggested to get quantitative results. In other words, the application of the sufficient-component cause model in epidemiologic research is hampered by lack of an appropriate statistical method for estimating the proportion of disease due to different sufficient causes. Thus, an extension of Rothman’s work is needed. An obvious approach to find out the most relevant sufficient causes is based on calculating the frequencies of component causes and of their combinations within cases (5). However, a high frequency of a specific combination of component causes within cases does not necessarily mean that this combination is a sufficient cause of high relevance, because the corresponding frequency within noncases may also be high. Moreover, this approach does not allow adjustment for other known factors and is not related to the common notions and concepts of epidemiology. In general, the importance of sufficient causes depends on both the relative risks and the frequencies of all possible component causes Correspondence to Dr. Kurt Hoffmann, Department of Epidemiology, German Institute of Human Nutrition, Arthur-Scheunert-Allee 114–116, 14558 Nuthetal, Germany (e-mail: [email protected]). 76 Am J Epidemiol 2006;163:76–83 Classes of Sufficient Causes and their combinations. Because the population attributable fraction (PAF) already comprises the strength of effect and the frequency of exposure, it is promising to look for a complex relation between sufficient causes and PAF. Such a relation should enable us to evaluate the importance of sufficient causes by using the same epidemiologic data that are useful for estimating PAF. In the present paper, a system of equations is derived and represented that relates adjusted PAF to the proportion of disease due to a class of sufficient causes (PDC). Solving these equations leads to an estimated frequency distribution for sufficient causes. Moreover, PDC itself can be interpreted as the PAF of a complex event that is characterized by a specific combination of present and absent single risk factors. Thus, estimation formulas and confidence intervals proposed for PAF can be carried over to PDC. We applied this approach to data from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam Study to evaluate classes of sufficient causes for incident myocardial infarction. SUFFICIENT CAUSES AND POPULATION ATTRIBUTABLE FRACTION A ‘‘sufficient cause’’ is defined as a set of minimal conditions that inevitably produce disease (2). It consists of a number of component causes. Each component cause is necessary for the completion of the sufficient cause. The completion of a sufficient cause is equivalent to the onset of the earliest stage of disease. In other words, the component cause that acts last determines the time of the onset. If one or more of the component causes are absent, the causal mechanism cannot operate. Two sufficient causes can be similar because of having some of the same components. Differentiating between these sufficient causes can possibly be difficult, since some of the distinct component causes are often not known or not measured. Leaving some pieces unlabeled in a pie chart is not only a way to represent unknown component causes but also a possibility to group similar sufficient causes. Thus, any pie chart with an unlabeled piece can be considered a class of sufficient causes. The common characteristic of sufficient causes belonging to the same class is the same set of known component causes. Formally, such classes can also include sufficient causes with unpredictable components to include random events (6). For the sake of simplicity, we first regard the case of only two known component causes, for example, X1 and X2. Then, four different classes of sufficient causes can be distinguished as shown in figure 1. The class S1 comprises all sufficient causes that have X1, but not X2, as the component cause. The symbol U1 stands for one or more unknown component causes. Analogously, S2 is defined as the class of sufficient causes that contains X2 as the only known component. S12 stands for causal mechanisms that require the presence of both X1 and X2. Finally, the class S0 comprises all sufficient causes that produce disease in the absence of X1 and X2. Obviously, any single sufficient cause belongs to one and only one of the four classes depicted in figure 1. Am J Epidemiol 2006;163:76–83 77 FIGURE 1. The four classes (S1, S2, S12, and S0) of sufficient causes in the case of two known risk factors (X1 and X2). The U symbols stand for one or more unknown component causes. Now, consider the PAF of the component causes X1 and X2. The PAF of a single condition is defined here as the fraction of cases that would not have occurred if the condition were not present (7). If stratification successfully removes confounding, the PAF can be validly estimated by the formula of Miettinen (8) that requires estimates of the condition prevalence among cases and the risk ratio standardized to the condition (2, p. 295). Analogously, the joint PAF of two conditions defined as the fraction of cases that would not have occurred if both conditions were not present can also be estimated by Miettinen’s formula. Denoting the proportion of disease due to a class S of sufficient causes by PDC(S), the following four equations hold. PAFðX1 Þ ¼ PDCðS1 Þ þ PDCðS12 Þ PAFðX2 Þ ¼ PDCðS2 Þ þ PDCðS12 Þ PAFðX1 ;X2 Þ ¼ PDCðS1 Þ þ PDCðS2 Þ þ PDCðS12 Þ PAFðX1 ;X2 Þ ¼ 1 PDCðS0 Þ: ð1Þ The equations can be verified by the pie charts of figure 1. First, consider the case that X1 has not occurred. Then, sufficient causes of types S1 and S12 could not have been completed, whereas the completion of all other sufficient causes would not have been affected. Therefore, the fraction of cases that can be attributed to X1 must be the sum of the proportions of disease caused by S1 and S12 as stated by the first equation. The second equation has a quite similar interpretation, but here X2 plays the role of X1. The remaining two equations refer to the joint PAF of both component causes. A generalization of these equations to the situation of more than two known component causes is straightforward. In general, the joint PAF of any m conditions can be calculated by summing up the PDC values of all sufficient causes that contain at least one of the m conditions as component cause. 78 Hoffmann et al. The equations 1 are not suitable to calculate the frequencies of sufficient causes from the single and joint PAF of component causes. For this purpose, the equations 1 should be rewritten in the following form: PDCðS1 Þ ¼ PAFðX1 ; X2 Þ PAFðX2 Þ PDCðS2 Þ ¼ PAFðX1 ; X2 Þ PAFðX1 Þ PDCðS12 Þ ¼ PAFðX1 Þ þ PAFðX2 Þ PAFðX1 ;X2 Þ PDCðS0 Þ ¼ 1 PAFðX1 ;X2 Þ: ð2Þ To get an impression of equations 2, we considered a very strong component cause X1 with a PAF of 0.5 and a strong component cause X2 with a PAF of 0.3. For simplicity, we assumed that both components are risk factors and that the presence of one of the components is not protective for disease in any etiologic mechanism. Then, the joint PAF cannot be smaller than the larger of each individual PAF and cannot be larger than their sum. Therefore, in this example, the joint PAF can vary only between 0.5 and 0.8. For different values of the joint PAF, table 1 gives the frequency distribution for sufficient causes. Obviously, for the increasing value of the joint PAF, the proportions of S1 and S2 also increase, whereas the proportions of S12 and S0 decrease. In the case of the minimal joint PAF, no sufficient cause that contains only the weaker component X2 can exist. Thus, the proportion of disease due to S2 has to be zero. More interesting is the opposite case of the maximal joint PAF. Here, the joint PAF is equal to the sum of single PAFs, which means that no interaction in public health contexts between X1 and X2 exists (9, 10). As we can see in the last row of table 1, the additivity of attributable fractions implies that both causes X1 and X2 cannot belong to a common sufficient cause. In other words, they act in different etiologic mechanisms and have no biologic interaction (2). Thus, the two concepts of public health and biologic interaction have the same referent point corresponding to PAF(X1,X2) ¼ 0.8. The equations 2 can be generalized to the case of more than two component causes. Let k be the number of known component causes, and then 2k classes of sufficient causes can be distinguished. To calculate the PDC for all 2k classes from the PAF of the component causes, the same number of equations is necessary. These 2k equations are derived in appendix 1. EVALUATING SUFFICIENT CAUSES Up to now, we have only a vague impression of PDC as the proportion of cases that is due to a specific class of sufficient causes. Indeed, we need a precise definition of this notion by using the terminology of probability theory. For this purpose, consider the class of sufficient causes that contains from the k known factors X1, . . ., Xk only the first m ones as component causes. We denote this multifactorial event by E ¼ (X1 ¼ 1, . . ., Xm ¼ 1, Xmþ1 ¼ 0, . . ., Xk ¼ 0) and the corresponding class of sufficient causes by SE. Obviously, the PDC of SE cannot be larger than the proportion of cases that had the event E at the onset of disease. Formally, the PDC is bounded above by P(EjD) that denotes the con- TABLE 1. Relation between the population attributable fraction of component causes and the frequency distribution of sufficient causes in a fictive example Population attributable fraction Frequency of classes of sufficient causes* X1 X2 X1, X2 S1y 0.5 0.3 0.5 0.2 0 0.3 0.5 0.5 0.3 0.55 0.25 0.05 0.25 0.45 0.5 0.3 0.6 0.3 0.1 0.2 0.4 0.5 0.3 0.65 0.35 0.15 0.15 0.35 0.5 0.3 0.7 0.4 0.2 0.1 0.3 0.5 0.3 0.75 0.45 0.25 0.05 0.25 0.5 0.3 0.8 0.5 0.3 0 0.2 S2 S12 S0 * Frequencies were calculated from the population attributable fractions by use of equation 2 given in the paper. y S1 is the class of sufficient causes containing X1 and not X2 as component cause, S2 contains X2 and not X1, S12 contains both X1 and X2, and S0 contains neither X1 nor X2. ditional probability of the event E under the condition that the individual develops the disease D. However, within the stratum of cases that have the event E, a fraction of individuals would also develop the disease if all known component causes of SE were absent before. The remaining fraction in this stratum is equal to the proportion of cases that is due to SE. Thus, we get the definition, PDCðSE Þ ¼ PðEjDÞ PðDjEÞ PðDjE0 Þ ; PðDjEÞ ð3Þ where E0 stands for the zero event (X1 ¼ 0, . . ., Xk ¼ 0), whereas P(DjE) and P(DjE0) are the disease rates in the strata defined by E and E0, respectively. Throughout the paper, we assume that the disease rate among the individuals with event E exceeds the disease rate among the unexposed. Dividing the numerator and the denominator of the righthand term by P(DjE0) yields the representation, RR 1 ; ð4Þ RR with RR denoting the relative risk of the disease in individuals with event E versus individuals with E0. Obviously, equation 4 is similar to the formula of Miettinen (8) for attributable fraction. Actually, the PDC of the class SE of sufficient causes is equal to the PAF of the multifactorial event E. By definition, the class of sufficient causes and the multifactorial event form an inseparable pair. The 2k different combinations for presence and absence of the k known factors define the same number of strata in the sample and the same number of classes of sufficient causes. It can be proven that the definition of PDC by equation 3 or equation 4 is consistent with the relations between PDC and PAF represented before. Actually, equation 4 gives a solution of the equation systems 1 and 2 and its generalization to the multicomponent case (appendix 2). Because 2k equations for 2k different PDC variables cannot have more than one solution for each PDC, the solution just given is the unique one. PDCðSE Þ ¼ PðEjDÞ Am J Epidemiol 2006;163:76–83 Classes of Sufficient Causes Equation 4 is suitable to get a valid estimate of PDC from any valid estimate of RR. Because equation 4 is the PAF formula of Miettinen for the event E, it holds also if adjustment is needed (2). The estimate of PDC can be adjusted for confounding factors by applying any one of the available statistical methods (11–13). We chose the most flexible and general adjustment method based on RR estimates from a logistic regression model that includes all the confounders assessed in the study. Confidence limits for the estimated PDC can be derived from the theory of PAF by using an implicit delta method (14, 15), the delta method ignoring some variances and covariances (16, 17), the asymptotic variance formula for the maximum likelihood estimator (18), or a variance formula originally proposed for casecontrol studies (2, 19, 20). In this paper, we utilized the last one. To differentiate between relevant and nonrelevant sufficient causes and to simultaneously estimate the PDC for all relevant classes of sufficient causes, we propose the following multistep procedure. 1. Define indicator variables (dichotomous variables) for all 2k 1 multifactorial events different from the zero event E0. 2. Define all covariates that are planned for the analysis. 3. Starting with a model with all the covariates defined before, apply stepwise regression on the indicator variables only, to include all significant indicator variables in the final model. 4. Calculate the relative frequencies of the remaining significant multifactorial events within all cases. 5. Estimate the relative risks for the significant multifactorial events and insert these estimates into equation 4 to estimate the corresponding PDC. This procedure includes statistical tests for the stratumspecific effects (step 3). A significant rejection of the hypothesis that the stratum-specific effect parameter is zero is equivalent to a significant rejection of the corresponding hypothesis RR ¼ 1 and is a necessary condition for the significant rejection of the hypothesis PDC ¼ 0. Thus, exclusion of nonsignificant indicator variables ignores only such classes of sufficient causes for which the confidence interval for PDC does contain zero. The subsequent statistical analysis was performed with SAS software (21). APPLICATION TO REAL DATA Study population The study population was the EPIC-Potsdam cohort, which is one of the two German cohorts contributing to the EPIC Study, a multicenter cohort study into diet and chronic diseases (22). A total of 27,548 subjects aged from 35 to 65 years were recruited from the general population between 1994 and 1998 (23). Baseline examinations included anthropometric measurements, blood sampling, the completion of a food frequency questionnaire, and a personal interview on lifestyle habits and medical history. Follow-up questionnaires are sent to the study participants every 2–3 years. Am J Epidemiol 2006;163:76–83 79 All potential cases of incident myocardial infarction were identified by self-reports or death certificates. Cases were defined as participants who developed myocardial infarction according to International Classification of Diseases, Tenth Revision, codes I21.0–I21.9 (24). All incident cases of nonfatal or fatal myocardial infarction were verified by patients’ medical records or death certificates according to World Health Organization Monitoring of Trends and Determinants in Cardiovascular Disease (MONICA) criteria (25). Participants with missing follow-up status or prevalent myocardial infarction at baseline were excluded, leaving 26,972 participants (10,470 men and 16,502 women) for analyses. Among these, we identified 159 newly diagnosed cases of myocardial infarction (116 nonfatal and 43 fatal) that occurred between baseline and April 30, 2004. Mean followup was 4.6 years. Risk factors From the potentially modifiable risk factors associated with myocardial infarction (26–30), the following four were considered: smoking, hypertension, obesity, and lack of exercise. We defined individuals at risk from smoking if they were current smokers or had quit smoking within the past 5 years. Hypertension was defined as having a systolic blood pressure of 140 mmHg or more, having a diastolic blood pressure of 90 mmHg or more, taking antihypertensive medication, or self-reporting a hypertension diagnosis. According to previous studies (30), obesity was considered abdominal obesity and defined by a waist/hip ratio of 0.9 or more in men and 0.8 or more in women. Individuals were judged to have a lack of exercise if they engaged in less than 2 hours a week in sporting activities. Estimated risks and frequencies Indicator variables were defined for all 15 combinations of the four risk factors that include at least one present risk factor. The prevalence of each combination within cases and noncases is given in table 2. As expected, the percentage of individuals with three or four risk factors was markedly higher within cases than within noncases. Indeed, 78.1 percent (48.5 percent plus 29.6 percent) of cases had at least three risk factors, whereas the corresponding percentage within noncases was only 37.6 percent (30.0 percent plus 7.6 percent). However, the inverse relation was observed for individuals with none, one, or two risk factors. Relative risks for the 15 combinations were estimated using a logistic regression model adjusted for age, sex, prevalence of diabetes, and history of dyslipidemia based on self reports of a diagnosis or of taking cholesterol-lowering medication. The results are given in the last column of table 2. The relative risks for combinations with one or two present risk factors were not significantly different from one. In contrast, lack of exercise together with any two other risk factors was associated with a significantly elevated relative risk for myocardial infarction. The highest relative risk of 5.4 (95 percent confidence interval: 3.5, 8.5) was associated with the simultaneous presence of all four risk factors. 80 Hoffmann et al. TABLE 2. Prevalence and relative risk for combinations of risk factors for incident myocardial infarction in the European Prospective Investigation into Cancer and Nutrition-Potsdam Study (159 cases, 26,813 noncases), 2004 Combinations Smokingy Prevalence (%) Hypertensionz Obesity§ Lack of exercise{ Absent Absent Absent Cases Noncases Relative risk (95% confidence interval)* No risk factor Absent 0.0 5.4 1 One risk factor Present Absent Absent Absent 0.0 2.3 NS# Absent Present Absent Absent 2.5 2.3 NS Absent Absent Present Absent 0.0 2.8 NS Absent Absent Absent Present 2.5 16.5 NS 5.0 23.9 Total Two risk factors Present Present Absent Absent 0.0 0.7 NS Present Absent Present Absent 1.3 1.3 NS Present Absent Absent Present 3.1 6.4 NS Absent Present Present Absent 3.1 3.6 NS Absent Present Absent Present 3.1 9.4 NS Absent Absent Present Present 6.3 11.7 NS 16.9 33.1 Total Three risk factors Present Present Present Absent 1.9 1.3 Present Present Absent Present 4.4 2.6 4.9 (2.2, 11.1) Present Absent Present Present 12.6 5.6 4.3 (2.5, 7.4) Absent Present Present Present 29.6 20.5 2.0 (1.3, 3.1) 48.5 30.0 29.6 7.6 Total NS Four risk factors Present Present Present Present 5.4 (3.5, 8.5) * Relative risk of the combination of risk factors, with the event of no present risk factor as the referent category. y ‘‘Smoking’’ was defined as current smoking or stopping smoking within the past 5 years. z ‘‘Hypertension’’ was defined as having a systolic blood pressure of 140 mmHg, a diastolic blood pressure of 90 mmHg, taking antihypertensive medication, or self-reporting a hypertension diagnosis. § ‘‘Obesity’’ was defined by a waist/hip ratio of 0.9 in men and 0.8 in women. { ‘‘Lack of exercise’’ was defined as individuals engaging in less than 2 hours a week in sporting activities. # NS, nonsignificant. The indicator variable for the combination of risk factors was excluded by stepwise regression, starting with a model that includes age, sex, prevalence of diabetes, and history of dislipidemia as covariates. Following the procedure described before, PDC was estimated for all classes of sufficient causes that correspond to significant multifactorial events. The estimates and the 95 percent confidence intervals are given in figure 2. As expected from table 2, the class of sufficient causes containing all four risk factors as component causes was the most important one. The estimated proportion of disease due to this class was 24.1 percent. Somewhat surprisingly, the class of sufficient causes that requires the presence of hypertension, obesity, and lack of exercise ranked second with 14.6 percent of cases, although the corresponding relative risk was only 2.0. This high percentage can be explained by the high prevalence of the corresponding multifactorial event. The PDCs of the other two classes with three known component causes were 9.6 percent and 3.5 percent. The PAF of the single risk factors can be estimated by summing up the PDC values of all sufficient causes that contain the risk factor as component cause. Thus, lack of exercise is the risk factor with the highest PAF of 51.8 percent, followed by obesity (48.3 percent), hypertension (42.2 percent), and smoking (37.2 percent). DISCUSSION The concept of sufficient cause and component causes introduced by Rothman is a tempting approach to model causality. In contrast to a completely deterministic model, stochastic elements for treating unmeasured or unpredictable Am J Epidemiol 2006;163:76–83 Classes of Sufficient Causes 81 FIGURE 2. Proportion of myocardial infarction (95% confidence intervals) due to classes of sufficient causes in the European Prospective Investigation into Cancer and Nutrition-Potsdam Study, 2004. component causes can be included in the pie charts by unlabeled slices. Thus, sufficient causes can be modeled without knowing the component causes completely. In this paper, we classified sufficient causes by multifactorial events that reflect specific combinations of known risk factors. Therefore, sufficient causes are not distinguishable if they belong to the same class. However, sufficient causes have different known causal components if they belong to different classes. Based on such a classification scheme, a statistical approach was described to estimate the proportion of disease that is due to a class of sufficient causes. Although the PDC is defined as attributable fraction of a multifactorial event, it clearly differs from the joint PAF of multiple factors because the known component causes of the sufficient causes are fixed. The estimated PDC of all possible classes allows a partition of cases into strata characterized by combinations of known risk factors, whereas PAF estimates refer to overlapping events in general. Therefore, the practical value of the presented estimation procedure consists in deriving more detailed and precise results on the associations between the presence of risk factors and development of disease. To illustrate this issue, consider abdominal obesity as a risk factor for myocardial infarction in the EPIC-Potsdam Study. A recommendation to lose weight would be addressed to all obese individuals if only the PAF estimates are given, whereas the PDC estimates allow restricting the recommendation to obese individuals lacking exercise who are smokers or have hypertension. For both strategies, the proportion of disease that can be potentially prevented is 48.3 percent. However, the efficiency of the second strategy is much higher, because only 33.7 percent of subjects have the described risk profile compared with 53.1 percent of individuals who are obese (refer to table 2 and summing up respective prevalence within noncases). Am J Epidemiol 2006;163:76–83 Some limitations for the described estimation procedure should be noted. First, the simultaneous consideration of a lot of risk factors and the differentiation among many classes of sufficient causes require a high number of cases. If the sample size and especially the number of cases are too low, stepwise regression will exclude many indicator variables, and the estimated PDC of some important classes of sufficient causes may not be significantly different from zero. This limitation was apparent in the EPIC-Potsdam Study of incident myocardial infarction in which all combinations with one and two risk factors present and one combination with three risk factors present were excluded by stepwise regression. Large studies, such as the INTERHEART study (30) with about 15,000 cases, do not encounter this problem and should show a higher number of important classes of sufficient causes. Second, the estimation procedure for PDC supposes dichotomous risk factors. If originally continuous variables are dichotomized, the cutoffs chosen may be arbitrary, and change of cutoffs may lead to different results. Although it is possible to include different possible cutoffs by different types of sufficient causes, a conclusive strategy to handle this problem is not available. Different concepts and interpretations of PAF have been proposed and discussed over the last 50 years (2, 31, 32). The concept used in our paper is that of ‘‘excess fraction,’’ defined as the fraction of cases that would not have occurred if the exposure (risk factor) had not occurred. The advantage of this concept is that an excess fraction can be validly estimated by the formulas of Miettinen (8) and Bruzzi et al. (33) by inserting an adjusted odds ratio from logistic regression, provided that there are no unobserved confounders and disease is rare (2). Because we have defined the PDC of a specific class SE as the PAF of the corresponding multifactorial event E, it can be interpreted as the excess 82 Hoffmann et al. fraction of cases. Clearly, from the standpoint of biology, the fraction of cases that are etiologically attributable to SE seems to be more relevant than the fraction of excess cases. Unfortunately, one cannot estimate the etiologic fraction without resorting to very strong biologic assumptions. The etiologic fraction is larger than the excess faction (2, 31), because some cases caused by SE still have become cases within the considered time period had E never occurred and, therefore, do not belong to the excess fraction. On the other hand, because the etiologic fraction attributable to SE cannot be larger than the proportion of cases with the event E, the current approach gives lower and upper bounds for the unknown etiologic fraction. In our example of myocardial infarction, for the class of sufficient causes with all four known risk factors as component causes, the etiologic fraction cannot be smaller than 24.1 percent (figure 2) and not larger than 29.6 percent (table 2). Besides sufficient-component cause models, three other major types of causal models have been applied in healthsciences research: causal diagrams (34, 35), potentialoutcome models (36, 37), and structural-equations models (38). Whereas the last two models provide a basis for quantitative analysis of effects, Rothman’s model of sufficient and component causes was considered only to illustrate the specific hypotheses about mechanisms of action (39). The extension of Rothman’s work presented may be helpful for quantitative analyses and for using the sufficient-component cause model beyond teaching examples. Furthermore, because a class of sufficient causes is characterized by a specific combination of present and absent risk factors, recommendations aimed to reduce the risk of disease can be specified to subgroups of individuals with the same risk profile. An application of the method to a study of myocardial infarction demonstrated the possible public health benefit of quantifying the importance of different sufficient causes. ACKNOWLEDGMENTS Conflict of interest: none declared. REFERENCES 1. Rothman KJ. Causes. Am J Epidemiol 1976;104:587–92. 2. Rothman KJ, Greenland S. Modern epidemiology. 2nd ed. Philadelphia, PA: Lippincott-Raven, 1998. 3. Karhausen LR. Causation in epidemiology: a Socratic dialogue: Plato. Int J Epidemiol 2001;30:704–6. 4. Parascandola M, Weed DL. Causation in epidemiology. J Epidemiol Community Health 2001;55:905–12. 5. Reiber GE, Vileikyte L, Boyko EJ, et al. Causal pathways for incident lower-extremity ulcers in patients with diabetes from two settings. Diabetes Care 1999;22:157–62. 6. Poole C. Commentary: positivized epidemiology and the model of sufficient and component causes. Int J Epidemiol 2001;30:707–9. 7. Levin ML. The occurrence of lung cancer in man. Acta Unio Int Contra Cancrum 1953;9:531–41. 8. Miettinen OS. Proportion of disease caused or prevented by a given exposure, trait, or intervention. Am J Epidemiol 1974; 99:325–32. 9. Blot WJ, Day NE. Synergism and interaction: are they equivalent? Am J Epidemiol 1979;110:99–100. 10. Rothman KJ, Greenland S, Walker AM. Concepts of interaction. Am J Epidemiol 1980;112:467–70. 11. Gefeller O. Comparison of adjusted attributable risk estimators. Stat Med 1992;11:2083–91. 12. Benichou J. Methods of adjustment for estimating the attributable risk in case-control studies: a review. Stat Med 1991;10: 1753–73. 13. Benichou J. A review of adjusted estimators of attributable risk. Stat Methods Med Res 2001;10:195–216. 14. Benichou J, Gail MH. Variance calculations and confidence intervals for estimates of the attributable risk based on logistic models. Biometrics 1990;46:991–1003. 15. Benichou J, Chow WH, McLaughlin JK, et al. Population attributable risk of renal cell cancer in Minnesota. Am J Epidemiol 1998;148:424–30. 16. Wilson PD, Loffredo CA, Correa-Villasenor A, et al. Attributable fraction for cardiac malformations. Am J Epidemiol 1998;148:414–23. 17. Platz EA, Willett WC, Colditz GA, et al. Proportion of colon cancer risk that might be preventable in a cohort of middleaged US men. Cancer Causes Control 2000;11:579–88. 18. Greenland S, Drescher K. Maximum likelihood estimation of the attributable fraction from logistic models. Biometrics 1993;49:865–72. 19. Whittemore AS. Statistical methods for estimating attributable risk from retrospective data. Stat Med 1982;1:229–43. 20. Whittemore AS. Estimating attributable risk from case-control studies. Am J Epidemiol 1983;117:76–85. 21. SAS Institute, Inc. SAS/STAT user’s guide, version 8. Cary, NC: SAS Institute, Inc, 1999. 22. Riboli E, Kaaks R. The EPIC Project: rationale and study design. Int J Epidemiol 1997;26(suppl 1):S6–14. 23. Boeing H, Korfmann A, Bergmann MM. Recruitment procedures of EPIC-Germany. Ann Nutr Metab 1999;43: 205–15. 24. World Health Organization. International statistical classification of diseases and related health problems. Geneva, Switzerland: World Health Organization, 1992. 25. Tunstall-Pedoe H, Kuulasmaa K, Amouyel P, et al. Myocardial infarction and coronary deaths in the World Health Organization MONICA Project. Registration procedures, event rates, and case-fatality rates in 38 populations from 21 countries in four continents. Circulation 1994;90:583–612. 26. Grundy SM, Pasternak R, Greenland P, et al. Assessment of cardiovascular risk by use of multiple-risk-factor assessment equations: a statement for healthcare professionals from the American Heart Association and the American College of Cardiology. Circulation 1999;100:1481–92. 27. Greenland P, Knoll MD, Stamler J, et al. Major risk factors as antecedents of fatal and nonfatal coronary heart disease events. JAMA 2003;290:891–7. 28. Poirier P, Eckel RH. Obesity and cardiovascular disease. Curr Atheroscler Rep 2002;4:448–53. 29. Thompson PD, Buchner D, Pina IL, et al. Exercise and physical activity in the prevention and treatment of atherosclerotic cardiovascular disease: a statement from the Council on Clinical Cardiology (Subcommittee on Exercise, Rehabilitation, and Prevention) and the Council on Nutrition, Physical Activity, and Metabolism (Subcommittee on Physical Activity). Circulation 2003;107:3109–16. Am J Epidemiol 2006;163:76–83 Classes of Sufficient Causes 30. Yusuf S, Hawken S, Ounpuu S, et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. Lancet 2004;364:937–52. 31. Greenland S, Robins JM. Conceptual problems in the definition and interpretation of attributable fractions. Am J Epidemiol 1988;128:1185–97. 32. Rockhill B, Newman B, Weinberg C. Use and misuse of population attributable fractions. Am J Public Health 1998; 88:15–19. 33. Bruzzi P, Green SB, Byar DP, et al. Estimating the population attributable risk for multiple risk factors using case-control data. Am J Epidemiol 1985;122:904–14. 34. Pearl J. Causal diagrams for empirical research (with discussion). Biometrika 1995;82:669–710. 35. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology 1999;10:37–48. 36. Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992;3:143–55. 37. Little RJ, Rubin DB. Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches. Am Rev Public Health 2000;21:121–45. 38. Pearl J. Causality. New York, NY: Cambridge University Press, 2000. 39. Greenland S, Brumback B. An overview of relations among causal modeling methods. Int J Epidemiol 2002;31:1030–7. where the sum on the left-hand side runs over all subsets of f(1), . . ., (m)g. If we isolate the class of sufficient causes with the largest number of known component causes, equation A2 can be rewritten as PDCðSð1Þ;...;ðmÞ Þ ¼ PAFðX1 ; . . . ; Xk Þ PAFðXðmþ1Þ ; . . . ; XðkÞ Þ X PDCðSI Þ; where the sum on the right-hand side runs over all proper subsets of f(1), . . ., (m)g. In the special case m ¼ k, the second term on the right-hand side of equation A3 must be set equal to zero. The recursive equations A3, together with the initial equations A1, allow the calculation of almost all frequencies of sufficient causes. The remaining frequency refers to S0 that can be calculated by the simple additional formula: PDCðS0 Þ ¼ 1 PAFðX1 ; . . . ; Xk Þ: Equation System for PDC in the Multicomponent Case Solution of the Equation System in the Multicomponent Case ðA1Þ Now consider any permutation X(1), . . ., X(k) of the risk factors and let X(mþ1), . . ., X(k) be the last k m factors in this arrangement. Then, the difference between the PAF of X1, . . ., Xk and the PAF of X(mþ1), . . ., X(k) can be attributed to sufficient causes containing only risk factors as known component causes that have to belong to fX(1), . . ., X(m)g. Therefore, we obtain the following relations: X PDCðSI Þ ¼ PAFðX1 ; . . . ; Xk Þ I4fð1Þ;...;ðmÞg PAFðXðmþ1Þ ; . . . ; XðkÞ Þ; Am J Epidemiol 2006;163:76–83 ðA2Þ ðA4Þ The equations A1, A3, and A4 together form a system of 2k equations appropriate to determine the frequencies of the 2k classes of sufficient causes. APPENDIX 2 PDCðSi Þ ¼ PAFðX1 ; . . . ; Xk Þ PAFðX1 ; . . . ; Xi1 ; Xiþ1 ; . . . ; Xk Þ: ðA3Þ Ifð1Þ;...;ðmÞg APPENDIX 1 This appendix provides the equations for calculating the frequency distribution for sufficient causes from the single and joint PAF of component causes. Let X1, . . ., Xk be known risk factors of disease, let I be a subset of f1, . . ., kg, and let SI be the class of sufficient causes that all contain Xi, i 2 I, as component causes but that do not contain Xi for i ; I. Obviously, the joint PAF of X1, . . ., Xk must be greater than or equal to the joint PAF of X1, . . ., Xi1, Xiþ1, . . ., Xk. The difference between these two PAFs is equal to the fraction of cases that is due to the class Si of sufficient causes containing Xi as the only known component cause. Thus, the following equations hold. 83 Again, let X(1), . . ., X(k) be any permutation of the risk factors X1, . . ., Xk, and let SI, I 4 f1, . . ., kg, be the class of sufficient causes that contains all Xi, i 2 I, as component causes but that does not contain Xi for i ; I. Then, in the generalization of equation 1, the joint PAF of the first m risk factors X(1), . . ., X(m), m k, can be obtained by summing up the PDC values of all sufficient causes that contain at least one of the m risk factors as the component cause. Thus, we have the following equation: X PDCðSI Þ: ðA5Þ PAFðXð1Þ ; . . . ; XðmÞ Þ ¼ I4fð1Þ;...;ðmÞg On the other hand, applying the formula of Bruzzi et al. (33) that was formally proven by Benichou (12), the PAF of X(1), . . ., X(m) is equal to PAFðXð1Þ ; . . . ; XðmÞ Þ ¼ X I4fð1Þ;...;ðmÞg PðDjEI Þ RRI 1 ; ðA6Þ RRI where EI is the multifactorial event with Xi ¼ 1 for i 2 I and Xi ¼ 0 for i ; I, and where RRI is the relative risk of EI versus E0. Thus, PDC defined by equation 4 is a solution of the equation system A5. Since equation A5 is equivalent to equations A1 and A2, equation 4 also represents the solution for the latter ones.
© Copyright 2026 Paperzz