Estimating the Proportion of Disease due to Classes of Sufficient

American Journal of Epidemiology
Copyright ª 2005 by the Johns Hopkins Bloomberg School of Public Health
All rights reserved; printed in U.S.A.
Vol. 163, No. 1
DOI: 10.1093/aje/kwj011
Advance Access publication November 17, 2005
Original Contribution
Estimating the Proportion of Disease due to Classes of Sufficient Causes
Kurt Hoffmann, Christin Heidemann, Cornelia Weikert, Matthias B. Schulze, and Heiner Boeing
From the Department of Epidemiology, German Institute of Human Nutrition, Potsdam-Rehbrücke, Germany.
Received for publication April 21, 2005; accepted for publication August 17, 2005.
Disease can be caused by different mechanisms. A possible causal model proposed by Rothman is a complete
causal mechanism or a so-called ‘‘sufficient cause’’ consisting of a set of component causes that can be illustrated in
a pie chart. However, this model does not allow finding out what sufficient causes produce the majority of cases. The
authors’ objective was to extend Rothman’s work by quantifying the proportion of disease that can be attributed to
a class of sufficient causes. The underlying idea was to consider all combinations of a given set of known risk factors
and to assign each combination to a class of sufficient causes. This assignment makes it possible to evaluate a class
of sufficient causes by the population attributable fraction of the corresponding combination of risk factors. The
approach presented was applied to sufficient causes of myocardial infarction by use of data on participants recruited
between 1994 and 1998 into the European Prospective Investigation into Cancer and Nutrition-Potsdam Study. As a
result, 51.8% of cases were attributed to only four different classes of sufficient causes. In conclusion, the statistical
method described in the paper may be beneficial for quantifying the importance of different sufficient causes and for
improving the efficiency of public health programs.
models, statistical; myocardial infarction; risk factors; statistics
Abbreviations: EPIC, European Prospective Investigation into Cancer and Nutrition; PAF, population attributable fraction;
PDC, proportion of disease due to a class of sufficient causes.
Rothman’s model of sufficient and component causes (1, 2)
introduced in 1976 is one of the most discussed causal models
in epidemiology. Despite some inherent limitations (3, 4),
this model seems to be appropriate to reflect the multiplicity
of causal pathways and the biologic interaction among component causes. The pie-chart description of possible classes
of sufficient causes is an illustrative method to differentiate
among distinct etiologic mechanisms, leaving some unlabeled slices to represent unknown component causes. However, up to now, Rothman’s model is only a theoretical
framework for epidemiology, with no direct connection to
empirical data. Although epidemiologic studies on a specific
disease should give information as to which of the possible
sufficient causes probably exist and which do not, no approach has been suggested to get quantitative results. In other
words, the application of the sufficient-component cause
model in epidemiologic research is hampered by lack of an
appropriate statistical method for estimating the proportion
of disease due to different sufficient causes. Thus, an extension of Rothman’s work is needed.
An obvious approach to find out the most relevant sufficient causes is based on calculating the frequencies of component causes and of their combinations within cases (5).
However, a high frequency of a specific combination of component causes within cases does not necessarily mean that
this combination is a sufficient cause of high relevance, because the corresponding frequency within noncases may
also be high. Moreover, this approach does not allow adjustment for other known factors and is not related to the common notions and concepts of epidemiology. In general, the
importance of sufficient causes depends on both the relative
risks and the frequencies of all possible component causes
Correspondence to Dr. Kurt Hoffmann, Department of Epidemiology, German Institute of Human Nutrition, Arthur-Scheunert-Allee 114–116,
14558 Nuthetal, Germany (e-mail: [email protected]).
76
Am J Epidemiol 2006;163:76–83
Classes of Sufficient Causes
and their combinations. Because the population attributable
fraction (PAF) already comprises the strength of effect and
the frequency of exposure, it is promising to look for a
complex relation between sufficient causes and PAF. Such
a relation should enable us to evaluate the importance of
sufficient causes by using the same epidemiologic data that
are useful for estimating PAF.
In the present paper, a system of equations is derived
and represented that relates adjusted PAF to the proportion
of disease due to a class of sufficient causes (PDC). Solving
these equations leads to an estimated frequency distribution for sufficient causes. Moreover, PDC itself can be interpreted as the PAF of a complex event that is characterized
by a specific combination of present and absent single risk
factors. Thus, estimation formulas and confidence intervals
proposed for PAF can be carried over to PDC. We applied
this approach to data from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam Study to
evaluate classes of sufficient causes for incident myocardial
infarction.
SUFFICIENT CAUSES AND POPULATION
ATTRIBUTABLE FRACTION
A ‘‘sufficient cause’’ is defined as a set of minimal conditions that inevitably produce disease (2). It consists of
a number of component causes. Each component cause is
necessary for the completion of the sufficient cause. The
completion of a sufficient cause is equivalent to the onset
of the earliest stage of disease. In other words, the component cause that acts last determines the time of the onset. If
one or more of the component causes are absent, the causal
mechanism cannot operate.
Two sufficient causes can be similar because of having
some of the same components. Differentiating between
these sufficient causes can possibly be difficult, since some
of the distinct component causes are often not known or not
measured. Leaving some pieces unlabeled in a pie chart is
not only a way to represent unknown component causes but
also a possibility to group similar sufficient causes. Thus,
any pie chart with an unlabeled piece can be considered
a class of sufficient causes. The common characteristic of
sufficient causes belonging to the same class is the same set
of known component causes. Formally, such classes can also
include sufficient causes with unpredictable components to
include random events (6).
For the sake of simplicity, we first regard the case of only
two known component causes, for example, X1 and X2.
Then, four different classes of sufficient causes can be distinguished as shown in figure 1. The class S1 comprises all
sufficient causes that have X1, but not X2, as the component
cause. The symbol U1 stands for one or more unknown
component causes. Analogously, S2 is defined as the class
of sufficient causes that contains X2 as the only known component. S12 stands for causal mechanisms that require the
presence of both X1 and X2. Finally, the class S0 comprises
all sufficient causes that produce disease in the absence of
X1 and X2. Obviously, any single sufficient cause belongs to
one and only one of the four classes depicted in figure 1.
Am J Epidemiol 2006;163:76–83
77
FIGURE 1. The four classes (S1, S2, S12, and S0) of sufficient
causes in the case of two known risk factors (X1 and X2). The U symbols stand for one or more unknown component causes.
Now, consider the PAF of the component causes X1 and
X2. The PAF of a single condition is defined here as the
fraction of cases that would not have occurred if the condition were not present (7). If stratification successfully removes confounding, the PAF can be validly estimated by the
formula of Miettinen (8) that requires estimates of the condition prevalence among cases and the risk ratio standardized to the condition (2, p. 295). Analogously, the joint PAF
of two conditions defined as the fraction of cases that would
not have occurred if both conditions were not present can
also be estimated by Miettinen’s formula. Denoting the proportion of disease due to a class S of sufficient causes by
PDC(S), the following four equations hold.
PAFðX1 Þ ¼ PDCðS1 Þ þ PDCðS12 Þ
PAFðX2 Þ ¼ PDCðS2 Þ þ PDCðS12 Þ
PAFðX1 ;X2 Þ ¼ PDCðS1 Þ þ PDCðS2 Þ þ PDCðS12 Þ
PAFðX1 ;X2 Þ ¼ 1 PDCðS0 Þ:
ð1Þ
The equations can be verified by the pie charts of figure 1.
First, consider the case that X1 has not occurred. Then,
sufficient causes of types S1 and S12 could not have been
completed, whereas the completion of all other sufficient
causes would not have been affected. Therefore, the fraction
of cases that can be attributed to X1 must be the sum of the
proportions of disease caused by S1 and S12 as stated by the
first equation. The second equation has a quite similar interpretation, but here X2 plays the role of X1. The remaining
two equations refer to the joint PAF of both component
causes. A generalization of these equations to the situation
of more than two known component causes is straightforward. In general, the joint PAF of any m conditions can be
calculated by summing up the PDC values of all sufficient
causes that contain at least one of the m conditions as component cause.
78
Hoffmann et al.
The equations 1 are not suitable to calculate the frequencies of sufficient causes from the single and joint PAF of
component causes. For this purpose, the equations 1 should
be rewritten in the following form:
PDCðS1 Þ ¼ PAFðX1 ; X2 Þ PAFðX2 Þ
PDCðS2 Þ ¼ PAFðX1 ; X2 Þ PAFðX1 Þ
PDCðS12 Þ ¼ PAFðX1 Þ þ PAFðX2 Þ PAFðX1 ;X2 Þ
PDCðS0 Þ ¼ 1 PAFðX1 ;X2 Þ:
ð2Þ
To get an impression of equations 2, we considered a very
strong component cause X1 with a PAF of 0.5 and a strong
component cause X2 with a PAF of 0.3. For simplicity, we
assumed that both components are risk factors and that the
presence of one of the components is not protective for
disease in any etiologic mechanism. Then, the joint PAF
cannot be smaller than the larger of each individual PAF
and cannot be larger than their sum. Therefore, in this example, the joint PAF can vary only between 0.5 and 0.8.
For different values of the joint PAF, table 1 gives the
frequency distribution for sufficient causes. Obviously, for
the increasing value of the joint PAF, the proportions of
S1 and S2 also increase, whereas the proportions of S12 and
S0 decrease. In the case of the minimal joint PAF, no sufficient cause that contains only the weaker component X2 can
exist. Thus, the proportion of disease due to S2 has to be
zero. More interesting is the opposite case of the maximal
joint PAF. Here, the joint PAF is equal to the sum of single
PAFs, which means that no interaction in public health contexts between X1 and X2 exists (9, 10). As we can see in the
last row of table 1, the additivity of attributable fractions
implies that both causes X1 and X2 cannot belong to a common sufficient cause. In other words, they act in different
etiologic mechanisms and have no biologic interaction (2).
Thus, the two concepts of public health and biologic interaction have the same referent point corresponding to
PAF(X1,X2) ¼ 0.8.
The equations 2 can be generalized to the case of more
than two component causes. Let k be the number of known
component causes, and then 2k classes of sufficient causes
can be distinguished. To calculate the PDC for all 2k classes
from the PAF of the component causes, the same number of
equations is necessary. These 2k equations are derived in
appendix 1.
EVALUATING SUFFICIENT CAUSES
Up to now, we have only a vague impression of PDC as
the proportion of cases that is due to a specific class of
sufficient causes. Indeed, we need a precise definition of
this notion by using the terminology of probability theory.
For this purpose, consider the class of sufficient causes that
contains from the k known factors X1, . . ., Xk only the first
m ones as component causes. We denote this multifactorial
event by E ¼ (X1 ¼ 1, . . ., Xm ¼ 1, Xmþ1 ¼ 0, . . ., Xk ¼ 0)
and the corresponding class of sufficient causes by SE. Obviously, the PDC of SE cannot be larger than the proportion of
cases that had the event E at the onset of disease. Formally,
the PDC is bounded above by P(EjD) that denotes the con-
TABLE 1. Relation between the population attributable
fraction of component causes and the frequency distribution
of sufficient causes in a fictive example
Population
attributable fraction
Frequency of classes of
sufficient causes*
X1
X2
X1, X2
S1y
0.5
0.3
0.5
0.2
0
0.3
0.5
0.5
0.3
0.55
0.25
0.05
0.25
0.45
0.5
0.3
0.6
0.3
0.1
0.2
0.4
0.5
0.3
0.65
0.35
0.15
0.15
0.35
0.5
0.3
0.7
0.4
0.2
0.1
0.3
0.5
0.3
0.75
0.45
0.25
0.05
0.25
0.5
0.3
0.8
0.5
0.3
0
0.2
S2
S12
S0
* Frequencies were calculated from the population attributable
fractions by use of equation 2 given in the paper.
y S1 is the class of sufficient causes containing X1 and not X2
as component cause, S2 contains X2 and not X1, S12 contains both
X1 and X2, and S0 contains neither X1 nor X2.
ditional probability of the event E under the condition that
the individual develops the disease D. However, within the
stratum of cases that have the event E, a fraction of individuals would also develop the disease if all known component
causes of SE were absent before. The remaining fraction in
this stratum is equal to the proportion of cases that is due to
SE. Thus, we get the definition,
PDCðSE Þ ¼ PðEjDÞ
PðDjEÞ PðDjE0 Þ
;
PðDjEÞ
ð3Þ
where E0 stands for the zero event (X1 ¼ 0, . . ., Xk ¼ 0),
whereas P(DjE) and P(DjE0) are the disease rates in the
strata defined by E and E0, respectively. Throughout the
paper, we assume that the disease rate among the individuals
with event E exceeds the disease rate among the unexposed.
Dividing the numerator and the denominator of the righthand term by P(DjE0) yields the representation,
RR 1
;
ð4Þ
RR
with RR denoting the relative risk of the disease in individuals with event E versus individuals with E0. Obviously,
equation 4 is similar to the formula of Miettinen (8) for
attributable fraction. Actually, the PDC of the class SE of
sufficient causes is equal to the PAF of the multifactorial
event E. By definition, the class of sufficient causes and the
multifactorial event form an inseparable pair. The 2k different combinations for presence and absence of the k known
factors define the same number of strata in the sample and
the same number of classes of sufficient causes.
It can be proven that the definition of PDC by equation
3 or equation 4 is consistent with the relations between PDC
and PAF represented before. Actually, equation 4 gives a
solution of the equation systems 1 and 2 and its generalization to the multicomponent case (appendix 2). Because 2k
equations for 2k different PDC variables cannot have more
than one solution for each PDC, the solution just given is the
unique one.
PDCðSE Þ ¼ PðEjDÞ
Am J Epidemiol 2006;163:76–83
Classes of Sufficient Causes
Equation 4 is suitable to get a valid estimate of PDC from
any valid estimate of RR. Because equation 4 is the PAF
formula of Miettinen for the event E, it holds also if adjustment is needed (2). The estimate of PDC can be adjusted for
confounding factors by applying any one of the available
statistical methods (11–13). We chose the most flexible and
general adjustment method based on RR estimates from
a logistic regression model that includes all the confounders
assessed in the study. Confidence limits for the estimated
PDC can be derived from the theory of PAF by using an
implicit delta method (14, 15), the delta method ignoring
some variances and covariances (16, 17), the asymptotic
variance formula for the maximum likelihood estimator
(18), or a variance formula originally proposed for casecontrol studies (2, 19, 20). In this paper, we utilized the
last one.
To differentiate between relevant and nonrelevant sufficient causes and to simultaneously estimate the PDC for all
relevant classes of sufficient causes, we propose the following multistep procedure.
1. Define indicator variables (dichotomous variables) for
all 2k 1 multifactorial events different from the zero
event E0.
2. Define all covariates that are planned for the analysis.
3. Starting with a model with all the covariates defined
before, apply stepwise regression on the indicator variables only, to include all significant indicator variables in
the final model.
4. Calculate the relative frequencies of the remaining
significant multifactorial events within all cases.
5. Estimate the relative risks for the significant multifactorial events and insert these estimates into equation 4 to
estimate the corresponding PDC.
This procedure includes statistical tests for the stratumspecific effects (step 3). A significant rejection of the hypothesis that the stratum-specific effect parameter is zero is
equivalent to a significant rejection of the corresponding
hypothesis RR ¼ 1 and is a necessary condition for the
significant rejection of the hypothesis PDC ¼ 0. Thus,
exclusion of nonsignificant indicator variables ignores
only such classes of sufficient causes for which the confidence interval for PDC does contain zero. The subsequent
statistical analysis was performed with SAS software (21).
APPLICATION TO REAL DATA
Study population
The study population was the EPIC-Potsdam cohort,
which is one of the two German cohorts contributing to
the EPIC Study, a multicenter cohort study into diet and
chronic diseases (22). A total of 27,548 subjects aged from
35 to 65 years were recruited from the general population
between 1994 and 1998 (23). Baseline examinations included anthropometric measurements, blood sampling, the
completion of a food frequency questionnaire, and a personal
interview on lifestyle habits and medical history. Follow-up
questionnaires are sent to the study participants every 2–3
years.
Am J Epidemiol 2006;163:76–83
79
All potential cases of incident myocardial infarction were
identified by self-reports or death certificates. Cases were
defined as participants who developed myocardial infarction
according to International Classification of Diseases, Tenth
Revision, codes I21.0–I21.9 (24). All incident cases of nonfatal or fatal myocardial infarction were verified by patients’
medical records or death certificates according to World
Health Organization Monitoring of Trends and Determinants in Cardiovascular Disease (MONICA) criteria (25).
Participants with missing follow-up status or prevalent myocardial infarction at baseline were excluded, leaving 26,972
participants (10,470 men and 16,502 women) for analyses.
Among these, we identified 159 newly diagnosed cases of
myocardial infarction (116 nonfatal and 43 fatal) that occurred between baseline and April 30, 2004. Mean followup was 4.6 years.
Risk factors
From the potentially modifiable risk factors associated
with myocardial infarction (26–30), the following four were
considered: smoking, hypertension, obesity, and lack of exercise. We defined individuals at risk from smoking if they
were current smokers or had quit smoking within the past 5
years. Hypertension was defined as having a systolic blood
pressure of 140 mmHg or more, having a diastolic blood
pressure of 90 mmHg or more, taking antihypertensive medication, or self-reporting a hypertension diagnosis. According
to previous studies (30), obesity was considered abdominal
obesity and defined by a waist/hip ratio of 0.9 or more in men
and 0.8 or more in women. Individuals were judged to have
a lack of exercise if they engaged in less than 2 hours a week
in sporting activities.
Estimated risks and frequencies
Indicator variables were defined for all 15 combinations
of the four risk factors that include at least one present risk
factor. The prevalence of each combination within cases and
noncases is given in table 2. As expected, the percentage of
individuals with three or four risk factors was markedly
higher within cases than within noncases. Indeed, 78.1 percent (48.5 percent plus 29.6 percent) of cases had at least
three risk factors, whereas the corresponding percentage
within noncases was only 37.6 percent (30.0 percent plus
7.6 percent). However, the inverse relation was observed for
individuals with none, one, or two risk factors.
Relative risks for the 15 combinations were estimated
using a logistic regression model adjusted for age, sex, prevalence of diabetes, and history of dyslipidemia based on self
reports of a diagnosis or of taking cholesterol-lowering
medication. The results are given in the last column of table
2. The relative risks for combinations with one or two present risk factors were not significantly different from one. In
contrast, lack of exercise together with any two other risk
factors was associated with a significantly elevated relative
risk for myocardial infarction. The highest relative risk of
5.4 (95 percent confidence interval: 3.5, 8.5) was associated
with the simultaneous presence of all four risk factors.
80
Hoffmann et al.
TABLE 2. Prevalence and relative risk for combinations of risk factors for incident myocardial infarction
in the European Prospective Investigation into Cancer and Nutrition-Potsdam Study (159 cases, 26,813
noncases), 2004
Combinations
Smokingy
Prevalence (%)
Hypertensionz
Obesity§
Lack of
exercise{
Absent
Absent
Absent
Cases
Noncases
Relative risk
(95% confidence interval)*
No risk factor
Absent
0.0
5.4
1
One risk factor
Present
Absent
Absent
Absent
0.0
2.3
NS#
Absent
Present
Absent
Absent
2.5
2.3
NS
Absent
Absent
Present
Absent
0.0
2.8
NS
Absent
Absent
Absent
Present
2.5
16.5
NS
5.0
23.9
Total
Two risk factors
Present
Present
Absent
Absent
0.0
0.7
NS
Present
Absent
Present
Absent
1.3
1.3
NS
Present
Absent
Absent
Present
3.1
6.4
NS
Absent
Present
Present
Absent
3.1
3.6
NS
Absent
Present
Absent
Present
3.1
9.4
NS
Absent
Absent
Present
Present
6.3
11.7
NS
16.9
33.1
Total
Three risk factors
Present
Present
Present
Absent
1.9
1.3
Present
Present
Absent
Present
4.4
2.6
4.9 (2.2, 11.1)
Present
Absent
Present
Present
12.6
5.6
4.3 (2.5, 7.4)
Absent
Present
Present
Present
29.6
20.5
2.0 (1.3, 3.1)
48.5
30.0
29.6
7.6
Total
NS
Four risk factors
Present
Present
Present
Present
5.4 (3.5, 8.5)
* Relative risk of the combination of risk factors, with the event of no present risk factor as the referent category.
y ‘‘Smoking’’ was defined as current smoking or stopping smoking within the past 5 years.
z ‘‘Hypertension’’ was defined as having a systolic blood pressure of 140 mmHg, a diastolic blood pressure of
90 mmHg, taking antihypertensive medication, or self-reporting a hypertension diagnosis.
§ ‘‘Obesity’’ was defined by a waist/hip ratio of 0.9 in men and 0.8 in women.
{ ‘‘Lack of exercise’’ was defined as individuals engaging in less than 2 hours a week in sporting activities.
# NS, nonsignificant. The indicator variable for the combination of risk factors was excluded by stepwise
regression, starting with a model that includes age, sex, prevalence of diabetes, and history of dislipidemia as
covariates.
Following the procedure described before, PDC was estimated for all classes of sufficient causes that correspond to
significant multifactorial events. The estimates and the 95
percent confidence intervals are given in figure 2. As expected from table 2, the class of sufficient causes containing
all four risk factors as component causes was the most
important one. The estimated proportion of disease due to
this class was 24.1 percent. Somewhat surprisingly, the class
of sufficient causes that requires the presence of hypertension, obesity, and lack of exercise ranked second with 14.6
percent of cases, although the corresponding relative risk
was only 2.0. This high percentage can be explained by
the high prevalence of the corresponding multifactorial
event. The PDCs of the other two classes with three known
component causes were 9.6 percent and 3.5 percent. The
PAF of the single risk factors can be estimated by summing
up the PDC values of all sufficient causes that contain the
risk factor as component cause. Thus, lack of exercise is the
risk factor with the highest PAF of 51.8 percent, followed
by obesity (48.3 percent), hypertension (42.2 percent), and
smoking (37.2 percent).
DISCUSSION
The concept of sufficient cause and component causes
introduced by Rothman is a tempting approach to model
causality. In contrast to a completely deterministic model,
stochastic elements for treating unmeasured or unpredictable
Am J Epidemiol 2006;163:76–83
Classes of Sufficient Causes
81
FIGURE 2. Proportion of myocardial infarction (95% confidence intervals) due to classes of sufficient causes in the European Prospective
Investigation into Cancer and Nutrition-Potsdam Study, 2004.
component causes can be included in the pie charts by unlabeled slices. Thus, sufficient causes can be modeled without knowing the component causes completely. In this
paper, we classified sufficient causes by multifactorial events
that reflect specific combinations of known risk factors.
Therefore, sufficient causes are not distinguishable if they
belong to the same class. However, sufficient causes have
different known causal components if they belong to different classes. Based on such a classification scheme, a statistical approach was described to estimate the proportion of
disease that is due to a class of sufficient causes.
Although the PDC is defined as attributable fraction of
a multifactorial event, it clearly differs from the joint PAF
of multiple factors because the known component causes of
the sufficient causes are fixed. The estimated PDC of all
possible classes allows a partition of cases into strata characterized by combinations of known risk factors, whereas
PAF estimates refer to overlapping events in general. Therefore, the practical value of the presented estimation procedure consists in deriving more detailed and precise results
on the associations between the presence of risk factors and
development of disease. To illustrate this issue, consider
abdominal obesity as a risk factor for myocardial infarction
in the EPIC-Potsdam Study. A recommendation to lose
weight would be addressed to all obese individuals if only
the PAF estimates are given, whereas the PDC estimates
allow restricting the recommendation to obese individuals
lacking exercise who are smokers or have hypertension. For
both strategies, the proportion of disease that can be potentially prevented is 48.3 percent. However, the efficiency of
the second strategy is much higher, because only 33.7 percent of subjects have the described risk profile compared
with 53.1 percent of individuals who are obese (refer to table 2
and summing up respective prevalence within noncases).
Am J Epidemiol 2006;163:76–83
Some limitations for the described estimation procedure
should be noted. First, the simultaneous consideration of a
lot of risk factors and the differentiation among many classes of sufficient causes require a high number of cases. If the
sample size and especially the number of cases are too low,
stepwise regression will exclude many indicator variables,
and the estimated PDC of some important classes of sufficient causes may not be significantly different from zero.
This limitation was apparent in the EPIC-Potsdam Study of
incident myocardial infarction in which all combinations
with one and two risk factors present and one combination
with three risk factors present were excluded by stepwise
regression. Large studies, such as the INTERHEART study
(30) with about 15,000 cases, do not encounter this problem
and should show a higher number of important classes of
sufficient causes. Second, the estimation procedure for PDC
supposes dichotomous risk factors. If originally continuous
variables are dichotomized, the cutoffs chosen may be arbitrary, and change of cutoffs may lead to different results.
Although it is possible to include different possible cutoffs
by different types of sufficient causes, a conclusive strategy
to handle this problem is not available.
Different concepts and interpretations of PAF have been
proposed and discussed over the last 50 years (2, 31, 32).
The concept used in our paper is that of ‘‘excess fraction,’’
defined as the fraction of cases that would not have occurred
if the exposure (risk factor) had not occurred. The advantage
of this concept is that an excess fraction can be validly
estimated by the formulas of Miettinen (8) and Bruzzi
et al. (33) by inserting an adjusted odds ratio from logistic
regression, provided that there are no unobserved confounders and disease is rare (2). Because we have defined the PDC
of a specific class SE as the PAF of the corresponding multifactorial event E, it can be interpreted as the excess
82
Hoffmann et al.
fraction of cases. Clearly, from the standpoint of biology, the
fraction of cases that are etiologically attributable to SE
seems to be more relevant than the fraction of excess cases.
Unfortunately, one cannot estimate the etiologic fraction
without resorting to very strong biologic assumptions. The
etiologic fraction is larger than the excess faction (2, 31),
because some cases caused by SE still have become cases
within the considered time period had E never occurred and,
therefore, do not belong to the excess fraction. On the other
hand, because the etiologic fraction attributable to SE cannot
be larger than the proportion of cases with the event E, the
current approach gives lower and upper bounds for the unknown etiologic fraction. In our example of myocardial infarction, for the class of sufficient causes with all four
known risk factors as component causes, the etiologic fraction cannot be smaller than 24.1 percent (figure 2) and not
larger than 29.6 percent (table 2).
Besides sufficient-component cause models, three other
major types of causal models have been applied in healthsciences research: causal diagrams (34, 35), potentialoutcome models (36, 37), and structural-equations models
(38). Whereas the last two models provide a basis for quantitative analysis of effects, Rothman’s model of sufficient
and component causes was considered only to illustrate the
specific hypotheses about mechanisms of action (39). The
extension of Rothman’s work presented may be helpful for
quantitative analyses and for using the sufficient-component
cause model beyond teaching examples. Furthermore, because a class of sufficient causes is characterized by a specific
combination of present and absent risk factors, recommendations aimed to reduce the risk of disease can be specified to
subgroups of individuals with the same risk profile. An application of the method to a study of myocardial infarction
demonstrated the possible public health benefit of quantifying the importance of different sufficient causes.
ACKNOWLEDGMENTS
Conflict of interest: none declared.
REFERENCES
1. Rothman KJ. Causes. Am J Epidemiol 1976;104:587–92.
2. Rothman KJ, Greenland S. Modern epidemiology. 2nd ed.
Philadelphia, PA: Lippincott-Raven, 1998.
3. Karhausen LR. Causation in epidemiology: a Socratic dialogue: Plato. Int J Epidemiol 2001;30:704–6.
4. Parascandola M, Weed DL. Causation in epidemiology.
J Epidemiol Community Health 2001;55:905–12.
5. Reiber GE, Vileikyte L, Boyko EJ, et al. Causal pathways
for incident lower-extremity ulcers in patients with diabetes
from two settings. Diabetes Care 1999;22:157–62.
6. Poole C. Commentary: positivized epidemiology and the
model of sufficient and component causes. Int J Epidemiol
2001;30:707–9.
7. Levin ML. The occurrence of lung cancer in man. Acta Unio
Int Contra Cancrum 1953;9:531–41.
8. Miettinen OS. Proportion of disease caused or prevented by
a given exposure, trait, or intervention. Am J Epidemiol 1974;
99:325–32.
9. Blot WJ, Day NE. Synergism and interaction: are they equivalent? Am J Epidemiol 1979;110:99–100.
10. Rothman KJ, Greenland S, Walker AM. Concepts of interaction. Am J Epidemiol 1980;112:467–70.
11. Gefeller O. Comparison of adjusted attributable risk estimators. Stat Med 1992;11:2083–91.
12. Benichou J. Methods of adjustment for estimating the attributable risk in case-control studies: a review. Stat Med 1991;10:
1753–73.
13. Benichou J. A review of adjusted estimators of attributable
risk. Stat Methods Med Res 2001;10:195–216.
14. Benichou J, Gail MH. Variance calculations and confidence
intervals for estimates of the attributable risk based on logistic
models. Biometrics 1990;46:991–1003.
15. Benichou J, Chow WH, McLaughlin JK, et al. Population
attributable risk of renal cell cancer in Minnesota. Am J Epidemiol 1998;148:424–30.
16. Wilson PD, Loffredo CA, Correa-Villasenor A, et al. Attributable fraction for cardiac malformations. Am J Epidemiol
1998;148:414–23.
17. Platz EA, Willett WC, Colditz GA, et al. Proportion of colon
cancer risk that might be preventable in a cohort of middleaged US men. Cancer Causes Control 2000;11:579–88.
18. Greenland S, Drescher K. Maximum likelihood estimation of
the attributable fraction from logistic models. Biometrics
1993;49:865–72.
19. Whittemore AS. Statistical methods for estimating attributable
risk from retrospective data. Stat Med 1982;1:229–43.
20. Whittemore AS. Estimating attributable risk from case-control
studies. Am J Epidemiol 1983;117:76–85.
21. SAS Institute, Inc. SAS/STAT user’s guide, version 8. Cary,
NC: SAS Institute, Inc, 1999.
22. Riboli E, Kaaks R. The EPIC Project: rationale and study
design. Int J Epidemiol 1997;26(suppl 1):S6–14.
23. Boeing H, Korfmann A, Bergmann MM. Recruitment procedures of EPIC-Germany. Ann Nutr Metab 1999;43:
205–15.
24. World Health Organization. International statistical classification of diseases and related health problems. Geneva,
Switzerland: World Health Organization, 1992.
25. Tunstall-Pedoe H, Kuulasmaa K, Amouyel P, et al. Myocardial
infarction and coronary deaths in the World Health Organization MONICA Project. Registration procedures, event rates,
and case-fatality rates in 38 populations from 21 countries in
four continents. Circulation 1994;90:583–612.
26. Grundy SM, Pasternak R, Greenland P, et al. Assessment of
cardiovascular risk by use of multiple-risk-factor assessment
equations: a statement for healthcare professionals from the
American Heart Association and the American College of
Cardiology. Circulation 1999;100:1481–92.
27. Greenland P, Knoll MD, Stamler J, et al. Major risk factors
as antecedents of fatal and nonfatal coronary heart disease
events. JAMA 2003;290:891–7.
28. Poirier P, Eckel RH. Obesity and cardiovascular disease. Curr
Atheroscler Rep 2002;4:448–53.
29. Thompson PD, Buchner D, Pina IL, et al. Exercise and
physical activity in the prevention and treatment of atherosclerotic cardiovascular disease: a statement from the Council
on Clinical Cardiology (Subcommittee on Exercise, Rehabilitation, and Prevention) and the Council on Nutrition, Physical
Activity, and Metabolism (Subcommittee on Physical Activity).
Circulation 2003;107:3109–16.
Am J Epidemiol 2006;163:76–83
Classes of Sufficient Causes
30. Yusuf S, Hawken S, Ounpuu S, et al. Effect of potentially
modifiable risk factors associated with myocardial infarction
in 52 countries (the INTERHEART study): case-control study.
Lancet 2004;364:937–52.
31. Greenland S, Robins JM. Conceptual problems in the definition
and interpretation of attributable fractions. Am J Epidemiol
1988;128:1185–97.
32. Rockhill B, Newman B, Weinberg C. Use and misuse of
population attributable fractions. Am J Public Health 1998;
88:15–19.
33. Bruzzi P, Green SB, Byar DP, et al. Estimating the population
attributable risk for multiple risk factors using case-control
data. Am J Epidemiol 1985;122:904–14.
34. Pearl J. Causal diagrams for empirical research (with discussion). Biometrika 1995;82:669–710.
35. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology 1999;10:37–48.
36. Robins JM, Greenland S. Identifiability and exchangeability
for direct and indirect effects. Epidemiology 1992;3:143–55.
37. Little RJ, Rubin DB. Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical
approaches. Am Rev Public Health 2000;21:121–45.
38. Pearl J. Causality. New York, NY: Cambridge University
Press, 2000.
39. Greenland S, Brumback B. An overview of relations among
causal modeling methods. Int J Epidemiol 2002;31:1030–7.
where the sum on the left-hand side runs over all subsets of
f(1), . . ., (m)g. If we isolate the class of sufficient causes
with the largest number of known component causes, equation A2 can be rewritten as
PDCðSð1Þ;...;ðmÞ Þ ¼ PAFðX1 ; . . . ; Xk Þ
PAFðXðmþ1Þ ; . . . ; XðkÞ Þ
X
PDCðSI Þ;
where the sum on the right-hand side runs over all proper
subsets of f(1), . . ., (m)g. In the special case m ¼ k, the
second term on the right-hand side of equation A3 must be
set equal to zero. The recursive equations A3, together with
the initial equations A1, allow the calculation of almost all
frequencies of sufficient causes. The remaining frequency
refers to S0 that can be calculated by the simple additional
formula:
PDCðS0 Þ ¼ 1 PAFðX1 ; . . . ; Xk Þ:
Equation System for PDC in the
Multicomponent Case
Solution of the Equation System in the
Multicomponent Case
ðA1Þ
Now consider any permutation X(1), . . ., X(k) of the risk
factors and let X(mþ1), . . ., X(k) be the last k m factors in
this arrangement. Then, the difference between the PAF of
X1, . . ., Xk and the PAF of X(mþ1), . . ., X(k) can be attributed
to sufficient causes containing only risk factors as known
component causes that have to belong to fX(1), . . ., X(m)g.
Therefore, we obtain the following relations:
X
PDCðSI Þ ¼ PAFðX1 ; . . . ; Xk Þ
I4fð1Þ;...;ðmÞg
PAFðXðmþ1Þ ; . . . ; XðkÞ Þ;
Am J Epidemiol 2006;163:76–83
ðA2Þ
ðA4Þ
The equations A1, A3, and A4 together form a system of 2k
equations appropriate to determine the frequencies of the 2k
classes of sufficient causes.
APPENDIX 2
PDCðSi Þ ¼ PAFðX1 ; . . . ; Xk Þ
PAFðX1 ; . . . ; Xi1 ; Xiþ1 ; . . . ; Xk Þ:
ðA3Þ
Ifð1Þ;...;ðmÞg
APPENDIX 1
This appendix provides the equations for calculating the
frequency distribution for sufficient causes from the single
and joint PAF of component causes. Let X1, . . ., Xk be known
risk factors of disease, let I be a subset of f1, . . ., kg, and let
SI be the class of sufficient causes that all contain Xi, i 2 I, as
component causes but that do not contain Xi for i ; I.
Obviously, the joint PAF of X1, . . ., Xk must be greater
than or equal to the joint PAF of X1, . . ., Xi1, Xiþ1, . . ., Xk.
The difference between these two PAFs is equal to the fraction of cases that is due to the class Si of sufficient causes
containing Xi as the only known component cause. Thus, the
following equations hold.
83
Again, let X(1), . . ., X(k) be any permutation of the risk
factors X1, . . ., Xk, and let SI, I 4 f1, . . ., kg, be the class of
sufficient causes that contains all Xi, i 2 I, as component
causes but that does not contain Xi for i ; I. Then, in the
generalization of equation 1, the joint PAF of the first m risk
factors X(1), . . ., X(m), m k, can be obtained by summing up
the PDC values of all sufficient causes that contain at least
one of the m risk factors as the component cause. Thus, we
have the following equation:
X
PDCðSI Þ:
ðA5Þ
PAFðXð1Þ ; . . . ; XðmÞ Þ ¼
I4fð1Þ;...;ðmÞg
On the other hand, applying the formula of Bruzzi et al.
(33) that was formally proven by Benichou (12), the PAF
of X(1), . . ., X(m) is equal to
PAFðXð1Þ ; . . . ; XðmÞ Þ ¼
X
I4fð1Þ;...;ðmÞg
PðDjEI Þ
RRI 1
; ðA6Þ
RRI
where EI is the multifactorial event with Xi ¼ 1 for i 2 I
and Xi ¼ 0 for i ; I, and where RRI is the relative risk of
EI versus E0. Thus, PDC defined by equation 4 is a solution
of the equation system A5. Since equation A5 is equivalent
to equations A1 and A2, equation 4 also represents the solution for the latter ones.