Systematic evaluation of environmental and behavioural factors

Published by Oxford University Press on behalf of the International Epidemiological Association
ß The Author 2013; all rights reserved.
International Journal of Epidemiology 2013;42:1795–1810
doi:10.1093/ije/dyt208
Systematic evaluation of environmental and
behavioural factors associated with all-cause
mortality in the United States National Health
and Nutrition Examination Survey
Chirag J Patel,1 David H Rehkopf,2 John T Leppert,3 Walter M Bortz,4 Mark R Cullen,2
Glenn M Chertow4 and John PA Ioannidis1*
1
Stanford Prevention Research Center, Stanford University School of Medicine, CA, USA 2Division of General Medical Disciplines,
Stanford University School of Medicine, CA, USA 3Department of Urology, Stanford University School of Medicine, CA, USA and
4
Division of Nephrology, Department of Medicine, Stanford University School of Medicine, CA, USA
*Corresponding author. Stanford Prevention Research Center, Stanford University School of Medicine, 1265 Welch Rd, Stanford
94305, CA, USA Email: [email protected]
Accepted
4 September 2013
Background Environmental and behavioural factors are thought to contribute to
all-cause mortality. Here, we develop a method to systematically
screen and validate the potential independent contributions to allcause mortality of 249 environmental and behavioural factors in the
National Health and Nutrition Examination Survey (NHANES).
Methods
We used Cox proportional hazards regression to associate 249 factors
with all-cause mortality while adjusting for sociodemographic factors
on data in the 1999–2000 and 2001–02 surveys (median 5.5 follow-up
years). We controlled for multiple comparisons with the false discovery rate (FDR) and validated significant findings in the 2003–04
survey (median 2.8 follow-up years). We selected 249 factors from
a set of all possible factors based on their presence in both the 1999–
2002 and 2003–04 surveys and linkage with at least 20 deceased
participants. We evaluated the correlation pattern of validated factors
and built a multivariable model to identify their independent contribution to mortality.
Results
We identified seven environmental and behavioural factors associated
with all-cause mortality, including serum and urinary cadmium, serum
lycopene levels, smoking (3-level factor) and physical activity. In a
multivariable model, only physical activity, past smoking, smoking in
participant’s home and lycopene were independently associated with
mortality. These three factors explained 2.1% of the variance of all-cause
mortality after adjusting for demographic and socio-economic factors.
Conclusions Our association study suggests that, of the set of 249 factors in NHANES,
physical activity, smoking, serum lycopene and serum/urinary cadmium
are associated with all-cause mortality as identified in previous studies
and after controlling for multiple hypotheses and validation in an independent survey. Whereas other NHANES factors may be associated with
mortality, they may require larger cohorts with longer time of follow-up
to detect. It is possible to use a systematic association study to prioritize
risk factors for further investigation.
Keywords
All-cause mortality, exposure, behaviour, environment-wide association study
1795
1796
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
Introduction
Identification of environmental and behavioural factors associated with mortality is critical for public
health and preventive care. Many of these factors
may be possible to modify, as opposed to genetic
and demographic factors (age, sex, race/ethnicity)
that are impossible to change and socio-economic factors (e.g. income, education and occupation) that are
very difficult to change. McGinnis, Foege, Mokdad
et al. identify behavioural and environmental risk factors as ‘actual causes of deaths in the United States’,
requiring as much attention and response as standard
proximate clinical conditions.1,2 One way to ascertain
and compare environmental and behavioural risks for
mortality is to integrate data from national health
surveys linked with mortality registries.2,3 There is a
large body of literature on studies that try to identify
environmental factors and behaviours that may increase or decrease death risk. However, these studies
typically assess and report one or a few factors at a
time, and may lack systematic validation in independent datasets. Modern humans are now exposed to a
complex array of environmental and behavioural factors4,5 and in theory many behaviours may entail
health risks and benefits. However, there is a lack
of analytic strategies that aim to decipher concurrently how multiple environmental and behavioural
factors are associated with mortality. Further, potential environmental exposure and behavioural risk may
be modified or determined by demographic attributes,
such as sex, race/ethnicity and socio-economic status.
Lack of standardization in the analysis may lead to
inflated or spurious irreproducible effects.6,7 This is in
contrast to current-day genome-wide association studies (GWAS), a systematic analytic strategy to correlate millions of common genetic factors with disease
traits.8 These investigations have resulted in a robust
literature of genetic findings in contrast to environmentally- or behaviourally-based investigations.8
We have recently developed methods for environment-wide association study (EWAS), aiming to
search for and validate environmental factors
associated with disease and disease-related phenotypes.9–11 Here, we extend this methodology to systematically evaluate the associations of 249
environmental and behavioural factors, such as
blood and urine biomarkers of exposure (e.g. pollutants and nutrients), and behavioural factors (e.g.
physical activity, smoking and alcohol consumption),
with all-cause mortality. We analyze the association
of 249 factors on all-cause mortality using information collected from participants of the 1999–2002
United States National Health and Nutrition and
Examination Survey (NHANES) with linked mortality
information ascertained by the National Death Index
(NDI) in 2006. We subsequently validate findings in
an independent survey, 2003–04 NHANES. Last, we
evaluate the correlation pattern between tentatively
validated factors and identify those that have
independent effects on all-cause mortality and how
these interplay with demographic and socio-economic
attributes.
Methods
NHANES 1999–2000, 2001–02 and 2003–04
We downloaded NHANES laboratory, questionnaire
and National Death Index (NDI) linked mortality
data for 1999–00, 2001–02 and 2003–04 surveys.
Mortality information was collected from the date of
the survey participation through 31 December 2006
and ascertained via a probabilistic match between
NHANES and NDI death certificate information. The
NDI matches individuals on personal and demographic criteria, such as social security number and
date of birth, and its performance has been described
elsewhere (e.g. ref 12). Overall, 9555, 11 021, and
10 100 participants were followed in the 1999–2000,
2001–02 and 2003–04 surveys, respectively, with 611,
470 and 276 assumed death events, respectively. We
used the 1999–2000 and 2001–02 surveys to scan for
factors associated with all-cause mortality (‘training’
dataset) and reserved the 2003–04 survey to replicate
findings from the training set.
Factors such as age, sex, race/ethnicity, educational
attainment, occupation and income are hypothesized
to be associated with both mortality and environmental/behavioural factors and we estimated their
association with mortality.13 Further, these sociodemographic factors may also confound associations of
environmental/behavioural factors with death. In
NHANES, race/ethnicity was coded as Non-Hispanic
White (‘White’), Mexican American (‘Mexican’),
Non-Hispanic Black (‘Black’), Other Hispanic and
Other. We coded educational attainment as less
than high school, high school equivalent and greater
than high school education. We estimated socioeconomic status (SES) as the categorical quintile of
income/poverty index as previously described.9,10 We
estimated occupation in categories corresponding to
white-collar and professional (reference group),
white-collar and semi-routine (e.g. technicians),
blue-collar and high-skill (e.g mechanics, construction
trades and military) and blue-collar and semi-routine
(e.g. personal services, farmworkers) as previously
described.14
Figure 1 depicts our procedure. We assessed a total
of 249 environmental and behavioural factors, see
Table 1 and Supplementary Table S1 (available as
Supplementary data at IJE online). These factors
were either (i) information on behaviours, such as
self-reported dietary intake (from a food frequency
questionnaire), self-reported alcohol consumption,
self-reported smoking, body mass index (BMI) from
a physical examination or self-reported physical activity; or (ii) physical/chemical biomarkers of external
exposures measured in serum or urine, such as
ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY
1797
Figure 1 Methodology to scan for environmental and behavioural factors associated with mortality. (A) Summary of
environmental and behavioural variables in three independent NHANES surveys (1999–2000, 2001–02, 2003–04). (B)
Training (combined 1999–2000 and 2001–02 surveys) and testing survey information. (C) Associating each 249 variables
with all-cause mortality (SES, socio-economic status estimate, quintile of income/poverty ratio). (D) Empirical false discovery rate (FDR) estimation in training surveys. (E) Proportional hazards assumption verification. (F) Tentative validation
(P <0.05 in testing surveys). (G) Estimation of variance explained by tentatively validated factors with independent contribution and interaction with demographic variables
blood lead concentration. Table 1 shows examples of
factors and Table S1 (available as Supplementary data
at IJE online) provides a listing of all factors. There
were a total of 416, 467 and 574 factors in the 1999–
2000, 2001–02 and 2003–04 surveys, respectively. Next,
from these 406, 457 and 564 factors, we identified a
total of 347 that were present in all three surveys. Of
these 347 factors, we found 249 that could be linked
with at least 20 deceased participants in the training
(1999–2000 and 2001–02 surveys) and testing (2003–
04) datasets independently (Figure 1A, B).
Behavioural factors included three surveying alcohol
consumption, one on ‘street drug’ use, 58 factors on
food and nutrient consumption, 23 on smokingrelated behaviours [e.g. ‘current or past smoker?
(versus never smoker)’, ‘does anyone in your household smoke (yes/no)?’)] one on physical activity and
three on social support (e.g. ‘have anyone to help?’,
1798
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
Table 1 Number and examples of environmental and behavioural factors
Factor category
Behavioural factors:
Alcohol use
No.
3
Examples
Drink five per day (yes/no)?
Quantity drinks per day (ordinal)
Personal smoking
19
Current or past smoker (referent: no smoking)
Smoke cigars 20 times in life (yes/no)?
Family smoking
4
Does anyone smoke in the home?
Cotinine
1
Serum levels of nicotine metabolite (log and per 1 SD)
Physical activity
1
Health.gov guideline activity levels (ordinal)
Social support
3
Anyone to help (yes/no)?
Street drug use
1
Ever used cocaine or street drugs (yes/no)?
Body mass index
4
<18.5 kg/m2, or
Total cigarette smokers in home (ordinal)
First-degree support (yes/no)?
525 and <30 kg/m2, or
530 and <35 kg/m2, or
535 kg/m2 (referent: 518.5 and <25 kg/m2)
Food nutrient recall
58
Dietary nutrient intake levels derived from Food frequency
questionnaire (FFQ) (continuous and adjusted for
caloric intake)
Environmental Factors (serum- and urine-based):
Bacterial infection
2
MRSA 1 present (yes/no)
S. aureus present (yes/no)
Viral infection
5
Hepatitis B antibody (yes/no)
Hepatitis A antibody (yes/no)
Diakyl
6
Urinary dimethylphosphate (log per 1 SD)
Dioxins
7
2,3,7,8-tetrachlorodibenzodioxin (log and per 1 SD)
Furans
10
2,3,7,8-tetrachlorodibenzofuran (log and per 1 SD)
Heavy metals
15
Urinary cadmium (log and per 1 SD)
Hydrocarbons
21
Urinary 1-hydroxyfluorene (log and per 1SD)
Nutrients and minerals
15
Serum folate (log and per 1 SD)
Serum cadmium (log and per 1 SD)
Serum vitamin D (log and per 1 SD)
Polychlorinated biphenyls
34
Serum (polychlorinated biphenyls) PCB170 (log and per 1 SD)
Pesticides
22
Serum heptachlor epoxide (log and per 1 SD)
Phthalates
12
Urinary mono-n-butyl phthalate (log and per 1 SD)
Phytoestrogens
Total
6
Urinary enterolactone (log and per 1 SD)
249
‘how many close friends do you have?’). We discuss
these variables in the following. First, the three factors on alcohol consumption included five or more
drinks per day, number of drinks per day in last
month [z-standardized (divided by the population
standard deviation to facilitate comparison of effects)
ordinal factor] and how many total days drinking per
year (z-standardized ordinal factor).
The 23 smoking factors included four regarding
family smoking behaviour and 19 on personal smoking behaviour. The four family smoking behaviour
factors included any smokers in the household (referent group: no smokers in household), total number of
cigarette smokers in the household (z-standardized
ordinal factor) and the total number of cigarettes
smoked at home (z-standardized ordinal factor).
ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY
The 18 factors regarding personal behaviour included
a categorical factor on current or past smoking (analyzed as a two-level categorical factor with never
smoking as a referent) and four on ever-used cigars,
chewing tobacco, snuff and pipes (referent group:
never smoked the item). Specifically for current and
past smokers, factors included the number of cigarettes smoked just before quitting (z-standardized
ordinal factor), how many years smoked (z-standardized ordinal factor), number of cigarettes currently
smoking (z-standardized ordinal factor), the average
number of cigarettes smoked per day in the past
month (z-standardized ordinal factor) and an estimated nicotine, tar and carbon monoxide content of
smoked item (z-standardized ordinal factors). Other
factors for current smokers included years since
started smoking (z-standardized ordinal variable).
Physical activity was estimated by deriving metabolic
equivalents for self-reported leisure and normal-time
activities15 and treated as an ordinal factor based on
Health.gov physical activity guideline categories for no
aerobic activity, low activity (medium intensity activity
greater than baseline but fewer than 150 min/week),
moderate activity (150 to 300 medium intensity min/
week) and high activity (4300 min medium intensive
activity per week or 4150 min high intensity per week)
as previously described.10,16
The 58 self-reported food and nutrient consumption
factors were determined from one in-person 24-h
interview (1999–2000, 2001–02) or two 24-h (2003–
04) in-person and telephone interviews using the
United States Department of Agriculture and
Department of Health and Human Services food
recall questionnaires.17–20 These food and nutrient
consumption factors were linearly adjusted by total
caloric intake and z-standardized.
We considered BMI as another behavioural four-level
categorical factor. We divided BMI into five categories
as previously described,21 including <18.5 kg/m2,
518.5 and <25 kg/m2, 525 and <30 kg/m2,
530 and <35 kg/m2, and 535 kg/m2. The 518.5 and
<25 kg/m2 category was the reference group.
The 156 factors were serum or urine-based measures
of environmental exposure, including infectious
agents, environmental chemicals and nutrients.
Broadly, these included a serum marker of nicotine
metabolism (cotinine), dioxins (n ¼ 7 markers),
furans (n ¼ 10), heavy metals (n ¼ 15), hydrocarbons
(n ¼ 21), nutrients (n ¼ 15), polychlorinated biphenyls
(n ¼ 34), pesticides (n ¼ 22), phthalates (n ¼ 12), oestrogenic compounds (n ¼ 6), bacterial (n ¼ 2) and viral
organisms (n ¼ 6). With the exception of assays detecting infectious agents (which were positive/negative assays), factors were continuous in scale.
Continuous biomarker factors that had a rightskewed distribution were log-transformed and z-standardized as previously described.9,10
Different measures of environmental and behavioural
factors had different numbers of eligible participants
1799
for mortality follow-up assessment (Figure 1B). In
the training surveys (1999–2002), there were 330–
6008 eligible participants (with 26–655 death events).
For the replication survey (2003–04), there were
177–3258 eligible participants (with 20–202 deaths)
(Supplementary Table S1, available as Supplementary
data at IJE online). We used the R-project survival and
survey library for all analyses and accounted for clusters
pseudo-strata, pseudo-sampling units and participant
weights to accommodate the complex sampling of
the data.22,23 Estimates were verified with STATA.24
Systematic scan of environmental and
behavioural factors associated with all-cause
mortality
We associated each of the 249 factors to all-cause
mortality serially using proportional hazards (Cox) regression, while adjusting for sociodemographic attributes described above, including age, sex, an estimate
of SES (categorical quintiles of poverty to income
ratio), educational attainment, occupation and race/
ethnicity in the training surveys, the 1999–2002
NHANES (the ‘training’ step, Figure 1C). We used
the FDR to correct for multiple hypotheses as
described previously9–11 (Figure 1D). The FDR is the
estimated proportion of the false discoveries made
over the number of total discoveries made at a
given significance level. We used a permutation simulation method to estimate the numerator, the number
of false positives incurred at a significance threshold
as documented earlier.9,11,25 Specifically, to estimate
the expected number of false positives, we permuted
the censorship and follow-up time variable within
each stratum of the survey; in other words, participants were randomly assigned mortality status. Then,
we re-ran survival analyses for each of the 249 factors. We repeated this process 100 times to attain a
distribution of P-values drawn from the null distribution. The permutation method accounts for the correlation amongst factors.26 We set an FDR threshold of
5% to identify findings in the training step for validation in the testing survey. For each factor that passed
the FDR threshold in the training step, we assessed
violation of proportional hazards by examining interaction between the factor and follow-up time.
We deemed a factor tentatively validated if it had
achieved FDR <5% significance in the training scan
(1999–2002 surveys) and achieved nominal statistical
significance in the test (2003–04) survey (P-value
<0.05, Figure 1D–F). For validated findings, we computed an overall adjusted hazard ratio (referred to as
‘overall HR’) by combining both the training and testing survey datasets (Figure 1F). We verified whether
the validated factors violated the proportional hazards
assumption by checking their interaction with followup time. We did not have evidence that any of these
factors significantly violated the assumption (P40.05).
We assessed the non-parametric correlations among
factors that had an FDR < 5% in the training step,
1800
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
specifically bi-serial correlations between binary factors and Spearman correlations when considering
quantitative factors. We visualized these pairwise correlations in a heat map and arranged the factors using
a hierarchical clustering algorithm27 as previously
described.11
We computed the power for detection of factors at
P-value corresponding to FDR <5% (equivalent to
P ¼ 0.0003) for sample sizes corresponding to each
factor tested at a range of adjusted HR of (1.1, 1.3,
1.5, 1.7 and 1.9) with the powerSurvEpi R library.28
Specifically, this library implements methods that
take into account the correlation among the factor
and adjustment co-variates29,30 sample size and
number of death events to estimate power at a
given P-value threshold and HR. We then estimated
how many factors we would detect if every one of the
249 were associated with all-cause mortality for FDR
<5% (P <0.0003) and each HR above by totalling the
power estimations for each factor tested (Supplementary Table S2, available as Supplementary data at
IJE online). At HRs of 1.1, 1.3, 1.5, 1.7 and 1.9, we
estimated we would find 7 out of 249 (3%), 120/249
(49%), 194/249 (79%), 221/249 (89%) and 233/249
(94%), respectively, if all 249 factors were associated
with all-cause mortality. We concluded we were
adequately powered to detect modest and large
associations (HR 41.3 or HR <0.8), but not weak
associations with all-cause mortality.
Interaction checks with two lowest SES
categories, male sex and Non-Hispanic Black
race/ethnicity
For tentatively validated factors, we aimed to assess
their interaction with demographic and socio-economic characteristics associated with risk for allcause mortality, namely male sex, two lowest SES
quintiles and Non-Hispanic Black race/ethnicity in
the combined cohort (training and testing cohorts,
Figure 1G). Specifically, we modelled the interactions
among each of the validated findings and the three
demographic factors with a multiplicative term in the
Cox proportional hazards model while controlling for
the remaining demographic co-variates above (age,
sex, education, quintile of SES, occupation and race/
ethnicity). As one example, the interaction between
serum cadmium exposure (‘X’) and male sex would
have been modelled as: log(HR) ¼ b1 * X þ b2 *
male þ b3 * X * male þ other adjustment covariates
(age, race/ethnicity, education, SES, occupation). We
assessed whether inclusion of the interaction term
(b3) was significant at the Bonferroni level of significance after considering 7 times 3 interaction tests
(P < 0.05/21 ¼ 0.002).
Variance explained of validated factors
To estimate the additive effects and overall variance
explained by identified factors, we built three multivariable models that included tentatively validated
factors (Figure 1G). The first models contained tentatively validated factors in addition to age, sex, quintiles of SES, education and occupation as defined
above. The third model contained tentatively validated
factors in addition to age, sex and race/ethnicity but
excluded socio-economic factors. We hypothesized
that the socio-economic factors may influence some
of the environmental and behavioural factors. Under
this hypothesis, the strength of the associations of the
environmental/behavioural factors might be stronger
in a model without socio-economic co-variates (SES,
education and occupation) versus models with socioeconomic factors. We computed the Nagelkerke R2 to
estimate the variance explained for each model and,
in addition, ascribed solely to the environmental and
behavioural factors. We computed standard errors
around the Nagelkerke R2 with a bootstrapping procedure that accommodated stratified data.31
Results
Baseline characteristics of deceased and
surviving participants in NHANES 99–02
and NHANES 03–04
There were a total of 6008 eligible participants for
study in NHANES 1999–2002 with a median time to
follow-up of 66 months. As expected, we found important associations among demographic characteristics and mortality, including older age [adjusted
hazard ratio (HR) ¼ 2.2 (2.1, 2.4) for a 10-year
increase], male sex [HR ¼ 1.7 (1.4, 2.1)], and nonHispanic Black race/ethnicity [HR ¼ 1.4 (1.1, 1.8) relative to non-Hispanic Whites]. We also observed higher
risk depending on SES (as defined by quintile of
income-poverty ratio). Individuals at the two lowest
SES standings had greater than 2-fold risk for
death [HR ¼ 2.2, (1.5, 3.6) and 2.4 (1.7, 3.4) for first
and second quintiles, respectively] versus the highest
SES (Table 2). Supplementary Table S1 (available as
Supplementary data at IJE online) shows factors that
differed among alive and deceased participants.
NHANES 2003–04 was used for validation. There
were a total of 3262 eligible participants for study in
NHANES 2003–04 with a median follow-up of 34
months. We observed similar trends in NHANES
2003–04 (Table 3). Participants in NHANES 2003–04
had higher mean survivor age of 44.5 years and
deceased mean age of 71.4 years. The adjusted HR
for a 10-year increase in age was 2.6 [95% CI: 2.3,
3.0)] versus 2.2 in NHANES 1999–2002. We observed
double the risk for men [adjusted HR ¼ 2.0 (1.5, 2.7)].
Limited cause of death information was available for
deceased participants and was coded as International
Classification of Diseases version 10 (ICD10) codes.
The Center for Disease Control and Prevention
(CDC)/National Center for Health Statistics (NCHS)
binned ICD10 codes into 113 groups. The top five
causes of death for participants in the 1999–2001
ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY
1801
Table 2 Demographic and socio-economic attributes and hazard ratios (HRs) for NHANES 1999–2002 ‘training’ samples
Age
Survivors
n ¼ 5353a
43.45 (0.34)
Deceased
n ¼ 655a
68.25 (0.83)
Age-adjusted
HRb
2.32 [2.15,2.49]
Demographic-adjusted
HRb
2.24 [2.07,2.43]
Male
47.9 (0.6)
52.9 (2.4)
1.56 [1.26,1.93]
1.72 [1.38,2.13]
Non-Hispanic Black
10.4 (1.2)
11.4 (1.4)
1.66 [1.33,2.08]
1.40 [1.09,1.81]
Mexican American
7.2 (0.9)
3.6 (0.8)
1.15 [0.80,1.64]
0.86 [0.60,1.23]
Race (%):
Other
4.7 (0.6)
1.6 (0.6)
0.57 [0.30,1.08]
0.54 [0.30,0.98]
Other Hispanic
6.8 (1.7)
5.6 (2.2)
1.18 [0.77,1.80]
0.92 [0.59,1.43]
70.8 (1.8)
77.7 (2.4)
ref
ref
<High school
20.8 (0.9)
37.5 (2.4)
1.42 [1.14,1.77]
1.24 [0.98,1.57]
High school
25.7 (1.0)
29.2 (1.9)
1.65 [1.33,2.05]
1.23 [0.98,1.54]
4High school
53.5 (1.5)
33.2 (2.5)
ref
ref
Non-Hispanic White
Education (%):
Income (quintile of income/poverty) (%):
Quintile 1
16.9 (0.9)
19.2 (2.0)
2.39 [1.70,3.38]
2.32 [1.51,3.57]
Quintile 2
18.5 (1.1)
33.5 (2.8)
2.47 [1.74,3.50]
2.41 [1.69,3.44]
Quintile 3
19.9 (0.7)
22.0 (2.3)
1.89 [1.29,2.75]
1.76 [1.20,2.57]
Quintile 4
19.6 (0.6)
13.8 (1.7)
1.68 [1.08,2.59]
1.60 [1.03,2.48]
Quintile 5
25.1 (1.4)
11.4 (2.0)
ref
ref
Blue-collar semi
38.1 (1.0)
39.1 (2.8)
1.58 [1.25,1.99]
1.18 [0.87,1.59]
Blue-collar high
10.3 (0.7)
14.6 (1.8)
1.73 [1.22,2.43]
1.10 [0.78,1.55]
2.6 (0.2)
2.8 (0.7)
0.78 [0.40,1.50]
0.76 [0.37,1.58]
White-collar semi
20.9 (0.8)
19.0 (1.4)
0.96 [0.77,1.19]
0.98 [0.74,1.30]
White-collar high
23.5 (0.8)
19.1 (1.8)
ref
ref
Occupation:
Never worked
Semi, semi-routine; high, high skill; ref, referent.
a
Unweighted sample size.
b
HR adjusted for all other demographic and socio-economic factors.
surveys included the groups ‘other forms of ischaemic
heart disease’ (ICD10 codes I20, I25.1–I25.9, 10% of the
deceased population), ‘cerebrovascular diseases’
(ICD10: I60–I69, 8% of deceased participants), ‘other
diseases’ (more than ten ICD10 groups, 7% of deceased
participants), ‘malignant neoplasms of trachea, bronchus and lung’ (ICD10: C33–C34, 7%), and ‘acute myocardial infarction’ (ICD10: I21–I22, 6% of deceased
participants). The top five causes of death for deceased
participants in the 2003–04 survey were similar and
included ‘other forms of ischaemic heart disease’
(12% of deceased participants), ‘malignant neoplasms
of trachea, bronchus and lung’ (10% of deceased
participants), ‘other diseases’ (8%), ‘acute myocardial
infarction’ (8%) and ‘cerebrovascular diseases’ (6%).
Systematic scan of environmental and
behavioural factors associated with
all-cause mortality
We associated each of the 249 environmental and
behavioural factors (self-reported or biomarkers of
exposure) with all-cause mortality in turn, adjusting
for age, sex, race/ethnicity, SES, occupation and educational attainment in the NHANES 1999–2002 surveys (the ‘training’ dataset). Figure 2 shows the
results, visualizing the adjusted hazard ratio versus
the P-value of the association. Adjusted HR denotes
risk for all-cause mortality per 1 SD for continuous
factors or per incremental change for ordinal values.
For categorical or binary factors, adjusted hazard
ratios denote risk relative to the referent category
(‘negative’ for an exposure).
We found 7 (out of 249) factors at FDR <5% in the
training surveys (1999–2002 NHANES) and were able
to tentatively validate all 7 factors in the test survey
(P <0.05 in 2003–04 NHANES) (Table 4, Figure 2).
The strongest association included physical activity,
analyzed as an ordinal factor (representing the trend
from no, low, medium and high activity as defined by
Health.gov categories) with adjusted HR of 0.72 for
the trend [CI: (0.66, 0.79), P-value ¼ 4 1012] in
the training surveys (Figure 2) and an adjusted HR
1802
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
Table 3 Demographic and socio-economic attributes and hazard ratios (HRs) for NHANES 2003–2004 ‘testing’ samples
Age
Survivors
n ¼ 3059a
44.49 (0.55)
Deceased
n ¼ 203a
71.42 (1.37)
Age-adjusted
HRb
2.60 [2.28,2.96]
Demographics-adjusted
HRb
2.28 [1.97,2.64]
Male
48.0 (0.7)
56.7 (3.5)
1.81 [1.40,2.36]
2.28 [1.55,3.35]
Non-Hispanic Black
11.3 (1.8)
12.4 (3.3)
1.54 [1.11,2.12]
1.35 [0.98,1.86]
Mexican American
8.0 (2.0)
4.3 (2.7)
1.21 [0.76,1.92]
0.78 [0.48,1.29]
Race (%):
Other
5.2 (0.7)
4.8 (2.1)
1.53 [0.62,3.74]
1.48 [0.50,4.38]
Other Hispanic
3.7 (0.7)
3.2 (1.9)
1.69 [0.58,4.87]
1.68 [0.59,4.82]
71.8 (3.4)
75.3 (3.8)
ref
ref
<High school
18.5 (1.2)
33.7 (4.6)
1.31 [0.96,1.80]
1.05 [0.67,1.65]
High school
27.0 (1.0)
25.7 (4.8)
1.00 [0.56,1.77]
0.97 [0.53,1.76]
4High school
54.6 (1.2)
40.6 (5.1)
ref
ref
Non-Hispanic White
Education (%):
Income (quintile of income/poverty) (%):
Quintile 1
16.9 (1.5)
22.2 (4.0)
2.33 [1.24,4.36]
2.05 [1.04,4.04]
Quintile 2
19.7 (0.9)
31.9 (3.9)
1.62 [0.79,3.30]
1.31 [0.59,2.92]
Quintile 3
19.6 (1.0)
21.6 (5.2)
1.38 [0.59,3.21]
1.23 [0.52,2.91]
Quintile 4
22.0 (1.2)
12.5 (2.9)
0.85 [0.51,1.43]
0.72 [0.37,1.39]
Quintile 5
21.9 (1.8)
11.9 (3.2)
ref
ref
Blue-collar semi
39.0 (2.0)
27.0 (5.4)
0.87 [0.52,1.44]
0.80 [0.47,1.38]
Blue-collar high
11.8 (1.2)
23.4 (3.8)
1.89 [1.25,2.86]
1.40 [0.78,2.49]
2.2 (0.2)
5.9 (1.6)
1.37 [0.69,2.71]
1.61 [0.73,3.54]
Occupation:
Never worked
White-collar semi
23.1 (1.2)
17.9 (3.0)
0.71 [0.48,1.07]
0.94 [0.55,1.58]
White-collar high
22.0 (1.4)
26.0 (3.8)
ref
ref
Semi, semi-routine; high, high skill; ref, referent.
a
Unweighted sample size.
b
HR adjusted for all other demographic and socio-economic factors.
of 0.63 (P ¼ 1 1010) in the test survey (Table 4). We
also estimated the adjusted HR of each physical activity level relative to other categories. In the combined
surveys, the adjusted HR for low activity relative to
zero activity was 0.60 [95% CI: (0.47, 0.73),
P ¼ 3 106]. The adjusted HR for moderate activity
versus low activity was 0.58 [95% CI: (0.41, 0.82),
P ¼ 3 103] and high activity versus moderate activity was not significant, with an adjusted HR of 1.2
[95% CI: (0.80, 1.7), P ¼ 0.39]. We had evidence
for multiple associations of environmental and
behavioural factors with all-cause mortality as seen
in the deviance from uniform distribution of
P-values (Supplementary Figure S1, available as
Supplementary data at IJE online).
Three self-reported smoking factors were associated
with mortality. These included the categorical factor
past and current smoking (versus never smoking).
The adjusted HR for past smoking was 1.5 [95% CI:
(1.2, 1.8), P ¼ 8 105 in training surveys] and 2.0 for
current smoking [95%CI: (1.4,2.9), P ¼ 2x104]. The
third self-reported smoking factor included anyone
smoking in the participant’s home [adjusted HR: 2.0
(1.6, 2.6), P ¼ 1 107 in the training surveys]. We
observed slightly larger estimates in the test survey
for these factors. For example, the adjusted HR for
current smokers and past smokers versus never smokers was 1.7 and 3.0, respectively (P < 8 105 and
2 104).
We found urine and serum cadmium levels associated with mortality. Serum cadmium had an adjusted HR of 1.4 for a 1-SD change in logged
exposure value [CI: (1.2, 1.6), P ¼ 1 105] and for
urinary cadmium the adjusted HR was 1.6 [CI: (1.3,
2.0), P-value ¼ 6 105]. Adjusted HR in the test surveys were higher [1.6 (P ¼ 6 107) and 2.0
(P ¼ 2 105), respectively].
We also found a serum nutrient marker associated
with all-cause mortality. Specifically, the serum carotenoid trans-lycopene was negatively associated with
ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY
1803
Figure 2 Volcano plot of 249 environmental and behavioural factor associations with all-cause mortality in training step
(all black points). Red horizontal line denotes FDR-adjusted level of statistical significance (FDR ¼ 5%, P-value ¼ 0.0003).
Red points show the standard demographic and socio-economic factors considered for adjustments. For SES: SES_0: 1st
quantile of SES, SES_1: 2nd quantile of SES, SES_2: 3rd quantile of SES, SES_3: 4th quantile of SES; SES HR are relative to
highest quintile of SES. For education: education_hs: high school education, education_less_hs: less than high school
education, education HR relative to greater than hig -school education. For occupation: occupation_blue_semi: semi blue
c-ollar, occupation_blue_high: high blue-collar, occupation_white_semi: semi white-collar, occupation_never: never worked.
Filled black markers denote validated factors. –log10(P-value) for physical activity and age are annotated in parentheses,
since they are extreme. Y-axis is discontinuous to accommodate higher –log10(P-values) for physical activity and age
all-cause mortality (Figure 2; Table 4). Specifically,
trans-lycopene had an HR of 0.6, and higher levels
of trans-lycopene were associated with a 20%
decreased risk for mortality (Table 4). Adjusted HRs
in the test surveys for these variables were similar.
Several factors had higher HR (42) but did not have
FDR <5%, including hepatitis C antibody and hepatitis B surface antigen. Antibodies to hepatitis C had
an adjusted HR of 2.7 in the training surveys 2.7 [95%
CI: (1.4, 5.0), P ¼ 0.002, FDR ¼ 10%] and 2.2 in the
combined surveys [95% CI: (1.2, 3.9), P ¼ 0.009].
Hepatitis B surface antigen presence had an adjusted
HR of 2.6 in the training surveys [95% CI: (0.9, 7.3),
P ¼ 0.08, FDR 430%] and 2.1 in the combined surveys
[95% CI: (0.8, 6.0), P ¼ 0.1].
Interaction checks with two lowest SES
categories, male sex and Non-Hispanic Black
race/ethnicity
We estimated whether the seven validated factors
interacted with three demographic categories (total
of 21 tests of interaction). We could not conclude
that any of these demographic factors modified associations for all-cause mortality after consideration of
multiple hypotheses (P 40.05 for all 21 interaction
tests).
Correlation pattern between putative risk
factors
We assessed the correlations among each of the environmental and behavioural factors with FDR <5%
(n ¼ 7) and adjustment covariates (n ¼ 21) and
observed that there were many modest correlations
among the 351 pairwise correlations that were calculated; 210 of the 378 correlations were significant
(Bonferroni-adjusted P < 0.05). The 5th to 95th percentile range of the absolute value of r was 0.005 to
0.30 (Figure 3) and the correlations that were significant had absolute values ranging from 0.04 to 0.62.
There were significant correlations between similar
factors belonging to the same group, such as smokingand cadmium-related factors. For example, the correlation between serum and urinary cadmium levels was
0.45 (adjusted P <1 1012). Self-reported anybody
1.14x106
1.66 [1.35,2.04]
All estimates are adjusted by age, sex, race, SES, education and occupation. n and number of events are unweighted.
FDR, false discovery rate; MET, Metabolic equivalent.
1.66x105
2.03 [1.47,2.80]
59
1.58x102
186
1783
Cadmium, urine (1 SD log)
1.62 [1.28,2.04]
5.7x105
1079
1.14x109
1.99 [1.59,2.48]
1.18x103
1.88 [1.28,2.76]
202
3258
3.5x103
1.1x107
655
6008
Does anyone smoke
in home?
2.00 [1.55,2.58]
4.47x1018
0.71 [0.66,0.77]
1.27x10
0.63 [0.54,0.72]
191
2989
<0.001
4.0x10
619
5534
Physical activity
(MET-based rank)
262
3096
Trans-lycopene(1 SD log)
0.72 [0.66,0.79]
1.20x109
0.80 [0.74,0.86]
1.45x107
0.79 [0.73,0.86]
179
3054
3.20x102
10
12
2.9x104
9.33x10
1.2x10
1.37 [1.19,1.57]
591
5722
Cadmium (1 SD log)
0.81 [0.72,0.91]
3.97x109
1.45 [1.28,1.65]
6.59x107
2.20 [1.61,3.00]
3
3120
188
1.63 [1.34,1.97]
5.66x10
7
9.51x106
3.17 [1.90,5.28]
201
2.80x102
5
652
Current smoker
5409
2.00 [1.38,2.89]
2.3x104
2911
5.31x106
1.53 [1.27,1.83]
1.65x102
1.66 [1.10,2.52]
201
1.68x102
652
Past smoker
5409
1.50 [1.23,1.83]
7.8x105
2911
P-value
Adjusted HR
[95% CI]
P-value
Adjusted HR
[95% CI]
Events
FDR
Description
n
Events
Current/past smoker (vs never smoker)
Adjusted HR
[95% CI]
P-value
n
Testing survey (2003–04)
Combined surveys
(1999–2004)
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
Training survey (1999–2002)
Table 4 Tentatively validated factors. Training denotes estimate from training survey, NHANES 1999–2002. Testing denotes estimates from testing survey, NHANES
2003–04. Combined denotes estimate from combining training and testing surveys
1804
smoking at home was significantly positively correlated
with current smoking status (r ¼ 0.6, adjusted
P <1 1012) and negatively correlated with past
smoking status (r ¼ 0.2, adjusted P <1 1012).
We observed correlations between smoking-related
behaviours, physical activity, levels of cadmium and
levels of trans-lycopene. First, smoking behaviour was
significantly correlated with cadmium biomarker
levels. Specifically, current smoking was correlated
with both serum and urine cadmium levels (r ¼ 0.52
and 0.21, respectively, adjusted P < 1 1012).
Physical activity was modestly correlated with translycopene with (r ¼ 0.2, adjusted P < 1 1012). Urine
cadmium was modestly correlated with past smoking
(r ¼ 0.1, adjusted P ¼ 1 105). On the other hand,
trans-lycopene was modestly but significantly negatively correlated with serum and urine cadmium
(r ¼ 0.13 and 0.16, adjusted P < 1 1012 and
P ¼ 8 1011, respectively).
Moreover, there were modest correlations between
the tentatively validated factors and demographic and
socio-economic factors (r 5 0.1 and adjusted P <0.01).
First, physical activity was positively correlated with
above high school education and 5th quintile of SES
(r ¼ 0.2 and 0.2, respectively, adjusted P <1 1012)
and negatively correlated with age (r ¼ 0.14, adjusted
p < 1x1012). Trans lycopene was inversely correlated
with age (r ¼ 0.3, adjusted P < 1 1012) and less
so than for high school education (r ¼ 0.13, P
<1 1012). Serum and urinary cadmium were directly
correlated with age (r ¼ 0.24 and 0.34, respectively, adjusted P <1 1012). Serum cadmium was additionally
correlated with less than high school education
(r ¼ 0.13, adjusted P <1 1012) and urinary cadmium
was correlated with Non-Hispanic Black race/ethnicity
(r ¼ 0.16, adjusted P <1 1012).
Smoking-related factors also exhibited correlations
with demographic factors. Self-reported current
smoking was correlated with male sex (r ¼ 0.1, adjusted P <1 1012) and inversely correlated with
age (r ¼ 0.15, adjusted P <1 1012). Current
smoking was also correlated with first quintile of
SES (r ¼ 0.12, adjusted P <1 1012). Similarly,
anyone smoking at home correlated with first quintile
of SES (r ¼ 0.11, adjusted P <1 1012) and with
Non-Hispanic black race/ethnicity (r ¼ 0.11, adjusted
P <1 1012). Past smoking was strongly correlated
with age (r ¼ 0.26, adjusted P <1 1012) and male
sex (r ¼ 0.14, adjusted P <1 1012).
Multivariable models and variance explained
by tentative validated factors
We built three multivariable models to estimate the
variance explained by the tentatively validated factors.
We opted to remove urinary cadmium from consideration in these models due to extensive missing information (only 1694 participants with 134 death events
versus 5155 participants with 416 events). In the first,
we entered five of the seven tentatively validated
ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY
1805
Figure 3 Pairwise correlations of factors with FDR <5% in the training set and of the standard demographic and
socio-economic factors used for adjustments
factors [serum cadmium, physical activity, anyone
smoked in home, current smokers/past smokers
(versus never smokers)] while adjusting for demographic covariates (Table 5) for participants from
both the training and testing surveys (Model A).
The second model was similar to the first, containing
six of seven validated factors including trans-lycopene
(Model B). The third multivariate model contained six
of seven validated factors but omitted socio-economic
factors, such as SES, education and occupation
(Model C). The total number of participants available
in the combined testing and training surveys in
Model A was 7381 (733 deaths). The total number
of participants available for Models B and C was
5155 (416 deaths).
The total variance explained by models A, B and C
was 14.4, 13.2 and 11%, respectively. The variance
explained by the tentatively validated environmental
and behavioural factors in these models was 1.6, 2.1
and 2.3%, respectively. Thus, models not including
trans-lycopene but built on more complete data
were not inferior versus models including trans-lycopene. Moreover, models that did not consider socioeconomic factors had a modestly lower R2 than those
that did (13 versus 11%). The contribution of environmental and behavioural factors was similar in
a
Past/current smoker, serum cadmium, total physical activity, anyone smoke in home?, trans-lycopene,
age, sex, race/ethnicity
Model C variables
MET, Metabolic equivalent.
a
CI computed by bootstrap.
Past/current smoker, serum cadmium, total physical activity, anyone smoke in home?, trans-lycopene, age,
sex, race/ethnicity, education, SES, education, occupation
0.023 [0.015,0.030]a
0.11 [0.102, 0.129]a
Model B variables
0.021 [0.012, 0.028]
a
0.132 [0.111, 0.1438]
4.61x105
Past/current smoker, serum cadmium, total physical activity, anyone smoke in home?, age, sex, race/ethnicity,
education, SES, education, occupation
a
5,155 (416)
0.84[0.77,0.91]
3.02x105
1.98x103
1.11x1016
Model A variables
0.144 [0.127, 0.156]
a
1.76[1.23,2.51]
4.85x10
0.65[0.59,0.72]
0.017
0.80
P-value
0.113
0.016 [0.009, 0.022]
2
5,155 (416)
0.84[0.77,0.91]
1.69[1.17,2.44]
16
1.26[1.04,1.52]
0.92[0.5,1.7]
Multivariate HR
[95% CI]
1.28[0.94,1.73]
3
9.99x10
0.021
0.775
P-value
0.013
Model C
Nagelkerke R (full-reduced)
Nagelkerke R
7,381 (733)
n (number of events)
2
.
Trans-lycopene (1 SD of log)
.
7.30x10
0.67[0.6,0.74]
4
1.24[1.03,1.49]
14
0.91[0.49,1.71]
2.88x104
1.67[1.24,2.24]
Model B
Multivariate
HR [95% CI]
1.27[0.94,1.71]
0.655
Does anyone smoke in home?
Serum cadmium (1 SD of log)
3.06x10
1.24[1.11,1.4]
Current Smoker (vs. Never smoker)
P-value
1.49x103
Total physical activity (MET-based rank) 0.73[0.68,0.79]
1.1[0.72,1.67]
Past smoker (vs. never Smoker)
Model A
Multivariate
HR [95% CI]
1.36[1.12,1.64]
Table 5 Multivariable model coefficients
1806
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY
models B and C. On the other hand, current smoking
(P 40.7 in Models A–C, Table 5) and past smoking
(Models B–C, Table 5) lost nominal significance
(P 40.05) in multivariable models, indicating the correlative nature of the tentatively validated factors.
Discussion
Out of the 249 tested environmental and behavioural
factors, we found that only physical activity, smoking
and cadmium levels have consistent evidence for
strong and validated associations with all-cause mortality. Some other factors might have been missed due
to limited power. This suggests that the study of putative environmental and behavioural risk factors that
regulate all-cause mortality at the general population
level will require very large studies and careful validation. Given the small effect sizes of validated factors,
continuing to perform modest-size studies with selective reporting of a few putative risk factors is unlikely
to yield reliable and conclusive answers. Tentatively
validated factors accounted for approximately 2% of
the risk variance when demographic factors were
accounted for, and this decreased little when socioeconomic factors (income, education and occupation)
were also accounted for. This suggests that little of
the impact of these two modifiable behaviours out
of 249 examined is explained by the few measured
socio-economic forces that possibly influence physically inactivity or smoking.
Whereas we did not subject the socio-economic factors to validation, the volcano plot (Figure 2) shows
that descriptively the two lowest quintiles of income
were strongly associated with mortality, with a hazard
ratio larger than any of the individual environmental
and behavioural factors tested. This relation specific to
the two lowest levels of income and mortality is consistent with prior work done on an earlier wave of
NHANES14 and thus should be further investigated,
given the out-of-sample replication and strength of
association we observe.
Reassuringly, we were able to elicit well-known associations between smoking and physical activity and
all-cause mortality. Our estimates of increased risk
with current and past smoking are very similar to
those of a recent meta-analysis where relative risks
were 1.83 for current smokers and 1.34 for past smokers.32 Our protective estimates from physical activity
are also similar to those identified by a recent large
meta-analysis of 80 cohorts.33 Physical activity34 and
current smoking35,36 are associated with average increase of 3–4 and decrease of 10 years in life expectancy, respectively, and physical inactivity and
smoking are thought to be each responsible for approximately five million deaths worldwide. Our multivariable analysis also reiterates the combined effect of
behavioural/environmental risk factors on mortality.37
Nutrition and nutrition quality has also been connected with mortality risk. We found a marker for a
1807
carotenoid nutrient (trans-lycopene) associated with
all-cause mortality. Several observational studies
have found correlations between carotenoid levels
and mortality among elderly, for example among
women38 and among Italian individuals.39 In
NHANES III, Shardell and colleagues observed a
modest decrease in mortality for 2nd and 3rd quartiles of lycopene.40 However, interventions focusing
on carotenoid-related nutrients have not shown any
benefits in clinical trials for prevention of chronic disease and cause-specific death (e.g ref41). Further, a
randomized trial of a ‘tomato-rich diet’ containing
high amounts of lycopene failed to change chronic
disease risk profiles of 255 UK-based participants.42
Some investigations have suggested harm.43
Therefore, trans-lycopene may be a surrogate marker
of other ‘healthy’ behaviours and possibly of a
‘healthy diet’ profile. It is unclear which measured
or unmeasured correlate of trans-lycopene levels
may be responsible for the association with mortality
risk. Further, what exactly constitutes a ‘healthy diet’
is currently very difficult to define, in contrast to earlier claims.44 We have previously documented a large
correlation matrix of environmental factors,10 and
further studies should investigate how nutritional
and other environmental and behavioural factors
relate to one another8 to potentially trace sources of
bias and harness confounding.
We found that urinary and serum cadmium were
also associated with all-cause mortality. Tellez-Plaza
et al. have reported similar results in these participants for blood and urinary cadmium on both
all-cause and cardiovascular-related mortality, while
adjusting for many other cardiovascular-related risk
factors including smoking, cholesterol, blood pressure
and medication use.45 Further studies will need to
evaluate the relationship of behaviours that lead to
cadmium exposure and all-cause or cause-specific
mortality. For example, serum cadmium is postulated
to indicate current exposure whereas urinary cadmium may reflect total body burden of cadmium,
but urinary cadmium is reflective of serum levels.46
Serum cadmium levels increase as humans age, and
sources include ambient air pollution (through fossil
fuel combustion), diet and smoking.46 Cadmium
levels were significantly correlated with smoking
and age; however, the association of death risk with
serum levels of cadmium was significant in the multivariable models even after smoking had been
accounted for.
Our analysis on all-cause mortality has several
limitations. First, to consider multiple factors in systematic and standard fashion, we had to make assumptions about what covariates to adjust for in our
initial scan and replication procedure. Investigators
may consider a different set of adjusting covariates
specific for each factor; however, it is unclear how
to attain a ‘standard’ set of covariates. We focused
on a set of demographic factors (age, sex and
1808
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
race/ethnicity) that are impossible to modify, and on a
set of socio-economic factors (income, education, occupation) that are very difficult for individuals to
modify (although they are amenable to social and
other multi-level interventions). Second, not all participants in NHANES have available measurements on
the entire set of all factors assayed; thus, it is not
possible to subject the scan to the same number of
participants for each environmental and behavioural
factor considered. This type of non-random missing
information may lead to biased findings, as the subsamples may not be representative of the larger
sample. Although we did not detect any differences
in population demographics of sub-samples, we acknowledge that missing data could have led to loss
of power. Third, along these lines, our power calculations based on the available data with non-missing
information suggest that we had power to detect
modest relative risks, but many small effects could
easily have been missed. The need to understand
small effects requires a recalibration of our thinking
about risk-factor epidemiology, with emphasis on very
large studies and careful replication. For small effects,
differentiating noise from genuine signals is difficult.
Fourth, residual confounding is always possible in any
observational associations, even those that are seemingly consistent and validated in different datasets.
Fifth, our data had a relative short follow-up in both
the training (median 6 years) and testing (median 3
years) surveys and lacked repeated measurements of
factors to assess the longer-term risk of these factors
on mortality through time. We emphasize that in an
investigation of non-institutionalized people in the
general population, many environmental and behavioural factors will require longer exposure and followup times to detect associations with mortality.
Whereas factors found through this study have
strong evidence for association with all-cause mortality, we cannot rule out important factors not identified by these methods or these data. Further, the
deceased participants considered here are older individuals (68 and 73 years mean age of deceased participants in the 1999–2002 and 2003–04 surveys,
respectively), many of whose cause of death included
chronic cardiovascular-related disease, such as heart
disease.
Relatedly, many environmental factors, such as infectious agents, are only applicable to a small subset
of the population, have lower prevalence and/or play a
role in cause-specific mortality such as cancer.
Therefore, a systematic scan of factors in a general
population will be underpowered to detect these putative associations in the context of all-cause mortality. Specifically, the top causes of death in the general
population included cardiovascular-related diseases
(e.g. ischaemic heart disease, stroke and myocardial
infarction) and the findings may only be pertinent to
aetiologies of these diseases. On the other hand, factors with larger effect sizes but higher FDR (outliers
on the volcano plot) can be noted for further investigation. For example, outliers in this study included
hepatitis B and C, factors that may cause liver
cancer.47 We emphasize that factors that are not top
findings in such a scan may still play a large role in
mortality risk, albeit in smaller sub-populations.
Sixth, we cannot claim that our systematic scan of
environmental and behavioural factors in NHANES
covers the entire space of the ‘exposome’.48 The CDC
and NCHS have selected an array of behavioural and
environmental factors based on their prevalence,
measurement feasibility and hypothesized influence
on population health. Furthermore, unlike static
genetic factors, there is heterogeneity in exposure or
self-reported factor ascertainment and exposures/
behavioural factors will follow unique temporal
patterns throughout an individual’s lifetime.49 For example, factors such as pollutants (e.g. polychlorinated
biphenyls) are lipophilic and persistent in fatty tissue
and are accrue in tissue over time.10 Other factors,
such as bisphenol A, are metabolized rapidly, are
short-lived and assume that individuals are continuously exposed to the factor (e.g. ref 50). Further, the
relationships between the biomarkers and actual exposure are also difficult to surmise due to issues of
sample timing and differential elimination. Selfreported dietary factors derived from a single point
in time can be error-prone51,52 and there are documented examples of lack of concordance with objective indicators of intake.53,54 As a result of lack of
comprehensive measures and heterogeneity, our systematic scan will have missed other candidate factors
putatively associated with mortality risk.
Acknowledging these caveats, we have shown a generalized and systematic approach to identify strong
and validated correlates of all-cause mortality and prioritize hypotheses regarding the association between
environmental and behavioural factors and mortality.
Instead of focusing on a few putative risk factors at a
time, our approach gives a wider perspective about
the strength of the evidence (or lack thereof) and
the impact of a wide array of putative risk factors
that may be possible to modify.
Supplementary Data
Supplementary data are available at IJE online.
Funding
This work was supported by the National Heart, Lung,
and Blood Institute [T32 HL007034] to C.J.P., and the
National Institute of Diabetes and Digestive Diseases
[K24 DK085446] to G.M.C. and [K23 DK089086] to
J.T.L.
Conflict of interest: None declared.
ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY
1809
KEY MESSAGES
Identification of environmental and behavioural factors associated with mortality is critical for public
health and preventive care. However, there are few investigations that systematically search for
associations between environmental and behavioural factors and all-cause mortality.
Here, we systematically associate 249 environmental and behavioural factors, such as urineary or
serum markers of environmental exposure and self-reported nutrients, with time-to-death, and were
able to tentatively validate five factors robustly associated with mortality.
Instead of focusing on a few risk factors at a time, our approach gives a wider perspective about the
strength of the evidence (or lack thereof) and the impact of a wide array of risk factors that may be
possible to modify.
References
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
McGinnis JM, Foege WH. Actual causes of death in the
United States. JAMA 1993;270:2207–12.
Mokdad AH, Marks JS, Stroup DF, Gerberding JL. Actual
causes of death in the United States, 2000. JAMA 2004;
291:1238–45.
Danaei G, Ding EL, Mozaffarian D et al. The preventable
causes of death in the United States: comparative risk
assessment of dietary, lifestyle, and metabolic risk factors. PLoS Med 2009;6:e1000058.
Wild CP. The exposome: from concept to utility. Int J
Epidemiol 2012;41:24–32.
Schwartz D, Collins F. Medicine. Environmental biology
and human disease. Science 2007;316:695–96.
Ioannidis JP. Why most discovered true associations are
inflated. Epidemiology 2008;19:640–48.
Ioannidis JPA. Why Most Published Research Findings
Are False. PLoS Med 2005;2:e124.
Ioannidis J, Loy EY, Poulton R, Chia KS. Researching
Genetic Versus Nongenetic Determinants of Disease: A
Comparison and Proposed Unification. Sci Transl Med
2009;1:8.
Patel CJ, Bhattacharya J, Butte AJ. An EnvironmentWide Association Study (EWAS) on type 2 diabetes mellitus. PLoS One 2010;5:e10746.
Patel CJ, Cullen MR, Ioannidis JP, Butte AJ. Systematic
evaluation of environmental factors: persistent pollutants
and nutrients correlated with serum lipid levels. Int J
Epidemiol 2012;41:828–43.
Tzoulaki I, Patel CJ, Okamura T et al. A nutrient-wide
association study on blood pressure. Circulation 2012;
126:2456–64.
Fillenbaum GG, Burchett BM, Blazer DG. Identifying a
national death index match. Am J Epidemiol 2009;170:
515–18.
Adler NE, Rehkopf DH. U.S. disparities in health: descriptions, causes, and mechanisms. Annu Rev Public Health
2008;29:235–52.
Rehkopf DH, Berkman LF, Coull B, Krieger N. The nonlinear risk of mortality by income level in a healthy population: US National Health and Nutrition Examination
Survey mortality follow-up cohort, 1988-2001. BMC
Public Health 2008;8:383.
Ainsworth BE, Haskell WL, Whitt MC et al. Compendium
of physical activities: an update of activity codes and MET
intensities. Med Sci Sports Exerc 2000;32(Suppl 9):
S498–504.
16
17
18
19
20
21
22
23
24
25
26
27
US Department of Health and Human Services. 2008
Physical Activity Guidelines for Americans. Available from:
http://www.health.gov/paguidelines/pdf/paguide.pdf (15
January 2013, date last accessed).
Blanton CA, Moshfegh AJ, Baer DJ, Kretsch MJ. The
USDA Automated Multiple-Pass Method accurately estimates group total energy and nutrient intake. J Nutr
2006;136:2594–99.
U.S. Department of Agriculture, Agricultural Research
Service, Beltsville Human Nutrition Research Center
et al. What We Eat in America, NHANES 2003-2004.
Beltsville, MD: Beltsville Human Nutrition Research
Center; Available from: ftp://ftp.cdc.gov/pub/Health_
Statistics/nchs/nhanes/2003-2004/DR1TOT_C.XPT
(15
January 2013, date last accessed).
U.S. Department of Agriculture, Agricultural Research
Service, Beltsville Human Nutrition Research Center
et al. What We Eat in America, NHANES 2001-2002.
Beltsville, MD: Beltsville Human Nutrition Research
Center; Available from: ftp://ftp.cdc.gov/pub/Health_
Statistics/nchs/nhanes/2001-2002/DRXTOT_B.XPT
(15
January 2013, date last accessed).
U.S. Department of Agriculture, Agricultural Research
Service, Beltsville Human Nutrition Research Center
et al. What We Eat in America, NHANES 1999-2000.
Beltsville, MD: Beltsville Human Nutrition Research
Center; Available from: ftp://ftp.cdc.gov/pub/Health_
Statistics/nchs/nhanes/1999-2000/DRXTOT.XPT
(15
January 2013, date last accessed).
Flegal KM, Graubard BI, Williamson DF, Gail MH.
Cause-specific
excess
deaths
associated
with
underweight, overweight, and obesity. JAMA 2007;298:
2028–37.
Therneau T. A Package for Survival Analysis in S., R package
version 2.36-14. 2012.
Lumley T. Survey: Analysis of Complex Survey Samples. R
package version 3.14; 2009.
StataCorp. Stata Statistical Software: Release 10. 10th edn.
College Station, TX: StataCorp LP, 2007.
Patel CJ, Chen R, Kodama K, Ioannidis JP, Butte AJ.
Systematic identification of interaction effects between
genome- and environment-wide associations in type 2
diabetes mellitus. Hum Genet 2013;132:495–509.
Efron B. Large-Scale Inference. Cambridge, UK: Cambridge
University Press, 2010.
Gordon A. Classification. 2nd edn. Boca Raton, FL:
Chapman and Hall/CRC, 1999.
1810
28
29
30
31
32
33
34
35
36
37
38
39
40
41
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
Qiu W, Chavarro J, Lazarus R, Rosner B, Ma J.
powerSurvEpi: Power and sample size calculation for survival analysis of epidemiological studies; R package version 0.0.6; 2012.
Hsieh FY, Lavori PW. Sample-size calculations for the Cox
proportional hazards regression model with nonbinary
covariates. Control Clin Trials 2000;21:552–60.
Schoenfeld DA. Sample-size formula for the proportionalhazards regression model. Biometrics 1983;39:499–503.
Davison A, Hinkley D. Bootstrap Methods and Their
Application. Cambridge, UK: Cambridge University Press,
1997.
Gellert C, Schottker B, Brenner H. Smoking and all-cause
mortality in older people: systematic review and metaanalysis. Arch Intern Med 2012;172:837–44.
Samitz G, Egger M, Zwahlen M. Domains of physical
activity and all-cause mortality: systematic review and
dose-response meta-analysis of cohort studies. Int J
Epidemiol 2011;40:1382–400.
Moore SC, Patel AV, Matthews CE et al. Leisure time
physical activity of moderate to vigorous intensity and
mortality: a large pooled cohort analysis. PLoS Med
2012;9:e1001335.
Jha P, Ramasundarahettige C, Landsman V et al. 21stcentury hazards of smoking and benefits of cessation in
the United States. N Engl J Med 2013;368:341–50.
Pirie K, Peto R, Reeves GK, Green J, Beral V. The 21st
century hazards of smoking and benefits of stopping: a
prospective study of one million women in the UK. Lancet
2013;381:133–41.
Loef M, Walach H. The combined effects of healthy lifestyle behaviours on all cause mortality: a systematic
review and meta-analysis. Prev Med 2012;5:163–70.
Nicklett EJ, Semba RD, Xue QL et al. Fruit and vegetable
intake, physical activity, and mortality in older community-dwelling women. J Am Geriatr Soc 2012;60:862–68.
Lauretani F, Semba RD, Dayhoff-Brannigan M et al. Low
total plasma carotenoids are independent predictors of
mortality among older persons: the InCHIANTI study.
Eur J Nutr 2008;47:335–40.
Shardell MD, Alley DE, Hicks GE et al. Low-serum carotenoid concentrations and carotenoid interactions predict
mortality in US adults: the Third National Health and
Nutrition Examination Survey. Nutr Res 2011;31:178–89.
MRC/BHF Heart Protection Study of antioxidant vitamin
supplementation in 20,536 high-risk individuals: a randomised placebo-controlled trial. Lancet 2002;360:23–33.
42
43
44
45
46
47
48
49
50
51
52
53
54
Thies F, Masson LF, Rudd A et al. Effect of a tomato-rich
diet on markers of cardiovascular disease risk in moderately overweight, disease-free, middle-aged adults: a randomized controlled trial. Am J Clin Nutr 2012;95:1013–22.
Bjelakovic G, Nikolova D, Gluud LL, Simonetti RG,
Gluud C. Antioxidant supplements for prevention of mortality in healthy participants and patients with various
diseases. Cochrane Database Syst Rev 2012;3:CD007176.
Hu FB, Willett WC. Optimal Diets for Prevention of
Coronary Heart Disease. JAMA 2002;288:2569–78.
Tellez-Plaza
M,
Navas-Acien
A,
Menke
A,
Crainiceanu CM, Pastor-Barriuso R, Guallar E. Cadmium
exposure and all-cause and cardiovascular mortality in
the U.S. general population. Environ Health Perspect 2012;
120:1017–22.
Centers for Disease Control and Prevention, Agency for
Toxic Substances and Disease Registry. ATDSR–
Toxicological Profile: Cadmium. http://www.atsdr.cdc.gov/
toxprofiles/tp.asp?id¼48&tid¼15 (15 January 2013, date
last accessed).
Altekruse
SF,
McGlynn
KA,
Reichman
ME.
Hepatocellular carcinoma incidence, mortality, and survival trends in the United States from 1975 to 2005.
J Clin Oncol 2009;27:1485–91.
Rappaport SM, Smith MT. Environment and Disease
Risks. Science 2010;330:460–61.
Athersuch TJ. The role of metabolomics in characterizing
the human exposome. Bioanalysis 2012;4:2207–12.
Calafat AM, Ye X, Wong LY, Reidy JA, Needham LL.
Exposure of the U.S. population to bisphenol A and 4-tertiary-octylphenol: 2003-2004. Environ Health Perspect 2008;
116:39–44.
Briefel RR, Flegal KM, Winn DM, Loria CM, Johnson CL,
Sempos CT. Assessing the nation’s diet: limitations of the
food frequency questionnaire. J Am Diet Assoc 1992;92:
959–62.
Byers T. Food frequency dietary assessment: how bad is
good enough? Am J Epidemiol 2001;154:1087–88.
Brown D. Do food frequency questionnaires have too
many limitations? J Am Diet Assoc 2006;106:1541–42.
Schatzkin A, Kipnis V, Carroll RJ et al. A comparison of
a food frequency questionnaire with a 24-hour recall
for use in an epidemiological cohort study: results
from the biomarker-based Observing Protein and
Energy Nutrition (OPEN) study. Int J Epidemiol 2003;32:
1054–62.