Published by Oxford University Press on behalf of the International Epidemiological Association ß The Author 2013; all rights reserved. International Journal of Epidemiology 2013;42:1795–1810 doi:10.1093/ije/dyt208 Systematic evaluation of environmental and behavioural factors associated with all-cause mortality in the United States National Health and Nutrition Examination Survey Chirag J Patel,1 David H Rehkopf,2 John T Leppert,3 Walter M Bortz,4 Mark R Cullen,2 Glenn M Chertow4 and John PA Ioannidis1* 1 Stanford Prevention Research Center, Stanford University School of Medicine, CA, USA 2Division of General Medical Disciplines, Stanford University School of Medicine, CA, USA 3Department of Urology, Stanford University School of Medicine, CA, USA and 4 Division of Nephrology, Department of Medicine, Stanford University School of Medicine, CA, USA *Corresponding author. Stanford Prevention Research Center, Stanford University School of Medicine, 1265 Welch Rd, Stanford 94305, CA, USA Email: [email protected] Accepted 4 September 2013 Background Environmental and behavioural factors are thought to contribute to all-cause mortality. Here, we develop a method to systematically screen and validate the potential independent contributions to allcause mortality of 249 environmental and behavioural factors in the National Health and Nutrition Examination Survey (NHANES). Methods We used Cox proportional hazards regression to associate 249 factors with all-cause mortality while adjusting for sociodemographic factors on data in the 1999–2000 and 2001–02 surveys (median 5.5 follow-up years). We controlled for multiple comparisons with the false discovery rate (FDR) and validated significant findings in the 2003–04 survey (median 2.8 follow-up years). We selected 249 factors from a set of all possible factors based on their presence in both the 1999– 2002 and 2003–04 surveys and linkage with at least 20 deceased participants. We evaluated the correlation pattern of validated factors and built a multivariable model to identify their independent contribution to mortality. Results We identified seven environmental and behavioural factors associated with all-cause mortality, including serum and urinary cadmium, serum lycopene levels, smoking (3-level factor) and physical activity. In a multivariable model, only physical activity, past smoking, smoking in participant’s home and lycopene were independently associated with mortality. These three factors explained 2.1% of the variance of all-cause mortality after adjusting for demographic and socio-economic factors. Conclusions Our association study suggests that, of the set of 249 factors in NHANES, physical activity, smoking, serum lycopene and serum/urinary cadmium are associated with all-cause mortality as identified in previous studies and after controlling for multiple hypotheses and validation in an independent survey. Whereas other NHANES factors may be associated with mortality, they may require larger cohorts with longer time of follow-up to detect. It is possible to use a systematic association study to prioritize risk factors for further investigation. Keywords All-cause mortality, exposure, behaviour, environment-wide association study 1795 1796 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Introduction Identification of environmental and behavioural factors associated with mortality is critical for public health and preventive care. Many of these factors may be possible to modify, as opposed to genetic and demographic factors (age, sex, race/ethnicity) that are impossible to change and socio-economic factors (e.g. income, education and occupation) that are very difficult to change. McGinnis, Foege, Mokdad et al. identify behavioural and environmental risk factors as ‘actual causes of deaths in the United States’, requiring as much attention and response as standard proximate clinical conditions.1,2 One way to ascertain and compare environmental and behavioural risks for mortality is to integrate data from national health surveys linked with mortality registries.2,3 There is a large body of literature on studies that try to identify environmental factors and behaviours that may increase or decrease death risk. However, these studies typically assess and report one or a few factors at a time, and may lack systematic validation in independent datasets. Modern humans are now exposed to a complex array of environmental and behavioural factors4,5 and in theory many behaviours may entail health risks and benefits. However, there is a lack of analytic strategies that aim to decipher concurrently how multiple environmental and behavioural factors are associated with mortality. Further, potential environmental exposure and behavioural risk may be modified or determined by demographic attributes, such as sex, race/ethnicity and socio-economic status. Lack of standardization in the analysis may lead to inflated or spurious irreproducible effects.6,7 This is in contrast to current-day genome-wide association studies (GWAS), a systematic analytic strategy to correlate millions of common genetic factors with disease traits.8 These investigations have resulted in a robust literature of genetic findings in contrast to environmentally- or behaviourally-based investigations.8 We have recently developed methods for environment-wide association study (EWAS), aiming to search for and validate environmental factors associated with disease and disease-related phenotypes.9–11 Here, we extend this methodology to systematically evaluate the associations of 249 environmental and behavioural factors, such as blood and urine biomarkers of exposure (e.g. pollutants and nutrients), and behavioural factors (e.g. physical activity, smoking and alcohol consumption), with all-cause mortality. We analyze the association of 249 factors on all-cause mortality using information collected from participants of the 1999–2002 United States National Health and Nutrition and Examination Survey (NHANES) with linked mortality information ascertained by the National Death Index (NDI) in 2006. We subsequently validate findings in an independent survey, 2003–04 NHANES. Last, we evaluate the correlation pattern between tentatively validated factors and identify those that have independent effects on all-cause mortality and how these interplay with demographic and socio-economic attributes. Methods NHANES 1999–2000, 2001–02 and 2003–04 We downloaded NHANES laboratory, questionnaire and National Death Index (NDI) linked mortality data for 1999–00, 2001–02 and 2003–04 surveys. Mortality information was collected from the date of the survey participation through 31 December 2006 and ascertained via a probabilistic match between NHANES and NDI death certificate information. The NDI matches individuals on personal and demographic criteria, such as social security number and date of birth, and its performance has been described elsewhere (e.g. ref 12). Overall, 9555, 11 021, and 10 100 participants were followed in the 1999–2000, 2001–02 and 2003–04 surveys, respectively, with 611, 470 and 276 assumed death events, respectively. We used the 1999–2000 and 2001–02 surveys to scan for factors associated with all-cause mortality (‘training’ dataset) and reserved the 2003–04 survey to replicate findings from the training set. Factors such as age, sex, race/ethnicity, educational attainment, occupation and income are hypothesized to be associated with both mortality and environmental/behavioural factors and we estimated their association with mortality.13 Further, these sociodemographic factors may also confound associations of environmental/behavioural factors with death. In NHANES, race/ethnicity was coded as Non-Hispanic White (‘White’), Mexican American (‘Mexican’), Non-Hispanic Black (‘Black’), Other Hispanic and Other. We coded educational attainment as less than high school, high school equivalent and greater than high school education. We estimated socioeconomic status (SES) as the categorical quintile of income/poverty index as previously described.9,10 We estimated occupation in categories corresponding to white-collar and professional (reference group), white-collar and semi-routine (e.g. technicians), blue-collar and high-skill (e.g mechanics, construction trades and military) and blue-collar and semi-routine (e.g. personal services, farmworkers) as previously described.14 Figure 1 depicts our procedure. We assessed a total of 249 environmental and behavioural factors, see Table 1 and Supplementary Table S1 (available as Supplementary data at IJE online). These factors were either (i) information on behaviours, such as self-reported dietary intake (from a food frequency questionnaire), self-reported alcohol consumption, self-reported smoking, body mass index (BMI) from a physical examination or self-reported physical activity; or (ii) physical/chemical biomarkers of external exposures measured in serum or urine, such as ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY 1797 Figure 1 Methodology to scan for environmental and behavioural factors associated with mortality. (A) Summary of environmental and behavioural variables in three independent NHANES surveys (1999–2000, 2001–02, 2003–04). (B) Training (combined 1999–2000 and 2001–02 surveys) and testing survey information. (C) Associating each 249 variables with all-cause mortality (SES, socio-economic status estimate, quintile of income/poverty ratio). (D) Empirical false discovery rate (FDR) estimation in training surveys. (E) Proportional hazards assumption verification. (F) Tentative validation (P <0.05 in testing surveys). (G) Estimation of variance explained by tentatively validated factors with independent contribution and interaction with demographic variables blood lead concentration. Table 1 shows examples of factors and Table S1 (available as Supplementary data at IJE online) provides a listing of all factors. There were a total of 416, 467 and 574 factors in the 1999– 2000, 2001–02 and 2003–04 surveys, respectively. Next, from these 406, 457 and 564 factors, we identified a total of 347 that were present in all three surveys. Of these 347 factors, we found 249 that could be linked with at least 20 deceased participants in the training (1999–2000 and 2001–02 surveys) and testing (2003– 04) datasets independently (Figure 1A, B). Behavioural factors included three surveying alcohol consumption, one on ‘street drug’ use, 58 factors on food and nutrient consumption, 23 on smokingrelated behaviours [e.g. ‘current or past smoker? (versus never smoker)’, ‘does anyone in your household smoke (yes/no)?’)] one on physical activity and three on social support (e.g. ‘have anyone to help?’, 1798 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Table 1 Number and examples of environmental and behavioural factors Factor category Behavioural factors: Alcohol use No. 3 Examples Drink five per day (yes/no)? Quantity drinks per day (ordinal) Personal smoking 19 Current or past smoker (referent: no smoking) Smoke cigars 20 times in life (yes/no)? Family smoking 4 Does anyone smoke in the home? Cotinine 1 Serum levels of nicotine metabolite (log and per 1 SD) Physical activity 1 Health.gov guideline activity levels (ordinal) Social support 3 Anyone to help (yes/no)? Street drug use 1 Ever used cocaine or street drugs (yes/no)? Body mass index 4 <18.5 kg/m2, or Total cigarette smokers in home (ordinal) First-degree support (yes/no)? 525 and <30 kg/m2, or 530 and <35 kg/m2, or 535 kg/m2 (referent: 518.5 and <25 kg/m2) Food nutrient recall 58 Dietary nutrient intake levels derived from Food frequency questionnaire (FFQ) (continuous and adjusted for caloric intake) Environmental Factors (serum- and urine-based): Bacterial infection 2 MRSA 1 present (yes/no) S. aureus present (yes/no) Viral infection 5 Hepatitis B antibody (yes/no) Hepatitis A antibody (yes/no) Diakyl 6 Urinary dimethylphosphate (log per 1 SD) Dioxins 7 2,3,7,8-tetrachlorodibenzodioxin (log and per 1 SD) Furans 10 2,3,7,8-tetrachlorodibenzofuran (log and per 1 SD) Heavy metals 15 Urinary cadmium (log and per 1 SD) Hydrocarbons 21 Urinary 1-hydroxyfluorene (log and per 1SD) Nutrients and minerals 15 Serum folate (log and per 1 SD) Serum cadmium (log and per 1 SD) Serum vitamin D (log and per 1 SD) Polychlorinated biphenyls 34 Serum (polychlorinated biphenyls) PCB170 (log and per 1 SD) Pesticides 22 Serum heptachlor epoxide (log and per 1 SD) Phthalates 12 Urinary mono-n-butyl phthalate (log and per 1 SD) Phytoestrogens Total 6 Urinary enterolactone (log and per 1 SD) 249 ‘how many close friends do you have?’). We discuss these variables in the following. First, the three factors on alcohol consumption included five or more drinks per day, number of drinks per day in last month [z-standardized (divided by the population standard deviation to facilitate comparison of effects) ordinal factor] and how many total days drinking per year (z-standardized ordinal factor). The 23 smoking factors included four regarding family smoking behaviour and 19 on personal smoking behaviour. The four family smoking behaviour factors included any smokers in the household (referent group: no smokers in household), total number of cigarette smokers in the household (z-standardized ordinal factor) and the total number of cigarettes smoked at home (z-standardized ordinal factor). ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY The 18 factors regarding personal behaviour included a categorical factor on current or past smoking (analyzed as a two-level categorical factor with never smoking as a referent) and four on ever-used cigars, chewing tobacco, snuff and pipes (referent group: never smoked the item). Specifically for current and past smokers, factors included the number of cigarettes smoked just before quitting (z-standardized ordinal factor), how many years smoked (z-standardized ordinal factor), number of cigarettes currently smoking (z-standardized ordinal factor), the average number of cigarettes smoked per day in the past month (z-standardized ordinal factor) and an estimated nicotine, tar and carbon monoxide content of smoked item (z-standardized ordinal factors). Other factors for current smokers included years since started smoking (z-standardized ordinal variable). Physical activity was estimated by deriving metabolic equivalents for self-reported leisure and normal-time activities15 and treated as an ordinal factor based on Health.gov physical activity guideline categories for no aerobic activity, low activity (medium intensity activity greater than baseline but fewer than 150 min/week), moderate activity (150 to 300 medium intensity min/ week) and high activity (4300 min medium intensive activity per week or 4150 min high intensity per week) as previously described.10,16 The 58 self-reported food and nutrient consumption factors were determined from one in-person 24-h interview (1999–2000, 2001–02) or two 24-h (2003– 04) in-person and telephone interviews using the United States Department of Agriculture and Department of Health and Human Services food recall questionnaires.17–20 These food and nutrient consumption factors were linearly adjusted by total caloric intake and z-standardized. We considered BMI as another behavioural four-level categorical factor. We divided BMI into five categories as previously described,21 including <18.5 kg/m2, 518.5 and <25 kg/m2, 525 and <30 kg/m2, 530 and <35 kg/m2, and 535 kg/m2. The 518.5 and <25 kg/m2 category was the reference group. The 156 factors were serum or urine-based measures of environmental exposure, including infectious agents, environmental chemicals and nutrients. Broadly, these included a serum marker of nicotine metabolism (cotinine), dioxins (n ¼ 7 markers), furans (n ¼ 10), heavy metals (n ¼ 15), hydrocarbons (n ¼ 21), nutrients (n ¼ 15), polychlorinated biphenyls (n ¼ 34), pesticides (n ¼ 22), phthalates (n ¼ 12), oestrogenic compounds (n ¼ 6), bacterial (n ¼ 2) and viral organisms (n ¼ 6). With the exception of assays detecting infectious agents (which were positive/negative assays), factors were continuous in scale. Continuous biomarker factors that had a rightskewed distribution were log-transformed and z-standardized as previously described.9,10 Different measures of environmental and behavioural factors had different numbers of eligible participants 1799 for mortality follow-up assessment (Figure 1B). In the training surveys (1999–2002), there were 330– 6008 eligible participants (with 26–655 death events). For the replication survey (2003–04), there were 177–3258 eligible participants (with 20–202 deaths) (Supplementary Table S1, available as Supplementary data at IJE online). We used the R-project survival and survey library for all analyses and accounted for clusters pseudo-strata, pseudo-sampling units and participant weights to accommodate the complex sampling of the data.22,23 Estimates were verified with STATA.24 Systematic scan of environmental and behavioural factors associated with all-cause mortality We associated each of the 249 factors to all-cause mortality serially using proportional hazards (Cox) regression, while adjusting for sociodemographic attributes described above, including age, sex, an estimate of SES (categorical quintiles of poverty to income ratio), educational attainment, occupation and race/ ethnicity in the training surveys, the 1999–2002 NHANES (the ‘training’ step, Figure 1C). We used the FDR to correct for multiple hypotheses as described previously9–11 (Figure 1D). The FDR is the estimated proportion of the false discoveries made over the number of total discoveries made at a given significance level. We used a permutation simulation method to estimate the numerator, the number of false positives incurred at a significance threshold as documented earlier.9,11,25 Specifically, to estimate the expected number of false positives, we permuted the censorship and follow-up time variable within each stratum of the survey; in other words, participants were randomly assigned mortality status. Then, we re-ran survival analyses for each of the 249 factors. We repeated this process 100 times to attain a distribution of P-values drawn from the null distribution. The permutation method accounts for the correlation amongst factors.26 We set an FDR threshold of 5% to identify findings in the training step for validation in the testing survey. For each factor that passed the FDR threshold in the training step, we assessed violation of proportional hazards by examining interaction between the factor and follow-up time. We deemed a factor tentatively validated if it had achieved FDR <5% significance in the training scan (1999–2002 surveys) and achieved nominal statistical significance in the test (2003–04) survey (P-value <0.05, Figure 1D–F). For validated findings, we computed an overall adjusted hazard ratio (referred to as ‘overall HR’) by combining both the training and testing survey datasets (Figure 1F). We verified whether the validated factors violated the proportional hazards assumption by checking their interaction with followup time. We did not have evidence that any of these factors significantly violated the assumption (P40.05). We assessed the non-parametric correlations among factors that had an FDR < 5% in the training step, 1800 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY specifically bi-serial correlations between binary factors and Spearman correlations when considering quantitative factors. We visualized these pairwise correlations in a heat map and arranged the factors using a hierarchical clustering algorithm27 as previously described.11 We computed the power for detection of factors at P-value corresponding to FDR <5% (equivalent to P ¼ 0.0003) for sample sizes corresponding to each factor tested at a range of adjusted HR of (1.1, 1.3, 1.5, 1.7 and 1.9) with the powerSurvEpi R library.28 Specifically, this library implements methods that take into account the correlation among the factor and adjustment co-variates29,30 sample size and number of death events to estimate power at a given P-value threshold and HR. We then estimated how many factors we would detect if every one of the 249 were associated with all-cause mortality for FDR <5% (P <0.0003) and each HR above by totalling the power estimations for each factor tested (Supplementary Table S2, available as Supplementary data at IJE online). At HRs of 1.1, 1.3, 1.5, 1.7 and 1.9, we estimated we would find 7 out of 249 (3%), 120/249 (49%), 194/249 (79%), 221/249 (89%) and 233/249 (94%), respectively, if all 249 factors were associated with all-cause mortality. We concluded we were adequately powered to detect modest and large associations (HR 41.3 or HR <0.8), but not weak associations with all-cause mortality. Interaction checks with two lowest SES categories, male sex and Non-Hispanic Black race/ethnicity For tentatively validated factors, we aimed to assess their interaction with demographic and socio-economic characteristics associated with risk for allcause mortality, namely male sex, two lowest SES quintiles and Non-Hispanic Black race/ethnicity in the combined cohort (training and testing cohorts, Figure 1G). Specifically, we modelled the interactions among each of the validated findings and the three demographic factors with a multiplicative term in the Cox proportional hazards model while controlling for the remaining demographic co-variates above (age, sex, education, quintile of SES, occupation and race/ ethnicity). As one example, the interaction between serum cadmium exposure (‘X’) and male sex would have been modelled as: log(HR) ¼ b1 * X þ b2 * male þ b3 * X * male þ other adjustment covariates (age, race/ethnicity, education, SES, occupation). We assessed whether inclusion of the interaction term (b3) was significant at the Bonferroni level of significance after considering 7 times 3 interaction tests (P < 0.05/21 ¼ 0.002). Variance explained of validated factors To estimate the additive effects and overall variance explained by identified factors, we built three multivariable models that included tentatively validated factors (Figure 1G). The first models contained tentatively validated factors in addition to age, sex, quintiles of SES, education and occupation as defined above. The third model contained tentatively validated factors in addition to age, sex and race/ethnicity but excluded socio-economic factors. We hypothesized that the socio-economic factors may influence some of the environmental and behavioural factors. Under this hypothesis, the strength of the associations of the environmental/behavioural factors might be stronger in a model without socio-economic co-variates (SES, education and occupation) versus models with socioeconomic factors. We computed the Nagelkerke R2 to estimate the variance explained for each model and, in addition, ascribed solely to the environmental and behavioural factors. We computed standard errors around the Nagelkerke R2 with a bootstrapping procedure that accommodated stratified data.31 Results Baseline characteristics of deceased and surviving participants in NHANES 99–02 and NHANES 03–04 There were a total of 6008 eligible participants for study in NHANES 1999–2002 with a median time to follow-up of 66 months. As expected, we found important associations among demographic characteristics and mortality, including older age [adjusted hazard ratio (HR) ¼ 2.2 (2.1, 2.4) for a 10-year increase], male sex [HR ¼ 1.7 (1.4, 2.1)], and nonHispanic Black race/ethnicity [HR ¼ 1.4 (1.1, 1.8) relative to non-Hispanic Whites]. We also observed higher risk depending on SES (as defined by quintile of income-poverty ratio). Individuals at the two lowest SES standings had greater than 2-fold risk for death [HR ¼ 2.2, (1.5, 3.6) and 2.4 (1.7, 3.4) for first and second quintiles, respectively] versus the highest SES (Table 2). Supplementary Table S1 (available as Supplementary data at IJE online) shows factors that differed among alive and deceased participants. NHANES 2003–04 was used for validation. There were a total of 3262 eligible participants for study in NHANES 2003–04 with a median follow-up of 34 months. We observed similar trends in NHANES 2003–04 (Table 3). Participants in NHANES 2003–04 had higher mean survivor age of 44.5 years and deceased mean age of 71.4 years. The adjusted HR for a 10-year increase in age was 2.6 [95% CI: 2.3, 3.0)] versus 2.2 in NHANES 1999–2002. We observed double the risk for men [adjusted HR ¼ 2.0 (1.5, 2.7)]. Limited cause of death information was available for deceased participants and was coded as International Classification of Diseases version 10 (ICD10) codes. The Center for Disease Control and Prevention (CDC)/National Center for Health Statistics (NCHS) binned ICD10 codes into 113 groups. The top five causes of death for participants in the 1999–2001 ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY 1801 Table 2 Demographic and socio-economic attributes and hazard ratios (HRs) for NHANES 1999–2002 ‘training’ samples Age Survivors n ¼ 5353a 43.45 (0.34) Deceased n ¼ 655a 68.25 (0.83) Age-adjusted HRb 2.32 [2.15,2.49] Demographic-adjusted HRb 2.24 [2.07,2.43] Male 47.9 (0.6) 52.9 (2.4) 1.56 [1.26,1.93] 1.72 [1.38,2.13] Non-Hispanic Black 10.4 (1.2) 11.4 (1.4) 1.66 [1.33,2.08] 1.40 [1.09,1.81] Mexican American 7.2 (0.9) 3.6 (0.8) 1.15 [0.80,1.64] 0.86 [0.60,1.23] Race (%): Other 4.7 (0.6) 1.6 (0.6) 0.57 [0.30,1.08] 0.54 [0.30,0.98] Other Hispanic 6.8 (1.7) 5.6 (2.2) 1.18 [0.77,1.80] 0.92 [0.59,1.43] 70.8 (1.8) 77.7 (2.4) ref ref <High school 20.8 (0.9) 37.5 (2.4) 1.42 [1.14,1.77] 1.24 [0.98,1.57] High school 25.7 (1.0) 29.2 (1.9) 1.65 [1.33,2.05] 1.23 [0.98,1.54] 4High school 53.5 (1.5) 33.2 (2.5) ref ref Non-Hispanic White Education (%): Income (quintile of income/poverty) (%): Quintile 1 16.9 (0.9) 19.2 (2.0) 2.39 [1.70,3.38] 2.32 [1.51,3.57] Quintile 2 18.5 (1.1) 33.5 (2.8) 2.47 [1.74,3.50] 2.41 [1.69,3.44] Quintile 3 19.9 (0.7) 22.0 (2.3) 1.89 [1.29,2.75] 1.76 [1.20,2.57] Quintile 4 19.6 (0.6) 13.8 (1.7) 1.68 [1.08,2.59] 1.60 [1.03,2.48] Quintile 5 25.1 (1.4) 11.4 (2.0) ref ref Blue-collar semi 38.1 (1.0) 39.1 (2.8) 1.58 [1.25,1.99] 1.18 [0.87,1.59] Blue-collar high 10.3 (0.7) 14.6 (1.8) 1.73 [1.22,2.43] 1.10 [0.78,1.55] 2.6 (0.2) 2.8 (0.7) 0.78 [0.40,1.50] 0.76 [0.37,1.58] White-collar semi 20.9 (0.8) 19.0 (1.4) 0.96 [0.77,1.19] 0.98 [0.74,1.30] White-collar high 23.5 (0.8) 19.1 (1.8) ref ref Occupation: Never worked Semi, semi-routine; high, high skill; ref, referent. a Unweighted sample size. b HR adjusted for all other demographic and socio-economic factors. surveys included the groups ‘other forms of ischaemic heart disease’ (ICD10 codes I20, I25.1–I25.9, 10% of the deceased population), ‘cerebrovascular diseases’ (ICD10: I60–I69, 8% of deceased participants), ‘other diseases’ (more than ten ICD10 groups, 7% of deceased participants), ‘malignant neoplasms of trachea, bronchus and lung’ (ICD10: C33–C34, 7%), and ‘acute myocardial infarction’ (ICD10: I21–I22, 6% of deceased participants). The top five causes of death for deceased participants in the 2003–04 survey were similar and included ‘other forms of ischaemic heart disease’ (12% of deceased participants), ‘malignant neoplasms of trachea, bronchus and lung’ (10% of deceased participants), ‘other diseases’ (8%), ‘acute myocardial infarction’ (8%) and ‘cerebrovascular diseases’ (6%). Systematic scan of environmental and behavioural factors associated with all-cause mortality We associated each of the 249 environmental and behavioural factors (self-reported or biomarkers of exposure) with all-cause mortality in turn, adjusting for age, sex, race/ethnicity, SES, occupation and educational attainment in the NHANES 1999–2002 surveys (the ‘training’ dataset). Figure 2 shows the results, visualizing the adjusted hazard ratio versus the P-value of the association. Adjusted HR denotes risk for all-cause mortality per 1 SD for continuous factors or per incremental change for ordinal values. For categorical or binary factors, adjusted hazard ratios denote risk relative to the referent category (‘negative’ for an exposure). We found 7 (out of 249) factors at FDR <5% in the training surveys (1999–2002 NHANES) and were able to tentatively validate all 7 factors in the test survey (P <0.05 in 2003–04 NHANES) (Table 4, Figure 2). The strongest association included physical activity, analyzed as an ordinal factor (representing the trend from no, low, medium and high activity as defined by Health.gov categories) with adjusted HR of 0.72 for the trend [CI: (0.66, 0.79), P-value ¼ 4 1012] in the training surveys (Figure 2) and an adjusted HR 1802 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Table 3 Demographic and socio-economic attributes and hazard ratios (HRs) for NHANES 2003–2004 ‘testing’ samples Age Survivors n ¼ 3059a 44.49 (0.55) Deceased n ¼ 203a 71.42 (1.37) Age-adjusted HRb 2.60 [2.28,2.96] Demographics-adjusted HRb 2.28 [1.97,2.64] Male 48.0 (0.7) 56.7 (3.5) 1.81 [1.40,2.36] 2.28 [1.55,3.35] Non-Hispanic Black 11.3 (1.8) 12.4 (3.3) 1.54 [1.11,2.12] 1.35 [0.98,1.86] Mexican American 8.0 (2.0) 4.3 (2.7) 1.21 [0.76,1.92] 0.78 [0.48,1.29] Race (%): Other 5.2 (0.7) 4.8 (2.1) 1.53 [0.62,3.74] 1.48 [0.50,4.38] Other Hispanic 3.7 (0.7) 3.2 (1.9) 1.69 [0.58,4.87] 1.68 [0.59,4.82] 71.8 (3.4) 75.3 (3.8) ref ref <High school 18.5 (1.2) 33.7 (4.6) 1.31 [0.96,1.80] 1.05 [0.67,1.65] High school 27.0 (1.0) 25.7 (4.8) 1.00 [0.56,1.77] 0.97 [0.53,1.76] 4High school 54.6 (1.2) 40.6 (5.1) ref ref Non-Hispanic White Education (%): Income (quintile of income/poverty) (%): Quintile 1 16.9 (1.5) 22.2 (4.0) 2.33 [1.24,4.36] 2.05 [1.04,4.04] Quintile 2 19.7 (0.9) 31.9 (3.9) 1.62 [0.79,3.30] 1.31 [0.59,2.92] Quintile 3 19.6 (1.0) 21.6 (5.2) 1.38 [0.59,3.21] 1.23 [0.52,2.91] Quintile 4 22.0 (1.2) 12.5 (2.9) 0.85 [0.51,1.43] 0.72 [0.37,1.39] Quintile 5 21.9 (1.8) 11.9 (3.2) ref ref Blue-collar semi 39.0 (2.0) 27.0 (5.4) 0.87 [0.52,1.44] 0.80 [0.47,1.38] Blue-collar high 11.8 (1.2) 23.4 (3.8) 1.89 [1.25,2.86] 1.40 [0.78,2.49] 2.2 (0.2) 5.9 (1.6) 1.37 [0.69,2.71] 1.61 [0.73,3.54] Occupation: Never worked White-collar semi 23.1 (1.2) 17.9 (3.0) 0.71 [0.48,1.07] 0.94 [0.55,1.58] White-collar high 22.0 (1.4) 26.0 (3.8) ref ref Semi, semi-routine; high, high skill; ref, referent. a Unweighted sample size. b HR adjusted for all other demographic and socio-economic factors. of 0.63 (P ¼ 1 1010) in the test survey (Table 4). We also estimated the adjusted HR of each physical activity level relative to other categories. In the combined surveys, the adjusted HR for low activity relative to zero activity was 0.60 [95% CI: (0.47, 0.73), P ¼ 3 106]. The adjusted HR for moderate activity versus low activity was 0.58 [95% CI: (0.41, 0.82), P ¼ 3 103] and high activity versus moderate activity was not significant, with an adjusted HR of 1.2 [95% CI: (0.80, 1.7), P ¼ 0.39]. We had evidence for multiple associations of environmental and behavioural factors with all-cause mortality as seen in the deviance from uniform distribution of P-values (Supplementary Figure S1, available as Supplementary data at IJE online). Three self-reported smoking factors were associated with mortality. These included the categorical factor past and current smoking (versus never smoking). The adjusted HR for past smoking was 1.5 [95% CI: (1.2, 1.8), P ¼ 8 105 in training surveys] and 2.0 for current smoking [95%CI: (1.4,2.9), P ¼ 2x104]. The third self-reported smoking factor included anyone smoking in the participant’s home [adjusted HR: 2.0 (1.6, 2.6), P ¼ 1 107 in the training surveys]. We observed slightly larger estimates in the test survey for these factors. For example, the adjusted HR for current smokers and past smokers versus never smokers was 1.7 and 3.0, respectively (P < 8 105 and 2 104). We found urine and serum cadmium levels associated with mortality. Serum cadmium had an adjusted HR of 1.4 for a 1-SD change in logged exposure value [CI: (1.2, 1.6), P ¼ 1 105] and for urinary cadmium the adjusted HR was 1.6 [CI: (1.3, 2.0), P-value ¼ 6 105]. Adjusted HR in the test surveys were higher [1.6 (P ¼ 6 107) and 2.0 (P ¼ 2 105), respectively]. We also found a serum nutrient marker associated with all-cause mortality. Specifically, the serum carotenoid trans-lycopene was negatively associated with ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY 1803 Figure 2 Volcano plot of 249 environmental and behavioural factor associations with all-cause mortality in training step (all black points). Red horizontal line denotes FDR-adjusted level of statistical significance (FDR ¼ 5%, P-value ¼ 0.0003). Red points show the standard demographic and socio-economic factors considered for adjustments. For SES: SES_0: 1st quantile of SES, SES_1: 2nd quantile of SES, SES_2: 3rd quantile of SES, SES_3: 4th quantile of SES; SES HR are relative to highest quintile of SES. For education: education_hs: high school education, education_less_hs: less than high school education, education HR relative to greater than hig -school education. For occupation: occupation_blue_semi: semi blue c-ollar, occupation_blue_high: high blue-collar, occupation_white_semi: semi white-collar, occupation_never: never worked. Filled black markers denote validated factors. –log10(P-value) for physical activity and age are annotated in parentheses, since they are extreme. Y-axis is discontinuous to accommodate higher –log10(P-values) for physical activity and age all-cause mortality (Figure 2; Table 4). Specifically, trans-lycopene had an HR of 0.6, and higher levels of trans-lycopene were associated with a 20% decreased risk for mortality (Table 4). Adjusted HRs in the test surveys for these variables were similar. Several factors had higher HR (42) but did not have FDR <5%, including hepatitis C antibody and hepatitis B surface antigen. Antibodies to hepatitis C had an adjusted HR of 2.7 in the training surveys 2.7 [95% CI: (1.4, 5.0), P ¼ 0.002, FDR ¼ 10%] and 2.2 in the combined surveys [95% CI: (1.2, 3.9), P ¼ 0.009]. Hepatitis B surface antigen presence had an adjusted HR of 2.6 in the training surveys [95% CI: (0.9, 7.3), P ¼ 0.08, FDR 430%] and 2.1 in the combined surveys [95% CI: (0.8, 6.0), P ¼ 0.1]. Interaction checks with two lowest SES categories, male sex and Non-Hispanic Black race/ethnicity We estimated whether the seven validated factors interacted with three demographic categories (total of 21 tests of interaction). We could not conclude that any of these demographic factors modified associations for all-cause mortality after consideration of multiple hypotheses (P 40.05 for all 21 interaction tests). Correlation pattern between putative risk factors We assessed the correlations among each of the environmental and behavioural factors with FDR <5% (n ¼ 7) and adjustment covariates (n ¼ 21) and observed that there were many modest correlations among the 351 pairwise correlations that were calculated; 210 of the 378 correlations were significant (Bonferroni-adjusted P < 0.05). The 5th to 95th percentile range of the absolute value of r was 0.005 to 0.30 (Figure 3) and the correlations that were significant had absolute values ranging from 0.04 to 0.62. There were significant correlations between similar factors belonging to the same group, such as smokingand cadmium-related factors. For example, the correlation between serum and urinary cadmium levels was 0.45 (adjusted P <1 1012). Self-reported anybody 1.14x106 1.66 [1.35,2.04] All estimates are adjusted by age, sex, race, SES, education and occupation. n and number of events are unweighted. FDR, false discovery rate; MET, Metabolic equivalent. 1.66x105 2.03 [1.47,2.80] 59 1.58x102 186 1783 Cadmium, urine (1 SD log) 1.62 [1.28,2.04] 5.7x105 1079 1.14x109 1.99 [1.59,2.48] 1.18x103 1.88 [1.28,2.76] 202 3258 3.5x103 1.1x107 655 6008 Does anyone smoke in home? 2.00 [1.55,2.58] 4.47x1018 0.71 [0.66,0.77] 1.27x10 0.63 [0.54,0.72] 191 2989 <0.001 4.0x10 619 5534 Physical activity (MET-based rank) 262 3096 Trans-lycopene(1 SD log) 0.72 [0.66,0.79] 1.20x109 0.80 [0.74,0.86] 1.45x107 0.79 [0.73,0.86] 179 3054 3.20x102 10 12 2.9x104 9.33x10 1.2x10 1.37 [1.19,1.57] 591 5722 Cadmium (1 SD log) 0.81 [0.72,0.91] 3.97x109 1.45 [1.28,1.65] 6.59x107 2.20 [1.61,3.00] 3 3120 188 1.63 [1.34,1.97] 5.66x10 7 9.51x106 3.17 [1.90,5.28] 201 2.80x102 5 652 Current smoker 5409 2.00 [1.38,2.89] 2.3x104 2911 5.31x106 1.53 [1.27,1.83] 1.65x102 1.66 [1.10,2.52] 201 1.68x102 652 Past smoker 5409 1.50 [1.23,1.83] 7.8x105 2911 P-value Adjusted HR [95% CI] P-value Adjusted HR [95% CI] Events FDR Description n Events Current/past smoker (vs never smoker) Adjusted HR [95% CI] P-value n Testing survey (2003–04) Combined surveys (1999–2004) INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Training survey (1999–2002) Table 4 Tentatively validated factors. Training denotes estimate from training survey, NHANES 1999–2002. Testing denotes estimates from testing survey, NHANES 2003–04. Combined denotes estimate from combining training and testing surveys 1804 smoking at home was significantly positively correlated with current smoking status (r ¼ 0.6, adjusted P <1 1012) and negatively correlated with past smoking status (r ¼ 0.2, adjusted P <1 1012). We observed correlations between smoking-related behaviours, physical activity, levels of cadmium and levels of trans-lycopene. First, smoking behaviour was significantly correlated with cadmium biomarker levels. Specifically, current smoking was correlated with both serum and urine cadmium levels (r ¼ 0.52 and 0.21, respectively, adjusted P < 1 1012). Physical activity was modestly correlated with translycopene with (r ¼ 0.2, adjusted P < 1 1012). Urine cadmium was modestly correlated with past smoking (r ¼ 0.1, adjusted P ¼ 1 105). On the other hand, trans-lycopene was modestly but significantly negatively correlated with serum and urine cadmium (r ¼ 0.13 and 0.16, adjusted P < 1 1012 and P ¼ 8 1011, respectively). Moreover, there were modest correlations between the tentatively validated factors and demographic and socio-economic factors (r 5 0.1 and adjusted P <0.01). First, physical activity was positively correlated with above high school education and 5th quintile of SES (r ¼ 0.2 and 0.2, respectively, adjusted P <1 1012) and negatively correlated with age (r ¼ 0.14, adjusted p < 1x1012). Trans lycopene was inversely correlated with age (r ¼ 0.3, adjusted P < 1 1012) and less so than for high school education (r ¼ 0.13, P <1 1012). Serum and urinary cadmium were directly correlated with age (r ¼ 0.24 and 0.34, respectively, adjusted P <1 1012). Serum cadmium was additionally correlated with less than high school education (r ¼ 0.13, adjusted P <1 1012) and urinary cadmium was correlated with Non-Hispanic Black race/ethnicity (r ¼ 0.16, adjusted P <1 1012). Smoking-related factors also exhibited correlations with demographic factors. Self-reported current smoking was correlated with male sex (r ¼ 0.1, adjusted P <1 1012) and inversely correlated with age (r ¼ 0.15, adjusted P <1 1012). Current smoking was also correlated with first quintile of SES (r ¼ 0.12, adjusted P <1 1012). Similarly, anyone smoking at home correlated with first quintile of SES (r ¼ 0.11, adjusted P <1 1012) and with Non-Hispanic black race/ethnicity (r ¼ 0.11, adjusted P <1 1012). Past smoking was strongly correlated with age (r ¼ 0.26, adjusted P <1 1012) and male sex (r ¼ 0.14, adjusted P <1 1012). Multivariable models and variance explained by tentative validated factors We built three multivariable models to estimate the variance explained by the tentatively validated factors. We opted to remove urinary cadmium from consideration in these models due to extensive missing information (only 1694 participants with 134 death events versus 5155 participants with 416 events). In the first, we entered five of the seven tentatively validated ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY 1805 Figure 3 Pairwise correlations of factors with FDR <5% in the training set and of the standard demographic and socio-economic factors used for adjustments factors [serum cadmium, physical activity, anyone smoked in home, current smokers/past smokers (versus never smokers)] while adjusting for demographic covariates (Table 5) for participants from both the training and testing surveys (Model A). The second model was similar to the first, containing six of seven validated factors including trans-lycopene (Model B). The third multivariate model contained six of seven validated factors but omitted socio-economic factors, such as SES, education and occupation (Model C). The total number of participants available in the combined testing and training surveys in Model A was 7381 (733 deaths). The total number of participants available for Models B and C was 5155 (416 deaths). The total variance explained by models A, B and C was 14.4, 13.2 and 11%, respectively. The variance explained by the tentatively validated environmental and behavioural factors in these models was 1.6, 2.1 and 2.3%, respectively. Thus, models not including trans-lycopene but built on more complete data were not inferior versus models including trans-lycopene. Moreover, models that did not consider socioeconomic factors had a modestly lower R2 than those that did (13 versus 11%). The contribution of environmental and behavioural factors was similar in a Past/current smoker, serum cadmium, total physical activity, anyone smoke in home?, trans-lycopene, age, sex, race/ethnicity Model C variables MET, Metabolic equivalent. a CI computed by bootstrap. Past/current smoker, serum cadmium, total physical activity, anyone smoke in home?, trans-lycopene, age, sex, race/ethnicity, education, SES, education, occupation 0.023 [0.015,0.030]a 0.11 [0.102, 0.129]a Model B variables 0.021 [0.012, 0.028] a 0.132 [0.111, 0.1438] 4.61x105 Past/current smoker, serum cadmium, total physical activity, anyone smoke in home?, age, sex, race/ethnicity, education, SES, education, occupation a 5,155 (416) 0.84[0.77,0.91] 3.02x105 1.98x103 1.11x1016 Model A variables 0.144 [0.127, 0.156] a 1.76[1.23,2.51] 4.85x10 0.65[0.59,0.72] 0.017 0.80 P-value 0.113 0.016 [0.009, 0.022] 2 5,155 (416) 0.84[0.77,0.91] 1.69[1.17,2.44] 16 1.26[1.04,1.52] 0.92[0.5,1.7] Multivariate HR [95% CI] 1.28[0.94,1.73] 3 9.99x10 0.021 0.775 P-value 0.013 Model C Nagelkerke R (full-reduced) Nagelkerke R 7,381 (733) n (number of events) 2 . Trans-lycopene (1 SD of log) . 7.30x10 0.67[0.6,0.74] 4 1.24[1.03,1.49] 14 0.91[0.49,1.71] 2.88x104 1.67[1.24,2.24] Model B Multivariate HR [95% CI] 1.27[0.94,1.71] 0.655 Does anyone smoke in home? Serum cadmium (1 SD of log) 3.06x10 1.24[1.11,1.4] Current Smoker (vs. Never smoker) P-value 1.49x103 Total physical activity (MET-based rank) 0.73[0.68,0.79] 1.1[0.72,1.67] Past smoker (vs. never Smoker) Model A Multivariate HR [95% CI] 1.36[1.12,1.64] Table 5 Multivariable model coefficients 1806 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY models B and C. On the other hand, current smoking (P 40.7 in Models A–C, Table 5) and past smoking (Models B–C, Table 5) lost nominal significance (P 40.05) in multivariable models, indicating the correlative nature of the tentatively validated factors. Discussion Out of the 249 tested environmental and behavioural factors, we found that only physical activity, smoking and cadmium levels have consistent evidence for strong and validated associations with all-cause mortality. Some other factors might have been missed due to limited power. This suggests that the study of putative environmental and behavioural risk factors that regulate all-cause mortality at the general population level will require very large studies and careful validation. Given the small effect sizes of validated factors, continuing to perform modest-size studies with selective reporting of a few putative risk factors is unlikely to yield reliable and conclusive answers. Tentatively validated factors accounted for approximately 2% of the risk variance when demographic factors were accounted for, and this decreased little when socioeconomic factors (income, education and occupation) were also accounted for. This suggests that little of the impact of these two modifiable behaviours out of 249 examined is explained by the few measured socio-economic forces that possibly influence physically inactivity or smoking. Whereas we did not subject the socio-economic factors to validation, the volcano plot (Figure 2) shows that descriptively the two lowest quintiles of income were strongly associated with mortality, with a hazard ratio larger than any of the individual environmental and behavioural factors tested. This relation specific to the two lowest levels of income and mortality is consistent with prior work done on an earlier wave of NHANES14 and thus should be further investigated, given the out-of-sample replication and strength of association we observe. Reassuringly, we were able to elicit well-known associations between smoking and physical activity and all-cause mortality. Our estimates of increased risk with current and past smoking are very similar to those of a recent meta-analysis where relative risks were 1.83 for current smokers and 1.34 for past smokers.32 Our protective estimates from physical activity are also similar to those identified by a recent large meta-analysis of 80 cohorts.33 Physical activity34 and current smoking35,36 are associated with average increase of 3–4 and decrease of 10 years in life expectancy, respectively, and physical inactivity and smoking are thought to be each responsible for approximately five million deaths worldwide. Our multivariable analysis also reiterates the combined effect of behavioural/environmental risk factors on mortality.37 Nutrition and nutrition quality has also been connected with mortality risk. We found a marker for a 1807 carotenoid nutrient (trans-lycopene) associated with all-cause mortality. Several observational studies have found correlations between carotenoid levels and mortality among elderly, for example among women38 and among Italian individuals.39 In NHANES III, Shardell and colleagues observed a modest decrease in mortality for 2nd and 3rd quartiles of lycopene.40 However, interventions focusing on carotenoid-related nutrients have not shown any benefits in clinical trials for prevention of chronic disease and cause-specific death (e.g ref41). Further, a randomized trial of a ‘tomato-rich diet’ containing high amounts of lycopene failed to change chronic disease risk profiles of 255 UK-based participants.42 Some investigations have suggested harm.43 Therefore, trans-lycopene may be a surrogate marker of other ‘healthy’ behaviours and possibly of a ‘healthy diet’ profile. It is unclear which measured or unmeasured correlate of trans-lycopene levels may be responsible for the association with mortality risk. Further, what exactly constitutes a ‘healthy diet’ is currently very difficult to define, in contrast to earlier claims.44 We have previously documented a large correlation matrix of environmental factors,10 and further studies should investigate how nutritional and other environmental and behavioural factors relate to one another8 to potentially trace sources of bias and harness confounding. We found that urinary and serum cadmium were also associated with all-cause mortality. Tellez-Plaza et al. have reported similar results in these participants for blood and urinary cadmium on both all-cause and cardiovascular-related mortality, while adjusting for many other cardiovascular-related risk factors including smoking, cholesterol, blood pressure and medication use.45 Further studies will need to evaluate the relationship of behaviours that lead to cadmium exposure and all-cause or cause-specific mortality. For example, serum cadmium is postulated to indicate current exposure whereas urinary cadmium may reflect total body burden of cadmium, but urinary cadmium is reflective of serum levels.46 Serum cadmium levels increase as humans age, and sources include ambient air pollution (through fossil fuel combustion), diet and smoking.46 Cadmium levels were significantly correlated with smoking and age; however, the association of death risk with serum levels of cadmium was significant in the multivariable models even after smoking had been accounted for. Our analysis on all-cause mortality has several limitations. First, to consider multiple factors in systematic and standard fashion, we had to make assumptions about what covariates to adjust for in our initial scan and replication procedure. Investigators may consider a different set of adjusting covariates specific for each factor; however, it is unclear how to attain a ‘standard’ set of covariates. We focused on a set of demographic factors (age, sex and 1808 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY race/ethnicity) that are impossible to modify, and on a set of socio-economic factors (income, education, occupation) that are very difficult for individuals to modify (although they are amenable to social and other multi-level interventions). Second, not all participants in NHANES have available measurements on the entire set of all factors assayed; thus, it is not possible to subject the scan to the same number of participants for each environmental and behavioural factor considered. This type of non-random missing information may lead to biased findings, as the subsamples may not be representative of the larger sample. Although we did not detect any differences in population demographics of sub-samples, we acknowledge that missing data could have led to loss of power. Third, along these lines, our power calculations based on the available data with non-missing information suggest that we had power to detect modest relative risks, but many small effects could easily have been missed. The need to understand small effects requires a recalibration of our thinking about risk-factor epidemiology, with emphasis on very large studies and careful replication. For small effects, differentiating noise from genuine signals is difficult. Fourth, residual confounding is always possible in any observational associations, even those that are seemingly consistent and validated in different datasets. Fifth, our data had a relative short follow-up in both the training (median 6 years) and testing (median 3 years) surveys and lacked repeated measurements of factors to assess the longer-term risk of these factors on mortality through time. We emphasize that in an investigation of non-institutionalized people in the general population, many environmental and behavioural factors will require longer exposure and followup times to detect associations with mortality. Whereas factors found through this study have strong evidence for association with all-cause mortality, we cannot rule out important factors not identified by these methods or these data. Further, the deceased participants considered here are older individuals (68 and 73 years mean age of deceased participants in the 1999–2002 and 2003–04 surveys, respectively), many of whose cause of death included chronic cardiovascular-related disease, such as heart disease. Relatedly, many environmental factors, such as infectious agents, are only applicable to a small subset of the population, have lower prevalence and/or play a role in cause-specific mortality such as cancer. Therefore, a systematic scan of factors in a general population will be underpowered to detect these putative associations in the context of all-cause mortality. Specifically, the top causes of death in the general population included cardiovascular-related diseases (e.g. ischaemic heart disease, stroke and myocardial infarction) and the findings may only be pertinent to aetiologies of these diseases. On the other hand, factors with larger effect sizes but higher FDR (outliers on the volcano plot) can be noted for further investigation. For example, outliers in this study included hepatitis B and C, factors that may cause liver cancer.47 We emphasize that factors that are not top findings in such a scan may still play a large role in mortality risk, albeit in smaller sub-populations. Sixth, we cannot claim that our systematic scan of environmental and behavioural factors in NHANES covers the entire space of the ‘exposome’.48 The CDC and NCHS have selected an array of behavioural and environmental factors based on their prevalence, measurement feasibility and hypothesized influence on population health. Furthermore, unlike static genetic factors, there is heterogeneity in exposure or self-reported factor ascertainment and exposures/ behavioural factors will follow unique temporal patterns throughout an individual’s lifetime.49 For example, factors such as pollutants (e.g. polychlorinated biphenyls) are lipophilic and persistent in fatty tissue and are accrue in tissue over time.10 Other factors, such as bisphenol A, are metabolized rapidly, are short-lived and assume that individuals are continuously exposed to the factor (e.g. ref 50). Further, the relationships between the biomarkers and actual exposure are also difficult to surmise due to issues of sample timing and differential elimination. Selfreported dietary factors derived from a single point in time can be error-prone51,52 and there are documented examples of lack of concordance with objective indicators of intake.53,54 As a result of lack of comprehensive measures and heterogeneity, our systematic scan will have missed other candidate factors putatively associated with mortality risk. Acknowledging these caveats, we have shown a generalized and systematic approach to identify strong and validated correlates of all-cause mortality and prioritize hypotheses regarding the association between environmental and behavioural factors and mortality. Instead of focusing on a few putative risk factors at a time, our approach gives a wider perspective about the strength of the evidence (or lack thereof) and the impact of a wide array of putative risk factors that may be possible to modify. Supplementary Data Supplementary data are available at IJE online. Funding This work was supported by the National Heart, Lung, and Blood Institute [T32 HL007034] to C.J.P., and the National Institute of Diabetes and Digestive Diseases [K24 DK085446] to G.M.C. and [K23 DK089086] to J.T.L. Conflict of interest: None declared. ENVIRONMENTAL AND BEHAVIOURAL FACTORS IN ALL-CAUSE MORTALITY 1809 KEY MESSAGES Identification of environmental and behavioural factors associated with mortality is critical for public health and preventive care. However, there are few investigations that systematically search for associations between environmental and behavioural factors and all-cause mortality. Here, we systematically associate 249 environmental and behavioural factors, such as urineary or serum markers of environmental exposure and self-reported nutrients, with time-to-death, and were able to tentatively validate five factors robustly associated with mortality. Instead of focusing on a few risk factors at a time, our approach gives a wider perspective about the strength of the evidence (or lack thereof) and the impact of a wide array of risk factors that may be possible to modify. References 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 McGinnis JM, Foege WH. Actual causes of death in the United States. JAMA 1993;270:2207–12. Mokdad AH, Marks JS, Stroup DF, Gerberding JL. Actual causes of death in the United States, 2000. JAMA 2004; 291:1238–45. Danaei G, Ding EL, Mozaffarian D et al. The preventable causes of death in the United States: comparative risk assessment of dietary, lifestyle, and metabolic risk factors. PLoS Med 2009;6:e1000058. Wild CP. The exposome: from concept to utility. Int J Epidemiol 2012;41:24–32. Schwartz D, Collins F. Medicine. Environmental biology and human disease. Science 2007;316:695–96. Ioannidis JP. Why most discovered true associations are inflated. Epidemiology 2008;19:640–48. Ioannidis JPA. Why Most Published Research Findings Are False. PLoS Med 2005;2:e124. Ioannidis J, Loy EY, Poulton R, Chia KS. Researching Genetic Versus Nongenetic Determinants of Disease: A Comparison and Proposed Unification. Sci Transl Med 2009;1:8. Patel CJ, Bhattacharya J, Butte AJ. An EnvironmentWide Association Study (EWAS) on type 2 diabetes mellitus. PLoS One 2010;5:e10746. Patel CJ, Cullen MR, Ioannidis JP, Butte AJ. Systematic evaluation of environmental factors: persistent pollutants and nutrients correlated with serum lipid levels. Int J Epidemiol 2012;41:828–43. Tzoulaki I, Patel CJ, Okamura T et al. A nutrient-wide association study on blood pressure. Circulation 2012; 126:2456–64. Fillenbaum GG, Burchett BM, Blazer DG. Identifying a national death index match. Am J Epidemiol 2009;170: 515–18. Adler NE, Rehkopf DH. U.S. disparities in health: descriptions, causes, and mechanisms. Annu Rev Public Health 2008;29:235–52. Rehkopf DH, Berkman LF, Coull B, Krieger N. The nonlinear risk of mortality by income level in a healthy population: US National Health and Nutrition Examination Survey mortality follow-up cohort, 1988-2001. BMC Public Health 2008;8:383. Ainsworth BE, Haskell WL, Whitt MC et al. Compendium of physical activities: an update of activity codes and MET intensities. Med Sci Sports Exerc 2000;32(Suppl 9): S498–504. 16 17 18 19 20 21 22 23 24 25 26 27 US Department of Health and Human Services. 2008 Physical Activity Guidelines for Americans. Available from: http://www.health.gov/paguidelines/pdf/paguide.pdf (15 January 2013, date last accessed). Blanton CA, Moshfegh AJ, Baer DJ, Kretsch MJ. The USDA Automated Multiple-Pass Method accurately estimates group total energy and nutrient intake. J Nutr 2006;136:2594–99. U.S. Department of Agriculture, Agricultural Research Service, Beltsville Human Nutrition Research Center et al. What We Eat in America, NHANES 2003-2004. Beltsville, MD: Beltsville Human Nutrition Research Center; Available from: ftp://ftp.cdc.gov/pub/Health_ Statistics/nchs/nhanes/2003-2004/DR1TOT_C.XPT (15 January 2013, date last accessed). U.S. Department of Agriculture, Agricultural Research Service, Beltsville Human Nutrition Research Center et al. What We Eat in America, NHANES 2001-2002. Beltsville, MD: Beltsville Human Nutrition Research Center; Available from: ftp://ftp.cdc.gov/pub/Health_ Statistics/nchs/nhanes/2001-2002/DRXTOT_B.XPT (15 January 2013, date last accessed). U.S. Department of Agriculture, Agricultural Research Service, Beltsville Human Nutrition Research Center et al. What We Eat in America, NHANES 1999-2000. Beltsville, MD: Beltsville Human Nutrition Research Center; Available from: ftp://ftp.cdc.gov/pub/Health_ Statistics/nchs/nhanes/1999-2000/DRXTOT.XPT (15 January 2013, date last accessed). Flegal KM, Graubard BI, Williamson DF, Gail MH. Cause-specific excess deaths associated with underweight, overweight, and obesity. JAMA 2007;298: 2028–37. Therneau T. A Package for Survival Analysis in S., R package version 2.36-14. 2012. Lumley T. Survey: Analysis of Complex Survey Samples. R package version 3.14; 2009. StataCorp. Stata Statistical Software: Release 10. 10th edn. College Station, TX: StataCorp LP, 2007. Patel CJ, Chen R, Kodama K, Ioannidis JP, Butte AJ. Systematic identification of interaction effects between genome- and environment-wide associations in type 2 diabetes mellitus. Hum Genet 2013;132:495–509. Efron B. Large-Scale Inference. Cambridge, UK: Cambridge University Press, 2010. Gordon A. Classification. 2nd edn. Boca Raton, FL: Chapman and Hall/CRC, 1999. 1810 28 29 30 31 32 33 34 35 36 37 38 39 40 41 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Qiu W, Chavarro J, Lazarus R, Rosner B, Ma J. powerSurvEpi: Power and sample size calculation for survival analysis of epidemiological studies; R package version 0.0.6; 2012. Hsieh FY, Lavori PW. Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates. Control Clin Trials 2000;21:552–60. Schoenfeld DA. Sample-size formula for the proportionalhazards regression model. Biometrics 1983;39:499–503. Davison A, Hinkley D. Bootstrap Methods and Their Application. Cambridge, UK: Cambridge University Press, 1997. Gellert C, Schottker B, Brenner H. Smoking and all-cause mortality in older people: systematic review and metaanalysis. Arch Intern Med 2012;172:837–44. Samitz G, Egger M, Zwahlen M. Domains of physical activity and all-cause mortality: systematic review and dose-response meta-analysis of cohort studies. Int J Epidemiol 2011;40:1382–400. Moore SC, Patel AV, Matthews CE et al. Leisure time physical activity of moderate to vigorous intensity and mortality: a large pooled cohort analysis. PLoS Med 2012;9:e1001335. Jha P, Ramasundarahettige C, Landsman V et al. 21stcentury hazards of smoking and benefits of cessation in the United States. N Engl J Med 2013;368:341–50. Pirie K, Peto R, Reeves GK, Green J, Beral V. The 21st century hazards of smoking and benefits of stopping: a prospective study of one million women in the UK. Lancet 2013;381:133–41. Loef M, Walach H. The combined effects of healthy lifestyle behaviours on all cause mortality: a systematic review and meta-analysis. Prev Med 2012;5:163–70. Nicklett EJ, Semba RD, Xue QL et al. Fruit and vegetable intake, physical activity, and mortality in older community-dwelling women. J Am Geriatr Soc 2012;60:862–68. Lauretani F, Semba RD, Dayhoff-Brannigan M et al. Low total plasma carotenoids are independent predictors of mortality among older persons: the InCHIANTI study. Eur J Nutr 2008;47:335–40. Shardell MD, Alley DE, Hicks GE et al. Low-serum carotenoid concentrations and carotenoid interactions predict mortality in US adults: the Third National Health and Nutrition Examination Survey. Nutr Res 2011;31:178–89. MRC/BHF Heart Protection Study of antioxidant vitamin supplementation in 20,536 high-risk individuals: a randomised placebo-controlled trial. Lancet 2002;360:23–33. 42 43 44 45 46 47 48 49 50 51 52 53 54 Thies F, Masson LF, Rudd A et al. Effect of a tomato-rich diet on markers of cardiovascular disease risk in moderately overweight, disease-free, middle-aged adults: a randomized controlled trial. Am J Clin Nutr 2012;95:1013–22. Bjelakovic G, Nikolova D, Gluud LL, Simonetti RG, Gluud C. Antioxidant supplements for prevention of mortality in healthy participants and patients with various diseases. Cochrane Database Syst Rev 2012;3:CD007176. Hu FB, Willett WC. Optimal Diets for Prevention of Coronary Heart Disease. JAMA 2002;288:2569–78. Tellez-Plaza M, Navas-Acien A, Menke A, Crainiceanu CM, Pastor-Barriuso R, Guallar E. Cadmium exposure and all-cause and cardiovascular mortality in the U.S. general population. Environ Health Perspect 2012; 120:1017–22. Centers for Disease Control and Prevention, Agency for Toxic Substances and Disease Registry. ATDSR– Toxicological Profile: Cadmium. http://www.atsdr.cdc.gov/ toxprofiles/tp.asp?id¼48&tid¼15 (15 January 2013, date last accessed). Altekruse SF, McGlynn KA, Reichman ME. Hepatocellular carcinoma incidence, mortality, and survival trends in the United States from 1975 to 2005. J Clin Oncol 2009;27:1485–91. Rappaport SM, Smith MT. Environment and Disease Risks. Science 2010;330:460–61. Athersuch TJ. The role of metabolomics in characterizing the human exposome. Bioanalysis 2012;4:2207–12. Calafat AM, Ye X, Wong LY, Reidy JA, Needham LL. Exposure of the U.S. population to bisphenol A and 4-tertiary-octylphenol: 2003-2004. Environ Health Perspect 2008; 116:39–44. Briefel RR, Flegal KM, Winn DM, Loria CM, Johnson CL, Sempos CT. Assessing the nation’s diet: limitations of the food frequency questionnaire. J Am Diet Assoc 1992;92: 959–62. Byers T. Food frequency dietary assessment: how bad is good enough? Am J Epidemiol 2001;154:1087–88. Brown D. Do food frequency questionnaires have too many limitations? J Am Diet Assoc 2006;106:1541–42. Schatzkin A, Kipnis V, Carroll RJ et al. A comparison of a food frequency questionnaire with a 24-hour recall for use in an epidemiological cohort study: results from the biomarker-based Observing Protein and Energy Nutrition (OPEN) study. Int J Epidemiol 2003;32: 1054–62.
© Copyright 2024 Paperzz