Supplemental Material Sperrin et al. Slowing down of adult bod mass index trend increases in England: a latent class analysis of cross-sectional surveys (1992-2010) Figure S1: Flow diagram of the steps taken to create the HSE obesity dataset (1991-2009) Download Health Survey for England datasets 1991/2,1993,1994,1995,1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 Drop population subgroups Drop children (ages 0-19) and those aged 75 and older Drop boost samples Ethnic boost – 1999 and 2004 Older people – 2000 and 2005 Extract core variables of interest Age, sex, weight (kg) height (cm), BMI, waist and hip measure, waist/hip ratio, variables indicating that there is a valid measure of height/weight/BMI/hip/waist, smoking status, smoking frequency, waist and hip measure, tenure, limiting long term illness, equivalised household income, ethnicity, social class, alcohol consumption, cotinine (from blood sample), sampling weights, person identifier, qualifications, systolic and diastolic blood pressure Recoding and derive new variables Recode variables to ensure consistency over time and to meet aims of analysis Append datasets Append all the datasets for each year of HSE to create a single dataset Drop redundant variables Final data set includes the following variables: 1 Table S1: Variables in Health Survey for England dataset (1991/2-2010) * For information on variable used with syntax see the HSE documentation available at http://www.esds.ac.uk/findingData/hseTitles.asp Variable name Pserial Variable categories Description Years available Syntax (derived variables)* n/a – All years (1991/22010) 1991/2 and 1993 Rename hserno pserial Year 1992, 1993.........2010 Continuous Agegp 16-34 35-54 55-74 Grouped age variable All years (1991/22010) All years (1991/22010) All years (1991/22010) n/a Age Unique identifier of survey respondent within each year Health Survey for England year Age Sex Sex Ht_valid Male Female Continuous (cm) BMI_valid Continuous all valid BMIs All years (1991/22010) smoke Current Ex-regular Never-regular Smoking status All years (1991/22010) All valid heights (cm) n/a gen agegp=. replace agegp=1 if age>=16 & age <=34 replace agegp=2 if age>=35 & age <=54 replace agegp=3 if age>=55 label define agegp 1 "16-34" 2 "35-54" 3 "55-74" label values agegp agegp n/a All years (1991/22010) All years (1991/22010) gen htvalid=htval if year>=1997 replace htvalid=height if htok==1 & year<=1996 codebook htvalid mvdecode htvalid, mv (-8=.\-1=.) gen bmivalid=bmival if year>=1997 replace bmivalid=bmi if bmiok==1 & year<=1996 codebook bmivalid mvdecode bmivalid, mv (-1=.) codebook bmivalid gen smokeVH=newsmok if year==1994 recode smokeVH 7/11=1 2/6=2 1=3 *=. gen smoke=cigsta3 if year>=1999 replace smoke=cigsmk2 if year==1996 | year==1995 | year==1993 | year==1992 gen smokeAM=cigst1 if year==1997|year==1998 recode smokeAM 4=1 2/3=2 1=3 replace smoke=smokeAM if year==1997|year==1998 replace smoke=smokeVH if year==1994 mvdecode smoke, mv (-9=.\-8=.\-7=.\-6=.\-2=.\-1=.) label define smoke 1 "current" 2 "ex-regular" 3 "never regular" label values smoke smoke 2 Variable name smkquant Topqual2 sclass Eqv5 Ethnic Variable categories Description Years available Syntax (derived variables)* Light <10 Moderate 10-19 Heavy 20+ Number of cigarettes smoked per day All years (1991/22010) *recoding into temp variables to match value labels across years gen smkqt96=newsmok2 if year==1996 recode smkqt96 4=1 5=2 6=3 *=. gen smkqt9495=newsmok if year==1994|year==1995 recode smkqt9495 8=1 9=2 10=3 *=. gen smkqt9293=cigsmk1 if year==1992 | year==1993 recode smkqt9293 1=30 2=20 3=10 *=. recode smkqt9293 30=3 20=2 10=1 *=. gen smkqt9709=cigst2 if year>=1997 recode smkqt9709 1=1 2=2 3=3 *=. NVQ4/5 or degree Higher education below degree NVQ3/GCE, A-level NVQ2/GCE, O-level NVQ1/CSE other grade No qualification Full time student i – professional ii – managerial/technical iiin – skilled non-manual iiim – skilled manual iv – semi-skilled manual v – unskilled manual <=£10,665.74 £10,665.74 to £16,900.00 £16,900.00 to £26,787.88 £26,787.88 to £41,864.41 >41,864.41 White Mixed *generating a consistent smoking quantitiy variable gen smkquant=smkqt9709 if year>=1997 replace smkquant=smkqt96 if year==1996 replace smkquant=smkqt9495 if year==1994 | year==1995 replace smkquant=smkqt92 if year==1992 | year==1993 mvdecode smkquant, mv (-9=.\-8=.\-1=.) ta smkquant label define smkquant 1 "light <10" 2 "moderate 10-19" 3 "heavy 20+" label values smkquant smkquant mvdecode topqual2, mv (-1=.\-9=.\-8=.\-7=.) Highest qualification All years except 1995 and 1996 Social class (occupational) All years from 1996 onwards mvdecode sclass, mv (-1=.) **recoding armed forces and 'not fully described as missing' (not clear where to put armed forces and only a handful of cases) mvdecode sclass, mv (7=.\8=.) Quintiles based on distribution of equivalised household income All years from 1997 onwards excluding 1999 and 2004 mvdecode eqv5, mv (-1=.\-90=.) Ethnic group All years from 1999 onwards gen ethcindVH=ethcind recode ethcindVH 3=44 4=33 3 Variable name Variable categories Description Years available Syntax (derived variables)* Asian or Asian British Black or Black British Other recode ethcindVH 44=4 33=3 label define ethcindVH 1 "white" 2 "mixed ethnic group" 4 "black or Black british" 3 "asian or asian british" 5 "any other group" label values ethcindVH ethcindVH mvdecode ethcindVH, mv (-9=.\-8=.) gen ethniciVH=ethnici recode ethniciVH 3=44 4=44 5=33 6=33 recode ethniciVH 44=4 33=3 7=5 label define ethniciVH 1 "white" 2 "mixed ethnic group" 4 "black or Black british" 3 "asian or asian british" 5 "any other group" label values ethniciVH ethniciVH mvdecode ethniciVH, mv (-9=.\-8=.) gen originVH=origin recode originVH 1/3=1 4/7=2 8/11=3 12/14=4 15/16=5 label define originVH 1 "white" 2 "mixed ethnic group" 4 "black or Black british" 3 "asian or asian british" 5 "any other group" label values originVH originVH mvdecode originVH, mv (-9=.\-8=.) ***** gen ethnicgp=ethniciVH if year>=1999 & year<=2003 replace ethnicgp=ethcindVH if year==2004 replace ethnicgp=ethinda if year>=2005 & year<=2007 replace ethnicgp=originVH if year>=2008 & year<=2009 label define ethnicgp 1 "white" 2 "mixed ethnic group" 4 "black or Black british" 3 "asian or asian british" 5 "any other group" label values ethnicgp ethnicgp mvdecode ethnicgp, mv (-9=.\-8=.) 4 Latent Class Analysis or mixture modelling Latent class analysis, or mixture modelling, is a tool to model population heterogeneity, as well as a form of semi-parametric modelling, for some measured variable(s) (McLachlan & Peel, 2000). For the population heterogeneity interpretation, we are assuming that the population we are examining consists of ( sub-populations. Within each sub-population ) the measured variable(s) are assumed to be independent and identically distributed with some distribution (probability density function) . Let proportion of individuals belonging to sub-population , so that denote the . Then, we can write the overall mixture density as In the context of this paper, we take the (univariate) response to be BMI, in the data obtained from HSE (health survey for England). Figure S2 shows empirical density plots of the distributions of BMI, separated by gender, for three years: 1993, 2001 and 2008. The densities are right skewed; this is particularly the case for women. We hypothesize that these skewed densities can be well approximated by a mixture of at least two normal densities, with one larger component capturing the main body of the density, and a second smaller component, with a larger mean, accounting for the right skew. To illustrate the idea, we focus on the BMI distribution for women in 2001. Figure S3 compares a normal distribution with one component, and a normal distribution fitted with two components, fitted to these data. We see that the two component version captures the skew, and appears to give a better representation to the data (we shall use more formal model fit statistics at the modelling stage). Clearly, one can argue that, in the sense of fitting a semi-parametric model that describes the data, a two-component normal model does a good job. A strength of the mixture modelling approach is that one can moreover (with some caution) interpret the mixture components in terms of their possible correspondence to subpopulations. 5 Figure S2: Density plots of BMI distribution in 1993, 2001 and 2008; men and women. The vertical red line corresponds to 25 kg/m2. Figure S3: BMI distribution in 2001. Dashed line: normal distribution fit; dotted line: 2-component normal distribution fit. 6 Mixture Regression Mixture models, or latent class analysis, can be extended into a regression context (Grun & Leisch, 2008). In this paper, we use age, sex, and calendar year, as explanatory variables for BMI. In particular, we are interested in how the distribution of BMI (conditional on age and sex) changes over calendar time. The simplest example of mixture models used in the regression context is where they describe different relationships between a single predictor or response, implicitly assuming that the relationship between the two variables is moderated by a third, latent, discrete variable. We propose modelling BMI in this fashion. In main model, we model the BMI variable as a 2component mixture model, correcting for the variables: year, age and sex (all taken as factors rather than continuous variables). We tentatively label the two components as an ‘average’ or normal group, making up the bulk of the distribution, and a ‘susceptible’ group, accounting for the right skew present in the distribution. Importantly, it is only the mean that is being modelled as dependent on the explanatory variables, so the variances and the proportions are invariant with respect to the explanatory variables. To mitigate potential limitations of this, we consider sub-models for different ages and genders, with proportions and variances then allowed to vary between the sub-models. Formally, we are considering models of the form where , represents the BMI of the ith subject and represents the explanatory variables of the ith subject: gender, year of measurement and age. The means of the distributions, where and depend on the explanatory variables in a regression: is a vector of co-efficients to be estimated for . Because all explanatory variables are viewed as factors, multiple coefficients are needed for each variable (e.g. a separate co-efficient is needed for each year). Testing mixture model fit All 2-component models are tested against their 1-component analogues, and all sub-model combinations are tested against their parent models. AIC is used for model comparison – 7 this is commonly recommended in mixture models, as opposed to other approaches such as the likelihood ratio test, which rely on various assumptions that are not satisfied by mixture models (McLachlan & Peel, 2000). Of particular interest is the behaviour of the component means, and , and how they vary with age and calendar year. Assignment of individuals to clusters The parameters for the mixture model are estimated using a maximum likelihood approach (Dempster et al, 1977), using the flexmix package in R (Leisch, 2004). Note that individuals are not explicitly assigned to these clusters (hence this is soft clustering) but a posterior probability of an individual’s membership can then be calculated via When a definite latent class membership is required (such as for sensitivity analysis), we assign each individual to the class for which their posterior probability of membership is largest. Sensitivity Analysis The parameters for the mixture model are estimated using a maximum likelihood approach 8 Table S2 – exploring component models (k = 1, 2, 3, and 4) Model Proportions Intercept AIC BIC 23.62 416014.9 (df = 29) 416281.9 (df = 29) 409530.6 (df = 59) 410073.9 (df = 59) 23.46 518826.0 (df = 29) 519097.0 (df = 29) 501773.0 (df= 59) 502324.0 (df = 59) 499795.1 (df = 89) 500626.7 (df = 89) 499288.3 (df = 119) 500400.3 (df = 119) MEN 1 component C1 1.000 2 component C1 0.765 22.86 C2 0.235 26.89 WOMEN 1 component C1 1.000 2 component C1 0.663 21.90 C2 0.337 27.39 3 component C1 0.220 28.99 C2 0.373 23.04 C3 0.407 21.35 4 component C1 0.305 24.52 C2 0.156 29.93 C3 0.250 21.06 C4 0.289 21.46 C: component or latent class. df: degrees of freedom. AIC: Akaike’s information criterion. BIC: Bayesian information criterion 9 Figure S4 One to four component models In men, models of 3 and 4 components were unstable (i.e. wide variation year on year) and not considered further. The cause of the instability is a combination of identifiability issues in the model fitting (common to mixture models), and the components fitting idiosyncrasies in the data rather than biologically plausible subgroups. Women 1 component 2 component 32 32 30 30 28 28 26 26 24 24 22 22 33.7% 20 66.3% 20 1990 1995 2000 2005 2010 BMI (kg/m 2) 20 30 40 50 60 70 1990 1995 2000 2005 2010 3 component 20 30 40 50 60 70 40 50 60 70 4 component 22.0% 15.6% 32 32 30 30 28 28 26 37.3% 30.5% 26 24 24 28.9% 40.7% 22 22 25.0% 20 20 1990 1995 2000 2005 2010 Year 20 30 40 50 60 70 Age 1990 1995 2000 2005 2010 Year 20 30 Age One to four component mixture models for BMI distributions per year and age in women. Only means for each component are shown. The percentages indicate the subpopulation proportion. 10 Potential Response Bias Analyses were conducted on the 183,259cases in the HSE dataset (1992-2010; ages 20-75 and with gender data) for which BMI measurements were available. There were a further 19,093 cases for which BMI measurements were unavailable but had agreed to interview, i.e. 89.6% of those interviewed had BMI measurements available. We can consider whether bias exists amongst those who agreed to interview between those with and without BMI measurements. However, it is untestable from these data whether response bias exists between those who do and do not agree to interview for HSE altogether (approximately 70% agreed to interview). Henceforth we consider missingness as a proportion of those interviewed. First, separating by gender, the missingness rate was 8.9% for males, and 11.7% for females. The extra missingness in females could be because weight measurements were not taken from pregnant women. Missingness also changed slightly by survey year, with more missingness generally present in later survey years: Year 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 %missing 7.6 10.7 6.5 8.9 7.2 6.5 8.5 9.8 11.7 10.7 10.9 10.0 14.7 15.1 12.9 12.2 13.6 13.3 15.3 11 We also considered differences in age, social class, educational status, smoking status and income: here there were no discernible differences between missing and non-missing cases: References for supplemental material Allman-Farinelli MA, Chey T, Bauman AE, Gill T, James WP (2008) Age, period and birth cohort effects on prevalence of overweight and obesity in Australian adults from 1990 to 2000. Eur J Clin Nutr 62(7): 898-907 Carstensen B (2007) Age-period-cohort models for the Lexis diagram. Stat Med 26(15): 3018-45 Dempster A, Laird N, Rubin D (1977) Maximum Likelihood from Incomplete Data via the EMAlogrithm. Journal of the Royal Statistical Society 39(B): 1-38 Grun B, Leisch F (2008) FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters." Journal of Statistical Software, 28(4), 1-35. URL http://www.jstatsoft.org/v28/i04/. Holford TR (1983) The estimation of age, period and cohort effects for vital rates. Biometrics 39(2): 311-24 Holford TR (1991) Understanding the effects of age, period, and cohort on incidence and mortality rates. Annu Rev Public Health 12: 425-57 Howel D (2011) Trends in the prevalence of obesity and overweight in English adults by age and birth cohort, 1991-2006. Public Health Nutr 14(1): 27-33 Leisch F (2004) FlexMix: A general framework for nite mixture models and latent class regression in R." Journal of Statistical Software, 11(8). URL http://www.jstatsoft.org/v11/i08/. McLachlan GJ, Peel D (eds) (2000) Finite Mixture Models. . New York: Wiley Rutherford MJ, Lambert PC, Thompson JR (2011) Age-Period-Cohort Modelling. The Stata Journal 10(4) 12
© Copyright 2026 Paperzz