MRC Career Development Award in Biostatistics A Bayesian machine learning approach to latent variable modelling to accelerate endotype discovery Dr Danielle Belgrave, PhD Research Fellow in Biomedical Modelling [email protected] 15th July 2016 Outline • Introduction: What is Endotype Discovery? • Motivating Example: The Allergic March • Probabilistic Programming in Infer.NET • Modelling Strategies for Analysis Latent Trajectories of Comorbidities • Concluding Remarks Endotype Discovery: The Grand Challenge To identify subgroups of complex disease risk or treatment outcome explained by a distinctive underlying mechanism (“endotypes”) Foundation of Stratified Medicine, seeking better-targeted interventions M.C. Escher Order and Chaos, 1950 Asthma genetics: Low Yield • Legacy of non-replicated genetic epidemiology, typical of most common chronic disorders Linkage in 1 study only Linkage in >1 study Big Data: Feast or Fog? Problem Space Observation Space Data Space y = b1x1 + b2x2 + b3x3 + c ...Health is measured with error and missingness: Endo-phenotypes are resolved as if the researcher was looking through a doyley and prism at the problem STELAR e-Lab ALSPAC Requests for data MAAS SEATON Data Exports On-going data collection STELAR e-Lab Methods and Results Comments, questions Ashford Isle of Wight Data Collection STELAR Researchers e-Lab Platform Motivating Example Latent Trajectory Models to Understand Disease Comorbidities of Eczema, Asthma and Rhinitis Received Wisdom: Allergic March • Progression of allergy Eczema → Asthma → Rhinitis From: Spergel & Paller, 2003 • Cross-sectional, population-level → • Used to explain mechanisms leading to allergy • Eczema causally linked to asthma and rhinitis • Prevention strategy: target children with eczema to reduce progression to asthma From: World Allergy Organization, 2014 What should General Practice trainees learn about atopic eczema?1 • Effective atopic eczema (AE) control not only improves quality of life but may also prevent the atopic march (1). • “Atopic dermatitis … is often the first step in the atopic march leading to the development of asthma or allergic rhinitis (2).” • “Early and effective treatment of atopic dermatitis could theoretically interrupt the atopic march and decrease the risk of asthma (3).” 1. Munidasa D et al. J Clin Med. 2015 Feb 12;4(2):360-8. 2. Spergel JM. Ann Allergy Asthma Immunol. 2010;105(2):99-106 3. Gali et al. Allergy Asthma Proc 2007;28:540-3 Stopping the Atopic March Millions Spent, complete failure… Warner JO. J Allergy Clin Immunol. 2001;108(6): 929-37 Aim: The goal of this study was to determine whether early intervention with pimecrolimus limits the atopic march in infants with AD. Conclusion: This longitudinal observation of infants with AD provides evidence of the atopic march. Pimecrolimus was safe and effective in infants with mild to moderate AD. Schneider L et al. Pediatr Dermatol. 2016 Jun 7. doi: 10.1111/pde.12867 Data Description • STELAR eLab: Population-based birth cohort studies • Manchester Asthma and Allergy Study (MAAS) • 1184 subjects recruited prenatally • Avon Longitudinal Study of Parents and Children (ALSPAC) • 8665 subjects recruited prenatally • Question 1: Has Your Child Had Eczema in the Past 12 Months? • Question 2: Has Your Child Had Wheeze in the Past 12 Months? • Question 3: Has Your Child Had Rhinitis in the Past 12 Months? • Assessed at ages 1, 3, 5, 8 and 11 years A Tale of 2 Cohorts: Cross-Sectional Patterns Show the Allergic March Data from two population-based birth cohort studies: Manchester Asthma and Allergy Study (MAAS) 1184 subjects recruited prenatally Avon Longitudinal Study of Parents and Children (ALSPAC) 8665 subjects recruited prenatally From: Belgrave et al. Developmental Profiles of Eczema, Wheeze, and Rhinitis: Two Population-Based Birth Cohort Studies. PlosMedicine 2014 Does atopic march exist at individual level? • Alternative Hypothesis: The allergic march is an imposed paradigm which does not adequately describe the natural history of Eczema → Asthma → Rhinitis • To capture disease heterogeneity and encapsulate different patterns of symptom progression in individual children in the first 11 years of life • To use probabilistic machine learning techniques to model the development of Eczema → Asthma → Rhinitis • To identify homogeneous groups of children with similar profiles of Eczema → Asthma → Rhinitis and understand the characteristics of different groups • Develop a Bayesian network of probabilistic dependencies in order to identify meaningful latent structure and dependencies Probabilistic Model to Capture Disease Profile Heterogeneity • Bayesian Machine Learning modelling framework to identify distinct latent classes based on individual changing disease profiles of Eczema → Asthma → Rhinitis • Assumption: each child belongs to one of N latent classes • Children in the same class having similar Eczema → Asthma → Rhinitis transitions over time • Number and size of classes not known a priori • Calculated each child’s posterior probability of belonging to each of the latent classes, and assigned children to the class with the greatest probability • Compare models using model evidence Probabilistic Programming in Infer.NET What is Probabilistic Programming? Pfeffer, Avi. "Practical probabilistic programming." International Conference on Inductive Logic Programming. Springer Berlin Heidelberg, 2010. Infer.NET • Compiles probabilistic programs into inference code (EP/VMP/Gibbs). • Supports many (but not all) probabilistic program elements • Extensible – distribution channel for new machine learning research • Consists of a chain of code transformations: Infer.NET Architecture Modelling Strategies for Analysing Latent Transition Profiles Disambiguating Longitudinal Comorbidities of Eczema, Asthma and Rhinitis Generalised Framework: Hidden Markov Model for Approximating Disease Transition States • An HMM consists of 5 elements: 1. 2. 3. 4. 5. A finite set of M observation states: Y1, Y2, Y3,…, Ym A finite set of K latent states: S1, S2, S3,…, Sk Initial probabilities P(S) for each S defining the probability of starting in state S Transition probabilities between latent states P(St|St-1) Emission probabilities of the observed states given the latent state at each time step P(Yi|Sj) • HMM: A tool for representing probability distributions over sequences of observations. Assumptions: 1. 2. 3. The observations are sampled at discrete, equally-spaced time intervals, so t can be some integer-valued time index The observation at time t was generated by some process whose state St is hidden from the observer The state of this hidden process satisfies the Markov property 4. The hidden state variable is discrete: St can take on K (integer) values • Given the value of St-1 the current state St is independent of all the states prior to t-1 • Three Fundamental Problems for HMMs 1. Evaluation: Given the model parameters and observed data, calculate the likelihood of the data 2. Decoding: Given the model parameters and observed data, estimate the optimal sequence of hidden states 3. Learning: Given the observed data, estimate the model parameters • Joint distribution of a sequence of states and observations: : , : = ∏ ( ( | ) Generalised Framework: Hidden Markov Model for Approximating Transition States of the Allergic March • Bayesian Approach: Learning starts with some a priori knowledge about the model structure and model parameters • Represented in the form of a prior probability distribution over model structures and parameters • Updated using the data to obtain a posterior probability distribution over models and parameters • Assume a single model structure M and estimate the parameter θ that maximises the likelihood P(D|θ,M) under that model • Assume uniform Dirichlet prior distribution with equal probability assignment to each latent class • Assume Uniform Dirichlet Prior: Dir(1/m,…,1/m) for the observation matrix • Assume Uniform Dirichlet Prior: Dir(1/k,…,1/k) for the latent state membership public ClusterSimpleChain(int numYears) { … probState0 = Variable.Array<double>(k).Named("probState0"); probState0Prior = Variable.Array<Beta>(k).Named("probState0Prior"); probState0[k] = Variable<double>.Random(probState0Prior[k]); for (int y = 0; y < numYears; y++) { ... #if clusterQ Q_T[y] = Variable.Array(Variable.Array<double>(s), k).Named("Q_T" + y); Q_F[y] = Variable.Array(Variable.Array<double>(s), k).Named("Q_F" + y); QTPriorArr[y] = Variable.Array(Variable.Array<Beta>(s), k).Named("QTPriorArr" + y); QFPriorArr[y] = Variable.Array(Variable.Array<Beta>(s), k).Named("QFPriorArr" + y); Q_T[y][k][s] = Variable<double>.Random(QTPrior[y][k][s]); Q_F[y][k][s] = Variable<double>.Random(QFPrior[y][k][s]); #else Q_T[y] = Variable.Array<double>(s).Named("Q_T" + y); Q_F[y] = Variable.Array<double>(s).Named("Q_F" + y); QTPriorArr[y] = Variable.Array<Beta>(s).Named("QTPriorArr" + y); QFPriorArr[y] = Variable.Array<Beta>(s).Named("QFPriorArr" + y); Q_T[y][s] = Variable<double>.Random(QTPriorArr[y][s]); Q_F[y][s] = Variable<double>.Random(QFPriorArr[y][s]); #endif … } Hidden Markov Model 1: Eczema, Wheeze and Rhinitis Independent Disease Profiles Eczema Class Age 1 Eczema State Age 3 Eczema State Age 5 Eczema State Age 8 Eczema State Age 11 Eczema State Wheeze Class Age 1 Wheeze State Age 3 Wheeze State Age 5 Wheeze State Age 8 Wheeze State Age 11 Wheeze State Rhinitis Class Age 1 Rhinitis State Age 3 Rhinitis State Age 5 Rhinitis State Age 8 Rhinitis State Age 11 Rhinitis State Children (n=9801) Latent Class Disease Profile Hidden Markov Model 2: Eczema, Wheeze and Rhinitis follow an “Allergic March” Profile Eczema Class Age 1 Eczema State Age 3 Eczema State Age 5 Eczema State Age 8 Eczema State Age 11 Eczema State Wheeze Class Age 1 Wheeze State Age 3 Wheeze State Age 5 Wheeze State Age 8 Wheeze State Age 11 Wheeze State Rhinitis Class Age 1 Rhinitis State Age 3 Rhinitis State Age 5 Rhinitis State Age 8 Rhinitis State Age 11 Rhinitis State Children (n=9801) Latent Class Disease Profile Model 3: Individual-level Longitudinal Latent Disease Profile Latent State Age 1 Latent State Age 3 Latent State Age 5 Latent State Age 8 Latent State Age 11 Class = 1,….,k Eczema Age 1 Eczema Age 3 Eczema Age 5 Eczema Age 8 Eczema Age 11 Wheeze Age 1 Wheeze Age 3 Wheeze Age 5 Wheeze Age 8 Wheeze Age 11 Rhinitis Age 1 Rhinitis Age 3 Rhinitis Age 5 Rhinitis Age 8 Rhinitis Age 11 Children (n=9801) Latent Class Disease Profile Myth Bust by Learning from Data MRC STELAR consortium working at scale across MAAS and ALSPACS cohorts From: Belgrave et al. Developmental Profiles of Eczema, Wheeze, and Rhinitis: Two Population-Based Birth Cohort Studies. PlosMedicine 2014 Investigating Model Sensitivity to Priors Model evidence for different numbers of inferred latent classes with different priors. Priors where the number of pseudo-counts is set from 1/k up to 2 to investigate whether setting different priors influences model evidence. Table of Model Evidence Number of Inferred Classes Prior on the number of pseudo-counts 2 3 4 5 6 7 8 9 1/k -50177 -49030 -48297 -47774 -47367 -47130 -46989 -47109* 2/k -50200 -49104 -48310 -47797 -47357 -47143 -46994 -47334* 1 -49920 -48448 -47506 -46930 -46845 -46658 -46503 -46424* 2 -49920 -48448 -47506 -46930 -46845 -46733 -46596 -46431* Limitations of HMMs • Exact inference in these generalisations is usually intractable • Can use approximate inference algorithms such as Markov chain sampling and variational methods • M time points with K outcomes, then there are KM possible states of the system a disease state • HMM would require KM distinct states to model this system with K2M parameters in the transition matrix • Need to constrain parameters to avoid overfitting • Can be computationally expensive Conclusions 1. Discoveries about asthma and allergic disease, not hypothesised a priori, have been made by experts explaining structure learned from data by algorithms tuned by those experts 2. Heuristic blend of biostatistics and machine-learning reveals more than either method individually 3. Data-complete epidemiology requires persistent integration of • Data • Methods • Expertise Thank You Professor Adnan Custovic Professor Angela Simpson Professor Iain Buchan Professor Christopher Bishop Professor John Henderson Dr Raquel Granell Dr John Guiver Dr John Winn Dr Phillip Couch Mike Porter The eLab Dev Team MRC Career Development Award in Biostatistics M e st e r c h a n Ast h m a a nd Al le rg y d y St u
© Copyright 2026 Paperzz