Microbiome Data Integration and Biomarker Development Alexander V. Alekseyenko Biomedical Informatics Center Departments of Public Health Sciences and Oral Health Sciences Program for Human Microbiome Research Medical University of South Carolina Acknowledgments • Vanderbilt: ZhengZheng Tang, Guanhua Chen • UPenn: Hongzhe Li • NYU: Jiyoung Ahn, Richard Hayes, Martin Blaser, Gregg Silverman • UMN: Constantin Aliferis • American Express: Alexander Statnikov • Stanford: Susan Holmes • MUSC: Betsy Hill, Beth Wolf, Kristin Wallace, Diane Kamen, Sarah Taylor, Keith Kirkwood, Galina Bogatkevitch, Lisa Steed, Michael Schmidt, Cassy Salgado, Wei Jiang, Charlie Strange, Bob Wilson, Jihad Obeid 2 3 4 High-dimensional host-microbiome characteristics Psycho-physiological assessment Tissues specific gene expression Clinical and health record data Immune cell populations Epigenetic variation Somatic variants Genetic variants Nutrition Bacterial, viral, and fungal composition and abundance Metagenomic composition Phylogeographic data Phylogenetic data Taxonomic data Metaproteome Metabolome Glycome HEALTHY? 5 Informatics Assay Upstream microbiome research pipeline S-1 S-2 … S-16 OTU-1 OTU-2 OTU-3 OTU-4 OTU-5 …. OTU-10,000 6 ‘Simple’ strategy for integrating panomic data HEALTHY? 7 Main challenge in microbiome analysis: curiosity. HEALTHY? 8 Supervised Unsupervised F *** Residuals Total Df 2 SumsOfSqs 3.1928 MeanSqs 1.5964 F.Model 5.9236 27 29 7.2765 10.4693 0.2695 0.69503 R2 0.30497 1.00000 Pr(>F) 0.001 Feature selection Single-omics analytic tasks √ √ √ 9 Microbiome analysis pipeline: Specimens Sequence Abundance Matrix Distances Statistics* Paper S-1 S-2 … S-16 0.54 0.54 0.34 OTU-1 OTU-2 OTU-3 OTU-4 0.60 0.17 0.10 0.01 0.75 0.17 0.72 0.24 0.34 0.74 0.89 0.58 0.08 0.43 0.82 0.91 0.69 0.64 0.57 0.06 0.59 0.03 0.69 0.89 0.07 0.88 0.68 0.59 0.49 0.19 0.64 0.54 0.26 0.24 0.71 0.58 0.52 0.93 0.94 0.46 0.70 0.30 OTU-5 0.92 0.46 0.17 0.21 0.17 0.02 0.81 0.22 0.45 0.59 0.51 0.23 0.74 0.36 0.15 1.00 0.95 0.71 0.88 0.94 0.65 …. 0.29 0.30 0.75 0.28 0.04 0.45 0.15 0.44 0.59 0.89 0.07 0.19 0.57 0.63 0.27 0.72 0.34 0.70 0.19 0.17 0.85 0.07 0.97 0.24 0.45 OTU-10,000 0.95 0.53 0.86 0.01 0.10 0.63 0.40 0.10 0.51 1.00 0.20 0.41 0.11 0.14 0.83 0.43 0.12 0.39 0.43 0.87 0.42 0.17 0.62 0.57 0.03 0.58 0.54 0.48 0.83 Visualization (Principal coordinates analysis) Statistical hypothesis testing (PERMANOVA) *This step is not strictly necessary and can sometimes be omitted without change in results. 10 11 Causal learning is the optimal solution to feature selection problem A B D P2 P1 S T C1 • Optimal features are contained in the Markov Blanket of the target variable. • Parent-child set usually as predictive and easier to discover computationally C2 12 Omnibus methods for assessing mediation relationships. 13 Distance-based analytics allow for inference of panomic relationships. 14 Discovering interacting subsystems 15 16
© Copyright 2026 Paperzz