Slides - Medical University of South Carolina

Microbiome Data Integration
and Biomarker Development
Alexander V. Alekseyenko
Biomedical Informatics Center
Departments of Public Health Sciences and Oral Health Sciences
Program for Human Microbiome Research
Medical University of South Carolina
Acknowledgments
• Vanderbilt: ZhengZheng Tang, Guanhua Chen
• UPenn: Hongzhe Li
• NYU: Jiyoung Ahn, Richard Hayes, Martin Blaser, Gregg Silverman
• UMN: Constantin Aliferis
• American Express: Alexander Statnikov
• Stanford: Susan Holmes
• MUSC: Betsy Hill, Beth Wolf, Kristin Wallace, Diane Kamen, Sarah
Taylor, Keith Kirkwood, Galina Bogatkevitch, Lisa Steed, Michael
Schmidt, Cassy Salgado, Wei Jiang, Charlie Strange, Bob Wilson, Jihad
Obeid
2
3
4
High-dimensional host-microbiome characteristics
Psycho-physiological assessment
Tissues specific gene expression
Clinical and health record data
Immune cell populations
Epigenetic variation
Somatic variants
Genetic variants
Nutrition
Bacterial, viral, and fungal composition and
abundance
Metagenomic composition
Phylogeographic data
Phylogenetic data
Taxonomic data
Metaproteome
Metabolome
Glycome
HEALTHY?
5
Informatics
Assay
Upstream microbiome research pipeline
S-1
S-2
…
S-16
OTU-1
OTU-2
OTU-3
OTU-4
OTU-5
….
OTU-10,000
6
‘Simple’ strategy for integrating panomic data
HEALTHY?
7
Main challenge in microbiome analysis: curiosity.
HEALTHY?
8
Supervised
Unsupervised
F
***
Residuals
Total
Df
2
SumsOfSqs
3.1928
MeanSqs
1.5964
F.Model
5.9236
27
29
7.2765
10.4693
0.2695
0.69503
R2
0.30497
1.00000
Pr(>F)
0.001
Feature
selection
Single-omics
analytic tasks
√
√
√
9
Microbiome analysis pipeline:
Specimens  Sequence  Abundance Matrix 
Distances  Statistics* Paper
S-1
S-2
…
S-16
0.54
0.54 0.34
OTU-1
OTU-2
OTU-3
OTU-4
0.60 0.17 0.10
0.01 0.75 0.17 0.72
0.24 0.34 0.74 0.89 0.58
0.08 0.43 0.82 0.91 0.69 0.64
0.57 0.06 0.59 0.03 0.69 0.89 0.07
0.88 0.68 0.59 0.49 0.19 0.64 0.54 0.26
0.24 0.71 0.58 0.52 0.93 0.94 0.46 0.70 0.30
OTU-5
0.92 0.46 0.17 0.21 0.17 0.02 0.81 0.22 0.45 0.59
0.51 0.23 0.74 0.36 0.15 1.00 0.95 0.71 0.88 0.94 0.65
….
0.29 0.30 0.75 0.28 0.04 0.45 0.15 0.44 0.59 0.89 0.07 0.19
0.57 0.63 0.27 0.72 0.34 0.70 0.19 0.17 0.85 0.07 0.97 0.24 0.45
OTU-10,000
0.95 0.53 0.86 0.01 0.10 0.63 0.40 0.10 0.51 1.00 0.20 0.41 0.11 0.14
0.83 0.43 0.12 0.39 0.43 0.87 0.42 0.17 0.62 0.57 0.03 0.58 0.54 0.48 0.83
Visualization (Principal coordinates analysis)
Statistical hypothesis testing (PERMANOVA)
*This
step is not strictly necessary and can sometimes be omitted without change in results.
10
11
Causal learning is the optimal solution to
feature selection problem
A
B
D
P2
P1
S
T
C1
• Optimal features are contained in the
Markov Blanket of the target variable.
• Parent-child set usually as predictive
and easier to discover computationally
C2
12
Omnibus methods for assessing mediation
relationships.
13
Distance-based analytics allow for inference
of panomic relationships.
14
Discovering interacting subsystems
15
16