Gateway node analysis of gene expression in the diauxic shift of

Gateway node analysis of gene
expression in the diauxic shift of
Saccharomyces cerevisiae
Emily Pachunka ● Spring 2017
Motivation
DNA
RNA
Protein
Function
• High-throughput assays result in large volumes of biological data
• Many data remain unanalyzed:
• Lack of user-friendly and efficient software
• Lack of standard protocol for data modeling and pattern discovery
• Network modeling is a popular approach
Network Modeling
• Network  graph of nodes and edges
• Nodes represent some object/entity
• Edges represent relationships
A
B
E
• Correlation network
• Nodes  genes
• Edges  correlations or co-expressions
D
C
• Incorporate principles of graph theory for
analysis and pattern recognition
F
G
H
Pairwise Correlation Networks
ID
Replicate 1
Replicate 2
Replicate 3
A
1.00
3.00
6.00
B
2.00
4.00
7.00
C
4.00
2.00
3.00
B
A
A to B
C
7
8
7
6
6
5
5
4
4
A
3
3
D
B
2
2
1
1
0
Replicate 1
C
Replicate 2
Replicate 3
Research Aims
• Gateway node analysis
• Uses correlation networks
• Prediction tool
• Gateway node  gene predicted to be
co-regulated in two distinct cellular
states
• Biomedical and research applications
Dempsey K., Ali H. (2014).
Strengthen
the
validity
of
gateway
The goal:
node analysis as a prediction tool
Yeast Quiescence
• Saccharomyces cerevisiae  yeast
• Quiescent = dormancy
• Induced by cell stress
• Low-nutrient environments
• Nonquiescent = active
• Well-studied cellular state/process
• Literature-supported genes involved in cell quiescence
Methodology Overview
Methods – data collection
Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/)
• Example queries:
•
•
•
•
•
•
“quiescent”
“nonquiescent”
“G0”
“Diauxic shift”
“Stationary”
“Relaxed”
• Criteria:
• Saccharomyces cerevisiae
• WT strains only
• At least 3 replicates
GSE8559  S288C quiescent and nonquiescent
GSE8542  BY4742 quiescent and nonquiescent
GSE55508  Time series: t0, t1, t2, and t3
Methods – network creation and validation
• Network creation:
• Pairwise Pearson correlation (⍴ <=
|0.7-1.0|, p-val <0.05)
• Visualized networks using Cytoscape
• Duplicate edges and self loops
removed
• Network validation:
• Examined degree distribution in R
• Kept if degree distribution followed
power law
Methods – data validation
• Data sets:
• GSE8559  S288C quiescent and nonquiescent (10 replicates each)
• GSE8542  BY4742 quiescent and nonquiescent (10 replicates each)
• GSE55508  Time series: t0, t1, t2, and t3 (3 replicates each)
• Four networks:
•
•
•
•
S288C_quiescent
S288C_nonquiescent
BY4742_quiescent
BY4742_nonquiscent
Methods - clustering
• Clustering  finding dense groups of genes
within the network
• MCODE (in Cytoscape)
• Performed for each network (4 total)
• Kept clusters with density above 80%
• Kept network interactions within these clusters
A
B
E
D
C
• WCGNA (in R) – Unable to extract clusters
F
G
H
Methods – gateway node analysis
• Gateway node  gene that connects clusters of each cellular state
1. Create networks from MCODE clusters
2. Merge nonquiescent and quiescent networks
3. Identify those genes that connect clusters from nonquiescent and
quiescent states
Key:
= nonquiescent
= gateway node
= quiescent
Results
• BY4742:
• SNZ1  member of stationary phaseinduced gene family
• S288C:
• HSP104  heat shock protein;
responsive to stresses
• FES1  heat shock protein exchange
factor
• HSP150  required for cell wall
stability
Discussion
• Predicted several genes involved in cellular shift from nonquiescent to
quiescent state
• Acquired a better understanding of GEO data sets, clustering
algorithms, etc
• Future Directions:
• Perform gene ontology enrichment analysis
• Assess differences in gateway nodes predicted by altering threholds
• Apply the methodology to other model pathways/organisms
Gateway node analysis could be used as a “first-step” modeling tool
Bibliography
1. Dempsey K., Ali H. (2014). Identifying aging-related genes in mouse hippocampus using gateway nodes. BMC
Systems Biology 8. Available at http://www.biomedcentral.com/1752-0509/8/62. Accessed September 23,
2015.
2. Pavlopoulous G., Secrier M., Moschopoulos C., Soldatos T., Kossida S., Aerts J., Schneider R., Bagos P. (2011).
“Using graph theory to analyze biological networks.” BioMed Central 4(10). Available at
http://biodatamining.biomedcentral.com/articles/10.1186/1756-0381-4-10. Accessed February 8, 2016
3. Horvath, S. (2011). Weighted network analysis: Applications in genomics and systems biology. New York:
Springer.
4. Gray J., Petsko G., Johnston G., Ringe D., Singer R., Werner-Washburne M. (2004). “Sleeping beauty:
Quiescence in Saccharomyces cerevisiae.” Microbiology and Molecular Biology Reviews 68: 187-206.
5. Bader G., Hogue C. (2003). An automated method for finding molecular complexes in large protein
interaction networks. BCM Bioinformatics 4. Available at http://www.biomedcentral.com/1471-2105/4/2.
Accessed September 24, 2015.
6. Langfelder P., Horvath S. (2008). “WGCNA: an R package for weighted correlation network analysis.” BMC
Bioinformatics 9 (559). Available http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9559.
Thank you!
Any questions or comments?