GlyDeR Analysis and Validation
1. Introduction
2. Methods
2.1. Defining microbial genome-specific Glycan Degradation (GlyDeR) scores.
After building the CAZyme table (see The Construction of GlyDeR 2.2.
Construction of the CAZyme table) I associated these CAZymes with the
genomes of the HMP taxa. For every taxon-specific gene I calculated, based
on its enzymatic annotation and those enzymes' subcellular localization, a
GlyDeR score. Given a bacterial taxon i and a glycan j, the GlyDeR score is
calculated as follows:
πΊππ¦π·π π ππππππ = β
πππ
, β ππ
ππ
where ek is an enzyme that can degrade glycan j, nik is the number of genes
in its genome which translate to enzyme ek, and gk is the number of glycans
broken by enzyme ek. This metric decrements the contribution of CAZymes
that are more promiscuous versus those specifically geared to degrade the
glycan in question (Figure 1).
For some categories of glycans such as "Long Polysaccharides" and "Plantspecific glycans", we defined a category specific score GSic, which is the sum
of GlyDeR scores for glycans that belong in that group:
πΊππ¦π·π π ππππππ = β πΊπππ , β π β π
where GSij is the GlyDeR score for genome i and glycan j, and c is the
collection of glycans that belong to category C. This scoring system has the
feature that summing the GlyDeR scores over all the glycans in a given
genome gives the total number of CAZymes in the genome:
Total CAZymesπ = β πΊπππ , β π β π½
where j
J is the set of all glycans, and i is the index of a specific taxon.
Subsequently, the GlyDeR Profile of a bacterial taxon is defined as:
πΊππ¦π·π ππππππππ = {πΊππ1 , πΊππ2 , β¦ , πΊπππβ1 , πΊπππ }
where j
J is the set of all glycans, and i is the index of a specific taxon.
3. Results
3.1. GlyDeR Usage
Microbial (meta-) genomes were annotated for CAZymes using BLAST (1) against
three reference databases: The Carbohydrate Active enZymes (CAZy) Database
(2), the Seed - RAST annotation (3), and KEGG (4) (see The Construction of
GlyDeR 2.1.5. CAZyme annotation). Then, CAZyme scores can be assigned to
genes and a (meta-) genome-specific GlyDeR score calculated for each glycan as
follows:
πΊππ¦π·ππ
π ππππππ = β
πππ
, β ππ
ππ
where ek is an enzyme that can degrade glycan j, nik is the number of genes in the
(meta-) genome which translate to enzyme ek, and gk is the number of glycans
broken by enzyme ek.
The GlyDeR score represents the predicted efficiency with which the glycan can
be degraded by that (meta-) genome, taking into account how many CAZymes
can degrade the glycan, and decrementing the score of promiscuous enzymes
with low specificities. For example, an organism containing three enzymes that
degrade maltotetraose, each of them degrading also four other glycans, would
have a GlyDe score of 3/5 (see two examples in Figure 1). The use of GlyDe is
captured in Figure 2.
3.2. Cross Validation
When applied to all the glycans available in KEGG the GlyDeR pipeline
produced a list of 114,573 intermediate glycan products, most of which were
novel and thus do not appear in the original KEGG database. To test how
consistent GlyDeR is I performed a cross-validation process where I picked a
random subset of 1,000 glycans from KEGG and applied GlyDeR to degrade
them. I then tested whether the products obtained from these 1000 glycans
were enriched with known versus novel intermediate glycans. A hyper
geometric test indicated that the products were highly enriched for known
glycans (p-value = 10-19; Figure 3). A sensitivity analysis with subsets of
different initial random sets and sizes still resulted in highly significant
enrichments (data not shown). This result testifies that GlyDeR is capable of
recapitulating the biochemical knowledge imprinted in the CAZymes that
constitute its computational foundation.
3.1 Examining individual genomes
To further validate GlyDeR I compared the predicted genome-specific GlyDeR
scores of each bacterial strain with the glycans that, according to KEGG, that
strain is able to break (KEGG glycans; Supplementary Table 2). Since the
above information from KEGG was not used to construct the set of GlyDeR
reactions a circular argument is avoided. I found a significantly higher mean
GlyDeR score for KEGG glycans across all strains compared to the mean
GlyDeR score for non-KEGG glycans (Figure 4). Notably, this analysis produced
GlyDeR scores for over a hundred-fold the number of unique glycan
degradation reactions that are currently reported in KEGG (116,388 vs. 1374),
highlighting the limited scope of glycan metabolism captured in the KEGG
database.
3.2 Experimental Validation
Given a bacterial species and a specific glycan the ultimate goal of the GlyDeR
platform is to be able to predict whether this species is capable of utilizing
the glycan for growth or not. Therefore, I sought to validate the experimental
glycan utilization data of two Bacteroides species published by Martens et al.
(5). After removing monosaccharides and replicates from the data 24 glycans
remained for which the growth rates of Bacteroides thetaiotaomicron and
Bacteroides ovatus where examined in vitro. Importantly, in each growth
experiment a single glycan was used as the sole carbon source in the
medium. The results from GlyDeR were able to accurately predict the
experimental growth rates (see Table below, p-val in brackets) which testifies
to the ability of the platform to predict glycan source utilization.
Species
Accuracy
Precision
Recall
Bacteroides
0.62 (0.04)
0.71 (0.33)
0.67 (0.001)
0.58 (0.03)
0.84 (0.31)
0.56 (0.001)
thetaiotaomicron
Bacteroides
ovatus
The full results of this analysis appear in Figure 5.
Figure 1: GlyDeR Score Calculation. Top: The organism has one enzyme (yellow
pacman) dedicated to the degradation of one glycan (purple), therefore the GlyDeR
score for the purple glycan equals 1. Bottom: The organism has two enzymes
capable of degrading 3 and 4 glycans respectively, therefore the GlyDeR score for the
purple glycan equals 7/12.
(Meta-) Genome sequence
BLAST against CAZy to infer CAZyme
(meta-) genome GlyDeRR score table (samples x
glycans scores matrix)
Figure 2: (meta-) genomes are annotated for CAZymes using CAZy, SEED and KEGG
databases, and using the CAZyme table a GlyDeR score can be calculated, reflecting
the capacity of a (meta-) genome to degrade a specific glycan. GlyDeR: Glycan
Degradation. CAZymes: Carbohydrate Active Enzymes .
Figure 3: A cross-validation process was performed which shows that GlyDeR
reaction products are enriched with existing glycans rather than intermediate
(hypothetical ) ones. The Venn diagram depicts a significant overlap between the
glycans created by GlyDeR (yellow) and the original glycans in KEGG Glycan (blue).
Average hyper geometric p-value < 10-19.
GlyDeR Score (log)
***
Degraded in KEGG
Not Degraded in KEGG
Figure 4: Distribution of genome-specific GlyDeR scores (y axis) for all the glycans in
KEGG. Glycans that are known to be broken by a certain organism according to KEGG
appear on the left and those that are not known appear on the right (studentβs
T=6.14, p<0.0001).
Figure 5: Experimental Validation of GlyDeR. The table compares between the
GlyDeR predictions and the experimental growth indicators of Bacteroides
thetaiotaomicron and Bacteroides ovatus. Green indicates a true (positive or
negative) prediction, while red indicates a false (positive or negative) prediction. Acc
β Accuracy score.
1.
2.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment
search tool. Journal of molecular biology 215:403-410.
Cantarel B, Coutinho P, Rancurel C, Bernard T, Lombard V, Henrissat B. 2009. The
Carbohydrate-Active EnZymes database (CAZy): an expert resource for
Glycogenomics. Nucleic acids research 37:8.
3.
4.
5.
Aziz R, Devoid S, Disz T, Edwards R, Henry C, Olsen G, Olson R, Overbeek R, Parrello
B, Pusch G, Stevens R, Vonstein V, Xia F. 2012. SEED servers: high-performance
access to the SEED genomes, annotations, and metabolic models. PloS one 7.
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. 1999. KEGG: Kyoto
Encyclopedia of Genes and Genomes. Nucleic Acids Res 27:29-34.
Martens E, Lowe E, Chiang H, Pudlo N, Wu M, McNulty N, Abbott D, Henrissat B,
Gilbert H, Bolam D, Gordon J. 2011. Recognition and degradation of plant cell wall
polysaccharides by two human gut symbionts. PLoS biology 9.
© Copyright 2026 Paperzz