Supplement

Supplement
We compared a number of naïve and model-based co-occurrence profiling methods
based on a set of orthologous groups for 25 fungal species and the microsporidium
Encephalitozoon cuniculi (Supplement Figure S1). As benchmarking data we used
functional associations for Saccharomyces cerevisiae from the MIPS (Mewes et al.
2006) and KEGG (Kanehisa et al. 2004) databases. The fungi do represent the most
densely sampled eukaryotic kingdom and their phylogenetic closeness allows for a
high resolution orthology prediction. Nevertheless this is a phylogenetically and
quantitatively limited dataset that we use to illustrate some observations that have also
been made independently in the literature. Due to the strong overrepresentation of
some complexes, such as the ribosome, in the MIPS dataset we used a bootstrapping
approach that weighs each complex equally. The same procedure was applied to the
KEGG dataset. We quantified the performance on the benchmarking datasets by two
measures: the area under the ROC curve (AUC) and the positive predictive value
(PPV). The AUC is a measure of overall performance that is independent of a specific
cut-off value. The PPV is the fraction of positive controls among the positive
predictions and has thus an intuitive interpretation. It has the disadvantage that, in
contrast to the AUC, its height depends on the fraction of positive and negative
controls in the benchmarking dataset – a fact that has not been accounted for in some
studies (e.g. (Snitkin et al. 2006; Wu et al. 2006)). To account for this here, for the
PPV we sampled the same number of positive and negative controls for each
bootstrap sample and thus get a controlled ratio 1:1. We use the fraction of positive
controls among the first 5% of positive predictions by each method (PPV0.05).
Bootstrapping
Normal bootstrapping: For each control dataset (MIPS, KEGG) the areas under the
ROC curve were calculated for 100 uniform random samples with replacement from
the pairs of orthologous group. The ratio of positive and negative controls in the
bootstrap samples was fixed to the ratio in the original data. The positive predictive
values were calculated for 100 uniform random samples with replacement with equal
probabilities to sample a positive or negative control. This was done to achieve a 1:1
ratio of positive controls to negative controls.
Weighted bootstrapping: The members of each complex/pathway were assigned the
weight 1/(complex size). Proteins shared between complexes/pathways were assigned
the average of the per complex/pathway weights. The negative controls were given
the same weight as the sum of the weights of the positive controls. Hence, positive to
negative controls were drawn with the same probability and among the positive
controls each complex/pathway was drawn with the same probability.
Figure S1. The phylogenetic tree used for tree-guided methods, differential parsimony, tree-kernel
method and maximum likelihood method. Numbers indicate the bootstrap support for the
corresponding branches.
Table S1. The numbers or pairs of orthologous groups in the MIPS and KEGG positive and negative
controls. The analyses used the remaining orthologous group pairs after filtering out anti-correlating
pairs and pairs referring to pan-orthologous groups.
positive
negative
full
anti-correlating
pan-orthologs
remaining
full
anti-correlating
pan-orthologs
remaining
MIPS
6225
1458
134
4633
9406
2984
240
6182
KEGG
16907
4154
524
12229
41343
9504
1212
30627
Figure S2. Bootstrap distributions (n=100) of area under the ROC curves (AUC) and positive
predictive value of the first 5% of the predictions (PPV0.05) for MIPS and KEGG datasets. Boxes:
inter-quartiles range; whiskers: extend up to 1.5 times the box width; circles: points farther than 1.5
times away from the boxes. In comparison to AUC, the spread of the PPV estimate is very large, in
particular for the normal bootstraps. The differences between methods are small in comparison to the
spread for normal bootstraps on the MIPS dataset.
About 40% of orthologous group pairs in the MIPS dataset belong to the cytoplasmic
or mitochondrial ribosomes and it is thus biased towards these functional classes. In
order to prevent that our performance estimates are mainly determined by few large
complexes or pathways, we use a ‘weighted’ bootstrapping approach that chooses
pairs from each functional category with the same probability. For example,
differential parsimony had a lower AUC then any other method on the ‘normal’ MIPS
dataset (Figure S2). Similarly, the Fisher’s exact test had a higher PPV0.05 then any
other method on this dataset. These outlier behaviours can be explained by the
mitochondrial and cytoplasmic large ribosomal subunits, the two largest complexes
that make up about 30% of the pairs of orthologous groups (Figure S3).
Figure S3. Receiver operator curves (ROC) for cytoplasmic and mitochondrial large ribosomal subunit. Differential parsimony performs worse than any other method for both large complexes. In
contrast, Fisher’s test based on the Dollo-parsimony gene losses outperforms all other methods on the
cytoplasmic large subunit. This corresponds to the worse than average AUC of differential parsimony
and a better than average overall PPV0.05 of Fisher’s exact test (Figure S2).
References
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. & Hattori, M. 2004 The KEGG
resource for deciphering the genome. Nucleic Acids Research 32, D277-D280.
Mewes, H. W., Frishman, D., Mayer, K. F., Munsterkotter, M., Noubibou, O., Pagel,
P., Rattei, T., Oesterheld, M., Ruepp, A. & Stumpflen, V. 2006 MIPS:
analysis and annotation of proteins from whole genomes in 2005. Nucleic
Acids Res 34, D169-72.
Snitkin, E. S., Gustafson, A. M., Mellor, J., Wu, J. & DeLisi, C. 2006 Comparative
assessment of performance and genome dependence among phylogenetic
profiling methods. BMC Bioinformatics 7, 420.
Wu, J., Hu, Z. & DeLisi, C. 2006 Gene annotation and network inference by
phylogenetic profiling. BMC Bioinformatics 7, 80.