Computing fragmentation trees
from tandem mass
spectrometry data
Florian Rasche1, Aleš Svatoš2, Ravi Kumar
Maddula2, Christoph Böttcher3 & Sebastian
Böcker1*
1Chair for Bioinformatics, Friedrich-SchillerUniversity Jena, Ernst-Abbe-Platz 2, D-07743
Jena, Germany
The Crux
• Mass Spec small molecules depends on spectral
library search
• What about unknown compounds?
• Proposed solution
▫ At least annotate the MS2 peaks as something.
Data
instrument
compoun
ds
used
35,45,55 yes
,70
37
37
API
20
QSTAR(16)
15,25,45 yes
,55,90e
42
Micromass
QTOF(11)
10,20,3
0,40,50
102
Orbitrap
ppma
CID (eV)
5
20
IPb
no
mass range
media
n
average
152.0−822.4
298.1
345.2
42
89.0−441.2
174.6
207.5
100
137.1−609.3
357.7
372.5
(Left) Fragmentation graph for (S,R)-noscapine (C22H23NO7) using Orbitrap data. Nodes of the same color correspond to
annotations of one measured peak (m/z, intensity, and collision energies). Arcs correspond to potential neutral losses. The
weight of arcs is encoded by different line types. NLs can be computed by subtracting molecular formulas for end node and start
node. Right: The corresponding hypothetical fragmentation tree of noscapine computed by our method. Nodes (blue)
correspond to peaks in the tandem mass spectra and their annotated molecular formula (CE is range of collision energies); arcs
(red) correspond to hypothetical neutral losses.
Published in: Florian Rasche; Aleš Svatoš ; Ravi Kumar Maddula; Christoph Bö ttcher; Sebastian Bö cker; Anal. Chem. Article ASAP
DOI: 10.1021/ac101825k
Copyright © 2011 American Chemical Society
Construction of graph
• Properties
▫ Each vertex is a molecular formula associated with
a peak.
▫ A vertex color indicates a peak.
▫ A directed edge (neutral loss) u->v implies v is a
fragment of u
• Weighting (real serious math here)
▫
▫
mass error
])
3
mass neutral loss
w(euv ) log( see ref ) log( / n) log( 1
)
parent mass
w(u ) I p log( Gauss[m p m,
• Goal:
▫ Find a “colorful” tree with maximal score.
Generating Fragmentation Tree
• Given a directed acyclic G(V,E), a set of colors
C where c(u) \in C, and edge weights w(u,v)
where u,v \in V.
• Output a directed tree with maximum edge
weight sum and is “colorful”.
▫ NP-Hard
▫ Heuristics were bad.
Dynamic Programming Solution
• Find the maximum score of the subtree rooted at
v using the color set S, where S \subset C.
max
F
(
u
,
S
\
{
c
(
v
)})
w
(
v
,
u
)
uV ,c (u )S \{c ( v )}
F (v, S ) max
max
F
(
v
,
S
)
F
(
v
,
S
)
1
2
S1 S2 {c ( v )}, S S1 S2
• They don’t specify, but “efficient runtime” looks
like
▫ O(|V|2^|C|)?
Results
• MS1 – mostly correct id of chemical formula
• Evaluation against Expert Knowledge and MSn
▫ Checked if the Neutral Losses were consistent with
expert expectations
Orbitrap : 76.9% “correct”, 12.4% “unsure”, 10.7%
“wrong”
▫ Analyzed fragmentation trees generated by Greedy
solution (pointless)
• Evaluation against Mass Frontier (predicts spectrum
based on molecular structure)
▫ FragTrees annotated 4x more
▫ 97% agreement of peak annotation overlap (p-value
10^-167)
• Comparing Fragmentation Trees
▫ Eyeballing.
Critiques?
• Not very systematic in the analysis
• They describe useless bits in the paper
• Are fragmentation trees useful?
© Copyright 2025 Paperzz