Assigning Adduct and Charge States to High

Assigning Adduct and Charge States to Highresolution Accurate-mass Mass Spectral Data
Using Frequency of Assignment in Multiple
Difference Networks
Thomas D. McClure, Matthew D. Kump, Michael Athanas
Thermo Fisher Scientific, San Jose, California
Overview
Results
Methods: A graph theory based difference networks are used for determining adduct
and charge states for component detection analysis of mass spectrometry data.
We verified the operation and
using a synthetically generated
(MH+, M+Na+, and M+NH4+).
isotopes. The two component
Results: This work shows that difference networks provide accurate results when
applied to complicated collections of adduct and charge states in a mass spectrum.
Synthetic Example 1
Purpose: Using a graph-theory based approach to determine adduct and neutral loss
species within a mass spectrum as a part of a small molecule component detection
workflow.
Algorithm Verification Using
Introduction
Liquid or gas chromatography coupled with mass spectrometry, has been
demonstrated to be a powerful tool for characterizing small molecules in biological
samples. Often the goal is to understand differences in the types and amounts of these
small molecules in metabolomic studies, metabolism studies, and virtually any
approach involving a complex matrix where untargeted profile information is desired.
The data sets from these experiments are usually large and complex. Such complexity
results, in no small part, from more than one signal for each compound detected.
These multiple signals per compound come from the formation of multiple charge
states, in-source neutral-loss fragmentations, chemical adducts, and the formation of
gas-phase polymers. In addition, each of these species also produces a number of
isotope signals.
In this presentation, we will demonstrate the utility of applying a graph-theory based
algorithm for reducing complexity in an LC-MS experiment. Graph-theory mathematics
is not a tool in the analysis of MS data. It has been used for analyzing mass
spectrometric data in a number of applications including: de novo peptide
sequencing1,2, isotope assignment, protein identification and quantification. We will use
this algorithm to properly assign and group signals arising from the presence of
multiple signals from chemical adducts, in-source neutral-loss fragmentations and
adduct related multiple charge states for individual compounds in a mixture of
compounds.
C19H43N5
342.35912
4.90
A
M
Methods
Synthetic Data Creation for Algorithm Verification
An LC-MS data set was created containing thirty compounds each with three adducts
([M+H], [M+Na], and [M+NH4]) as well as the isotopes for each of these species. A
mass sorted peak list was generated from the LC-MS data file described above using
software that produces extracted ion current chromatograms which are then evaluated
using parameter-less peak detection.
Synthetic Example 2
Amino Acid Mixture Sample Preparation
Commercially available standard mixture was obtained using a Thermo Scientific™
Pierce™ Amino Acid Standard H, P/N 20088. The concentration of the amino acids in
this mixture was 2.5 µmol/ml, except cysteine at 1.25 µmol/ ml. A diluted stock solution
was prepared using 100 µl of Amino Acid Standard H + 900 µl of water. The final
sample was then prepared by diluting the stock in a dilution series to a final dilution of
1:1000 with HPLC-grade water. 2 µl was directly injected on to the HPLC column.
HPLC Conditions
A Thermo Scientific™ Dionex™ UltiMate™ 3000 RSLC system was used with a
Thermo Scientific™ Hypersil GOLD™ HPLC column (150 × 2.1, 1.9 um, P/N 25002152130). The HPLC solvents were: A – 0.1% formic acid in water; B – 0.1% formic acid
in methanol. The elution gradient was: 0.5% B to 55% B in 5.5 min, 50% B to 98% B in
0.5 min, hold 98% B for 6 min. The flow rate was 450 µl/min and the column was
heated to 55° C.
C45H88N4O
701.70310
4.65
A
M
M
MS Conditions
A Thermo Scientific™ Q Exactive™ mass spectrometer with a Thermo Scientific™
HESI-II source was used with the following gas settings: sheath was 45 and the aux
gas was set to 8. The spray voltage was 3.8 kV, and the capillary temperature was
320° C. The HESI Heater temperature was 350° C. The mass analyzer had the
following settings: positive polarity, full MS: 67–1000 AMU, AGC was set to 3+E6, the
resolution was 70,000, and the maximum ion injection time was 100 ms.
M
In both cases, the adduct and
2 Assigning Adduct and Charge States to High-resolution Accurate-mass Mass Spectral Data Using Frequency of Assignment in Multiple Difference Networks
ermine adduct and neutral loss
lecule component detection
Results
Analysis of a Dilute Amino Acid Mi
Algorithm Verification Using Synthetic Data
e used for determining adduct
mass spectrometry data.
We verified the operation and accuracy of the assignments by the difference network
using a synthetically generated raw file with 30 components, each having three adducts
(MH+, M+Na+, and M+NH4+). Each adduct had a minimum of three, and usually four,
isotopes. The two component examples are shown below.
vide accurate results when
states in a mass spectrum.
Synthetic Example 1
rometry, has been
mall molecules in biological
the types and amounts of these
udies, and virtually any
profile information is desired.
and complex. Such complexity
each compound detected.
rmation of multiple charge
adducts, and the formation of
also produces a number of
plying a graph-theory based
ent. Graph-theory mathematics
d for analyzing mass
g: de novo peptide
n and quantification. We will use
ng from the presence of
al-loss fragmentations and
pounds in a mixture of
ounds each with three adducts
or each of these species. A
ata file described above using
rams which are then evaluated
with a Thermo Scientific™
: sheath was 45 and the aux
e capillary temperature was
mass analyzer had the
MU, AGC was set to 3+E6, the
me was 100 ms.
RT(min) Amino Acid
0.52
Lysine
0.54
Histidine
0.55
Arginine
0.56
Cystine
0.57
Serine
0.58
Aspartic Acid
0.59
Alanine
0.61
Threonine
0.61
Glutamic Acid
RT(mi
0.6
0.9
1.1
1.4
1.7
1.8
2.3
M+H
M+H-NH3
C19H43N5
342.35912
4.90
Adduct
M+H
Charge
1
M+Na
1
M+NH4
1
Isotope
A0
A1
A2
A3
MZ
342.35912
343.36230
344.36543
345.36827
Intensity
669135
147786
15569
951
Peak Area
5867777
1295968
135995
7155
Isotope
A0
A1
A2
A3
MZ
364.34106
365.34420
366.34732
367.35016
Intensity
891390
197788
20891
1353
Peak Area
7816780
1734444
181364
10581
Isotope
A0
A1
A2
A3
MZ
359.38567
360.38875
361.39184
362.39456
Intensity
656801
147162
15687
1034
Peak Area
5759615
1290490
137002
7830
M+Na
m/z
147.11255
Apex RT
0.46
Adduct
M+H
C
M+H-NH3
M+Na+CH3CN
Synthetic Example 2
M+Na
M+H
using a Thermo Scientific™
entration of the amino acids in
mol/ ml. A diluted stock solution
900 µl of water. The final
tion series to a final dilution of
d on to the HPLC column.
system was used with a
50 × 2.1, 1.9 um, P/N 25002d in water; B – 0.1% formic acid
B in 5.5 min, 50% B to 98% B in
/min and the column was
The amino acid mixture previously de
algorithm. The following chromatogra
and their elution times.
M+H-NH3
m/z
182.08078
C45H88N4O
701.70310
4.65
Adduct
M+H
Charge
1
M+Na
1
M+NH4
1
Isotope
A0
A1
A2
A3
A4
MZ
701.70310
702.70633
703.70959
704.71275
705.71652
Intensity
586712
302768
77602
13145
1271
Peak Area
7854121
4052986
1038850
176007
17021
Isotope
A0
A1
A2
A3
A4
MZ
723.68501
724.68832
725.69154
726.69469
727.69848
Intensity
743036
383766
98287
16661
1611
Peak Area
9947609
5134197
1316105
223059
21570
Isotope
A0
A1
A2
A3
A4
MZ
718.72963
719.73287
720.73603
721.73925
722.74231
Intensity
778223
404727
104345
17783
1961
Peak Area
10421120
5414875
1396982
238128
24241
In both cases, the adduct and charge states are correctly assigned.
Apex RT
1.40
Adduct
M+H
Charge
1
M+H-NH3
1
M+Na
1
Analysis of a Complicated Mixture
In our final example, an unpublished
adducts in one mass spectrum. Of th
M+CaCOOH (z=1), M+MgCOOH (z=
2(H2O) (z=1), M+Ca+2(CH3CN) (z=2
M+Ca+H2O (z=2).
Notice there are both single and doub
Thermo Scientific Poster Note • PN-64093-ASMS-EN-0614S 3
Analysis of a Dilute Amino Acid Mixture
nments by the difference network
ponents, each having three adducts
minimum of three, and usually four,
below.
Intensity
669135
147786
15569
951
Peak Area
5867777
1295968
135995
7155
ope
0
1
2
3
MZ
364.34106
365.34420
366.34732
367.35016
Intensity
891390
197788
20891
1353
Peak Area
7816780
1734444
181364
10581
ope
0
1
2
3
MZ
359.38567
360.38875
361.39184
362.39456
Intensity
656801
147162
15687
1034
Peak Area
5759615
1290490
137002
7830
pe
RT(min)
0.69
0.95
1.10
Methionine
1.73
IsoLeucine
2.39
Phenylalanine
Leucine
1. Establishing a List of Known
Using the terms in equation 1, w
with combinations of charge carr
combine with M to form additiona
used to generate a combinatoria
2. Generating the Nodes for th
Nodes are generated by matchin
differences determined from mon
analyzed. The matching m/z valu
and an edge is drawn between th
and edges for the mass spectrum
M+Na+CH3CN
M+Na
Lysine
m/z
147.11255
Apex RT
0.46
Adduct
M+H
Charge
1
M+H-NH3
1
M+Na+CH3CN
1
M+Na
1
Isotope
A0
A1
A1
A2
MZ
147.11255
148.11589
148.10962
149.11687
Intensity
54321440
3604857
192584
457894
Peak Area
66288519
4250354
316075
2173745
Isotope
A0
A1
MZ
130.08606
131.08937
Intensity
10109494
812226
Peak Area
11013528
559401
Isotope
A0
MZ
210.12015
Intensity
195011
Peak Area
143738
Isotope
A0
A1
MZ
169.09444
170.09763
Intensity
2110288
114523
Peak Area
2464782
72849
M+H
MZ
701.70310
702.70633
703.70959
704.71275
705.71652
Intensity
586712
302768
77602
13145
1271
Peak Area
7854121
4052986
1038850
176007
17021
MZ
723.68501
724.68832
725.69154
726.69469
727.69848
Intensity
743036
383766
98287
16661
1611
Peak Area
9947609
5134197
1316105
223059
21570
MZ
718.72963
719.73287
720.73603
721.73925
722.74231
Intensity
778223
404727
104345
17783
1961
Peak Area
10421120
5414875
1396982
238128
24241
rectly assigned.
Apex RT
1.40
M+Na
Adduct
M+H
Charge
1
M+H-NH3
1
M+Na
1
Node 6
(M+H+Na+
H2O)++
174.62281
Tyrosine
Isotope
A0
A1
A2
A2
MZ
182.08078
183.08418
184.08751
184.08508
Intensity
67884942
7242606
306655
370365
Peak Area
130861879
11166390
349548
627524
Isotope
A0
A1
MZ
165.05429
166.05765
Intensity
7440865
753306
Peak Area
12655904
1231796
Isotope
A0
MZ
204.06277
Intensity
766400
Peak Area
1321692
Analysis of a Complicated Mixture of Adducts
In our final example, an unpublished study, we analyzed a sample that contained 13
adducts in one mass spectrum. Of these 13, we detected the following 10 species:
M+CaCOOH (z=1), M+MgCOOH (z=1), M+Fe-H (z=1), M+Na (z=1), M+H (z=1), M+H2(H2O) (z=1), M+Ca+2(CH3CN) (z=2), M+Ca+CH3CN (z=2), M+Mg+CH3CN (z=2),
M+Ca+H2O (z=2).
No
(M
308
Node 3
(M+NH4)+
325.24857
Node 2
(M+Na)+
330.20396
M+H-NH3
m/z
182.08078
We use the following formula to i
are present in each species bein
∆ m/z is the difference in mass-to
molecule, n is the gas-phase clu
the parent neutral molecule mas
by the adduct or adducts or neut
contributed to the species by the
the detected species.
Tyrosine
1.81
Adduct and Neutral-Loss Assi
Where:
Valine
1.41
Discussion
∆ m/z = (n1M1 + Ma1 + Mcc1) / Z1
Amino Acid
Proline
M+H-NH3
MZ
342.35912
343.36230
344.36543
345.36827
pe
RT(min) Amino Acid
0.52
Lysine
0.54
Histidine
0.55
Arginine
0.56
Cystine
0.57
Serine
0.58
Aspartic Acid
0.59
Alanine
0.61
Threonine
0.61
Glutamic Acid
M+H
ope
0
1
2
3
pe
The amino acid mixture previously described was analyzed using the difference network
algorithm. The following chromatogram is labeled with the amino acids in the mixture
and their elution times.
Node 1
(2M+H+
+H2O+A
)++
348.731
3. Edge Trimming and Assignm
Because of the large number of
Nodes with multiple assignments
(positive ion mode) or M–H (neg
factors to be applied to predicted
insight. The weighting factors ca
down on the ordered list.
4. Assignment of Species
The assignments of the charge c
then made according to final pos
possibilities are reported.
Notice there are both single and double charge states in this series of adducts.
4 Assigning Adduct and Charge States to High-resolution Accurate-mass Mass Spectral Data Using Frequency of Assignment in Multiple Difference Networks
alyzed using the difference network
h the amino acids in the mixture
Discussion
5. Overview
The current approach is able to ide
uncommon adducts in a complicat
approach is also able to identify sp
loss of water or ammonia, as show
Adduct and Neutral-Loss Assignment Algorithm
We use the following formula to identify the important mass carrying components that
are present in each species being detected by the mass spectrometer.
∆ m/z = (n1M1 + Ma1 + Mcc1) / Z1 – (n2M2 + Ma2 + Mcc2) / Z2
(1)
Where:
∆ m/z is the difference in mass-to-charge between two different species of the same
molecule, n is the gas-phase cluster or polymeric number for the base molecule, M is
the parent neutral molecule mass, Ma is the total mass that is contributed to the species
by the adduct or adducts or neutral-loss or losses, Mcc is the total mass that is
contributed to the species by the charge carrier or carriers, and Z is the total charge of
the detected species.
1. Establishing a List of Known Adducted Species
Using the terms in equation 1, we construct a table of candidate “modifying” species
with combinations of charge carriers, neutral adducts, and neutral losses that can
combine with M to form additional signals. These candidate “modifying” species are
used to generate a combinatorial table of mass differences.
2. Generating the Nodes for the Graph
Nodes are generated by matching differences in the list of candidate species with mass
differences determined from monoisotopic m/z values from the mass spectrum being
analyzed. The matching m/z values from the spectrum are added as nodes to the graph
and an edge is drawn between the two nodes. The result is a large number of nodes
and edges for the mass spectrum. This is illustrated below using hypothetical data.
CN
Lysine
MZ
47.11255
48.11589
48.10962
49.11687
Intensity
54321440
3604857
192584
457894
Peak Area
66288519
4250354
316075
2173745
MZ
30.08606
31.08937
Intensity
10109494
812226
Peak Area
11013528
559401
MZ
10.12015
Intensity
195011
Peak Area
143738
MZ
69.09444
70.09763
Intensity
2110288
114523
Peak Area
2464782
72849
Node 3
(M+NH4)+
325.24857
Tyrosine
Intensity
67884942
7242606
306655
370365
Peak Area
130861879
11166390
349548
627524
Intensity
7440865
753306
Peak Area
12655904
1231796
Intensity
766400
Peak Area
1321692
zed a sample that contained 13
cted the following 10 species:
1), M+Na (z=1), M+H (z=1), M+HN (z=2), M+Mg+CH3CN (z=2),
s in this series of adducts.
Node 4
(M+K+H2O
)+
264.18847
Node 6
(M+H+Na+
H2O)++
174.62281
Node 2
(M+Na)+
330.20396
M+Na
Node 1
(M+H)+
308.22202
Node 8
(2M+Na+H
2O)+
655.45307
Using the synthetic data we were a
accurately assign both the neutral
combinations.
The amino acid data sets provided
adducts, and charge carrying spec
potentially interfering signals. The
Tyrosine is consistent with the acc
The assignment of the acetonitrile
mobile phase used methanol and n
comparatively quite low. In this cas
additional information such as the
predicted adducts.
Uncommon adducts such as those
detected and labeled as shown in
magnesium adducts also included
Conclusion
 Using the described graph-th
in-source neutral losses, and
useful results.
 Using the method described
states and in-source neutral
Node 5
(M+2H)++
154.61465
Node 7
(2M+H)+
614.42894
Node 10
(2M+H+Na
+H2O+ACN
)++
348.73165
6. Example Results
Node 9
(2M+Na+H
)++
319.21299
 The complexity of the mass s
identification of the aforemen
capability to group compound
 The accuracy of this algorithm
such as mobile phase compo
Preference can be given to s
factors for the predicted addu
References
1. Taylor, J. A.; Johnson, R. S. R
1067–1075.
2. Clauser, K. R.; Baker, P.; Bur
3. Cox, J.; Mann, M. Nat. Biotec
3. Edge Trimming and Assignment of Species
Because of the large number of nodes and edges, ambiguities in assignment can arise.
Nodes with multiple assignments are then ordered by frequency of assignment for M+H
(positive ion mode) or M–H (negative ion mode). The algorithm also allows for weighting
factors to be applied to predicted species should a priori information provide appropriate
insight. The weighting factors can be node specific, which causes edges to move up or
down on the ordered list.
4. Assignment of Species
The assignments of the charge carrier(s) and neutral adduct(s) or neutral loss(es) are
then made according to final position in the ordered list. Should ambiguities still arise, all
possibilities are reported.
All trademarks are the property of Thermo Fi
This information is not intended to encourage
intellectual property rights of others.
Thermo Scientific Poster Note • PN-64093-ASMS-EN-0614S 5
5. Overview
The current approach is able to identify difference charge states as well as numerous
uncommon adducts in a complicated mixture all within the same mass spectrum. This
approach is also able to identify species resulting from in-source neutral losses, such as
loss of water or ammonia, as shown by the amino acid samples.
ass carrying components that
s spectrometer.
/ Z2
(1)
different species of the same
er for the base molecule, M is
that is contributed to the species
is the total mass that is
ers, and Z is the total charge of
andidate “modifying” species
nd neutral losses that can
date “modifying” species are
ces.
of candidate species with mass
om the mass spectrum being
are added as nodes to the graph
lt is a large number of nodes
ow using hypothetical data.
6. Example Results
Using the synthetic data we were able to verify the operation of the algorithm and
accurately assign both the neutral adducts and charged species for a number of
combinations.
The amino acid data sets provided additional complexity showing that neutral losses,
adducts, and charge carrying species are accurately assigned in the presences of other
potentially interfering signals. The assignment of loss of ammonia from both Lysine and
Tyrosine is consistent with the accurate mass determined elemental composition.
The assignment of the acetonitrile and sodium to lysine is questionable, given that the
mobile phase used methanol and not acetonitrile and also the strength of the signal is
comparatively quite low. In this case, improved accuracy could be achieved by using
additional information such as the mobile phase composition to restrict the list of
predicted adducts.
Uncommon adducts such as those containing calcium, magnesium, and iron were
detected and labeled as shown in the final example. Some of the calcium and
magnesium adducts also included neutral solvent species and were doubly-charged.
Conclusion
 Using the described graph-theory based algorithm to assign neutral adducts,
in-source neutral losses, and charge species to mass spectral data produces
useful results.
 Using the method described here complex mixtures of adducts with varied charge
states and in-source neutral losses are detected and properly assigned.
Node 5
(M+2H)++
154.61465
Node 7
(2M+H)+
614.42894
Node 9
(2M+Na+H
)++
319.21299
 The complexity of the mass spectral information is reduced as a consequence of
identification of the aforementioned species, which provides the analyst the
capability to group compound related signals.
 The accuracy of this algorithm is further enhanced by incorporation of information
such as mobile phase composition and knowledge of in-source fragmentation.
Preference can be given to species known to occur through the use of weighting
factors for the predicted adducts, charge carriers, and neutral loss species.
References
1. Taylor, J. A.; Johnson, R. S. Rapid Commun. Mass Spectrom. 1997, 11 (9),
1067–1075.
2. Clauser, K. R.; Baker, P.; Burlingame, A. L. Anal. Chem. 1999, 71 (14), 2871–2882.
3. Cox, J.; Mann, M. Nat. Biotechnol. 2008, 26 (12), 1367–1372.
guities in assignment can arise.
equency of assignment for M+H
gorithm also allows for weighting
i information provide appropriate
ch causes edges to move up or
duct(s) or neutral loss(es) are
Should ambiguities still arise, all
All trademarks are the property of Thermo Fisher Scientific and its subsidiaries.
This information is not intended to encourage use of these products in any manners that might infringe the
intellectual property rights of others.
PO64093-EN 0614S
6 Assigning Adduct and Charge States to High-resolution Accurate-mass Mass Spectral Data Using Frequency of Assignment in Multiple Difference Networks
www.thermofisher.com
©2016 Thermo Fisher Scientific Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and its
subsidiaries. This information is presented as an example of the capabilities of Thermo Fisher Scientific products. It is not intended
to encourage use of these products in any manners that might infringe the intellectual property rights of others. Specifications, terms
and pricing are subject to change. Not all products are available in all countries. Please consult your local sales representative for
details.
Africa +43 1 333 50 34 0
Australia +61 3 9757 4300
Austria +43 810 282 206
Belgium +32 53 73 42 41
Canada +1 800 530 8447
China 800 810 5118 (free call domestic)
400 650 5118
Denmark +45 70 23 62 60
Europe-Other +43 1 333 50 34 0
Finland +358 9 3291 0200
France +33 1 60 92 48 00
Germany +49 6103 408 1014
India +91 22 6742 9494
Italy +39 02 950 591
Japan +81 45 453 9100
Latin America +1 561 688 8700
Middle East +43 1 333 50 34 0
Netherlands +31 76 579 55 55
New Zealand +64 9 980 6700
Norway +46 8 556 468 00
Russia/CIS +43 1 333 50 34 0
Singapore +65 6289 1190
Spain +34 914 845 965
Sweden +46 8 556 468 00
Switzerland +41 61 716 77 00
UK +44 1442 233555
USA +1 800 532 4752
PN-64093-EN-0716S