Bruker Soyasaponin Application Note 210x279_01.indd

Application Note LCMS-116
What are we eating?
MetaboScape® Software; Enabling the De-replication and
Identification of Unknowns in Food Metabolomics
Introduction
Determining the structure of secondary metabolites is a
significant bottleneck often faced by today’s plant and food
metabolomics scientists. The identification of compounds
of interest is a key step for enabling the biological
interpretation of observed changes in metabolite profiles.
Additionally, there is a need to quickly tag those compounds
which have been characterised previously. This so-called
de-replication process saves time which might otherwise
be spent for the repetitive annotation of already known
compounds.
Here we re-evaluated a part of the data acquired for
the showcase study of the „Metabolomics 2015 - 11th
International Conference of the Metabolomics Society“
which was organized by the local conference hosts based at
the University of California, Davis.
They prepared three different food plates, chosen on
the basis of large differences in dietary components,
representing a fast food meal (coined „USA“food plate),
a California food plate (based on USDA MyPlate dietary
recommendations - http://www.choosemyplate.gov/) and a
„Davis“ food plate, which was inspired by Korean cuisine.
In 2015, we employed complementary approaches such
as high resolution accurate mass LC-QTOF-MS/MS and
GC-APCI-QTOF-MS/MS for a comprehensive analysis
Authors
Nikolas Kessler; Heiko Neuweger;
Verena Tellström, Aiko Barsch
Bruker Daltonik GmbH, Bremen, Germany
Keywords
Technology and
Software
Metabolomics
impact II
Structure elucidation
MetaboScape
Structure confirmation
CompoundCrawler
in-silico fragmentation
SmartFormula3D
Library search
MetFrag
de-replication
unknown ID
Food
Profiling
of both food metabolites and natural products found in
the three different food plates. Data evaluation focused
on the identification and annotation of characteristic, i.e.
differentiating, small molecules found in the food samples.
In the present study we re-investigated the data acquired
by LC-QTOF-MS/MS in ESI positive ionisation mode
and will present novel in the MetaboScape 2.0 software
solution which facilitate the identification of natural
products. The information can subsequently be used to
build well-characterised MS/MS libraries, enabling a quick
de-replication for known compounds. The MetaboScape
software can address the challenge to identify unknowns
and enables the confident assignment of known target
compounds, both of which are critical steps in turning raw
MS data into knowledge.
Experimental
According to the organizers of the Metabolomics 2015
showcase study, food plates were homogenized with an
industrial-grade food service blender, lyophilized under
vacuum (except for volatile profiling), and stored in a -80 °C
freezer prior to shipment.
In our lab, three replicates of, USA, Davis and California food
plate samples were dissolved in 100 µL 80% methanol. Five
µL of each sample was analysed in two technical replicates,
each by UHPLC-QTOF-MS/MS, resulting in a total number of
eighteen runs, excluding blank and quality control samples.
Chromatographic separation was carried out using a Dionex
RSLC system (Thermo Fisher Scientific) with a 100 x 2 mm
Acclaim RSLC 120 C18 column, at a flow rate of 0.3 mL/min,
Solvent A: Water + 0.1% HCOOH, Solvent B: Acetonitrile
+ 0.1% HCOOH, using the following gradient: 0 - 2 min
1% B; 2 – 17 min linear gradient from 1% - 99% B; 17 – 20
min 99% B; 20.1 min 1% B, total run time 30 min. MS
detection was performed using a Bruker impact II Qq-TOF
mass spectrometer (Bruker Daltonics). The instrument was
operated in ESI positive mode acquiring full scan MS and
MS/MS data using the InstantExpertiseTM routine.
The resulting data was processed using the
FindMolecularFeatures (FMF) algorithm and clustered
in a bucket table with ProfileAnalysis 2.3 software. The
subsequent data analysis and compound identification
workflow was performed using tools integrated into the
MetaboScape 2.0 software: Automatic molecular formula
determination was carried out by combined evaluation of
mass accuracy, isotopic patterns, adduct and fragment
information using SmartFormula3D software™. Statistical
data evaluation and structure identification including MetFrag
[1] based in-silico fragmentation were accomplished on
the same data. MS/MS spectra of confirmed compounds
were stored in the spectral Library Editor integrated in
MetaboScape 2.0.
Results
Data pre-processing for statistical analysis
In the non-targeted metabolomics workflow presented here,
the detection of compounds via the FindMolecularFeatures
(FMF) peak finder was an important initial step of data
pre-processing prior to statistical analysis. The FMF
algorithm combines ions belonging to one compound
such as common adducts (e.g. +Na, +K, +NH4), fragments
originating from neutral losses, isotopes and charge states
to one FMF compound. In a subsequent bucketing process
the extracted features from the different samples were
aligned across all samples and combined into a so-called
bucket table. Here, a bucket table containing the 18
samples from the USA, Davis and California food plates
was calculated and 1163 features were assigned throughout
the samples. Following the import of the bucket table to
the client-server based MetaboScape 2.0 software, an
automated assignment of high-resolution accurate mass
(HRAM) MS/MS spectra to the respective buckets enabled
the subsequent confident de-replication of known and the
structure elucidation for unknown compounds.
Confident, automatic de-replication
The information for extracted features contained in the
bucket table included retention time, accurate mass and
isotopic pattern (TIP TM - True Isotopic Pattern) of precursor
and fragment spectra and hence, enabled to automatically
annotate compounds at different confidence levels:
1. Using a custom “Analyte List” enabled to confidently
annotate compounds in the bucket table. This list of
known target compounds included metabolite name,
molecular formula, retention time information from
the applied C18 reversed phase chromatography and
MS/MS library spectra. The graphical Annotation
Quality “AQ” representation (
) enabled to readily
derive the confidence for each annotation based on
user definable levels for matching of accurate mass,
retention time, isotopic fidelity and MS/MS library
score (see Figure 1).
2. Buckets which were not annotated using the Analyte
List were queried against two complementary MS/
MS spectral libraries: The “Bruker HMDB Metabolite
Library” and the “Bruker MetaboBASE Personal
Library”. This allowed the assignment of features
based on spectral similarity. Since no retention time
information is evaluated for this workflow compound
identification is considered “tentative”.
3. For the buckets which were not annotated by the
first two approaches molecular formulas were
automatically calculated by SmartFormula3D
Figure 1. Overview perspective in MetaboScape 2.0
software. The implemented algorithm considers
accurate mass and isotopic pattern information in MS
and MS/MS spectra. Furthermore, information from
adducts and neutral losses, as well as additional filters
for elemental compositions [2, 3] were applied to
narrow down the list of possible molecular formulas
to biologically relevant candidates.
Statistical evaluation via PCA and ANOVA in MetaboScape
software revealed a characteristic compound with 943.525
m/z eluting at 10.86 min to be much more abundant in
“Davis” food platter samples compared to “CA” and
“USA” (see Figure 2). This compound was not annotated
by the Analyte List or via the MS/MS spectral library
query, but was selected for further characterisation due to
Identification of Soyasaponin I as a characteristic compound
for “Davis” samples - SmartFormula3D, CompoundCrawler
and in-silico fragmentation with MetFrag
Figure 2. Box Plot representation for Bucket 10.86 min: 943.525 m/z
revealing higher abundance in Davis compared to CA and USA food
platter samples.
Figure 3. Assignment of elemental composition via SmartFormula3D.
Based on precursor m/z information dozens of candidate formulas
in a 1 mDa mass accuracy window are possible. In addition to mass
accuracy SmartFormula3D considers the True Isotopic Pattern and
MS/MS fragment information and returned the molecular formula
C 48H78O18 as most likely candidate. Confidence in this result is not
only based on the 0.94 ppm mass accuracy and very good isotopic pattern fit (2.71 mSigma value) but it is also supported by 80
fragment ions, constituting 92% of the MS/MS spectral intensity, for
each of which an unambiguous molecular formula could be assigned.
Figure 4. A) Searching online compound databases with CompoundCrawler for C 48H78O18 returned multiple candidate structures from
the online compound databases. In-silico fragmentation of selected candidates using the MetFrag [1] algorithm generated scores for the
likelihood of the structures to match the MS/MS fragment peaks. The best candidate molecule was Soyasaponin I. The characteristic aglycon
fragment with 441.373 m/z highlighted on the Soyasaponin I molecule subsantiated this structural hyposesis.
its relevance as a differentiating feature. The first critical
information allowing for the identification of this metabolite
was the correct molecular formula:
Evaluation by the SmartFormula 3D software enabled
to readily assign the molecular formula C 48H78O19 to the
precursor with high confidence (see Figure 3) with a mass
accuracy of 0.94 ppm and a mSigma value of 2.71 for the
[M+H]+ (the lower the mSigma value the better the fit
between measured and simulated isotopic pattern; scale
ranges from 0 – 1000). Also the [M+Na]+ adduct contained
in the extracted feature pointed to this molecular formula
consisting only of C, H, and O atoms. Additional confidence
in this molecular formula was derived from 80 MS/MS
fragment peaks for which formulas could be assigned,
covering 92% of fragment peak intensity.
A search for this molecular formula in public databases
using the integrated CompoundCrawler software
functionality generated multiple hits for possible structures.
In-silico fragmentation of the selected candidates via the
fully integrated MetFrag algorithm delivered Soyasaponin I
as the compound with the best MetFrag score (see Figure
4 A). The characteristic aglycon fragment with 441.373 m/z
highlighted in Figure 4 A and additional in-silico generated
structures (Figure 4 B) matching measured fragment ion
peaks subsantiated this structural hypothesis.
Confirmation of Soyasaponin I with a reference standard
The identity of the compound could be confirmed by
measuring the reference standard of Soyasaponin I and
comparing retention time and MS/MS spectra (see Figure
5 A and B). Data for the reference compound was acquired
approximately 12 month after analysing the original
showcase samples by using the same general setup, but
not the identical LC-MS/MS system. To demonstrate the
transferability and reproducibility from one setup to another,
a replicate of a Davis sample that was not analysed during
the initial study was redissolved and analyzed on the
new setup. The retention time and MS/MS spectrum of
the candidate bucket acquired in 2015 matched the data
acquired in 2016 (see Figure 5 B and C).
Considering that the Davis food platter was inspired by
Korean cuisine the identification of Soyasaponin I is in
agreement with the “biological” context: The organizers of
the showcase sample disclosed that the Davis food platter
contained bean sprouts and those were described before to
contain Soyasaponin I [4].
617.405 m/z
781.473 m/z
485.151 m/z
Figure 4.B) Further, in-silico generated fragment structures matching
measured fragment ion peaks added to the annotation confidence.
Figure 6. MS/MS Bucket matches: Three connected buckets based
on similar HRAM MS/MS spectra – the similarity indicates these
analytes to be related to Soyasaponin.
Figure 5. Retention time and MS/MS spectrum of the Soyasaponin I
reference standard (A) match the chromatographic signal in the Davis
food study samples reanalyzed in 2016 (B) and the corresponding
data acquired in 2015 (approximately 12 month before) (C).
Identification of further Soyasaponins by MS/MS spectral
similarity search
In addition to Soyasaponin I, several other soyasaponins
have been described in black beans [4]. Since chemically
related compounds typically reveal similar MS/MS
fragmentation patterns, an MS/MS spectral similarity
search was performed with the aim of discovering further
soyasaponins within the current data set.
Figure 6 represents the outcome of an MS/MS similarity
match between the MS/MS spectrum of the identified
Soyasaponin I and all other MS/MS spectra of buckets
contained in the bucket table. Similar to a typical MS/MS
spectral library query, a query spectrum is compared to
other MS/MS spectra and a matching score is calculated.
The difference of the similarity matching is that the MS/
MS query spectra are not matched against a spectral library
of known compounds but against other MS/MS spectra
contained in the same bucket table. Two buckets with
similar MS/MS spectra were returned: 11.16min:797.468m/z
with a score of 899 and 10.69min:959.520m/z with a
score of 917. Following the same workflow as described
for Soyasaponin I - molecular formula generation followed
by database searches for candidate structures and in-silico
fragmentation - resulted in the tentative identification of
Soyasaponin III and Soyasaponin V, respectively.
Conclusions
The Bruker impact II series of Q-TOF MS instruments,
based on its Full Sensitivity Resolution (FSR) mode provides
a non-compromising combination of mass accuracy, isotopic
fidelity, resolution, dynamic range, sensitivity and MS/MS
performance - a key requirement to analyse highly complex
samples. Fully exploiting this high quality data using the
novel MetaboScape 2.0 software allowed for an automated
and confident de-replication of known target compounds
based on user definable confidence levels for mass
accuracy, isotopic fidelity, retention time and MS/MS score.
Additionally, the integrated structure elucidation solutions
SmartFormula 3DTM and MetFrag software enabled the
identification of a secondary metabolite with m/z > 900 as a
characteristic compound for the Davis food platter samples.
In detail, unambiguous molecular formula assignment to
the precursor ion followed by in-silico fragmentation of a
structure candidate obtained from public database queries
led to the successful identification of Soyasaponin I. The
Davis food platter sample, inspired by Korean cuisine,
contained among other ingredients bean sprouts which are
known to contain soyasaponins as the predominant saponin.
A subsequent MS/MS spectral similarity search allowed
the tentative annotation of two additional Soyasaponins, III
and V. These three target compounds can now be added
to a custom MS/MS library in the MetaboScape software,
extending the list of ‘known knowns’, and in combination
with an extended Analyte List, will enable to quickly identify
these compounds in other metabolite extracts.
Acknowledgements
Food study samples were provided by Arpana Vaniya (Fiehn
laboratory) from University of California Davis and Nancy Keim
(USDA team at Davis, California) as part of the Metabolomics 2015
conference show case study.
We also thank Steffen Neumann and his team at the IPB in Halle,
Germany for helpful discussions and for providing the source code of
the MetFrag algorithm.
[1] Wolf et al. BMC Bioinformatics 2010, 11:148.
[2] Kind T. and Fiehn O. BMC Bioinformatics. 2007, 8:105.
[3] Kessler, N. et al. PLOS One 2014 26; 9(11):e113909.
[4] Lee MR et al. J Mass Spectrom. 1999 34(8):804-12.
For research use only. Not for use in diagnostic procedures.
Bruker Daltonik GmbH
Bruker Daltonics Inc.
Bremen · Germany
Phone +49 (0)421-2205-0
Fax +49 (0)421-2205-103
Billerica, MA · USA
Phone +1 (978) 663-3660
Fax +1 (978) 667-5993
[email protected] - www.bruker.com
to change specifications without notice. © Bruker Daltonics 08-2016, LCMS-116, 1845963
Bruker Daltonics is continually improving its products and reserves the right
References