Building the NIST Tandem Mass Spectral Library 2014 Xiaoyu (Sara) Yang Mass Spectrometry Data Center Biomolecular Measurement Division Seminar September 9, 2014 1 Outline • Methods of building the NIST tandem mass spectral library • Quality Control o Peak annotation o Noise Removal o Chemical information consistency • Major types of mass spectra • Major types of compounds 2 Mass Spectral Library Searching Experimental mass spectrum Searching Reference Spectra in MS Library Experimental matched Reference MS Search 3 NIST Tandem Mass Spectral Library Rationale and Objectives: • Chemical identification through Electrospray ionization (ESI) tandem mass spectrometry (MS/MS) is becoming a routine technique in metabolomics, proteomics and other fields. • The identification can be aided by matching the acquired tandem mass spectra against reference library spectra. • We are developing a comprehensive library of high quality reference ESI tandem mass spectra for the identification of compounds through the ion fragmentations. Goals: • Develop a tandem mass spectral library of all biologically relevant metabolite ions. • Provide the library in a form that is easily searchable using software tools. 4 Steps of Building the NIST Tandem Mass Spectral Library Authentic samples metabolites, drugs, peptides lipids, pesticides, surfactants, glycans, sugars LC/MS/MS Ion Trap (LTQ, IT/FTMS) First clustering Second clustering Collision Cell (HCD, QQQ, QTOF) Cluster MS2 spectra using precursor m/z count-based clustering algorithm cluster spectra of the similar precursor m/z values Create consensus spectra of MS2, MS3 and MS4 adjusted dot product-based clustering algorithm Chemical structure, formula, name, synonyms, and CAS are consistent cluster spectra of the similar fragmentations Precursor type identification • precursor purity • mass accuracy • peak annotation • noise removal Manual inspection MSMS library with multiple precursor types 5 Count-based Algorithm for Clustering Precursors Steps: 1. Count the number of precursors (m/z count) within 0.1 m/z; 2. Sort the precursors by the m/z count in descending order; 3. Group similar precursors into the same cluster by using the precursor with the highest m/z count as the cluster center; 4. Repeat step 3 until all the precursors are clustered. intensity 100,000,00035 m/z count 40 Intensity vs m/z m/z count vs m/z 30 10,000,00025 437 spectra 30 20 33 clusters 1,000,000 15 20 10 10 100,000 5 β-Dihydroequilin [M+H]+ 10,0000 HCD 10eV 0 0 100 100 200 200 300 300 400 400 500 500 600 600 m/z 6 Clustering Algorithm for Generating Consensus Spectra Steps: 1. Group similar spectra into the same cluster; 2. Generate one consensus spectrum from each cluster; 3. Pick the best consensus spectrum for the library. Dot Product (DP) = 1 3 2 4 I 1* I 2 5 I 1* I 2 I – peak intensity MS/MS spectra 1. Calculate adjusted dot product (DP) 2. Find a cluster center with the highest average DP (>0.7) cluster 2 cluster 1 3 5 1 2 4 Cluster the same ion at the same collision energy Consensus spectrum: Median peak intensity Median m/z consensus spectrum 1 consensus spectrum 2 bin size bin = 700 x 10ppm x 10-6 = 0.0070 m/z when m/z=700 best consensus spectrum 7 Using Consensus Spectrum in the Library • Eliminated low quality spectra by spectral clustering. • Improved the spectrum quality by using the median of the m/z and intensity values. • Realistically represented the characteristic fragmentations. Same compound Same precursor type Same instrument Same energy (cone, collision) Same mode (+/-) Same spectrum type (MS2, MS3, MS4) 10-20 spectra / consensus spectrum 10-20 energy levels 8 What Precursor Types are in the NIST MSMS Library? Compound Type Neutral Molecule Organic Salt Cations ESI Product Ionic Species [M+H]+, [M+2H]2+, [2M+H]+, [M+H-H2O]+, +, [M+H-OH]+, [M+H+H O]+, [M+H-NH ] 3 2 Positive Ions [M+NH4]+, [M+Na]+, [M-H+2Na]+, [M(129,662; 84%) 2H+3Na]+, [M+K]+, [M-H+2K]+, [M-2H+3K]+, [M+Li]+, [M-H+2Li]+, [M-2H+3Li]+ Negative Ions [M-H]-, [M-2H]2-, [2M-H]-, [M-H-H2O]-, [M-H-, [M-H+H O]-, [M-H+NH ]NH ] 3 2 3 (23,638; 15%) Positive Ions [Cat]+, [Cat+H]2+, [Cat-H2O]+, [Cat-NH3]+, (1,751; 1%) Negative Ions (40; <0.1%) [Cat+H2O]+ [Cat-2H]-, [Cat-2H-H2O]-, [Cat-2H-NH3]-, [Cat2H+H2O]-, [Cat-2H+NH3]9 Multiple Precursor Ions for More Flexible Identification 100 p-C5H10O10P2 137.0459 O p=[M+H]+ Relative Intensity 0 N N HO O 100 50 0 100 50 0 O P OH p 429.0211 p-C5H10O9N4P2 97.0283 200 300 400 p-C5H4ON4 336.9462 Na2PO3 124.9375 p-C5H6O2N4 318.9356 500 200 300 m/z OH 50 0 100 p 472.9848 50 400 500 p=[M+Na]+ Energy=18eV P p=[M-H+2Na]+ Energy=28eV 100 200 300 400 500 p-C5H10O10P2 HP2O6 p=[M-H]135.0316 158.9257 Energy=27eV PO3 p 78.9592 p-H3PO4 329.0298 427.0067 100 OH O O O HO 100 100 N Energy=12eV 50 p-C5H6O2N4 296.9537 H N 0 100 50 0 p-C5H9O9N4NaP2 97.0283 90 NaH4P2O7+ 200.9325 180 270 360 p-C8H11NaO4N4 p-C H ON 5 4 4 244.8964 358.9282 100 p 494.9668 200 HP2O6 158.9256 300 p-C5H4ON4 272.9574 200 inosine 5'-diphosphate acquired on Orbitrap HCD 450 p=[M-2H+3Na]+ Energy=29eV Na3HPO4 164.9300 PO3100 78.9592 p-C5H4ON4 314.9643 p 451.0031 400 500 p=[M-H-H2O]Energy=32eV p 408.9958 300 m/z 400 500 10 What Precursor Types are in the NIST MSMS Library? Structure dependent losses: 2H2O, 3H2O, NH3+H2O, H2S, HCl, H3PO4, HCN, H2, CO, CO2, HCOOH, CH4, CH3, CH3OH, CH3SH, C2H5OH, ... 11 In Source Fragmentation Confirms Metabolite Identification MS2 spectra acquired on Orbitrap HCD 12 n MS Orbitrap FT/IT Orbitrap ion trap MS2 MS22 MS Estra-1,3,5(10),7-tetraene-3,17β-diol [M+H]+ 13 n MS Estra-1,3,5(10),7-tetraene-3,17β-diol [M+H]+ MS2 MS3 Orbitrap ion trap MS3 MS3 MS4 14 Noise Removal Relative Intensity 1. Satellite peaks in Orbitrap HCD spectra: due to the Fourier transform ringing artifacts 100 359.0117 A 130.0612 50 0 <1% intensity of the main peak within ± 2 m/z 218.9783 70 140 210 m/z 2 m/z 280 thiencarbazone -methyl [M+H]+ 350 2 m/z 15 Noise Removal Relative Intensity 2. Tailing peaks in QTOF spectra: due to the imperfect centroiding 100 A 84.04 102.05 130.04 1-Eicosatrienoyl-snglycero-3phosphoethanolamine [M+H]+ 190.08 50 172.07 0 60 90 84.044 20 84.044 m/z 2020 <20% intensity of the main peak after 0.3 m/z 150 190.080 20 190.080 202020 B 10 1010 00 00 120 84 84 1010 10 2020 84.044 20 84.044 10 101010 84.236 84 190 000 00 0 0.3 190 84 84.236 101010 10 180 C 190.246 0.3 B 202020 190.080 20 190.080 190.246 85 85 191 85 191 85 C 16 Noise Removal 3. Random noise peaks: in all mass spectra due to unstable instrument, impurity… The occurrence <25% occurrence=1/3 =33.3% occurrence=2/3 =66.7% 17 Noise Removal – an Example of Voting Algorithm 100 A p-C4 75.0 50 … Relative Intensity 0 100 50 100 0 50 0 22 Relative Intensity C 100 p-C4H8O2 75.0229 p-C2H5O2 50 102.0466 0 22 11 0 00 1 1 60 80 100 120 140 160 m/z B A p-C4H8O2 p-CH4O2 75.0229 p-C3Hp-C 4O22H115.0545 5O2 102.0466 115.0545 p-C4H8O2 60 75.0229 80 B p-C 100 2H5O2 102.0466 11 22 0 00 11 1 01 00 2 12 1 22 80 100 120 140 160 60 80 100 120 140 160 B C 11 120 140 160 m/z 60 80 100 120 140 160 B60 80 100 120 140 160 60 60 80 80 100 100 120 120 140 140 160 160 60 80 100 120 140 160 after 0 00 1 1 2 2 before 2 2 60 60 C C p-CH4O2 p-C3H4O2 115.0545 115.0545 A A p-CH4O2 p-C3H4O2 115.0545 115.0545 m/z 4-methylcinnamic acid [M+H]+ HCD 18 60 60 Peak Annotation – Peptide (for low and high resolution MS/MS spectra) H+ ADGK/1 Int-DG • p, y, b, a, internal fragments, neutral losses(-H2O, -NH3, -CO) • y+10(CO-H2O), a2-45(CONH3) • 52 immonium ions (e.g. IHA: C6H7N3O + H - CO --> C5H8N3) 19 Peak Annotation - Small Molecule (for high resolution MS/MS spectra) • Peaks were annotated with the most probable chemical formula consistent with the precursor formula formula valence=∑Count*(valence-2) + 2 + charge count of each element formula valence<=# of H; H:C ratio>=0.125 |(Observed m/z – Theoretical m/z)| Accuracy (ppm) = x 106 Observed m/z Sum of intensities of unassigned peaks Unassigned % = x 100 Sum of intensities of all peaks T. Kind and O. Fiehn, Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics 2007, 8:105 doi:10.1186/1471-2105-8-105 10 ppm 10 % 20 Peak Annotation - Small Molecule (for high resolution MS/MS spectra) H+ /-4.2ppm C10H12O H+ Cuminaldehyde [M+H]+ theoretical precursor m/z= 149.0961 HCD energy=19eV Unassigned=0.9% /-1.3ppm /-0.6ppm /-0.1ppm 21 Anal. Chem. 2014, 86: 6393−6400 22 What Types of Mass Spectra are in the NIST MSMS Library? 8% 6% MS2 86% MS3 MS4 Instruments: Micromass Quattro Micro: Triple Quadrupole Thermo Finnigan LTQ: IT/ion trap Agilent QTOF 6530: Q-TOF Thermo Finnigan Elite Orbitrap: HCD IT-FT/ion trap with FTMS IT/ion trap 23 What Types of Compounds are in the NIST MSMS Library? Metabolites ~50% Ureidosuccinic acid [M+H]+ HCD 14eV 24 What Types of Compounds are in the NIST MSMS Library? Drugs ~20% Acetaminophen [M+H]+ QQQ 24V 25 What Types of Compounds are in the NIST MSMS Library? Bioactive peptides ~10% ARLDVASEFRKKWNKWALSR [M+4H]4+ QQQ 27V 26 What Types of Compounds are in the NIST MSMS Library? All amino acids (20) All dipeptides (400) All tryptic tripeptides (800) YNK [M+H]+ Ion Trap 35% 27 What Types of Compounds are in the NIST MSMS Library? Lipids ~ 5% Cholesterol [M+H-H2O]+ QTOF 20V 28 What Types of Compounds are in the NIST MSMS Library? Glycans ~ 2% Glycan A1F-MIX [M+2Na]2+ HCD 78eV 29 What Types of Compounds are in the NIST MSMS Library? Sugars ~ 2% Like sugar? So does diabetes… D-(+)-Galactose [M+H]+ QQQ 12V 30 What Types of Compounds are in the NIST MSMS Library? Surfactants and Contaminants Sorbitan monopalmitate [M+H]+ QQQ 14V 31 What Types of Compounds are in the NIST MSMS Library? Pesticides Atrazine [M+H]+ HCD 32eV 32 NIST Tandem Mass Spectral Library 2014 33 Conclusions • Two level clustering algorithms (counter-based and distance-based) were developed and provide a robust means of generating consensus spectra of multiple precursor types for the NIST Tandem Mass Spectral Library. • Quality control programs such as peak annotation and noise removal methods were developed and are used in building the reference quality NIST Tandem Mass Spectral Library. • NIST Tandem Mass Spectral Library 2014 can be applied in chemical identification in Metabolomics, Proteomics and other fields. Plans for the future: • Improve peak annotation program by using compound’s structure and physical chemical properties. • Develop more quality control methods to eliminate low quality spectra for the library. 34 Acknowledgements 35
© Copyright 2026 Paperzz