PeakErazor – Calibrating MALDI Mass Spectra Peter Højrup and Karin Hjernø PeakErazor.exe Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M, Denmark. Solution To address these problems we have created a small, user-friendly Windowsbased program, PeakErazor, using a simple concept: By comparing all mass values from a tryptic digest against a list of known contaminants, contaminating peaks can easily be identified and removed. At the same time you can perform a multipoint calibration leading to improved precision. These features leaves you with a much larger precision in the actual PMS search. Furthermore, the program helps you detect new contaminants either in a given dataset or after general use. A novel development of the program enables the calibration using the mass defect which results in the calibration of mass spectra that we hitherto was not able to calibrate. In general the final precision is in the 30-50 ppm region. 2 - General function of PeakErazor A 80 The average mass defect is 494 ppm. B 1410.5 Intensity 1800.7 2299.0 1000 P 1607.7 1426.5 2244.0 772.43 2225.0 2232.9 1019.5 1219.6 1476.6 1832.7 2085.8 853.43 1147.6 1490.51707.7 1940.8 804.41 2856.2 2519.0 2760.1 1106.5 1112.6 2959.2 2871.2 2839.2 2390.9 679.55 2976.2 0 1000 3500 m/z C:\Documents and Settings\Peter\My Documents\WORD\Teaching\BM131\Day4\spec2.massml (12:57 01/30/05) Description: SSI 39.5 A. A MALDI mass spectrum is annotated, the mass list is copied to the clipboard. B. The mass list is pasted into PeakErazor. The program compares it to the currently loaded ‘Erazor list’ and calculates the difference to all ‘contaminants’ that fall within the given precision. F H Y S M Contaminants: Blue: trypsin Red: keratin QT W NG E C. In the separate graph window the deviations are plotted against the mass. A number of spots are seen around the x-axis, indicating a likely calibration line. The three outliers are unchecked prior to calibrating on the five remaining values. D. After calibration the spots are within 13 ppm (which gives a norm for the database search). One of the outliers is directly on the +1 Da grey line and is thus a candidate for a wrongly assigned isotope. These outlying peaks are unchecked from the list before all unchecked mass values are copied to the clipboard for PMS. The final protein (Calreticulin) was identified with 10 peptides (± 25 ppm). After a two-point calibration, the same protein was identified with only 9 peptides (± 30 ppm) due to a ~10 ppm offset and a slight slope relative to the multipoint calibration. A. The initial mass spectrum of an HPLC separated peptide shows a peptide peak at m/z 1815. Not being micropurified, the spectrum shows a number of adduct ions. The precision is quite poor at 602 ppm. 3000 C B. Decreasing the signal to noise ratio to 2.5 labels a large number of peaks, most of which are peptide related, but of very low intensity. This peptide mass list is copied into PeakErazor. Calibrate on these peaks 1 tryptic, 5 keratin An initial attempt to calibrate on the mass defect, i.e. performing a ‘straight’ multipoint linear calibration on the deviation from the average mass defect, yielded a reasonable mass precision in the order of 50-80 ppm. This was better than no calibration, but prompted an investigation into the nature of the mass defect. As the most common enzyme by far is trypsin, an enzymatic digestion of the entire Swiss-Prot database (164.000 proteins) was done in silico, yielding 6.55 million peptides, of which 4.1 million was in the interesting region of 500..4000 Da. A: When the mass spectrum has been annotated (in this case using MoverZ), copy the list of monoisotopic masses to the clipboard. C The mass defect The distribution of the mass defect for all amino acid residues shown on a scale from 0 to 1000 ppm. The blue line indicates the average mass defect, the green line average not including Cys. B: Paste the list into PeakErazor. Note that the mass values, which fit to values in the chosen ’Erazor list’ (e.g. known contaminants) will be checked and listed with deviation and id. C: In the mass vs. deviation graph it is clear to see that several of the contaminants are located on a straight line. D: After deselecting the outlying peak (m/z 2224) the remainder is used for a linear calibration. After the target protein has been identified, it is digested in silico and the peptide mass values A reloaded into PeakErazor as <blank>. The identified peptides can be seen as 9 hits within 12 ppm. One mass value (m/z 1338) can be seen on the bottom gray line (- 1 Da) and upon inspection of the mass spectrum, it could be seen to be a wrongly marked isotope. CMD = actual mass value / av. mass defect mass Peptide mass list of hit loaded as <blank> 9 hits within 12 ppm Wrongly marked isotope (cluster) B E: After switching the graph display to ’Mass defect view’, the same raw mass list is pasted into PeakErazor. The mass defect for each peptide is now plotted relative to the mass and can be seen to form a diffuse sloping line. Blue: Total CMD x 100 Figure A. E Deviation from mass defect calibration of tryptic peptides based on the average mass defect of 494 ppm. Less than 1% of tryptic peptides above mass 1050 Da. deviate with more than 125 ppm. Figure D. Adjusting the average mass defect factor in the low and high mass regions yields a total CMD close to zero. Also the number of peptides >125 ppm decreases considerably. D. A following ’Auto’ calibration performs a linear fit to the existing data. A check of the resulting mass values shows that the precision is now 57 ppm. 602 ppm 1 8 1 5 .3 1921 3 .2 1 8 5 3 .2 1 3 8 5 .3 7 1 6 .1 9 2 0 6 6 .4 0 1000 3000 m /z \\ B m b -fils rv\ P R p u b lic \ In g e rM Z \M y o g lo b in s e p t. 0 4 \p 4 9 8 9 7 ic , to p 1 5 .m a s s m l (1 3 : 1 8 0 1 / 0 4 / 0 5 ) D e s c rip t io n : M y o g lo b . a k ta 2 2 6 4 ic , t o p 1 5 s/n 2.5 B 250 1 8 1 5 .3 1921 3 .2 1 8 5 3 .2 1 8 3 7 .2 1 3 8 5 .3 1 719894.07 .3 7 1 6 .1 9 15 7 4 58.1 28 .2 1 813807.25 .2 7 329.1 43 1 3 7 1844.4 136 7 7.1 1 .3 1 925608 54 9.2 919.2 0704.2 11.1 57 2 6 .4 1 00 .3 1 10 .3 0 .2112 .2 124 256 077.3 .3 .21 14140 7 2 7 .1 68856699.1 12146 7.2 4.4.1 1 651115777 .0 2 12.2 .2 .3 1 928 620 .3.3 02 96.0 2 163.9 31 0 0 1 00 0 3 0 00 m /z \\ B m b - fils rv\P R p u b lic \ In g e r M Z\M y o g lo b in s e p t. 0 4 \p 4 9 8 9 7 ic , to p 1 5 . m a s s m l (1 3 :1 8 0 1 /0 4 / 0 5 ) D e s c r ip t io n : M y o g lo b . a k ta 2 2 6 4 ic , to p 1 5 C C manual + autocalc. 57 ppm C delete + recalc. Final 51 ppm 6 - Conclusion The current program depends on a combination of the human brain to pick out visual trends in a graph followed by a linear multipoint calibration. Using the ‘standard’ calibration option of calibrating on contaminations a 20-50% improvement in the precision of the mass list can usually be achieved. In addition, contaminant mass values are removed from the search list, thus improving the search as the mass list gets smaller. The program saves the mass search data as either removed or included in the search, and after several hundred spectra have been analyzed, the program enables you to identify system specific contaminants. Another functions enables you to identify common values in a smaller dataset (i.e. contaminants specific to a particular gel or when analyzing isoforms/alleles). The most recent addition to the program, mass defect calibration, comes into its own in the cases where no internal calibrants or contaminants can be located in the data. By taking advantage of the fact that peptide mass values are not evenly distributed, but most values fall within ±125 ppm regions it is possible to calibrate any peptide containing mass spectrum given that a sufficient number of peptides are present. As a side effect, most non-peptide mass values can be removed from the list, as they will have a mass defect larger than 125 ppm. C F Figure C. Adjusting the average mass defect for the database residue distribution (494 ppm) brings the total CMD to zero in the range 2000-3000 Da., but the low and high mass ranges are not improved due to the fact that every peptide has a Lys or Arg, both of which have a high mass deviation. The number of peptides with large deviations actually gets worse. C. The mass default deviation map show a distribution both above and below the x-axis. This necessitates a manual calibration, performed by selecting ’Manual’ followed by a click at both ends of the low mass ’line of deviations’ as shown. E. Deleting the mass values with a mass defect >125 ppm (matrix and adduct ions) by pressing ’Delete’ followed by another automatic calibration yields the final mass values which shows a precision of 51 ppm. D Omitting the mass defect of Cys from the average mass defect (448 ppm) improves the total CMD and lowers the number of peptides with large deviations. D m/z H:\MSdata\P H0467A human serum proteins\p50301ic, 6_C4B_P01028.m assml (14:21 01/02/05) Description: PH0467 A, µ-o pr. 1/3 af 6 D Figure B. C 3000.1 2383.6 3097.0 2500.92716.6 2704.7 2955.0 3131.9 2982.8 2932.0 A Red: – number of peptides (in %) deviating more than 125 ppm 2195.1 2626.8 0 Green: - Peptide distribution (x 65.500) 1442.5 2289.8 2550.8 2777.9 2052.7 881.17 1196.4 1436.5 1938.7 1851.6 2042.7 1967.7 842.421052.4 1179.5 1 338 .5 1191.5 1743.5 1833.7 2084.8 1707.5 1518.6 Calibrating on the mass defect The figure shows: 842.541045.6 1324.5 1213 .5 1307.5 1379.5 1381.5 1638.6 1475.5 1353.5 1499.6 A 250 Intensity 966.32 K V R Total CMD: Σ (Round (CMD) – CMD) / pepCount 2283.0 A major problem when analyzing (HPLC) isolated peptides is that your only choice for calibration is either an external calibration or calibrating on matrix peaks. However, in many cases the mass defect calibration can also be used here – it is mainly a case of finding enough (low intensity) monoisotopic mass peaks. B Intensity A 2482.9 L Mass defect of the 20 amino acid residues in ppm: Ala 523 Arg 648 Asn 377 Asp 234 Cys 89 Glu 330 Gln 457 Gly 376 His 430 Leu 743 Lys 741 Met 309 Phe 465 Pro 544 Ser 368 Thr 472 Trp 426 Tyr 388 Val 691 2224.8 983.34 The mass defect is the difference between the monoisotopic and the integer mass value of a given amino acid residue, i.e. the mass defect of Alanine with a monoisotopic mass of 71.03711 Da is 0.03711 Da or 523 ppm. Peptides were divided into 50 Da. slots. For each slot a corrected mass defect (CMD) was calculated as 2211.0 50 1000 ppm The mass defect 5 - Isolated peptides 4 - Multipoint and mass defect calibration 3 - Compensating for variations in the mass defect Problem Peptide mass searching/fingerprinting (PMS) is one of the most common methods for identifying proteins in proteomics. The method relies on identifying proteins based on the mass values of the peptides generated by enzymatic digestions, typically tryptic digestion. Very advanced search programs are available today, making identification of proteins by PMS using tryptic MALDITOF MS data quite straightforward. Two criteria are essential for a successful identification: A high number of (significant) mass values and a high precision. In the current software we have tried to address both problems, with most emphasis on improving the precision. In a given dataset the PMS can be improved by removing non-significant values, i.e. remove peptide contamination. 2-D gel separated proteins are typically contaminated by keratin and tryptic autodigest peptides, but other project specific contaminants may be present. Including these peaks in the peptide mass search will obviously make the search less precise and consequently they have to be excluded from the peptide mass list before using the list as input in a protein identification. Calibration is usually performed as a two- or three-point calibration (either external or internal) of the mass spectra are used and in most cases the obtained accuracy is sufficient for identification of the protein in question. However, performing a multipoint internal calibration will improve the precision of the final dataset. Furthermore, the peaks used for calibration may be missing, of poor quality or the isotope envelope may overlap with other peaks, making internal calibration next to impossible. Intensity 1 - Introduction G The mass defect calibration does not work if the spectrum is heavily contaminated with non-peptide peaks (i.e. matrix, detergent or adduct ions). These cases are usually immediately recognized, as the spread of the mass defect gets larger than 125 ppm and it is not possible to perform a ‘stable’ calibration. Few non-peptide mass values are easily removed and recalibration is just the press of a button. Furthermore, the higher the number of peptides, the more accurate the calibration. In praxis you need ~15 mass values for a proper calibration. In a few cases (in our experience <5%) the method fails and we have to resort to external calibration. Reloading the theoretical peptide mass list of the target protein enables you to identify wrongly assigned peptides (larger than average deviation) and helps to locate potentially modified peptides. D F: Performing an linear multipoint calibration on the mass defect with removal of ’outliers’ (>125 ppm) yields a diffuse scattering of points centered around the x-axis. The mass list is now calibrated and can be used for PMS. 8 - Where to find PeakErazor The PeakErazor program can be downloaded for free from G: Reloading the target protein peptide list shows 9 hits within 10 ppm. http://welcome.to/gpmaw Please go to the ’download’ section of the web site. If you have any problems or have suggestions for improvements, please contact the author on [email protected].
© Copyright 2026 Paperzz