COMMENT Origin of anterior patterning symbiolongicarpus (Cnidaria: Hydrozoa). Proc. Natl. Acad. Sci. U. S. A. 95, 3673–3678 37 Schummer, M. et al. (1992) HOM/HOX homeobox genes are present in hydra (Chlorohydra viridissima) and are differentially expressed during regeneration. EMBO J. 11, 1815–1823 38 Martinez, D.E. et al. (1997) Budhead, a forkhead/HNF3 homologue, is expressed during axis formation and head Outlook specification in hydra. Dev. Biol. 192, 523–536 39 Technau, U. and Bode, H.R. (1999) HyBra1, a Brachyury homologue, acts during head formation in Hydra. Development 126, 999–1010 Mass spectrometry from genomics to proteomics Large-scale DNA sequencing has stimulated the development of proteomics by providing a sequence infrastructure for protein analysis. Rapid and automated protein identification can be achieved by searching protein and nucleotide sequence databases directly with data generated by mass spectrometry. A highthroughput and large-scale approach to identifying proteins has been the result. These technological changes have advanced protein expression studies and the identification of proteins in complexes, two types of studies that are essential in deciphering the networks of proteins that are involved in biological processes. he elucidation of an organism’s genome is the first and important step towards understanding its biology, and the data created by whole-genome sequencing have significant benefits in fields outside those of genomics and bioinformatics. One area to benefit is that of proteomics. The term proteomics, or more appropriately functional proteomics, describes the ability to apply global (proteomewide or system-wide) experimental approaches to assess protein function. Proteomics has emerged as a new experimental approach in part because mass spectrometry has simplified protein analysis and characterization, and several important and recent innovations have extended the capability of mass spectrometry. T Mass spectrometry of biological molecules Mass spectrometers consist of three essential parts (Fig. 1). The first, an ionization source, converts molecules into gas-phase ions. Once ions are created, individual mass-tocharge ratios (m/z; see Box 1) are separated by a second device, a mass analyzer, and transferred to the third, an ion detector. A mass analyzer uses a physical property [e.g. electric or magnetic fields, or time-of-flight (TOF)] to separate ions of a particular m/z value that then strike the ion detector. The magnitude of the current that is produced at the detector as a function of time (i.e. the physical field in the mass analyzer is changed as a function of time) is used to determine the m/z value of the ion. Although mass analyzers are an important (and continually improving) component of mass spectrometers and determine critical performance characteristics, an important innovation for proteomics has been the development of two robust techniques to create ions of large molecules. Matrix-assisted laser desorption ionization (MALDI) creates ions by excitation of molecules that are isolated from the energy of the laser by an energy absorbing matrix. The laser energy strikes the crystalline matrix to cause rapid excitation of the matrix and subsequent ejection of matrix and analyte ions into the gas-phase. Electrospray ionization (ESI) creates ions by application of a potential to a flowing liquid causing the liquid 0168-9525/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(99)01879-X to charge and subsequently spray. The electrospray creates very small droplets of solvent-containing analyte. Solvent is removed as the droplets enter the mass spectrometer by heat or some other form of energy (e.g. energetic collisions with a gas), and multiply-charged ions are formed in the process. The detection limits that can be achieved with ESI have improved with a reduction in the flow rates1. These ionization techniques have stimulated developments in mass spectrometers to enhance the production of two different types of information. The first type of information is the accurate measurement of molecular weight. To measure molecular weight to the low ppm level, MALDI is used typically in conjunction with TOF mass analyzers. The second type of information, produced by tandem mass spectrometers (MS/MS), is diagnostic of amino acid sequence (Fig. 1b). Many types of MS/MS have been developed2, and new innovations allow greater automation and efficiency in data acquisition. Data can be generated in a data-dependent manner through interaction of the m/z data in each scan with a computer program to control the type of experiment performed3. For example, a scan of the mass range can reveal the presence of several ions above a preset ion-abundance threshold. The computer can signal the instrument to perform tandem mass spectrometry on each of the ions, thus improving the efficiency of data acquisition, particularly during separations when ions appear for only a brief period of time. Identifying proteins using mass spectrometry data and database searching Mass spectrometers are capable of generating data quickly and thus have a great potential for high-throughput analysis. An essential component to achieving greater throughput is simplifying data analysis. There is a direct relationship between mass spectrometry data and amino acid sequences. Peptide molecular weight measurements are predictive of amino acid composition, and peptide fragmentation information (as described in the glossary) relates to amino acid sequence. Both types of information can be correlated to protein sequences in the database. A single peptide TIG January 2000, volume 16, No. 1 John R. Yates, III jyates@ u.washington.edu Department of Molecular Biotechnology, University of Washington, Seattle, WA 98195-7730, USA. 5 Outlook COMMENT Mass spectrometry Ion source Mass analyzer (TOF) 10 2061.1366 20 1697.8175 30 1800.9144 1890.9643 40 1406.7220 1570.6782 Counts × 103 Mass spectrometer 1221.7473 1209.5710 (a) 766.4868 836.4362 904.4685 997.5691 FIGURE 1. The mass spectrometry approach 0 Detector 800 1000 1200 1400 1600 1800 2000 Mass (m/z) Peptide mass map Tandem mass spectrometer 10 0 200 400 600 922.4 800 1000 1200 1400 1074.5 1236.7 30 961.4 1051.6 333.1 50 (M + 2H)+2 = 703.5 778.5 Detector 70 619.0 Collision cell Mass analyzer-2 90 468.1 Ion source Mass analyzer-1 835.4 AVANESGANFISVK Relative abundance (b) Mass (m/z) Peptide fragmentation pattern trends in Genetics (a) A single-stage mass spectrometer. The instrument consists of three components: an ionization source, mass analyzer and ion detector. The mass analyzer that is shown is a time-of-flight (TOF) mass spectrometer. Mass-to-charge ratio (m/z) values are determined by measuring the time it takes ions to move from the ion source to the detector. The time that is required to move this distance can be directly correlated with the m/z value. A mass spectrum of a protein digest is shown to the right of the figure. (b) The components of one type of tandem mass spectrometer. The instrument consists of an ion source, first mass analyzer, gas-phase collision cell, second mass analyzer and ion detector. The first mass analyzer can be used to isolate a particular m/z value for dissociation in the collision cell. The dissociation products are then analyzed in the second mass analyzer. A tandem mass spectrum for a peptide produces a ladder of fragment ions that represent amide bond cleavage. A peptide spectrum is shown to the right of the mass spectrometer. molecular weight, however, is not generally unique to a specific protein, thus a collection of peptides (≥3) that are derived from the same protein must be used to find a unique match. The identity of an ‘unknown’ protein is determined by comparing the molecular-weight map of the ‘unknown’ protein with the theoretical molecular weights of peptides that are produced by digestion of each of the proteins in a database2,4. Proteins that contain peptide molecular weights that match a preponderance of the m/z values in the mass spectrum are then considered a match. An ability to acquire highly accurate m/z values has helped this method of protein identification a great deal. As the accuracy of molecular weight measurement increases, the number of peptides that will match that weight in the database will decrease5. A second method employs amino acid fragmentation data that are generated by MS/MS (Refs 6, 7). In this method, data that are specific to an individual peptide are collected. These data contain information that is specific to and diagnostic of the amino acid sequence of peptides. In the collision-induced dissociation (CID) process, peptides fragment in a predictable manner, thus sequences from the database can be used to predict an expected fragmentation pattern and match the expected pattern to that observed in the spectrum. An advantage of this approach is that each peptide tandem mass spectrum represents a 6 TIG January 2000, volume 16, No. 1 unique piece of information; consequently, matching one or more tandem mass spectra to sequences in the same protein provides a high level of confidence in the identification6,8. The identification process is not adversely effected by the presence of peptides from other proteins and is amenable to searching expressed sequence tag (EST) databases9. Thus, a collection of peptides that originate from a mixture of proteins allows the identification of the proteins that are present. Protein expression mapping The ability to identify proteins rapidly using mass spectrometry data has catalyzed the development of methods for large-scale protein analysis as well as the development of new approaches to analyze protein mixtures. A natural application of mass spectrometry is to identify the individual proteins that have been separated by gel electrophoresis. Two gel-separation methods are used to separate complicated protein mixtures. For simple protein mixtures (<100 components), single dimension (1-D) sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) is often used. Complex protein mixtures, such as total cell lysates, require the use of the highly resolving two-dimensional (2-D) SDS-PAGE. In this technique, proteins are separated by isoelectric point (pI) in the first dimension and then by COMMENT Mass spectrometry molecular weight in the second. Unlike a cDNA array, in which the location of every gene is known, a 2-D gel separates proteins and the identity of each ‘spot’ is therefore unknown. Mass spectrometry provides a rapid and sensitive method to ‘address’ the ‘spots’ with a protein identity. Link et al.10 and Shevchenko et al.11 have performed large-scale identification studies on proteins that have been separated by 2-D SDS-PAGE from Haemophilus influenzae NCTC 8143 and Saccharomyces cerevisiae, respectively. Data from MS/MS or a combination of MALDI–TOF and MS/MS have been used to identify the proteins. This approach requires efficient methods to digest proteins in the gel, extract the resulting peptides and transfer them to the mass spectrometer. By employing robotics to excise spots from the gel or to slice the gel into a series of small cubes and to digest the proteins, whole-gel analysis might be possible. Identification of proteins that have been purified by 1-D or 2-D SDS-PAGE can now be performed with low nanogram quantities of protein12. An interesting question that is raised by the ability to identify and quantify protein production and turnover is the correlation of these measurements to gene expression. Anderson and Seilhamer13 have compared levels of gene expression as measured by the number of human liver cDNAs that have been sequenced to known quantities of protein, which have been determined through 2-D SDSPAGE studies. Gygi et al.14 have attempted a more rigorous comparison of the levels of radioactively labelled S. cerevisiae proteins with those of mRNA through serial analysis of gene expression (SAGE analysis and with gene codon bias). In both studies, the mRNA and protein levels correlated poorly overall. Both studies compared levels of mRNA with those of abundant proteins; consequently, Outlook further study will be required to determine the correlation between less-abundant proteins and their corresponding mRNA. Because it is difficult to observe minor protein constituents in a total cell lysate using 2-D SDS-PAGE, some form of enrichment is generally required. Identification of proteins in complexes To observe proteins involved in specific biological processes it is possible to specifically enrich for these proteins. However, this requires knowledge of activity or at least one protein in the biological process. Under nondenaturing conditions, interacting proteins can be coenriched in these methods, which include: chromatography, co-immunoprecipitation (using antibodies against one of the components), co-precipitation (using affinity-tagged proteins) or protein affinity-interaction chromatography (Fig. 2). The final step of the analysis employs SDS-PAGE to separate the proteins for analysis. The use of mass spectrometry to either sequence or identify proteins from 1-D SDS-PAGE separations is growing because of the relative ease of data analysis and increased levels of sensitivity, as well as the potential for comprehensive analysis15,16. This situation is particularly true for studies that are conducted in organisms with completed genomes or a large collection of ESTs. By using this approach, a linkage between the S. cerevisiae SAGA complex and TATA-binding-proteinassociated factors (TAFIIs) has been shown convincingly17. Other examples include linking the S. cerevisiae ataxia telegiectasia mutated (ATM)-related cofactor Tra1 with the SAGA complex18, identification of proteins in the S. cerevisiae spindle-pole body19, and identification of the proteins in the yeast and human spliceosome20,21. This list is by no means exhaustive, but illustrates a clear trend in the FIGURE 2. Identifying protein-complex components by mass spectrometry Agarose Ig-G Agarose Co-immunoprecipitation (a) GST Protein Protein-interaction chromatography Multi-protein complex (b) Gel electrophoresis Proteolysis In-gel proteolysis LC/MS/MS Mass spectrometry Database search Database search Identification of protein components trends in Genetics The components of protein complexes can be determined using mass spectrometry from co-precipitation reactions, protein affinity-interaction chromatography or isolation. One of two approaches can be used to isolate and/or identify the components. (a) The collections of proteins can be separated using sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and then bands can be removed for proteolysis. Each protein from the gel can be subjected to mass spectrometry. Data is then searched through databases to identify the protein. A second approach (b) subjects the collection of proteins to proteolysis directly. The complicated collection of peptides is then analyzed directly by high-performance liquid chromatography (HPLC), combined with tandem mass spectrometry (LC/MS/MS), and the resulting spectra searched through a database. Software is used to assign peptides to their respective proteins, and thus identify the proteins present. TIG January 2000, volume 16, No. 1 7 Outlook COMMENT Mass spectrometry BOX 1. Glossary Collision-induced dissociation (CID) A method of energetically activating ions to dissociate. Typically, a gas-phase collision cell that is filled with argon gas is used to subject ions to low energy collision (10–50 eV) to cause energetic excitation. As ions become energetically excited, covalent bonds dissociate to produce structurally informative fragment ions. Often the molecular structure of the ion can be postulated from the fragmentation pattern, or in the case of peptides, the amino acid sequence can be deduced. Mass-to-charge ratio (m/z) Mass spectrometers measure the mass-to-charge ratios of ions. In matrix-assisted laser desorption ionization (MALDI) and electrospray ionization (ESI), peptides are typically ionized by the addition of one or more protons. Thus, a peptide of molecular weight 1000 Da will have an m/z value of 1001 after ionization by the addition of one proton and 501 with the addition of two (M+2H)+2. Tandem mass spectrometer (MS/MS) A tandem mass spectrometer combines two mass analyzers with a device (e.g. gas-phase collision cell) or method to energetically activate ions. In this approach, a particular m/z value can be isolated from all other ions that enter the mass analyzer at the same time, dissociated, and the m/z values of the dissociation products can be determined in the second mass analyzer. The dissociation process causes covalent bonds to fragment, leading to a collection of ions that are diagnostic of the molecular structure of the ion. In the case of peptide ions, fragmentation processes predominate at or around the amide bond, creating a ladder of ions that is indicative of an amino acid sequence (after careful deliberation). Time-of-flight (TOF) mass spectrometer A mass analyzer that measures m/z values by pulsing ions from the ion source into a flight tube. The time required for ions to travel a set distance and strike a detector is determined and m/z values are calculated from the time-of-flight measurements. TOF mass spectrometers can be used with matrix-assisted laser desorption ionization (MALDI) or electrospray ionization (ESI) sources. use of mass spectrometry to identify the components of functionally important protein complexes. Direct identification of proteins in complexes The use of MS/MS to identify proteins provides a unique analytical capability6,8. Because MS/MS can separate an ion very precisely from a collection of other ions, they provide a powerful tool to analyze protein mixtures. When numerous peptide ions enter an MS/MS, one peptide m/z value can be isolated and then dissociated to obtain a fragmentation pattern that is indicative of the amino acid sequence. The ability to search a database of sequences to match a tandem mass spectrum uniquely to a sequence allows proteins in mixtures to be identified6,8. Thus, an approach that is based on the proteolytic digestion of protein mixtures, which is followed by reversed-phase liquid References 1 Wilm, M.S. and Mann, M. (1994) Electrospray and Taylor-Cone theory, Dole’s beam of macromolecules at last? Int. J. Mass Spectrom. Ion Processes 136, 2–3 2 Yates, J.R., III (1998) Mass spectrometry and the age of the proteome. J. Mass Spectrom. 33, 1–19 3 Stahl, D.C. et al. (1996) J. Am. Soc. Mass Spectrom. 7, 532–540 4 Yates, J.R., III et al. (1996) Mining genomes with mass spectrometry. Anal. Chem. 68, 534–540 5 Jensen, O.N. et al. (1996) Delayed extraction improves specificity in database searches by matrix-assisted laser desorption/ionization peptide maps. Rapid Commun. Mass Spectrom. 10, 1371–1378 6 Eng, J.K. et al. (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 7 Mann, M. and Wilm, M. (1994) Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66, 4390–4399 8 McCormack, A.L. et al. (1997) Direct analysis and identification of proteins in mixtures by LC/MS/MS and database searching at the low-femtomole level. Anal. Chem. 69, 767–776 8 chromatography to separate or partially fractionate the complex peptide mixture and direct introduction into a tandem mass spectrometer, has been developed. Several advantages flow from this strategy. This approach reduces the reliance on SDS-PAGE to separate proteins for analysis, provides a more flexible strategy for proteolytic digestion and manipulation, and can take advantage of the sensitivity of mass spectrometry. Direct identification of proteins in mixtures has been used in several types of experiments. McCormack et al. have used the method with immunoaffinity precipitation, protein affinity-interaction chromatography, and to identify proteins that interact with large protein complexes8 (Fig. 2). This approach has also been used to identify components of the yeast ribosome using 2-D liquid and MS/MS. A total of 80 proteins were identified in a single experiment, at least ten of which were not observed by 2-D gel electrophoresis22. Coupling the direct-identification approach with quantitative methods to measure relative protein expression will greatly increase the value of the data that are produced23. Conclusions Integration of the information that is produced through structural genomics has significantly improved protein discovery by proteomics. It is now possible to be more thorough and rigorous in the analysis of proteins that have been obtained through molecular biology experiments and to do so with higher throughput. As more genomes are completed, the task of reconstructing metabolic and regulatory networks, pathways and subsequently the functions of all proteins, will be more straightforward. Much of this information will be derived from the study of protein– protein interactions and protein complexes, experiments that are simplified through the availability of better tools for protein analysis. Future prospects are bright: the sensitivity of protein analysis by mass spectrometry will continue to improve, as will the sophistication of data-dependent acquisition methods and data-analysis software. Acknowledgements The author is supported by NSF BIR921482, NIH RR11823-03, and NCI R33CA81665-01. 9 Yates, J.R., III et al. (1995) Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to nucleotide sequences. Anal. Chem. 67, 3202–3210 10 Link, A.J. et al. (1997) Identifying the major components of Haemophilus influenzae type-strain NCTC 8143. Electrophoresis 18, 1314–1334 11 Shevchenko, A. et al. (1996) Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two-dimensional gels. Proc. Natl. Acad. Sci. U. S. A. 93, 14440–14445 12 Figeys, D. et al. (1996) Protein identification by solid phase microextraction-capillary zone electrophoresismicroelectrospray-tandem mass spectrometry. Nat. Biotechnol. 14, 1579–1583 13 Anderson, L. and Seilhamer, J. (1997) A comparison of selected mRNA and protein abundances in human liver. Electrophoresis 18, 533–537 14 Gygi, S.P. et al. (1999) Correlation between protein and mRNA abundance in yeast. Mol. Cell. Biol. 19, 1720–1730 15 Lamond, A. and Mann, M. (1997) Cell biology and genome projects – a concerted strategy for characterizing multi-protein complexes by mass spectrometry. Trends Cell Biol. 7, 139–142 TIG January 2000, volume 16, No. 1 16 Blackstock, W.P. and Weir, M.P. (1999) Proteomics: quantitative and physical mapping of cellular proteins. Trends Biotechnol. 17, 121–127 17 Grant, P.A. et al. (1998) A subset of TAF(II)s are integral components of the SAGA complex required for nucleosome acetylation and transcriptional stimulation. Cell 94, 45–53 18 Grant, P.A. et al. (1998) The ATM-related cofactor Tra1 is a component of the purified SAGA complex. Mol. Cell 2, 863–867 19 Wigge, P.A. et al. (1998) Analysis of the Saccharomyces spindle pole by matrix-assisted laser desorption/ionization (MALDI) mass spectrometry. J. Cell Biol. 141, 967–977 20 Neubauer, G. et al. (1997) Identification of the proteins of the yeast U1 small nuclear ribonucleoprotein complex by mass spectrometry. Proc. Natl. Acad. Sci. U. S. A. 94, 385–390 21 Gottschalk, A. et al. (1998) A comprehensive biochemical and genetic analysis of the yeast U1 snRNP reveals five novel proteins. RNA 4, 374–393 22 Link, A.J. et al. (1999) Direct analysis of protein complexes using mass spectrometry. Nat. Biotechnol. 17, 676–682 23 Gygi, S.A. et al. Quantitative analysis of complex protein mixtures using isotope coded affinity tags. Nat. Biotechnol. (in press)
© Copyright 2026 Paperzz