Mass spectrometry

COMMENT
Origin of anterior patterning
symbiolongicarpus (Cnidaria: Hydrozoa). Proc. Natl. Acad. Sci.
U. S. A. 95, 3673–3678
37 Schummer, M. et al. (1992) HOM/HOX homeobox genes are
present in hydra (Chlorohydra viridissima) and are
differentially expressed during regeneration. EMBO J. 11,
1815–1823
38 Martinez, D.E. et al. (1997) Budhead, a forkhead/HNF3
homologue, is expressed during axis formation and head
Outlook
specification in hydra. Dev. Biol. 192, 523–536
39 Technau, U. and Bode, H.R. (1999) HyBra1, a Brachyury
homologue, acts during head formation in Hydra. Development
126, 999–1010
Mass spectrometry
from genomics to proteomics
Large-scale DNA sequencing has stimulated the development of proteomics by providing a sequence
infrastructure for protein analysis. Rapid and automated protein identification can be achieved by searching
protein and nucleotide sequence databases directly with data generated by mass spectrometry. A highthroughput and large-scale approach to identifying proteins has been the result. These technological changes
have advanced protein expression studies and the identification of proteins in complexes, two types of studies
that are essential in deciphering the networks of proteins that are involved in biological processes.
he elucidation of an organism’s genome is the first and
important step towards understanding its biology, and
the data created by whole-genome sequencing have significant benefits in fields outside those of genomics and bioinformatics. One area to benefit is that of proteomics. The
term proteomics, or more appropriately functional proteomics, describes the ability to apply global (proteomewide or system-wide) experimental approaches to assess
protein function. Proteomics has emerged as a new experimental approach in part because mass spectrometry
has simplified protein analysis and characterization, and
several important and recent innovations have extended
the capability of mass spectrometry.
T
Mass spectrometry of biological molecules
Mass spectrometers consist of three essential parts (Fig. 1).
The first, an ionization source, converts molecules into
gas-phase ions. Once ions are created, individual mass-tocharge ratios (m/z; see Box 1) are separated by a second
device, a mass analyzer, and transferred to the third, an
ion detector. A mass analyzer uses a physical property
[e.g. electric or magnetic fields, or time-of-flight (TOF)] to
separate ions of a particular m/z value that then strike the
ion detector. The magnitude of the current that is produced at the detector as a function of time (i.e. the physical field in the mass analyzer is changed as a function of
time) is used to determine the m/z value of the ion.
Although mass analyzers are an important (and continually improving) component of mass spectrometers and
determine critical performance characteristics, an important innovation for proteomics has been the development
of two robust techniques to create ions of large molecules.
Matrix-assisted laser desorption ionization (MALDI)
creates ions by excitation of molecules that are isolated
from the energy of the laser by an energy absorbing
matrix. The laser energy strikes the crystalline matrix to
cause rapid excitation of the matrix and subsequent
ejection of matrix and analyte ions into the gas-phase.
Electrospray ionization (ESI) creates ions by application of a potential to a flowing liquid causing the liquid
0168-9525/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(99)01879-X
to charge and subsequently spray. The electrospray creates
very small droplets of solvent-containing analyte. Solvent
is removed as the droplets enter the mass spectrometer by
heat or some other form of energy (e.g. energetic collisions
with a gas), and multiply-charged ions are formed in the
process. The detection limits that can be achieved with ESI
have improved with a reduction in the flow rates1.
These ionization techniques have stimulated developments in mass spectrometers to enhance the production of
two different types of information. The first type of information is the accurate measurement of molecular weight.
To measure molecular weight to the low ppm level,
MALDI is used typically in conjunction with TOF mass
analyzers. The second type of information, produced by
tandem mass spectrometers (MS/MS), is diagnostic of
amino acid sequence (Fig. 1b). Many types of MS/MS
have been developed2, and new innovations allow greater
automation and efficiency in data acquisition. Data can be
generated in a data-dependent manner through interaction
of the m/z data in each scan with a computer program to
control the type of experiment performed3. For example, a
scan of the mass range can reveal the presence of several
ions above a preset ion-abundance threshold. The computer can signal the instrument to perform tandem mass
spectrometry on each of the ions, thus improving the
efficiency of data acquisition, particularly during separations when ions appear for only a brief period of time.
Identifying proteins using mass spectrometry
data and database searching
Mass spectrometers are capable of generating data quickly
and thus have a great potential for high-throughput analysis.
An essential component to achieving greater throughput is
simplifying data analysis. There is a direct relationship
between mass spectrometry data and amino acid sequences.
Peptide molecular weight measurements are predictive of
amino acid composition, and peptide fragmentation information (as described in the glossary) relates to amino acid
sequence. Both types of information can be correlated to
protein sequences in the database. A single peptide
TIG January 2000, volume 16, No. 1
John R. Yates, III
jyates@
u.washington.edu
Department of Molecular
Biotechnology, University
of Washington, Seattle,
WA 98195-7730, USA.
5
Outlook
COMMENT
Mass spectrometry
Ion source
Mass analyzer (TOF)
10
2061.1366
20
1697.8175
30
1800.9144
1890.9643
40
1406.7220
1570.6782
Counts × 103
Mass spectrometer
1221.7473 1209.5710
(a)
766.4868
836.4362
904.4685
997.5691
FIGURE 1. The mass spectrometry approach
0
Detector
800
1000 1200 1400 1600 1800 2000
Mass (m/z)
Peptide mass map
Tandem mass spectrometer
10
0
200
400
600
922.4
800
1000 1200 1400
1074.5
1236.7
30
961.4
1051.6
333.1
50
(M + 2H)+2 = 703.5
778.5
Detector
70
619.0
Collision cell Mass analyzer-2
90
468.1
Ion source Mass analyzer-1
835.4
AVANESGANFISVK
Relative abundance
(b)
Mass (m/z)
Peptide fragmentation pattern
trends in Genetics
(a) A single-stage mass spectrometer. The instrument consists of three components: an ionization source, mass analyzer and ion detector. The mass analyzer that is
shown is a time-of-flight (TOF) mass spectrometer. Mass-to-charge ratio (m/z) values are determined by measuring the time it takes ions to move from the ion source
to the detector. The time that is required to move this distance can be directly correlated with the m/z value. A mass spectrum of a protein digest is shown to the right
of the figure. (b) The components of one type of tandem mass spectrometer. The instrument consists of an ion source, first mass analyzer, gas-phase collision cell,
second mass analyzer and ion detector. The first mass analyzer can be used to isolate a particular m/z value for dissociation in the collision cell. The dissociation
products are then analyzed in the second mass analyzer. A tandem mass spectrum for a peptide produces a ladder of fragment ions that represent amide bond
cleavage. A peptide spectrum is shown to the right of the mass spectrometer.
molecular weight, however, is not generally unique to a specific protein, thus a collection of peptides (≥3) that are
derived from the same protein must be used to find a unique
match. The identity of an ‘unknown’ protein is determined
by comparing the molecular-weight map of the ‘unknown’
protein with the theoretical molecular weights of peptides
that are produced by digestion of each of the proteins in a
database2,4. Proteins that contain peptide molecular weights
that match a preponderance of the m/z values in the mass
spectrum are then considered a match. An ability to acquire
highly accurate m/z values has helped this method of protein
identification a great deal. As the accuracy of molecular
weight measurement increases, the number of peptides that
will match that weight in the database will decrease5.
A second method employs amino acid fragmentation
data that are generated by MS/MS (Refs 6, 7). In this
method, data that are specific to an individual peptide are
collected. These data contain information that is specific
to and diagnostic of the amino acid sequence of peptides.
In the collision-induced dissociation (CID) process, peptides fragment in a predictable manner, thus sequences
from the database can be used to predict an expected fragmentation pattern and match the expected pattern to that
observed in the spectrum. An advantage of this approach
is that each peptide tandem mass spectrum represents a
6
TIG January 2000, volume 16, No. 1
unique piece of information; consequently, matching one
or more tandem mass spectra to sequences in the same
protein provides a high level of confidence in the identification6,8. The identification process is not adversely
effected by the presence of peptides from other proteins
and is amenable to searching expressed sequence tag (EST)
databases9. Thus, a collection of peptides that originate
from a mixture of proteins allows the identification of the
proteins that are present.
Protein expression mapping
The ability to identify proteins rapidly using mass spectrometry data has catalyzed the development of methods for
large-scale protein analysis as well as the development of
new approaches to analyze protein mixtures. A natural
application of mass spectrometry is to identify the individual proteins that have been separated by gel electrophoresis.
Two gel-separation methods are used to separate complicated protein mixtures. For simple protein mixtures (<100
components), single dimension (1-D) sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) is often
used. Complex protein mixtures, such as total cell lysates,
require the use of the highly resolving two-dimensional (2-D)
SDS-PAGE. In this technique, proteins are separated by isoelectric point (pI) in the first dimension and then by
COMMENT
Mass spectrometry
molecular weight in the second. Unlike a cDNA array, in
which the location of every gene is known, a 2-D gel separates
proteins and the identity of each ‘spot’ is therefore
unknown. Mass spectrometry provides a rapid and sensitive
method to ‘address’ the ‘spots’ with a protein identity.
Link et al.10 and Shevchenko et al.11 have performed
large-scale identification studies on proteins that have
been separated by 2-D SDS-PAGE from Haemophilus
influenzae NCTC 8143 and Saccharomyces cerevisiae,
respectively. Data from MS/MS or a combination of
MALDI–TOF and MS/MS have been used to identify the
proteins. This approach requires efficient methods to
digest proteins in the gel, extract the resulting peptides and
transfer them to the mass spectrometer. By employing
robotics to excise spots from the gel or to slice the gel into
a series of small cubes and to digest the proteins, whole-gel
analysis might be possible. Identification of proteins that
have been purified by 1-D or 2-D SDS-PAGE can now be
performed with low nanogram quantities of protein12.
An interesting question that is raised by the ability to
identify and quantify protein production and turnover is
the correlation of these measurements to gene expression.
Anderson and Seilhamer13 have compared levels of gene
expression as measured by the number of human liver
cDNAs that have been sequenced to known quantities of
protein, which have been determined through 2-D SDSPAGE studies. Gygi et al.14 have attempted a more rigorous comparison of the levels of radioactively labelled
S. cerevisiae proteins with those of mRNA through serial
analysis of gene expression (SAGE analysis and with gene
codon bias). In both studies, the mRNA and protein levels
correlated poorly overall. Both studies compared levels of
mRNA with those of abundant proteins; consequently,
Outlook
further study will be required to determine the correlation
between less-abundant proteins and their corresponding
mRNA. Because it is difficult to observe minor protein
constituents in a total cell lysate using 2-D SDS-PAGE,
some form of enrichment is generally required.
Identification of proteins in complexes
To observe proteins involved in specific biological
processes it is possible to specifically enrich for these proteins. However, this requires knowledge of activity or at
least one protein in the biological process. Under nondenaturing conditions, interacting proteins can be coenriched in these methods, which include: chromatography, co-immunoprecipitation (using antibodies against one
of the components), co-precipitation (using affinity-tagged
proteins) or protein affinity-interaction chromatography
(Fig. 2). The final step of the analysis employs SDS-PAGE
to separate the proteins for analysis. The use of mass spectrometry to either sequence or identify proteins from 1-D
SDS-PAGE separations is growing because of the relative
ease of data analysis and increased levels of sensitivity, as
well as the potential for comprehensive analysis15,16. This
situation is particularly true for studies that are conducted
in organisms with completed genomes or a large collection
of ESTs. By using this approach, a linkage between the
S. cerevisiae SAGA complex and TATA-binding-proteinassociated factors (TAFIIs) has been shown convincingly17.
Other examples include linking the S. cerevisiae ataxia
telegiectasia mutated (ATM)-related cofactor Tra1 with
the SAGA complex18, identification of proteins in the
S. cerevisiae spindle-pole body19, and identification of the
proteins in the yeast and human spliceosome20,21. This list is
by no means exhaustive, but illustrates a clear trend in the
FIGURE 2. Identifying protein-complex components by mass spectrometry
Agarose
Ig-G
Agarose
Co-immunoprecipitation
(a)
GST Protein
Protein-interaction chromatography
Multi-protein complex
(b)
Gel electrophoresis
Proteolysis
In-gel proteolysis
LC/MS/MS
Mass spectrometry
Database search
Database search
Identification of protein components
trends in Genetics
The components of protein complexes can be determined using mass spectrometry from co-precipitation reactions, protein affinity-interaction chromatography or
isolation. One of two approaches can be used to isolate and/or identify the components. (a) The collections of proteins can be separated using sodium dodecyl sulfate
polyacrylamide gel electrophoresis (SDS-PAGE) and then bands can be removed for proteolysis. Each protein from the gel can be subjected to mass spectrometry. Data
is then searched through databases to identify the protein. A second approach (b) subjects the collection of proteins to proteolysis directly. The complicated collection
of peptides is then analyzed directly by high-performance liquid chromatography (HPLC), combined with tandem mass spectrometry (LC/MS/MS), and the resulting
spectra searched through a database. Software is used to assign peptides to their respective proteins, and thus identify the proteins present.
TIG January 2000, volume 16, No. 1
7
Outlook
COMMENT
Mass spectrometry
BOX 1. Glossary
Collision-induced dissociation (CID)
A method of energetically activating ions to dissociate. Typically, a gas-phase collision cell that
is filled with argon gas is used to subject ions to low energy collision (10–50 eV) to cause energetic excitation. As ions become energetically excited, covalent bonds dissociate to produce structurally informative fragment ions. Often the molecular structure of the ion can be postulated from
the fragmentation pattern, or in the case of peptides, the amino acid sequence can be deduced.
Mass-to-charge ratio (m/z)
Mass spectrometers measure the mass-to-charge ratios of ions. In matrix-assisted laser desorption ionization (MALDI) and electrospray ionization (ESI), peptides are typically ionized by the addition of one or more protons. Thus, a peptide of molecular weight 1000 Da will have an m/z value
of 1001 after ionization by the addition of one proton and 501 with the addition of two (M+2H)+2.
Tandem mass spectrometer (MS/MS)
A tandem mass spectrometer combines two mass analyzers with a device (e.g. gas-phase collision cell) or method to energetically activate ions. In this approach, a particular m/z value can be
isolated from all other ions that enter the mass analyzer at the same time, dissociated, and the
m/z values of the dissociation products can be determined in the second mass analyzer. The dissociation process causes covalent bonds to fragment, leading to a collection of ions that are diagnostic of the molecular structure of the ion. In the case of peptide ions, fragmentation processes
predominate at or around the amide bond, creating a ladder of ions that is indicative of an amino
acid sequence (after careful deliberation).
Time-of-flight (TOF) mass spectrometer
A mass analyzer that measures m/z values by pulsing ions from the ion source into a flight tube.
The time required for ions to travel a set distance and strike a detector is determined and m/z values are calculated from the time-of-flight measurements. TOF mass spectrometers can be used
with matrix-assisted laser desorption ionization (MALDI) or electrospray ionization (ESI) sources.
use of mass spectrometry to identify the components of
functionally important protein complexes.
Direct identification of proteins in complexes
The use of MS/MS to identify proteins provides a unique
analytical capability6,8. Because MS/MS can separate an
ion very precisely from a collection of other ions, they provide a powerful tool to analyze protein mixtures. When
numerous peptide ions enter an MS/MS, one peptide m/z
value can be isolated and then dissociated to obtain a fragmentation pattern that is indicative of the amino acid
sequence. The ability to search a database of sequences to
match a tandem mass spectrum uniquely to a sequence
allows proteins in mixtures to be identified6,8. Thus, an
approach that is based on the proteolytic digestion of protein mixtures, which is followed by reversed-phase liquid
References
1 Wilm, M.S. and Mann, M. (1994) Electrospray and Taylor-Cone
theory, Dole’s beam of macromolecules at last? Int. J. Mass
Spectrom. Ion Processes 136, 2–3
2 Yates, J.R., III (1998) Mass spectrometry and the age of the
proteome. J. Mass Spectrom. 33, 1–19
3 Stahl, D.C. et al. (1996) J. Am. Soc. Mass Spectrom. 7, 532–540
4 Yates, J.R., III et al. (1996) Mining genomes with mass
spectrometry. Anal. Chem. 68, 534–540
5 Jensen, O.N. et al. (1996) Delayed extraction improves
specificity in database searches by matrix-assisted laser
desorption/ionization peptide maps. Rapid Commun. Mass
Spectrom. 10, 1371–1378
6 Eng, J.K. et al. (1994) An approach to correlate tandem mass
spectral data of peptides with amino acid sequences in a
protein database. J. Am. Soc. Mass Spectrom. 5, 976–989
7 Mann, M. and Wilm, M. (1994) Error-tolerant identification of
peptides in sequence databases by peptide sequence tags.
Anal. Chem. 66, 4390–4399
8 McCormack, A.L. et al. (1997) Direct analysis and identification
of proteins in mixtures by LC/MS/MS and database searching at
the low-femtomole level. Anal. Chem. 69, 767–776
8
chromatography to separate or partially fractionate the
complex peptide mixture and direct introduction into a
tandem mass spectrometer, has been developed. Several
advantages flow from this strategy. This approach reduces
the reliance on SDS-PAGE to separate proteins for
analysis, provides a more flexible strategy for proteolytic
digestion and manipulation, and can take advantage of the
sensitivity of mass spectrometry.
Direct identification of proteins in mixtures has been
used in several types of experiments. McCormack et al.
have used the method with immunoaffinity precipitation,
protein affinity-interaction chromatography, and to identify proteins that interact with large protein complexes8
(Fig. 2). This approach has also been used to identify components of the yeast ribosome using 2-D liquid and
MS/MS. A total of 80 proteins were identified in a single
experiment, at least ten of which were not observed by
2-D gel electrophoresis22. Coupling the direct-identification
approach with quantitative methods to measure relative
protein expression will greatly increase the value of the
data that are produced23.
Conclusions
Integration of the information that is produced through
structural genomics has significantly improved protein discovery by proteomics. It is now possible to be more
thorough and rigorous in the analysis of proteins that have
been obtained through molecular biology experiments and
to do so with higher throughput. As more genomes are
completed, the task of reconstructing metabolic and regulatory networks, pathways and subsequently the functions
of all proteins, will be more straightforward. Much of this
information will be derived from the study of protein–
protein interactions and protein complexes, experiments
that are simplified through the availability of better tools
for protein analysis. Future prospects are bright: the sensitivity of protein analysis by mass spectrometry will continue
to improve, as will the sophistication of data-dependent
acquisition methods and data-analysis software.
Acknowledgements
The author is supported by NSF BIR921482, NIH
RR11823-03, and NCI R33CA81665-01.
9 Yates, J.R., III et al. (1995) Mining genomes: correlating
tandem mass spectra of modified and unmodified peptides to
nucleotide sequences. Anal. Chem. 67, 3202–3210
10 Link, A.J. et al. (1997) Identifying the major components of
Haemophilus influenzae type-strain NCTC 8143.
Electrophoresis 18, 1314–1334
11 Shevchenko, A. et al. (1996) Linking genome and proteome by
mass spectrometry: large-scale identification of yeast
proteins from two-dimensional gels. Proc. Natl. Acad. Sci.
U. S. A. 93, 14440–14445
12 Figeys, D. et al. (1996) Protein identification by solid phase
microextraction-capillary zone electrophoresismicroelectrospray-tandem mass spectrometry. Nat.
Biotechnol. 14, 1579–1583
13 Anderson, L. and Seilhamer, J. (1997) A comparison of
selected mRNA and protein abundances in human liver.
Electrophoresis 18, 533–537
14 Gygi, S.P. et al. (1999) Correlation between protein and mRNA
abundance in yeast. Mol. Cell. Biol. 19, 1720–1730
15 Lamond, A. and Mann, M. (1997) Cell biology and genome
projects – a concerted strategy for characterizing multi-protein
complexes by mass spectrometry. Trends Cell Biol. 7, 139–142
TIG January 2000, volume 16, No. 1
16 Blackstock, W.P. and Weir, M.P. (1999) Proteomics:
quantitative and physical mapping of cellular proteins. Trends
Biotechnol. 17, 121–127
17 Grant, P.A. et al. (1998) A subset of TAF(II)s are integral
components of the SAGA complex required for nucleosome
acetylation and transcriptional stimulation. Cell 94, 45–53
18 Grant, P.A. et al. (1998) The ATM-related cofactor Tra1 is a
component of the purified SAGA complex. Mol. Cell 2, 863–867
19 Wigge, P.A. et al. (1998) Analysis of the Saccharomyces
spindle pole by matrix-assisted laser desorption/ionization
(MALDI) mass spectrometry. J. Cell Biol. 141, 967–977
20 Neubauer, G. et al. (1997) Identification of the proteins of the
yeast U1 small nuclear ribonucleoprotein complex by mass
spectrometry. Proc. Natl. Acad. Sci. U. S. A. 94, 385–390
21 Gottschalk, A. et al. (1998) A comprehensive biochemical and
genetic analysis of the yeast U1 snRNP reveals five novel
proteins. RNA 4, 374–393
22 Link, A.J. et al. (1999) Direct analysis of protein complexes
using mass spectrometry. Nat. Biotechnol. 17, 676–682
23 Gygi, S.A. et al. Quantitative analysis of complex protein
mixtures using isotope coded affinity tags. Nat. Biotechnol. (in
press)