Plant proteomics in a nutshell Dr. Juan Antonio Vizcaíno PRIDE Group Coordinator Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK Overview • Short intro to proteomics and mass spectrometry • Things to consider in the bioinformatics analysis • Current existing MS proteomics approaches • Protein sequence databases (with some specifics for plants) Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 From the genome to the proteome Genomics Transcriptomics Proteomics Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Key technologies in modern biology Genomics DNA sequencing is a central technology for studying DNA Transcriptomics Microarrays and RNA-seq are a central technology for studying RNA Proteomics Mass spectrometry is a central technology for studying the proteome. Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Definitions to start Proteomics is the large-scale study of proteins, particularly their structures and functions The proteome is the entire complement of proteins including the modifications made to a particular set of proteins, produced by an organism or system. This will vary with time and distinct requirements, or stresses, that a cell or organism undergoes proteome = ‘protein’ + ‘genome’ (M. Wilkins, 1994) Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Genome vs. proteome •Genome •Proteome • Essentially static over time • Non location specific • Human genome mapped (2000) • ~20,000 genes • PCR is available to amplify DNA Juan A. Vizcaíno [email protected] • Dynamic over time • Location specific • Human proteome nonmapped: • How many??? • No equivalent of PCR for proteins Agricultural-Omics Course Hinxton, 20 February 2014 Mass spectrometry (MS) MS is an analytical technique that measures the mass-to-charge (m/z) ratio of charged particles. It is used for determining masses of particles, for the determination of the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and other chemical compounds. Many applications… one of them is proteomics Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Exclusive information available through MS proteomics • Sometimes there is not much correlation between gene expression and protein expression… • Biomarkers: easy access to human fluids (plasma, urine, …) • Post-Translational Modifications (PTMs). Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 MAIN MS PROTEOMICS WORKFLOWS Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Mass Spectrometry (MS)-based proteomics • Many different workflows. • Discovery mode: • Bottom-up proteomics • Top down proteomics • Targeted mode: • SRM (Selected Monitoring) Juan A. Vizcaíno [email protected] Reaction Agricultural-Omics Course Hinxton, 20 February 2014 10 Mass Spectrometry (MS)-based proteomics • Many different workflows. • Discovery mode: • Bottom-up proteomics • Top down proteomics • Targeted mode: • SRM (Selected Monitoring) Juan A. Vizcaíno [email protected] Reaction Agricultural-Omics Course Hinxton, 20 February 2014 11 MS proteomics: tandem MS (bottom-up) MS/MS matching identifies peptides, not proteins. Proteins are inferred from the peptide sequences. Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 From protein centric to peptide centric The rapid development of genomics has allowed the development of proteomics MS Shot-gun proteomics: Method of identifying proteins in complex mixture HPLC 100 % 0 Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 100 300 500 700 900 1100 1300 1500 1700 1900 2100 m/z MS proteomics: tandem MS (bottom-up) Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 A Start with a protein A I K H Q G K E A I R L N V I T I D V D C G R T A [email protected] T L K Juan A. Vizcaíno P Agricultural-Omics Course Hinxton, 20 February 2014 A Cut with a protease (trypsin) A I K H Q G K E A I R L N V I T I D V D C G R T A [email protected] T L K Juan A. Vizcaíno P Agricultural-Omics Course Hinxton, 20 February 2014 A Select a peptide A I K H Q G K E A I R L N V I T I D V D C G R T A [email protected] T L K Juan A. Vizcaíno P Agricultural-Omics Course Hinxton, 20 February 2014 Digestion with trypsin 546 aa 60 kDa; 57 461 Da pI = 4.75 >RBME00320 Contig0311_1089618_1091255 EC-mopA 60 KDa chaperonin GroEL MAAKDVKFGR TAREKMLRGV DILADAVKVT LGPKGRNVVI EKSFGAPRIT KDGVSVAKEV ELEDKFENMG AQMLREVASK TNDTAGDGTT TATVLGQAIV QEGAKAVAAG MNPMDLKRGI DLAVNEVVAE LLKKAKKINT SEEVAQVGTI SANGEAEIGK MIAEAMQKVG NEGVITVEEA KTAETELEVV EGMQFDRGYL SPYFVTNPEK MVADLEDAYI LLHEKKLSNL QALLPVLEAV VQTSKPLLII AEDVEGEALA TLVVNKLRGG LKIAAVKAPG FGDCRKAMLE DIAILTGGQV ISEDLGIKLE SVTLDMLGRA KKVSISKENT TIVDGAGQKA EIDARVGQIK QQIEETTSDY DREKLQERLA KLAGGVAVIR VGGATEVEVK EKKDRVDDAL NATRAAVEEG IVAGGGTALL RASTKITAKG VNADQEAGIN IVRRAIQAPA RQITTNAGEE ASVIVGKILE NTSETFGYNT ANGEYGDLIS LGIVDPVKVV RTALQNAASV AGLLITTEAM IAELPKKDAA PAGMPGGMGG MGGMDF The sequence of the generated peptides is known Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Digestion with trypsin MAAK DVK FGR TAR EK MLR GVDILADAVK VTLGPK GR NVVI EK SFGAPR ITK DGVSVAK EVELEDK FENMGAQMLR VQTSKPLLIIAEDVEGEALATLVVNK EVASK TNDTAGDGTT TATVLGQAIVQEGAK AVAAG MNPMDLK GI DLAVNEVVAELLK KA INT SEEVAQVGTI SANGEAEIGK MIAEAMQK VG NEGVITVEEA KTAETELEVVEGMQFDR GYLSPYFVTNPEK MVADLEDAYILLHEK LSNLQALLPVLEAVLR Juan A. Vizcaíno [email protected] GGLK IAAVK APGFGDCR AMLEDIAILTGGQV ISEDLGIK LESVTLDMLGR AK VSISK ENTTIVDGAGQK AEIDAR VGQIK QQIEETTSDYDR EK LQER LAK LAGGVAVIR VGGATEVEVK DR VDDALNATR AAVEEGIVAGGGTALL R ASTK ITAK GVNADQEAGIN IVR AIQAPAR QITTNAGEEASVIVGK ILENTSETFGYNTANGEYGDLISLGIVDPVK VVR TALQNAASVAGLLITTEAMIAELPK DAAPAGMPGGMGGMGGMDF Agricultural-Omics Course Hinxton, 20 February 2014 MS proteomics: tandem MS (bottom-up) Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Mass Spec Principles Sample + _ Ionization Source Juan A. Vizcaíno [email protected] Mass Analyzer/s Agricultural-Omics Course Hinxton, 20 February 2014 Detector Schematic view of a generalized mass spec sample ion source mass analyzer(s) detector digitizer Generalized mass spectrometer - All mass analyzers operate on gas-phase ions using electromagnetic fields. The latter can be in absolute or relative measurements. - The ion source therefore makes sure that (part of) the sample molecules are ionized and brought into the gas phase. - The detector is responsible for actually recording the presence of ions. Time-of-flight analyzers also require a digitizer (ADC). Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 MS/MS MS analysis 100 Peptide Mass Fingerprinting (PMF) % 0 300 400 500 600 700 800 900 1000 m/z 1100 Fragmentation Peptide sequence 100 information (on top of Mass and Charge) MS/MS analysis % 0 Juan A. Vizcaíno [email protected] 100 300 500 700 900 Agricultural-Omics Course Hinxton, 20 February 2014 1100 1300 1500 1700 1900 2100 m/z Why tandem-MS? peptide structure x3 y3 R1 NH2 C H a1 CO b1 N H c1 z3 y2 x2 z2 R2 R3 CH2 CH2 C H CO a2 b2 N H c2 y1 x1 C H R4 CO a3 z1 b3 N H C H COOH c3 There are several other ion types that can be annotated, as well as ‘internal fragments’. The latter are fragments that no longer contain an intact terminus. These are harder to use for ‘ladder sequencing’, but can still be interpreted. This nomenclature was coined by Roepstorff and Fohlmann (Biomed. Mass Spec., 1984) and Klaus Biemann (Biomed. Environ. Mass Spec., 1988) and is commonly referred to as ‘Biemann nomenclature’. Note the link with the Roman alphabet. Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Comparison between the instruments From: Domon & Aebersold, Science, 2006 Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 MS proteomics: tandem MS (bottom-up) Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 MS/MS IDENTIFICATION Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Three types of MS/MS identification Protein database based comparison database sequence theoretical spectrum compare experimental spectrum Sequential comparison: de novo approaches database sequence compare de novo sequence experimental spectrum Spectral comparison Spectral library experimental spectrum compare experimental spectrum Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007 Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 MS proteomics: peptide IDs and protein IDs 100 100 100 % 100 % 100 0 % 100 0 300 500 % 700 100 900 100 300 500 % 700 100 900 100 300 500 % 0 0 100 0 1100 1300 1500 1700 1900 2100 700 100 900 300 500 % 100 300 500 % 700 100 900 100 300 500 % 0 100 0 m/z 1100 1300 1500 1700 1900 2100 700 100 900 0 m/z 1100 1300 1500 1700 1900 2100 m/z 1100 1300 1500 1700 1900 2100 m/z 1100 1300 1500 1700 1900 2100 700 100 900 m/z 1100 1300 1500 1700 1900 2100 m/z m/z 300 500 % 700 100 900 100 300 500 % 700 900 1100 1300 1500 1700 1900 2100 100 300 500 % 700 900 0 0 100 0 1100 1300 1500 1700 1900 2100 m/z 1100 1300 1500 1700 1900 2100 m/z m/z 300 500 700 900 1100 1300 1500 1700 1900 2100 100 300 500 700 900 1100 1300 1500 1700 1900 2100 100 300 500 700 900 0 m/z 1100 1300 1500 1700 1900 2100 m/z MS/MS spectra proteins Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 MS proteomics: peptide IDs and protein IDs 100 100 100 % 100 % 100 0 % 100 0 300 500 % 700 100 900 100 300 500 % 700 100 900 100 300 500 % 0 0 100 0 1100 1300 1500 1700 1900 2100 700 100 900 300 500 % 100 300 500 % 700 100 900 100 300 500 % 0 100 0 m/z 1100 1300 1500 1700 1900 2100 700 100 900 0 m/z 1100 1300 1500 1700 1900 2100 m/z 1100 1300 1500 1700 1900 2100 m/z 1100 1300 1500 1700 1900 2100 700 100 900 m/z 1100 1300 1500 1700 1900 2100 m/z m/z 300 500 % 700 100 900 100 300 500 % 700 900 1100 1300 1500 1700 1900 2100 100 300 500 % 700 900 0 0 100 0 1100 1300 1500 1700 1900 2100 m/z 1100 1300 1500 1700 1900 2100 m/z m/z 300 500 700 900 1100 1300 1500 1700 1900 2100 100 300 500 700 900 1100 1300 1500 1700 1900 2100 100 300 500 700 900 0 m/z 1100 1300 1500 1700 1900 2100 m/z MS/MS spectra proteins Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 MS proteomics: peptide IDs and protein IDs 100 100 100 % 100 % 100 0 % 100 0 300 500 % 700 100 900 100 300 500 % 700 100 900 100 300 500 % 0 0 100 0 1100 1300 1500 1700 1900 2100 700 100 900 300 500 % 700 100 900 300 500 % 700 100 900 100 300 500 % 0 100 0 m/z 1100 1300 1500 1700 1900 2100 100 0 m/z 1100 1300 1500 1700 1900 2100 m/z 1100 1300 1500 1700 1900 2100 700 100 900 sequence database m/z 1100 1300 1500 1700 1900 2100 m/z 1100 1300 1500 1700 1900 2100 m/z m/z 300 500 % 700 100 900 100 300 500 % 700 900 1100 1300 1500 1700 1900 2100 100 300 500 % 700 900 0 0 100 0 1100 1300 1500 1700 1900 2100 m/z 1100 1300 1500 1700 1900 2100 m/z m/z 300 500 700 900 1100 1300 1500 1700 1900 2100 100 300 500 700 900 1100 1300 1500 1700 1900 2100 100 300 500 700 900 0 UniProt IPI RefSeq MS/MS spectra peptides m/z 1100 1300 1500 1700 1900 2100 m/z Search engine TDMDNQIVVSDYAQ MDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL proteins Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Search engines Proteins Peptides UniProt IPI RefSeq sequence database TDMDNQIVVSDYAQMDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL VDMSLAQR DIVVQETMEDIDK … 100 100 100 % 100 % 0 0 100 300 % 500 700 100 900 1100 1300 1500 1700 1900 2100 100 300 % 500 0 0 100 300 % 500 0 700 100 100 900 1100 300 % 500 0 100 1300 700 1500 100 900 1700 1100 300 % 500 1900 1300 700 700 100 900 1100 1300 1500 1700 1900 2100 1500 100 900 300 % 500 700 100 900 1100 1300 100 300 % 500 1900 1300 700 1500 m/z 1700 1900 2100 1500 Spectra m/z 1700 1900 2100 0 100 300 % 500 0 700 100 100 900 100 900 1100 1300 1500 1100 300 % 500 1300 700 100 300 % 500 700 900 m/z 1700 1100 1900 1300 1900 2100 100 300 % 500 1300 1500 700 100 900 100 300 500 700 1700 1900 900 1100 1300 Experimental Spectra Juan A. Vizcaíno [email protected] m/z 1700 1100 1900 1300 2100 1500 m/z 1700 1900 2100 m/z m/z 2100 100 300 % 500 1500 700 900 1100 1300 1500 1700 1900 2100 m/z m/z 0 0 2100 1500 m/z 1700 1100 1500 100 900 0 0 2100 m/z 1700 1100 100 0 0 2100 m/z 1700 1900 2100 100 300 500 700 900 1100 1300 m/z Sequence database matching Agricultural-Omics Course Hinxton, 20 February 2014 Theoretical Spectra 1500 1700 1900 2100 m/z Search engines 800 1200 1600 2000 800 2400 Experimental Spectra 2000 Theoretical Spectra How good is the correlation? -Scores are generated by search engines -Usually the best match is kept [email protected] 1600 m/z m/z Juan A. Vizcaíno 1200 Agricultural-Omics Course Hinxton, 20 February 2014 2400 Search engines Taken from Nesvizhskii, J Proteomics, 2010 Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Search engines Taken from Nesvizhskii, J Proteomics, 2010 Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 The most popular algorithms • MASCOT (Matrix Science) http://www.matrixscience.com • SEQUEST (Scripps, Thermo Fisher Scientific) http://fields.scripps.edu/sequest • X!Tandem (The Global Proteome Machine Organization) http://www.thegpm.org/TANDEM • OMSSA (NCBI) http://pubchem.ncbi.nlm.nih.gov/omssa/ Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Overall concept of scores and cut-offs Incorrect identifications Threshold score Correct identifications False negatives False positives Adapted from: www.proteomesoftware.com – Wiki pages Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Playing with probabilistic cut-off scores higher stringency 6% 100% 90% 5% 80% identifications 4% 70% 60% 3% 50% false positives 40% 2% 30% 20% 1% 10% 0% 0% p=0.05 Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 p=0.01 p=0.005 p=0.0005 PROTEIN INFERENCE Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 The protein inference problem Slide from J. Cottrell, Matrix Science Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Protein inference A B C D Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Protein inference A B C D Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Protein inference A B C D Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Protein inference A B C D Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Protein inference A B C D Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Protein inference A B C D Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Protein inference A B C D Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Protein inference A B C D Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Protein inference A B C D Unambiguous peptide Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 PROTEIN SEQUENCE DATABASES Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 What is needed from a protein database 1. Comprehensive (whatever is not in the DB will not be included in your results). 2. Not too redundant at the protein sequence level - Protein inference gets easier - It is not very good if the database is too big. 3. Quality of annotation 4. Stability of identifiers Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Main databases used a) UniProt Knowledgebase curated)/ TrEMBL. (UniProtKB): SWISS-PROT (manually b) NCBI non-redundant database: It compiles all protein sequences available from the following databases: ‘GenBank’ translations, the Protein Data Bank (PDB), UniProtKB/Swiss-Prot, PIR and PRF. c) Ensembl: Genomics centric resource. Integration of the information with genomics is easy. d) IPI (International Protein Index): It has been discontinued (09/2011). Different builds for different species (Human, Mouse, Cow, Rat, Zebrafish, Dog, Arabidopsis). a) Model organisms DBs (for instance, TAIR for Arabidopsis). Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Databases for non-model organisms - This is the case for many plants! - If the species is not well represented in the protein databases, there is a much stronger need to search ESTs or genomic databases. -The search engine will translate the 6 possible ORFs for each nucleotide sequence. - ESTs are not suitable for PMF approaches (incomplete proteins). - The alternative is to filter comprehensive databases like UniProt by species or genus, or to use a protein DB from a close organism. Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Importance of choosing the right DB -Since each database has a different focus, the databases can vary in terms of completeness, degree of redundancy, and quality of annotations. -More inclusive bigger protein databases will take longer to search - For the bigger resources, it may also result on more false-positive identifications and reduced statistical significance (the probability of random match is higher). Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 OTHER MS PROTEOMICS APPROACHES Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Mass Spectrometry (MS)-based proteomics • Many different workflows. • Discovery mode: • Bottom-up proteomics • Top down proteomics • Targeted mode: • SRM (Selected Monitoring) Juan A. Vizcaíno [email protected] Reaction Agricultural-Omics Course Hinxton, 20 February 2014 56 MS-based proteomics: Discovery mode Compton & Kelleher, Nat. Methods, 2012 Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Mass Spectrometry (MS)-based proteomics • Many different workflows. • Discovery mode: • Bottom-up proteomics • Top down proteomics • Targeted mode: • SRM (Selected Reaction Monitoring) Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 58 Targeted proteomics Selected Reaction Monitoring: the objective is to be able to detect one particular protein in the sample. - Obvious implications for diagnosis (biomarkers). Image from http://demo.shimadzu.com/ Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Multiple/Selected Reaction Monitoring (MRM/SRM) collision cell mass filter 1 peptide mixture selected peptides mass filter 2 fragments of both peptides selected fragment MRM/SRM removes noise, yielding better signal-to-noise ratio MRM/SRM removes ‘contaminating’ peaks, aiding targeted identification MRM/SRM works well with proteotypic peptides MRM/SRM can be performed with Q-Q-Q, Q-LIT and IT instruments Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Overview • Short intro to proteomics and mass spectrometry • MS/MS proteomics • Search engines and protein inference • Protein sequence databases Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 No time for quantitation Not only identify, but also quantify the amount of each protein in the sample The current methods rely mainly on MS: Vaudel et al., Proteomics 2010 Feb;10(4):650-670 Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Recommended reading Mallick P, Kuster B. Nat Biotechnol. 2010 Jul;28(7):695-709. Juan A. Vizcaíno [email protected] Nesvizhskii, J Proteomics, 2010 Oct 10;73(11):2092-123. Agricultural-Omics Course Hinxton, 20 February 2014 The analysis process should not be a black box! From: Lilley et al., Proteomics, 2011 Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014 Questions? Juan A. Vizcaíno [email protected] Agricultural-Omics Course Hinxton, 20 February 2014
© Copyright 2026 Paperzz