Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information [email protected] http://fenyolab.org/pi2015/ http://fenyolab.org/pi2015/ Proteomics Informatics – Learning Objectives Be able analyze proteomics data sets and understand the limitations of the results. Proteomics Informatics – Syllabus Lecture 1 Overview of proteomics (February 3, 2014 TRB 717 4pm) Lecture 2 Overview of mass spectrometry (February 10, 2014 TRB 717 4pm) Lecture 3 Signal processing I: analysis of mass spectra (February 17, 2014 TRB 718 4pm) Lecture 4 Protein identification I: searching protein sequence collections and significance testing (February 24, 2014 TRB 718 4pm) Lecture 5 Protein quantitation I: overview (March 3, 2014 TRB 717 4pm) Lecture 6 Databases, data repositories and standardization (March 10, 2014 TRB 717 4pm) Lecture 7 Protein identification II: de novo sequencing (March 17, 2014 TRB 717 4pm) Lecture 8 Protein quantitation II: multiple meaction monitoring (March 24, 2014 TRB 717 4pm) Lecture 9 Proteogenomics (March 31, 2014 TRB 619 4pm) Lecture 10 Protein characterization I: post-translational modifications (April 7, 2014 TRB 717 4pm) Lecture 11 Signal processing II: image analysis (April 21, 2014 TRB 717 4pm) Lecture 12 Protein characterization II: protein interactions (April 28, 2014 TRB 619 4pm) Lecture 13 Data analysis and visualization (May 5, 2014 TRB 717 4pm) Lecture 14 Molecular signatures (May 12, 2014 TRB 717 4pm) Lecture 15 Presentations of projects (May 19, 2014 TRB 717 4pm) Overview of Proteomics (Week 1) • Why proteomics? • Bioinformatics • Overview of the course Motivating Example: Protein Regulation Geiger et al., “Proteomic changes resulting from gene copy number variations in cancer cells”, PLoS Genet. 2010 Sep 2;6(9). pii: e1001090. Motivating Example: Protein Complexes Alber et al., Nature 2007 Motivating Example: Signaling Choudhary & Mann, Nature Reviews Molecular Cell Biology 2010 Bioinformatics Biological System Experimental Design Samples Measurements Raw Data Data Analysis Information Mass Spectrometry Based Proteomics Lysis Fractionation Digestion Mass spectrometry MS Peak Finding Charge determination De-isotoping Integrating Peaks Searching Identified and Quantified Proteins Overview of Mass spectrometry (Week 2) Mass Analyzer intensity Ion Source mass/charge Detector Overview of Mass spectrometry (Week 2) Ion Source b Mass Analyzer 1 Fragmentation Mass Analyzer 2 Detector y Overview of Mass spectrometry (Week 2) LC Ion Source Mass Analyzer 1 Fragmentation Mass Analyzer 2 mass/charge mass/charge mass/charge mass/charge mass/charge Time intensity intensity intensity mass/charge intensity mass/charge mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity intensity intensity mass/charge intensity intensity intensity intensity intensity Detector mass/charge mass/charge mass/charge Intensity Signal processing I: Analysis of mass spectra (Week 3) m/z Protein identification I: searching protein sequence collections and significance testing (Week 4) Sequence DB Pick Peptide MS/MS All Fragment Masses MS/MS Compare, Score, Test Significance Repeat for all peptides LC-MS Repeat for all proteins Lysis Pick Protein Fractionation Digestion Protein identification I: searching protein sequence collections and significance testing (Week 4) Protein quantitation I: Overview (Week 5) C ij p p p Lysis L ij p D ijk LC Pr Fractionation p ij Digestion p ik I Sample i Protein j Peptide k ik Pep k C ij j Cij k L Pr ij ij p p ik I LC-MS ik MS pijk D MS ik Pep LC MS ik ik ik p p p I p p p p p p ik k L Pr D Pep LC MS ij ij ijk ik ik ik k Protein quantitation I: Overview (Week 5) Sample i Protein j Peptide k Lysis Fractionation Digestion LC-MS MS Assumption: p p p p p p k L Pr D Pep LC MS ij ij ijk ik ik ik constant for all samples Ci / Ci n MS j m j I in j / I im j Databases, data repositories and standardization (Week 6) Databases, data repositories and standardization (Week 6) Most proteins show very reproducible peptide patterns Databases, data repositories and standardization (Week 6) Query Spectrum Best match In GPMDB Second best match In GPMDB Protein identification II: de novo sequencing (Week 7) Amino acid masses Chemical formula C3H5ON Monois Average otopic 71.0371 71.0788 R Arg C 6H12ON4 156.101 156.188 N Asn C 4H6O2N2 114.043 114.104 D Asp C 4 H5 O 3 N 115.027 115.089 C Cys C 3H5ONS 103.009 103.139 E Glu C 5 H7 O 3 N 129.043 129.116 Q Gln C 5H8O2N2 128.059 128.131 G Gly C2H3ON 57.0215 57.0519 H His C 6H7ON3 137.059 137.141 I Ile C 6H11ON 113.084 113.159 L Leu C 6H11ON 113.084 113.159 K Lys C 6H12ON2 128.095 128.174 M Met C 5H9ONS 131.04 131.193 F Phe C9H9ON 147.068 147.177 P Pro C5H7ON 97.0528 97.1167 S Ser C 3 H5 O 2 N 87.032 87.0782 T Thr C 4 H7 O 2 N 101.048 101.105 W Trp Y Tyr V Val C 11H10ON2 186.079 186.213 C 9H9O2N 163.063 163.176 C5H9ON 99.0684 99.1326 % Relative Abundance 1-letter 3-letter code code A Ala 762 100 0 875 [M+2H]2+ 292 405 534 260 389 504 250 500 633 663 m/z 778 1022 9071020 1080 750 Mass Differences Sequences consistent with spectrum 1000 Protein quantitation II: Targeted (Week 8) Shotgun proteomics 1. Records M/Z LC-MS 1. Select precursor ion MS Digestion 2. Selects peptides based on abundance and fragments MS/MS 3. Protein database search for peptide identification Data Dependent Acquisition (DDA) Targeted MS Fractionation MS 2. Precursor fragmentation MS/MS Lysis 3. Use Precursor-Fragment pairs for identification Uses predefined set of peptides Proteogenomics (Week 9) Non-Tumor Sample Genome sequencing Genome sequencing RNA-Seq Tumor Sample Alt. Splicing Identify germline variants Identify alternative splicing, somatic variants and novel expression Novel Expression Tumor Specific Protein DB Exon 1 Exon 1 Exon 3 Exon 2 Exon X Exon 2 Reference Human Database (Ensembl) Variants Fusion Genes Gene X Exon 1 Gene X Exon 2 Gene X Gene Y Exon 1 Gene Y Gene Y Exon 2 Exon 1 TCGAGAGCTG TCGAGAGCTG TCGAGAGCTG TCGAGAGCTG TCGAGAGCTG TCGATAGCTG Kelly Ruggles Protein characterization I: post-translational modifications (Week 10) Peptide with two possible modification sites Matching Intensity MS/MS spectrum m/z Which assignment does the data support? 1, 1 or 2, or 1 and 2? Signal processing II: image analysis (Week 11) Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy of the cardiac connexome reveals plakophilin-2 inside the connexin43 plaque", Cardiovasc Res. 2013 Protein Characterization II: protein interactions (Week 12) E A A D C B Digestion Mass spectrometry Identification F Data analysis and visualization (Week 13) Molecular Signatures (Week 14) Molecular Signatures (Week 14) Presentations of projects (Week 15) Select a published data set that has been made public and reanalyze it. Highlighted data sets: http://www.thegpm.org/ 10 min presentations Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information [email protected] http://fenyolab.org/pi2015/
© Copyright 2026 Paperzz