Proteomics Informatics – Syllabus Lecture 1

Proteomics Informatics (BMSC-GA 4437)
Course Director
David Fenyö
Contact information
[email protected]
http://fenyolab.org/pi2015/
http://fenyolab.org/pi2015/
Proteomics Informatics –
Learning Objectives
Be able analyze proteomics data sets and understand
the limitations of the results.
Proteomics Informatics – Syllabus
Lecture 1 Overview of proteomics (February 3, 2014 TRB 717 4pm)
Lecture 2 Overview of mass spectrometry (February 10, 2014 TRB 717 4pm)
Lecture 3 Signal processing I: analysis of mass spectra (February 17, 2014 TRB 718 4pm)
Lecture 4 Protein identification I: searching protein sequence collections and significance testing (February
24, 2014 TRB 718 4pm)
Lecture 5 Protein quantitation I: overview (March 3, 2014 TRB 717 4pm)
Lecture 6 Databases, data repositories and standardization (March 10, 2014 TRB 717 4pm)
Lecture 7 Protein identification II: de novo sequencing (March 17, 2014 TRB 717 4pm)
Lecture 8 Protein quantitation II: multiple meaction monitoring (March 24, 2014 TRB 717 4pm)
Lecture 9 Proteogenomics (March 31, 2014 TRB 619 4pm)
Lecture 10 Protein characterization I: post-translational modifications (April 7, 2014 TRB 717 4pm)
Lecture 11 Signal processing II: image analysis (April 21, 2014 TRB 717 4pm)
Lecture 12 Protein characterization II: protein interactions (April 28, 2014 TRB 619 4pm)
Lecture 13 Data analysis and visualization (May 5, 2014 TRB 717 4pm)
Lecture 14 Molecular signatures (May 12, 2014 TRB 717 4pm)
Lecture 15 Presentations of projects (May 19, 2014 TRB 717 4pm)
Overview of Proteomics (Week 1)
• Why proteomics?
• Bioinformatics
• Overview of the course
Motivating Example: Protein Regulation
Geiger et al., “Proteomic changes resulting from gene copy number
variations in cancer cells”, PLoS Genet. 2010 Sep 2;6(9). pii: e1001090.
Motivating Example: Protein Complexes
Alber et al., Nature 2007
Motivating Example: Signaling
Choudhary & Mann, Nature Reviews Molecular Cell Biology 2010
Bioinformatics
Biological System
Experimental Design
Samples
Measurements
Raw Data
Data Analysis
Information
Mass Spectrometry Based Proteomics
Lysis
Fractionation
Digestion
Mass spectrometry
MS
Peak Finding
Charge determination
De-isotoping
Integrating Peaks
Searching
Identified and Quantified Proteins
Overview of Mass spectrometry (Week 2)
Mass
Analyzer
intensity
Ion
Source
mass/charge
Detector
Overview of Mass spectrometry (Week 2)
Ion Source
b
Mass
Analyzer 1
Fragmentation
Mass
Analyzer 2
Detector
y
Overview of Mass spectrometry (Week 2)
LC
Ion Source
Mass
Analyzer 1
Fragmentation
Mass
Analyzer 2
mass/charge
mass/charge
mass/charge
mass/charge
mass/charge
Time
intensity
intensity
intensity
mass/charge
intensity
mass/charge
mass/charge
intensity
mass/charge
intensity
mass/charge
intensity
mass/charge
intensity
intensity
intensity
mass/charge
intensity
intensity
intensity
intensity
intensity
Detector
mass/charge
mass/charge
mass/charge
Intensity
Signal processing I:
Analysis of mass spectra (Week 3)
m/z
Protein identification I: searching protein sequence
collections and significance testing (Week 4)
Sequence
DB
Pick Peptide
MS/MS
All Fragment
Masses
MS/MS
Compare, Score, Test Significance
Repeat for
all peptides
LC-MS
Repeat for all proteins
Lysis
Pick Protein
Fractionation
Digestion
Protein identification I: searching protein sequence
collections and significance testing (Week 4)
Protein quantitation I: Overview (Week 5)
C ij
p
p
p
Lysis
L
ij
p
D
ijk
LC
Pr
Fractionation
p
ij
Digestion
p
ik
I
Sample i
Protein j
Peptide k
ik
Pep




k
C ij
j 
Cij 
k
L
Pr
ij
ij
p p
ik
I
LC-MS
ik
MS
pijk
D
MS
ik
Pep
LC
MS
ik
ik
ik
p p p
I
 p p p p p p
ik
k
L
Pr
D
Pep
LC
MS
ij
ij
ijk
ik
ik
ik

k
Protein quantitation I: Overview (Week 5)
Sample i
Protein j
Peptide k
Lysis
Fractionation
Digestion
LC-MS
MS
Assumption:
 p p p p p p
k
L
Pr
D
Pep
LC
MS
ij
ij
ijk
ik
ik
ik
constant for all samples
Ci / Ci
n
MS
j
m
j
I
in j / I im j
Databases, data repositories and
standardization (Week 6)
Databases, data repositories and
standardization (Week 6)
Most proteins show very reproducible peptide patterns
Databases, data repositories and
standardization (Week 6)
Query Spectrum
Best match
In GPMDB
Second
best match
In GPMDB
Protein identification II:
de novo sequencing (Week 7)
Amino acid masses
Chemical
formula
C3H5ON
Monois
Average
otopic
71.0371 71.0788
R
Arg
C 6H12ON4
156.101 156.188
N
Asn
C 4H6O2N2
114.043 114.104
D
Asp
C 4 H5 O 3 N
115.027 115.089
C
Cys
C 3H5ONS
103.009 103.139
E
Glu
C 5 H7 O 3 N
129.043 129.116
Q
Gln
C 5H8O2N2
128.059 128.131
G
Gly
C2H3ON
57.0215 57.0519
H
His
C 6H7ON3
137.059 137.141
I
Ile
C 6H11ON
113.084 113.159
L
Leu
C 6H11ON
113.084 113.159
K
Lys
C 6H12ON2
128.095 128.174
M
Met
C 5H9ONS
131.04 131.193
F
Phe
C9H9ON
147.068 147.177
P
Pro
C5H7ON
97.0528 97.1167
S
Ser
C 3 H5 O 2 N
87.032 87.0782
T
Thr
C 4 H7 O 2 N
101.048 101.105
W
Trp
Y
Tyr
V
Val
C 11H10ON2 186.079 186.213
C 9H9O2N 163.063 163.176
C5H9ON
99.0684 99.1326
% Relative Abundance
1-letter 3-letter
code
code
A
Ala
762
100
0
875
[M+2H]2+
292
405
534
260
389
504
250
500
633
663
m/z
778
1022
9071020 1080
750
Mass Differences
Sequences
consistent
with spectrum
1000
Protein quantitation II: Targeted (Week 8)
Shotgun proteomics
1. Records M/Z
LC-MS
1. Select precursor ion
MS
Digestion
2. Selects peptides based
on abundance and
fragments
MS/MS
3. Protein database search for
peptide identification
Data Dependent Acquisition (DDA)
Targeted MS
Fractionation
MS
2. Precursor fragmentation
MS/MS
Lysis
3. Use Precursor-Fragment
pairs for identification
Uses predefined set of peptides
Proteogenomics (Week 9)
Non-Tumor Sample
Genome sequencing
Genome sequencing
RNA-Seq
Tumor Sample
Alt. Splicing
Identify germline variants
Identify alternative splicing,
somatic variants and
novel expression
Novel Expression
Tumor Specific
Protein DB
Exon 1
Exon 1
Exon 3
Exon 2
Exon X
Exon 2
Reference Human
Database (Ensembl)
Variants
Fusion Genes
Gene X
Exon 1
Gene X
Exon 2
Gene X
Gene Y
Exon 1
Gene Y
Gene Y
Exon 2
Exon 1
TCGAGAGCTG
TCGAGAGCTG
TCGAGAGCTG
TCGAGAGCTG
TCGAGAGCTG
TCGATAGCTG
Kelly Ruggles
Protein characterization I: post-translational
modifications (Week 10)
Peptide with two possible modification sites
Matching
Intensity
MS/MS spectrum
m/z
Which assignment does
the data support?
1, 1 or 2, or 1 and 2?
Signal processing II: image analysis (Week 11)
Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy
of the cardiac connexome reveals plakophilin-2 inside the connexin43 plaque", Cardiovasc Res. 2013
Protein Characterization II: protein
interactions (Week 12)
E
A
A
D
C
B
Digestion
Mass spectrometry
Identification
F
Data analysis and visualization (Week 13)
Molecular Signatures (Week 14)
Molecular Signatures (Week 14)
Presentations of projects (Week 15)
Select a published data set that has been made public
and reanalyze it.
Highlighted data sets: http://www.thegpm.org/
10 min presentations
Proteomics Informatics (BMSC-GA 4437)
Course Director
David Fenyö
Contact information
[email protected]
http://fenyolab.org/pi2015/