S - R Project

Novel method for estimating
isotope incorporation using the
‘half-decimal place rule‘
Ingo Fetzer
Department of Environmental Microbiology
userR Conference 2009, Rennes
Problem
1.2e+7
Intensity
1.0e+7
0%
50%
100%
8.0e+6
6.0e+6
4.0e+6
2.0e+6
0.0
835 840 845 850 855 860 865 870 875 880
mass
Substrate fluxes in Procaryotes
Function
Activity
Identitiy
Interactions: Competition
Mutualism
Page 2
Goal
Develop an algorithm for estimating 13C
incorporation by using ‘half decimal
place rule’
Page 3
‘Half-decimal place rule’ (HDPR)
Mann (1995)
Decimal places
100%
0%
m/z tryptic peptides Heliobacter pylori [Da]
Schmidt et al 2003
Page 4
Outline
1. Peptide mass calculation for 12C and 13C
2. Estimation of 12C and 13C slopes (HDPR)
3. Estimation of relative 13C incorporation rates
(of user data)
implemented in ‘R‘
(R-project.org)
Page 5
Flowchart Script 1
Peptide database
criterions
Peptide sequences
AA molecular formula
Atomic weights
Peptide sequence masses
Page 6
Peptide mass calculation for 12C and 13C
Virtual digestion with MS-Digest
dataset M. tuberculosis H37Rv
amino acid sequences
length 2 – 40
315,579 peptide fragments
ChemScore ≥ 10
Missing cleavage = 0
Modifications = Null
Mol. weight ≤ 5000 Da
90,637 peptide sequences
Sanger Institute
(ftp://ftp.sanger.ac.uk/pub/tb/sequences/TB.pep)
Page 7
Peptide mass calculation for 12C and 13C
Page 8
Peptide mass calculation for 12C and 13C
Script 1:
1. Reduction of dataset (ChemScore, Modification etc.)
315,579
90,637
2 a. Peptide mass calculation
Sequence
in DB
GAG
+
Molecular
formula of
AA
G=C2H6NO2
Why?
Page 9
A=C3H8NO2
Sum of
C, H, N, O for
each sequence
C7 H20 N3 O6
Calculation of percentage 13C
incorporation
Peptide mass calculation for 12C and 13C
2b. Peptide mass calculation
Sum of
+ Atomic weights
C, H, N, O of each
12C = 12.000000 Da
sequence
13C = 13.003355 Da
C7 H20 N3 O6
N = 14.003074 Da
O = 15.994915 Da
H = 1.007825 Da
Molecular
weights of
sequences
(with decimal
residuals)
12C=242.135212
Da
13C=249.158697 Da
Page 10
Peptide mass calculation for 12C and 13C
2a. Peptide mass calculation
+ Atomic weights
Count sum of
C, H, N, O, S for
each sequence
12C
= 12.000000 Da
= 13,003355 Da
N = 14.003074 Da
O = 15.994915 Da
H = 1.007825 Da
S = 31.972071 Da
13C
f
o
s
e
s
s
a
Molecular M
13 C peptides
and
H
C
R
1
NH3+
H
O
+
C
R
2
OH
MW(H3
Page 11
H
O
C
NH3+
O+)
C
R
1
OH
= 19.01839 Da
C
NH3+
O
Molecular
weights of
sequences
12 C
(with decimal
residuals)
H
C
C
N
H
R
2
O
C
+ H3O+
OH
Flowchart Script 2
Peptide sequence masses
Transformation
Transf. Peptide
sequence masses
k-means clustering
Groupings
Slope 0%/100% 13C
Linear fitting
Transf. Decimal
Residuals
Page 12
m/z – DR plot
Estimation of 12C and 13C slopes
0.8
0.6
0.4
m/zTemp = m/z - 1800 * DR
0.0
0.2
Decimal residuals
1.0
Script 2:
1000
2000
3000
m/z
Page 13
4000
5000
6000
0.4
0.6
0.8
Hartigan &
Wong (1979)
0.0
0.2
Decimal residuals
1.0
Estimation of 12C and 13C slopes
0
k-means clustering
using kmeans()
Page 14
1000
2000
3000
4000
m/zTemp
5000
6000
1.0
Estimation of 12C and 13C slopes
Add
2
3
0.4
0.6
0.8
1
0.0
0.2
Decimal residuals
0
Hartigan &
Wong (1979)
0
k-means clustering
using kmeans()
Page 15
1000
2000
3000
4000
m/zTemp
5000
6000
2.5
3.0
100% slope = 0.000633
0% slope = 0.000514
1.5
2.0
100%
1.0
0%
lm() function
Stats package
0.0
0.5
Decimal residuals
3.5
Estimation of 12C and 13C slopes
0
1000
2000
3000
m/z
Page 16
4000
5000
Flowchart Script 3
User data
k-means
clustering
Robust
linear fitting
Slope 0% + 100% 13C
Slope user data
reference
calculation
Estimation of relative
13C incorporation rates
Page 17
3.0
100% slope = 0.000633
0% slope = 0.000514
(with a=0)
2.0
2.5
DR = b ∗ m/z
100%
1.5
‘outliers‘
1.0
0%
Influence on
slope estimation
0.0
0.5
Decimal residuals
Script 3:
3.5
Estimation of relative 13C incorporation rates
0
1000
2000
m/z
Page 18
3000
4000
5000
Estimation of relative 13C incorporation rates
1.0
1.5
2.0
2.5
3.0
SSE = min ∑ ( DR − b ∗ m/z ) 2
0.5
Decimal residuals
3.5
Linear fitting:
minimizing the error sum of the squares
0.0
Breakdown point value = 0
0
1000
2000
m/z
Page 19
3000
4000
5000
2.0
1.5
1.0
Breakdown point value set = 0.5
>50% of datapoints must
change
0.0
0
rlm() function
1000
2000
m/z
MASS package (Venable & Ripley 2002)
Page 20
robust linear
fitting
2.5
3.0
Robust linear model (Huber 1973)
M-estimator type
Iterative re-weighted least squares (IWLS;
Huber 1981, Hampel et al 1986)
0.5
Decimal residuals
3.5
Estimation of relative 13C incorporation rates
3000
4000
5000
3.0
100% slope = 0.000633
0% slope = 0.000514
(with a=0)
Your incorporation rate
is 98.3%
1.5
2.0
2.5
DR = b ∗ m/z
100%
0%
1.0
Decimal residuals
3.5
Estimation of relative 13C incorporation rates
buser − b12C
∗100
CUser % =
b13C − b12C
0.0
0.5
13
0
1000
2000
m/z
Page 21
3000
4000
5000
User data input
+
Page 22
R
Script 3
User data output
+
Page 23
R
Script 3
Sensitivity of Method
Dataset Pseudomonas putida
1. Calculated 50% and 100% 13C incorporation
2. Randomly sampled 100 times each
10-100 (steps by 10), 150, 200, 300, 500, 1000
sequences (0%,50% and 100%)
3. Statistics on estimated incorporation rate for 0%,
50% and 100%
Page 24
Incorporation [%]
Sensitivity of Method
150
140
130
120
110
100
90
80
70
60
50
100
90
80
70
60
50
40
30
20
10
0
50
40
30
20
10
0
-10
-20
-30
-40
-50
100%
~100 samples
50%
0%
10
50
100
150
200
300
Number of peptides
Page 25
500
1000
Conclusion
1. ‚Half-decimal place rule‘ useful for
the estimation of 13C incorporation
rates
http://s3.amazonaws.com/readers/2008/08/14/draddalybig_1.jpg
2. Robust linear models better suited
for fitting of highly variable user
data than MinSSE fitting
3. >100 measurements needed for
prescision <5% incorporation
estimation
Page 26
Outlook
1. Application of HPDR on DNA
2. Backcalculation to 12C-peaks
function identification
3. Include N-isotope
incoorporation
http://www.sharp.co.jp/plasmacluster-tech/en/release/images/041117_3.gif
Page 27
Acknowledgement
Nico Jemlich (UFZ)
Carsten Vogt (UFZ)
Martin von Bergen (UFZ)
Hans-Hermann Richnow (UFZ)
http://cache.gawker.com/assets/images/gizmodo/2009/01/bactsunsuet_01.jpg
Hauke Harms (UFZ)
Frank Schmidt (Uni Greifwald)
Jens Mattow (MPI Berlin)
Bernd Thiede (Uni Oslo)
R development team
Page 28