HPCWalesUserGroupPresentation-22Oct14

HPC User Presentation
Bela Paizs, Bangor University
HPC Wales User Group Meeting
Wednesday 22nd October 2014
Computational Proteomics
Bela Paizs
School of Chemistry, Bangor University, Bangor, LL57 2UW, UK
Overview
• Proteomics
• Mass spectrometry in proteomics
• HPC & proteomics
Why is Research on Proteins Important?
Determination of the genome (hereditary information encoded in
DNA) is only the first step towards understanding biology.
Why is Research on Proteins Important?
Determination of the genome (hereditary information encoded in
DNA) is only the first step towards understanding biology.
The same genome
different expressed proteins
Pictures: http://www.edupic.net/index.html
Typical Protein Identification Workflow
Digestion
Mass
spectrometry
Protein
mixture
Peptide
mixture
Collision Induced Dissociation (CID)
Ion
creation
Ion isolation
selection
CID with
an inert gas
Activated ion
fragments
Detection of
fragment ions
MS2
MS
‡
Detector
Two or more stages of MS:
tandem mass spectrometry
Typical Protein Identification Workflow
Digestion
Mass
spectrometry
Protein
mixture
Peptide
mixture
Compare
& rank
120
120
100
100
Fragmentation
model
Intensity
Candidate
peptide
sequences
80
60
Intensity
Peptide
m/z
80
60
40
40
20
20
0
0 0
100
0
Protein
database
200
100
300
m/z 200
400
300
400
m/z
Theoretical (in silico)
spectra
Current Sequencing Software Performs Poorly
~50 % unassigned
Related biological information is lost
~50 % assigned, but with some error rate
Current Sequencing Software Performs Poorly
~50 % unassigned
Related biological information is lost
~50 % assigned, but with some error rate
NewsFocus
´Proteomics Ponders Prime Time´
„... most errors in MS-based analyses arise not because the
technology cannot spot the proteins researchers are looking
for but because software programs often misidentify them.“
Science, 26 September 2008, page 1758.
Typical Protein Identification Workflow
Digestion
Mass
spectrometry
Protein
mixture
Peptide
mixture
Compare
& rank
120
120
100
100
Fragmentation
model
Intensity
Candidate
peptide
sequences
80
60
Intensity
Peptide
m/z
80
60
40
40
20
20
0
0 0
100
0
Protein
database
200
100
300
m/z 200
400
300
400
m/z
Theoretical (in silico)
spectra
Need for Deeper Understanding of Peptide
Fragmentation
• Current software uses oversimplified
fragmentational models
• We need to develop a new generation of score
functions that utilizes advanced peptide
fragmentation chemistry
• Fragmentation chemistry is computed at HPC
Wales