T o d a y: Analyzing Proteomic Data

T o d a y: Analyzing
Proteomic Data

Your slides

…from 2D-PAGE gels

…from Mass spectrometry
(K1)
(K2)
What exactly is proteomic data?

Data: the raw stuff

Analysis converts data into information

Interpretation converts information to knowledge


What is knowledge converted into?
What mental process converts knowledge into it?
(see next slide)
What exactly is proteomic data?

Data: the raw stuff

Analysis converts data into information

Interpretation converts information to knowledge

What is knowledge converted into?


products
wisdom
What converts knowledge into it?

design
insight
What exactly is proteomic data?

Proteomic: adjective


having to do with proteomics
Proteomics: noun
a subject of study


What is this subject of study?
Proteomics = the study of proteomes
What exactly is proteomic data?


Proteomics = the study of proteomes
Proteome: noun


the set of proteins produced by an organism
under specified conditions
(“Complete proteome” – under any condition)

E.g. A human proteome,
an arabidopsis proteome
What exactly is proteomic data?

Proteome of an organism:


Genome of an organism:



Its proteins produced under specified conditions
Its set of genes
Which has more, a genome or a proteome?
What other -omes are there?
What exactly is proteomic data?

What –omes are there?
 Proteome:


Genome:


set of small molecule metabolites in an organism or other sample
Textome: scientific literature of a field (“biomedical textome”)

(12/3/06 Google 43, textome 6k, bibliome 284k)

(9/20/10 Google 425, textome 1.65m, bibliome 19k)

(9/14/11 Google 1630, textome 2.47m, bibliome 19.1k)

(9/9/13


the hereditary information, usually the DNA, of an organism
Metabolome:


set of proteins in a cell, system, or organism at a given time
scientific literature of a field (“biomedical textome”)
Transcriptome:

textome 117m, bibliome 23.7k, literaturome 1,070)
mRNA molecules present in a cell or set of cells at a given time
What exactly is proteomic data?

What –ome is a microarray good for investigating?

What –ome is a gene chip good for investigating?

See


– http://www.genomicglossaries.com/content/omes.asp for a
well-documented list
– http://en.wikipedia.org/wiki/Omics for a good
introduction


http://en.wikipedia.org/wiki/Bibliome – I updated it once, you can
update wikipedia too!
– omics.org used to have an overfull list of –omes…
What exactly is proteomic data?

From omics.org:

Alignmentome: conceived before 2003. The whole set of multiple sequence
and structure alignments in bioinformatics. Alignments are the most important
representation in bioinformatics especially for homology and evolution study.
Alignome: 2003 . The whole set of string alignment algorithms such as FASTA,
BLAST and HMMER.
Alternatome: 2006. The totality of alternative spliceable elements. Suggested
by people in KOBIC and UCSC. (Alternatome.org)
Animalome: 2000 . The whole set of animals and their genetic components on
Earth. While animal kingdom traditionally means the totality of animals,
animalome indicates the system of animals, animal genes, animality, and complex
network of animal genes and proteins. Animals contain proteins that are
special. (Animalome.org)
Aniome: 2003 . The whole set of any biologically relevant things in the
universe.
Antibodyome: conceived around 2003 in association with immunolome in
artificial immune system as computational system (Jong).
Archaeome: 2002 . All the species of archae and their proteins especially.
Arenayome: [get it? Say “RNA”]
Back to “Analyzing data
from 2D-PAGE results”…

2D = 2-dimensional = on a plane

PAGE = polyacrylamide gel electrophoresis
2D-PAGE Results
source: http://proteomics.cancer.dk/2d_comparison_pic.php?id=82
Source: unknown (sorry, let me know if you have a cite)
Back to “Analyzing data
from 2D-PAGE gels”…

2D = 2-dimensional = on a plane
PAGE = polyacrylamide gel electrophoresis

Polyacrylamide



poly- = many
acrylamide = “A readily polymerized amide…. It
is a carcinogen… present in some foods,
especially starches and cereals that are cooked at
high temperatures” - The American Heritage® Dictionary of the
English Language, Fourth Edition, Copyright © 2000 by Houghton Mifflin
Company

“Only the acrylamide monomer is toxic.
Acrylamide polymers are non-toxic.”
- http://www.inchem.org/documents/pims/chemical/pim652.htm
Back to “Analyzing data
from 2D-PAGE gels”…


PAGE = polyacrylamide gel electrophoresis
Gel


Semi-solid, jelly-like substance
A gel is a colloidal system with a finite, usually rather
small, yield stress.
http://www.iupac.org/reports/2001/colloid_2001/manual_of_s_and_t/node33.html

Biochemistry. a semirigid polymer, as agarose, starch, cellulose
acetate, or polyacrylamide, cast into slabs or cylinders for the
electrophoretic separation of proteins and nucleic acids.
- Dictionary.com Unabridged (v 1.0.1) Based on the Random House Unabridged
Dictionary, © Random House, Inc. 2006.
Back to “Analyzing data
from 2D-PAGE gel”…

2D = 2-dimensional = on a plane

PAGE = polyacrylamide gel electrophoresis

Electrophoresis


Electro- = relating to electricity or electric
fields
–phoresis = suff. Transmission…[From Greek
phorēsis, a carrying…
- http://www.answers.com/topic/phoresis
2D-PAGE, how it works

How does 2-dimensional polyacrylamide
gel electrophoresis work?

It separates based on





charge in one direction
mass in the other direction
Two proteins may have same charge or mass
They are much less likely to share both
So, 2D electrophoresis is better than 1D!
2D-PAGE Gels
source: http://proteomics.cancer.dk/2d_comparison_pic.php?id=82
Note how many proteins would be left unseparated
if only charge or only mass was used!
2D-PAGE, how it works II

2-dimensional polyacrylamide gel electrophoresis


separates based on charge in one direction, mass in the other
It separates based on charge first:

Every protein has a charge: –, 0, or +




The ph of the medium changes the charge
So the gel is made with a ph gradient




Zero qualifies as a possible charge but most aren’t zero
Don’t know why – need to read up on my chem!
low ph (acidic) at one end
high ph (basic) at the other end
A voltage (which creates an electric field) is applied
Charged protein molecules migrate in response

…to where the ph is such that the protein’s charge is zero


this is the “isoelectric” point (pI); iso- = “same”
Clever, no?
2D-PAGE, how it works III

2-dimensional polyacrylamide gel electrophoresis


separates based on charge in one direction, mass in the other
It separates based on mass in the other direction

First, denature (unfold) the proteins into long rods


SDS/SDP also coats the unfolded protein with negative charges




Do this with sodium dodecyl sulphate or phosphate
Amount of negative charge is proportional to length
Length is proportional to mass
So charge-to-mass ratio is the same across molecules
Second, apply a voltage (electric field) in other direction


Stop before smallest molecules reach the edge
Why do the smallest molecules go fastest despite less charge, so
less force moving them along?
2D-PAGE Gels
source: http://proteomics.cancer.dk/2d_comparison_pic.php?id=82
Voltage (electric field) is applied first in one direction,
then later at 90 degrees (other direction)
2D-PAGE, how it works III

2-dimensional polyacrylamide gel electrophoresis

In stage 2 it separates based on mass


Unfold the proteins into long rods
Coat the unfolded protein with negative charges




Amount of negative charge is proportional to mass
So charge-to-mass ratio is the same across molecules
Apply an electric field
Why do the smallest molecules


1) have the least force applied to them
2) go the fastest?
Galileo
credit:
http://w
ww.cryst
alinks.co
m/galileo
.html
Tower credit:
http://hyperionzoom
lover.blogspot.com/2
011/05/leaningtower-of-pisagalileos.html
2D-PAGE, how it works III

2-dimensional polyacrylamide gel electrophoresis

In stage 2 it separates based on mass

Why do the smallest molecules



Think intuitive physics





1) have the least force applied to them
2) go the fastest?
Recall Galileo’s famous experiment
Consider small cars, big trucks, and engine power
Consider a tilted bed of nails and
Large marbles and small ball bearings
The demo shows the idea (not commercial grade!)
Analyzing 2D-PAGE Results



You get a plane with different proteins
in different locations
stain the proteins to get a “fingerprint”
The “fingerprint” contains spots



location, size, shape, and intensity all vary
The result is an expression profile
We’d like to analyze this profile
Analyzing 2D-PAGE Results II



1st, stain the gel
2nd, scan the stains
Do image processing:


subtract the background
rate the intensity of each spot


what is intensity? (see image, next)
The problem is not exactly like a microarray

what are the differences? (see image, next)
2D-PAGE Gels
source: http://proteomics.cancer.dk/2d_comparison_pic.php?id=82
Voltage (electric field) is applied first in one direction,
then later at 90 degrees (other direction)
Analyzing 2D-PAGE Results III


Alas, spots can overlap
How can a splotch be resolved into
separate constituent spots?

(see image again)
From Product Comparison page
http://www.biocompare.com/quickcompare/209/Fast-And-Efficient-2D-Analysis.html
(bolding added)

Two-dimensional (2D) electrophoresis generates a wealth of data and
often, a wealth of challenges as well: low abundance proteins generate
weak spots which can be difficult to detect, spots can overlap, runto-run variations can make comparing spots from different gels a
challenge, etc. It’s a lot to account for, especially when staring at one
or more spot-laden 2D gels. The 2D gel analysis software packages
below have been developed to address these and other challenges.
These packages offer high sensitivity detection for finding weak spots,
spot splitting for separating overlapping spots, and/or warping to help
counteract run differences when comparing gels. Some packages offer
fully automated analyses where the user doesn’t even need to set
parameters; others permit the user to switch off automated analyses
and perform spot-finding and spot-matching manually. Whether you’re
looking for full automation, or just a few algorithms to help with your
2D analyses, these software packages analyze your spots quickly and
efficiently.
Example: Syngene’s Dymention product
(http://www.2dymension.com/html/dymension_1_faqs.html)
Go figure!
Analyzing 2D-PAGE Results II



Alas, spots can overlap
How can a splotch be resolved into separate constituent spots?
(see image)
 Out-of-roundness indicates >1 spot(s)
 Can look for intensity maxima
Output: a spot list

Each spot is a different protein


(Could it ever happen that two proteins are in
the same spot?)
list has spot center (x,y) coordinates

anything else?
Analysis: Comparing
2 Results

Different experimental conditions can yield


…differential protein expression profiles
Differential expression can be found how?
Analyzing: Comparing
2 Results


Different experimental conditions can…
 yield differential protein expression profiles
Differential expression can be found how?




Spots on one gel but not another
Spots on both but with noticeably different intensities
Spots on both but in slightly different places (T or F?)
Do differences always reflect differences in
protein synthesis in the organism?
(see next slide)
Analysis: Comparing
2 Results


Different experimental conditions can…
 yield differential protein expression profiles
Can differential expression be found based on…

spots on both but in slightly different places (?)


No, spots do that anyway because no 2 gels are identical
Do differences always reflect differences in
protein synthesis in the organism?


Might be that
Might be post-translational modifications


phosphorylation, glycosylation
Might reflect different destruction rates
Analysis: How to Match Gels Despite
Irrelevant Position Variations?

Given two gels:




Identify corresponding landmark spots
Warp as needed…
rotate & stretch images to match spots nicely
We can then get a table of experimental
conditions vs. protein expression levels
Analyze like gene expression tables:

cluster similar-behaving proteins, etc.
2D-PAGE Databases


Various databases of 2D-PAGE data exist
You can even compare 2 gels visually over
the Web with Flicker or CAROL

Flicker uses flickering:




rapidly alternating the two gel images
lets you see what changes between them
CAROL also allows comparisons in a Web browser
Other software: Delta2D, ImageMaster, Melanie,
PDQuest, Progenesis and REDFIN
Analysis of
Protein Mass Spectrometry
Results
(K2)

It distinguishes among particles


they must have different mass/charge ratios
What is the general idea of MS?
Analyzing Results of
Protein Mass Spectrometry

Results can help identify proteins, e.g. by…

Peptide-mass fingerprinting


Break the protein into pieces with, e.g., trypsin
The pieces are short sequences of amino acids




Such sequences are called peptides
Use MS to determine the masses of these peptides
Different proteins will have different “fingerprints”
Match the fingerprint in a database of fingerprints


A good match means you’ve found the right DB record
Return the protein name for that record!
From http://en.wikipedia.org/wiki/Trypsin

“trypsin predominantly cleaves proteins
at the carboxyl side (or "C-terminal
side") of the amino acids lysine and
arginine, except when either is followed
by proline.”
Creating a Peptide-Mass Fingerprint
from a Sequence

What we want:





predict the peptide-mass fingerprint given a sequence
do it for each sequence in a database
then find the best match to a MS experimental result
if the match is good enough, we’ve identified the protein
Finding the fingerprint from the sequence

Start with a very predictable cleavage agent

Trypsin is ideal


it cleaves a sequence after every lysine or arginine
 …unless followed by a proline
Results are tryptic peptides (why “tryptic”?)
Creating a Peptide-Mass Fingerprint
from a Sequence
Finding the fingerprint from the sequence

Start with a predictable cleavage agent

Trypsin is ideal
it cleaves a sequence after every lysine or arginine
 …unless followed by a proline
 Results are tryptic peptides (why “tryptic”?)
 Example (from Westhead et al. p. 186):
 sequence:
MCLTAKGAATCSATFRYLIFALSLATKPACALLASALLARACATTAVA
Where are the resulting peptide boundaries?

Hint



Arginine – R
Lysine – K
Proline – P
Matching the Fingerprint Cautions

The same amino acids but in different orders can
lead to a mistaken assessment of match


Leucine and isoleucine have the same mass


Fortunately, for DB lookup it “does not have a practical
impact” – Westhead et al. p. 187
Subsequences of Ks and Rs are cleaved randomly
(RKK, RK, etc.)


Fortunately, this rarely happens
Makes the algorithms a little more complicated, no big deal
Etc.…(see pp. 187-188 of Westhead et al.)
Supplementary slides
SDS-Page
•PAGE, PolyAcrylamide Gel Elecropheresis
provides a perfect environment for proteins
of different sizes to move at different rates.
•Acrylamide can be polymerized to generate
gels of varying but controlled pore size.
•the elecropheretic separation will be as
following: smaller molecules will move
freely in an electric field and larger
molecules will be restricted in their
migration.
•Protein visualization is the last part of an
top view of the tunnels formed
SDS-PAGE where a mixture of water: acetic
in the PAGE
acid: methanol is used to cause the proteins
to be "fixed." Then, the gel is stained with
Coomassie brilliant Blue.
Gel Electrophoresis Cont.
• Molecular weight markers are used to help estimate the size of DNA
fragments. These markers consist of DNA molecules of known weights.
References
http://web.utk.edu/~khughes/GEL/sld001.htm
http://www.dnalc.org/ddnalc/resou