III - EMBnet

Biologically active oligomeric
assemblies
[email protected]
1. Oligomeric Assemblies / Quaternary Structures
! The coordinates present in a PDB entry (e.g. solved by Xray crystallography or NMR) do not necessarily represent
the correct oligomeric assembly of the macromolecule.
! Many proteins are active as (homo- or hetero-)
complexes.
! How do we determine the correct oligomeric assembly
from PDB entries based on
" NMR or
" X-ray crystallography ?
1. Oligomeric Assemblies / Quaternary Structures
X-ray crystallography
More than 80% of protein structures are solved
by means of X-ray diffraction on crystals.
Crystal = translated Unit Cell
An X-ray diffraction experiment produces atomic
coordinates of the crystal’s Asymmetric Unit
(ASU).
In general, neither ASU nor Unit Cell has any
relation to Biological Units, or stable protein
complexes which act as units in physiological
processes.
Unit Cell = all space symmetry
group mates of ASU
Is there a way to infer Biological Unit from the
protein crystallography data?
PDB file
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1. Oligomeric Assemblies / Quaternary Structures
Crystal interfaces
Stability of protein complexes depends on
properties of protein-protein interfaces,
such as
free energy of formation !Gint
solvation energy gain !GS
interface area
hydrogen bonds and salt bridges
across the interface
• hydrophobic specificity
•
•
•
•
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1. Oligomeric Assemblies / Quaternary Structures
Interface assessment
A crystal may be viewed as a packing of
assemblies with biologically insignificant
contacts between them.
Protein assembly is a packing of
monomeric units with biologically relevant
interfaces between them.
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1. Oligomeric Assemblies / Quaternary Structures
At first glance …
… the solution is simple as 1-2:
1. Evaluate all protein contacts (interfaces) in crystal
2. Leave only the strongest (“biologically relevant”) ones
- and what you get will have chances to be a stable protein complex.
Small technical problem:
How to discriminate between “real” (biologically relevant) and “superficial”
(inter-assembly, or crystal packing) interfaces?
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Real and superficial protein interfaces
Most often used
discrimination criteria
- interface area.
dimers
monomers
6000
Buried ASA [Å2]
A cut-off at 900 Å2
gives about 80%
success rate of
discrimination
between monomers
and dimers.
7000
5000
4000
3000
2000
1000
Big proteins would be
always sticky if this
criteria is true …
0
0
20
40
60
80
PDB entry
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Free energy gain of
interface formation.
A cut-off at -8 kcal/M
gives about 82%
success rate of
discrimination
between monomers
and dimers.
Can energy measure
be uniform for all
weights and shapes?
Free Enerfgy Gain [kcal/M]
Real and superficial protein interfaces
0
-20
-40
-60
dimers
monomers
-80
0
20
40
60
PDB entry
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
80
1.1. MSD-PISA
P-value of
hydrophobic patches.
A measure of
probability for the
interface to be more
hydrophobic than
found.
A cut-off at 0.2 gives
about 60% success rate
of discrimination
between monomers
and dimers.
P-value of Hydrophobic Patch
Real and superficial protein interfaces
dimers
monomers
0.8
0.6
0.4
0.2
0
0
20
40
60
80
PDB entry
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Real and superficial protein interfaces
" No ultimate discriminating parameter for
the identification of biologically relevant
protein interfaces may be proposed at
present even for dimeric complexes
Jones, S. & Thornton, J.M. (1996) Principles of
protein-protein interactions, Proc. Natl. Acad. Sci.
USA, 93, 13-20.
" Formation of N>2 -meric complexes is
most probably a corporate process
involving a set of interfaces. Therefore
significance of an interface should not be
detached from the context of protein
complex
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Making assemblies from significant interfaces
Despite failure to find an ultimate measure for interface biological relevance, two
approaches were developed that use scoring of individual interfaces:
" PQS server @ MSD-EBI (Kim Henrick) Trends in Biochem. Sci. (1998) 23, 358
Method: progressive build-up by addition of monomeric chains that suit the
selection criteria. The results are partly curated.
" PITA software @ Thornton group EBI (Hannes Ponstingl) J. Appl. Cryst. (2003) 36, 1116
Method: recursive splitting of the largest complexes as allowed by crystal
symmetry. Termination criteria is derived from the individual statistical scores of
crystal contacts. The results are not curated.
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Chemical stability of protein complexes
" It is not properties of individual interfaces but rather chemical stability
of protein complex in general that really matters
" Protein chains will most likely associate into largest complexes that
are still stable
" A protein complex is stable if its free energy of dissociation is positive:
!Gdiss % $ !Gint $ T!S # 0
How to calculate !Gdiss?
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Protein affinity
Solvation
energies of
dissociated
subunits
Solvation energy of
protein complex
Free energy
of H-bond
formation
Free energy
of salt bridge
formation
n
!Gint % !Gs & A1, A2 ! An ' $ ( !Gs & Ai ' $ Ehb N hb $ Esb N sb
i %1
Number of Hbonds between
dissociated
subunits
Choice of dissociation subunits:
!Gint is function of
protein interfaces
Dissociation into
stable subunits
with minimum
& A1 A2 A3 '
!Gdiss
Number of salt
bridges between
dissociated
subunits
A1 ) A2 ) A3
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Solvation free energy
Atom’s
accessible
surface area
Atomic solvation
parameters
ak
&
!Gs & A' % ( !* k ak $ akr
k
Atom’s accessible surface
area in reference (unfolded)
state
so
lve
nt
k
protein
'
Eisenberg, D. & McLachlan, A.D. (1986)
Nature 319, 199-203.
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Entropy of macromolecules in solutions
Translational entropy
Rotational entropy
Sidechain entropy
&
'
S % Strans &m ' ) S rot Iˆ,* S ) S surf &a '
Mass
Solvent-accessible
surface area
Tensor of inertia
Symmetry number
S trans & m ' + ct ) 3 R log& m '
2
R
S rot & Iˆ,* S ' + cr ) log I1I 2 I 3 * S2
2
&
S surf & a '
+
Fa
Murray C.W. and Verdonik M.L. (2002)
J. Comput.-Aided Mol. Design 16, 741-753.
'
ct , cr and F are semi-empirical parameters
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Entropy of dissociation
n
!S % ( S & Ai ' $S & A1, A2 ! An '
i %1
Mass of i-th subunit
12 m .
% &n $ 1'C ) 3 R log/ i i , )
k-th principal moment of
2
inertia of i-th subunit
0 (i mi 1 2i 2k I k & Ai ' * S2 & Ai ' .
R log/
, ) Faburied
2
/ 2 I & A !A ' * 2 & A !A ' ,
S 1
n 0 k k 1 n
Fitted parameter
Fitted parameter
!S is function of protein complex
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
How to identify an assembly in crystal?
We now know (or we think that we know) how to evaluate chemical
stability of protein complexes.
Given a 3D-arrangement of protein chains, we can now say whether there
are chances that this arrangement is a stable assembly, or biological unit.
But how to get potential assemblies in first place?
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Enumerating assemblies in crystal
" crystal is represented as a periodic
graph with monomeric chains as
vertices and interfaces as edges
" each set of assemblies is identified
by engaged interface types
" all assemblies may be enumerated
by a backtracking scheme engaging
all possible combinations of different
interface types
Example: crystal with 3 interface types
Assembly
Engaged
set
interface types
1
2
3
4
000
001
010
011
- only monomers
- dimer N1
- dimer N2
Assembly
Engaged
set
interface types
5
6
7
8
100
101
110
111
- dimer N3
- all crystal
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Clever backtracking
The number of different interface types may reach a hundred. The algorithm is
not going to complete backtracking of 2100 combinations unless it is clever
enough to
" check geometry and engage induced
interfaces as soon as they emerge
" check geometry and terminate
backtracking if assembly contains
two identical chains in parallel
orientations
" see the future and terminate
backtracking if there are no stable
assemblies down the current branch
of the recursion tree
Engaged
interfaces
Induced
interface
Otherwise assembly will be infinite due
to translation symmetry in crystal
Based on the observation that entropy of
dissociation of unstable assemblies only
increases down the recursion tree
… only then the algorithm completes in 0.1 secs to 1.5 hours depending on the structure …
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
PISA workflow summary
1. Calculate properties of all structures
2. Calculate all crystal contacts and their properties
3. Find all assemblies which are possible in given crystal
4. Evaluate all assemblies for chemical stability and
leave only potentially stable ones
5. Range assemblies by chances to be a biological unit
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Benchmark results
Assembly classification on the benchmark set of 218 structures published in
Ponstingl, H., Kabir, T. and Thornton, J. (2003) Automatic inference of protein quaternary structures
from crystals. J. Appl. Cryst. 36, 1116-1122.
1mer 2mer 3mer 4mer 6mer Other
Sum
Correct
50
4
0
1
0
0
55
91%
6
68+11
0
2+1
0
0
76+12
90%
1
0
22
0
1
0
24
92%
2
3
0
27+6
0
0
32+6
87%
0
0
0
1
10+2
0
11+2
92%
Total: 198+20
90%
198+20 <=> 198 homomers and 20 heteromers
1mer
2mer
3mer
4mer
6mer
Fitted parameters:
1. Free energy of a H-bond :
2. Free energy of a salt bridge :
3. Constant entropy term :
4. Surface entropy factor :
E hb = 0.51
E sb = 0.21
T 3 C = 11.7
T 3 F = 0.57·10-3
kcal/mol
kcal/mol
Classification error in
!Gdiss : ± 5 kcal/mol
kcal/mol
kcal/(mol*Å2)
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
What is beyond the benchmark set?
Classification results obtained for 366 recent depositions into PDB
in reference to manual classification in MSD-EBI :
1mer
2mer
3mer
4mer
5mer
6mer
8mer
10mer
12mer
1mer 2mer 3mer 4mer 5mer 6mer 8mer 10mer 12mer Other Sum Correct
131
11
0
4
0
2
2
0
0
0
150
87%
12+6 88+12
1
4
0
1
2
0
0
0
105+21
79%
1
2
6+2
0
0
1
0
0
0
0
7+5
67%
1+1
5+2
0
25+5
0
0
1+2
0
0
0
32+10
71%
1
0
0
0
2+1
0
0
0
0
0
2+2
75%
2+1
0
0
0
13+2
0
0
0
0
15+4
79%
1
0
1
0
0
0
0
0+2
0
0
0
1+2
67%
0
0
0
0
0
0
0
2
0
0
2
100%
2
0
0
0
0
0
0
0
5+1
0
7+1
75%
Total:
321+45
81%
321+45 <=> 321 homomers and 45 heteromers
Classification error in !Gdiss : ± 5 kcal/mol
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Is it ever going to be 100%?
Nobody should be that naive, because :
" theoretical models for protein affinity and entropy change upon protein
complexation are primitive
" coordinate (experimental) data is of a limited accuracy
" there is no feasible way to take conformations in crystal into account
" experimental data on multimeric states is very limited and not always
reliable - calibration of parameters is difficult
" protein assemblies may exist in some environments and dissociate in
other - a definite answer is simply not there
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
Web-server PISA
http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)
1.1. MSD-PISA
! And what about Protein / DNA complexes?
" Support for Protein-DNA/RNA and DNA/RNA-DNA/RNA
interactions added to PISA 1.05 (17/02/2006 )
1.1. MSD-PISA
Conclusions
" Stable protein complexes, which are likely to be biological units, may
be calculated from protein crystallography data at 80-90% success rate
" Biological relevance of a particular protein interface cannot be reliably
inferred from the interface properties only. Instead, one should
conclude about significance of an interface from the analysis of the
relevant protein assemblies
(slides courtesy of Eugene Krissinel & Kim Henrick, MSD-EBI)