Protein Domain - Kyoto University Bioinformatics Center

九大数理集中講義
Comparison, Analysis, and Control of
Biological Networks (3)
Domain-Based Mathematical Models for
Protein Evolution
Tatsuya Akutsu
Bioinformatics Center
Institute for Chemical Research
Kyoto University
Contents



A simple evolutionary model of protein
domains
A domain-based model of protein-protein
interaction networks
An evolutionary model of multi domain
proteins
Motivation of Our Studies
Explaining observed distributions on
proteins and PPI networks




PPI networks are scale-free [Jeong et al., 2001]
#(proteins having k domain families) follows exponential
distribution [Koonin et al., 2002]
#(proteins having k domains) follows power-law [Koonin et al.,
2002]

#(domains appearing in k proteins) follows power-law
[Wuchty, 2001]
Providing simple evolutionary models


In real proteins, what evolve are not networks
but genes/proteins
An Evolutionary Model of
Protein Domains
J.C. Nacher, M. Hayashida and T. Akutsu: Physica A, 367, 538-552, 2006
Protein Domain
Domain: Well-defined region within a
protein that either performs a specific
function or constitutes a stable unit
Protein consisting of
3 domains
Evolutionary Model of Protein Domains
N proteins, each
one consists of
only one domain
(domains are different from each other)
We repeat T times the following steps:
a) With probability (1-a) we create a new protein with new domain
(MUTATION)
b) Otherwise, we randomly select one protein and make a copy of it
(PROTEIN DUPLICATION)
We assume that each protein consists of only one domain
Model
(continued)
Mutation
Duplication of Protein
a
1-a
T times
a ~ 1.0



i : i-th kind of domain
ki : number of proteins consisting of i-th domain
ti : time when i-th domain was first created
dki
ki
a
dt
t
Q(k )  k [ 1(1/ a )]
t
ki  c 
 ti 
a
As in
Barabasi &
Albert 1999
Q(k): number of domains each of which appears in k proteins
Model of Protein Evolution
Protein
duplication
mutation
Prob.= 1-
a
Prob.= a
Exaplanation of Q(k)
Types of domains
1
2
3
4
5
6
Types of proteins
k1  1, k2  3, k3  2, k4  2, k5  2, k6  1
Q(1)  62 , Q(2)  63 , Q(3)  16 , Q(4)  Q(5)    0
Our Model vs. Preferential Attachment

Similarity




#(proteins with the i-th domain) ⇔ degree of the i-th node
Duplication of protein with the i-th domain ⇔ Attachment of an edge to
the i-th node
Mutation (creation of a protein with a new domain) ⇔ Addition of a new
vertex
Difference:
k [ 1(1/ a )]
vs. k 3
PD(1)=3
PD(2)=1
PD(3)=1
1-a
a
Duplication
Mutation
new edge
a ~ 1.0
new node
A Domain-Based Model of ProteinProtein Interaction Networks
J.C. Nacher, M. Hayashida and T. Akutsu: BioSystems, 95, 155-159, 2009
A Domain-Based Model of Protein-Protein Interactions
[Sprinzak & Margalit 2001, Deng et al. 2002]

Proteins interact ⇔ There exist interacting domain pair(s)
Domain-Domain
Interaction
A
X
Protein-Protein
Interaction
B
Y
C
D
Z
Combination of Domain Evolution Model and
Domain-based Protein-Protein Interaction Model
Evolutional model of protein domains
PD (k )  k [ 1(1/ a )]
Random interaction of domains
Pr( Di interacts with D j )  
Domain-based protein-protein interactions





Proteins interact ⇔ There exist interacting domain pair(s)
Scale-free property of PPI (protein-protein
interaction network)
[ 1(1/ a )]
PPPI (k )  k
Mathematical Analysis
domain
A
nA=x
=3
However, if the number of domain-domain interactions is large,
the distribution approaches to the normal distribution because of the
central limit theorem
domain
B
nB=y
=2
3 proteins
with
degree 2
An Evolutionary Model of Multi
Domain Proteins
J.C. Nacher, M. Hayashida and T. Akutsu: BioSystems, 101:127-135, 2010.
Domain Fusion and Internal Duplication (1)


1. Internal Duplication
 Duplication of one or more domains inside one protein
2. Domain Fusion
 Two proteins are merged
Protein
Duplication
Mutation
Internal Domain
duplication
Domain Fusion
Modeling of Duplication, Mutation and Fusion (1)




Ni(t) : #proteins having i domains at time t
pm : prob. mutation (creation of new protein) occurs
pd : prob. duplication occurs
pf : prob. fusion occurs
Modeling of Duplication, Mutation and Fusion (2)
By letting ni(t) =Ni(t) /t and ni = ni(t) for t→∞
Modeling of Duplication, Mutation and Fusion (3)
Using generation function, we have exact solution
Using Stirling’s approximation
It shows nk follows almost exponential distribution
Modeling of Internal Duplication
By letting ni(t) =Ni(t) /t and ni = ni(t) for t→∞
nk follows
power-law
Combination of Mutation, Fusion, Internal/External Duplications
Difficult to solve
⇒ Computer simulation
Summary


A simple (simplest? ) model of protein domain
evolution, which explains power-raw distribution
A domain-based model of protein-protein
interaction network
⇒ Explains power-law property of PPI
⇒ Good agreement between simulation and real data
⇒ Simpler than existing models (e.g., duplication-divergence)

An evolutionary model of multi-domain proteins
⇒ #(proteins having k domain families) follows exponential
⇒ #(proteins having k domains) follows power-law
⇒ Good agreement between simulation and real data
⇒ Importance of role of internal duplications