BIOINF 4120
Bioinformatik II
- Structures and Systems Oliver Kohlbacher
SS 2010
21. Network Structure,
PPI Networks
Abt. Simulation biologischer Systeme
WSI/ZBIT, Eberhard Karls Universität Tübingen
Overview
• Network structure
– Statistical measures of networks
– Types of networks
– Models of network evolution
• Protein-Protein Interaction Networks and their
properties
– Recap: Experimental methods
– Structure of PPI networks
• Going full circle: systems biology and protein
structure
– 10,000 interactions
– Validating PPI networks using structure
Network Structure
• Biological networks possess a structure related to
their function
• We can distinguish several types of networks
– Random networks
– Scale-free networks
– Hierarchical networks
– …
• These networks differ in their topological structures
• These differences can be assessed using graphtheoretical/statistical measures of the graph
structure
1
Network Measures
• Given a biological network G(V,
E), we can define a series of
measures describing the
network in a statistical fashion
• One of the simplest measures is
the degree or connectivity k of
each node
• For directed networks, we
need to distinguish in- and outdegree, kin and kout
• For an undirected graph, the
expected value <k> for the
degree is given by
<k> = 2|E|/|V|
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
Degree Distribution
• The degree distribution P(k)
for a network is obtained by
counting the nodes with a
given degree
• P(k) is characteristic for
certain types of networks and
contains essential
information on the structure
• The figure on the right shows
the (log-transformed) outdegree distributions of eleven
prokaryotic metabolic
networks
Zhu & Qin. BMC Bioinformatics 2005, 6:8
Hubs
• Hubs are airports that serve as transfer points for an airline
• They are connected to a large number of other airports
www.hemispheresmagazine.com
2
Hubs
• Hubs occur in biological
networks as well
• Hubs are easily
recognized by their large
node degree
• Degree distribution
describes the likelihood
of finding a hub of a
certain size in a network
Example:
• PPI network of yeast
• Arrows mark two obvious
hubs
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
Clustering Coefficient
• Clustering coefficient Ci describes how
close a given node i is to being part of a
clique
• The neighborhood Ni of a node i is
defined as the set of nodes adjacent to i:
• Clustering coefficient Ci is then defined
as
Ci describes the ratio of triangles passing
through A (¸i) and the number of
potential triangles (¿i) formed by its
neighbors
B
A
C
D
Example:
NA = {B, C, D}
¸i = 2, ¿i = 6
) Ci = 1/3
Path-Based Properties
• Based on the computation of all-pairs shortest paths,
one can define various measures describing how close
vertices are
• The shortest distance lmin(i, j) between nodes i and j
can be efficiently computed even for large graphs
• From this one can define several measures:
– Average shortest path length
– Graph diameter: length of the longest shortest path
3
Centrality
• Centrality is a measure of the relative
‘importance’ of a node in the network
• There are various ways to define centrality
– Degree centrality is simply the node degree
– Betweenness centrality measures on how many
shortest paths between other nodes a node lies:
here ¾st is the number of shortest paths from node
s to t and ¾st(v) the number of shortest paths from
s to t passing through v
Scale-Free Networks
• The degree distribution of many
biological networks follows a
power law
P(k) ~ k-°
where ° is the so-called degree
exponent
• In these networks most nodes are
of low degree, although there is a
significant number of hubs of a
degree much larger than average
• ° usually lies between 2 and 3
• The lower °, the more important
are the hubs
Plotting P(k) as a function of k
in log scales yields a straight
line for a degree distribution
following a power law, the
signature of a scale-free
network.
Zhu & Qin. BMC Bioinformatics 2005, 6:8
Scale-Free Metabolic Networks
• In metabolic networks most
metabolites are involved in few
reactions only
• Co-factors such as ADP/ATP,
however, serve as metabolic hubs
and connect remote parts of the
network
• If one translates the reaction schema
(a) into a graph where any two nodes
(metabolites) are connected if they
participate in the same reaction, one
obtains a scale-free network (d)
• Paths are thus short between any
two metabolites (although these
paths are meaningless for direct
biosynthesis!) and remote parts of
metabolic networks are coupled
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
4
Small-World Property
• Networks in which most nodes are not adjacent, but
can be reached within a few steps are called smallworld networks
• This property has also been found in many social
networks or in the world-wide web
• It requires a small average shortest path length
• Scale-free networks have the small-world property, as
hubs allow to traverse the network quickly
• In metabolic networks, most pairs of metabolites are
linked by just three or four reactions
• Scale-free networks are usually considered ultra
small, as their mean shortest path length is
significantly smaller than for comparable other types
of networks (e.g., random networks)
Types of Networks
We will now examine three types of networks (random
networks, scale-free networks, and hierarchical
networks) and discuss
• their properties, and
• through which processes they may arise.
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
Random Networks
• Random networks can
be constructed
according to the ErdösRényi (ER) model
• Construction starts with
N disconnected nodes
• For every pair of nodes,
an edge is inserted with
probability p
• This results in a graph
with approximately
pN(N-1)/2 edges
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
5
Random Networks
• In ER random networks the degree
distribution follows a Poisson distribution
with
• Most nodes thus have a node degree close
to <k>
• Clustering coefficient is constant, as it
does not depend on k in this case
• The mean shortest path length is
proportional to log N, it thus has the
small-world property
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
Random Networks
• ER networks tend to have one
large, giant component
• Initially, disconnected nodes
are connected by edges only
• As the network grows, it
becomes increasingly more
likely that a new edge
connects two existing
components
• If |E| ~ |N|/2 the diameter of
the networks grows suddenly
as a giant component
containing most of the nodes is
formed
• It contains O(N) nodes while
the second largest component
contains only O(log N) nodes
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
Random Networks – Demo
http://www.ct.infn.it/~cactus/applets/Giant%20Component.html
6
Scale-Free Networks
• Scale-free networks have a
power-law degree
distribution
• They are characterized by
the presence of hubs, i.e.,
nodes that have significantly
higher degree
• Models for the construction
of scale-free networks are
based on preferential
attachment: nodes with high
degrees are more likely to
get new edges
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
Scale-Free Networks
• The Barabasi-Albert model for the construction of
scale-free networks is based on preferential
attachment
Algorithm:
– Start with N0 isolated nodes
– For t iterations:
• Add one new node v and connect v by m new
edges to the existing graph
• Attachment probability Πi to a node i depends
on node degree ki:
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
Scale-Free Networks
• The Barabasi-Albert model constructs networks that
are scale-free
• In this model, the degree distribution follows a power
law with ° = 3:
• The average path length in such a network is O(log
log N), the networks are thus ultra small
• While this model can explain some of the properties
of scale-free networks observed in biology, they
cannot account for all aspects, in particular not
varying degree exponents
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
7
Scale-Free Networks
• Scale-free networks have a
power-law degree distribution
• They are characterized by the
presence of hubs, i.e., nodes
that have significantly higher
degree than the average node
• Scale-free networks
constructed by the BarabasiAlbert model do not have a
modular structure, thus C(k) is
independent of k
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
Hierarchical Networks
• Hierarchical networks account
for the modularity observed in
biological networks
(pathways, operons, …)
• A simple model for the
construction of a hierarchical
model works like this:
• A small subnetwork of
four densely connected
nodes (blue) is replicated
• These replicas (green) are
connected to the original
motif and then again
replicated (red)
• This process can be repeated
over and over again
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
Hierarchical Networks
• The resulting networks show a
power-law degree distribution
• Due to their modular structure, the
clustering coefficient depends on the
node degree:
• Nodes of a lower degree have a
higher clustering coefficient as
they are densely connected to
their local module
• Nodes of a higher degree (hubs)
are connected to many modules
and thus have a lower clustering
coefficient
• The degree exponent of such a
network is γ = 2.26
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
8
Biological Network Evolution
• The Barabasi-Albert model is
based on two fundamental
mechanisms
• Network growth
• Preferential attachment
• Together these mechanisms
generate hubs and thus a scalefree network
• Similar mechanisms are at work
in protein-protein interaction
networks:
• Gene duplication creates a
new node in the network
• Since the two proteins are
still identical, they share
the same interactions
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
Interactomics: PPI Networks
• Protein-protein interaction (PPI) networks are graphs containing
an edge for each PPI
• They are scale-free networks, typically with degree exponents
between 2.2 and 2.4
• They show significant functional clustering: proteins with
related function often form densely connected subgraphs
http://www.nature.com/nbt/journal/v20/n10/fig_tab/nbt1002-991_F1.html
Yeast Two-Hybrid Screening
• Yeast two-hybrid (Y2H) is a
high-throughput method for
interactomics based on
genetically engineered yeast
• To test whether two protein
domains (prey and bait)
interact, both are expressed
as fusion proteins
• Bait is fused to a DNA-binding
domain binding to a promoter
region
• Prey is fused to an activation
domain
• If both interact, they activate
a reporter gene (e.g. lacZ to
create colored colonies)
http://www.genscript.com/images/yeast_two_hybrid_system.jpg
9
Yeast PPI Network
• This network shows the
largest connected
component of the yeast
interactome as determined
by Y2H
• This component contains
78% of all proteins
• Nodes are color-coded by
the effect of a knock-out
mutant
–
–
–
–
Red: lethal
Green: non-lethal
Orange: slow growth
Yellow: unknown
• Hubs are often colored
red!
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
Robustness
• A key property of biological
networks is their robustness
against perturbations
• Scale-free networks are
very robust against random
deletions: the majority of
nodes will affect mostly
small-degree nodes
• Hubs, however, are very
vulnerable and their
removal leads to a
disintegration of the whole
network
• This agrees well with the
results we discussed for
knock-outs in the E. coli
metabolic network
Barabasi & Oltvai, Nat. Rev. Genetics (2004), 5:101
PPIs from a Structural Perspective
• The basis of an edge in a PPI network is always
a physical interaction between two proteins, a
binding process
• It thus makes sense to examine PPIs from a
structural perspective
• For many protein-protein complexes we have
known 3D structures
• From these we can also conjecture
interactions between close homologs
• Structural studies can thus help to interpret or
validate PPI networks
Aloy & Russell, Nat. Biotech. (2004), 22:1317.
10
PPIs from a Structural Perspective
• Aloy & Russell define an
‘interaction type’, which they
call the interaction equivalent
of a protein fold
• An interaction type groups
together all pairs of
interacting domains that
interact in the same way
• Interaction types are often
conserved between homologs:
if an interaction is observed
between two proteins, then in
90% of all cases this
interaction is also present
between a pair of homologs
Aloy & Russell, Nat. Biotech. (2004), 22:1317.
PPIs from a Structural Perspective
From the number of known interaction types and the interactions
present in large PPI datasets, one can extrapolate that the total number
of protein interactions should be on the order of 10,000 interactions
types.
Aloy & Russell, Nat. Biotech. (2004), 22:1317.
References
Papers
• Barabasi & Oltvai. Network Biology: Understanding the Cell’s
Functional Organization. Nat. Rev. Genetics (2004), 5:101
• Aloy & Russell. Ten thousand interactions for the molecular biologist.
Nat. Biotech. (2004), 22:1317
• Aloy & Russell. Interrogating protein interaction networks through
structural biology. Proc. Natl. Acad. Sci. USA (2002), 99:5896
• Han et al. Evidence for dynamically organized modularity in the yeast
protein-protein interaction network. Nature (2004), 430:88
• Barabasi & Albert. Emergence of scaling in random networks. Science
(1999), 286:509-12
Links
• Random graph demo
http://www.ct.infn.it/~cactus/applets/Giant%20Component.html
11
© Copyright 2026 Paperzz