Networks in Biology

Networks in Biology
7.6., 14.6., and 21.6.2013
Dr. Katja Nowick
[email protected]
www.nowick-lab.info
Networks in Biology
Introduction
Dr. Katja Nowick
[email protected]
www.nowick-lab.info
Networks in Biology
Networks in cells (molecular networks):
• Metabolic Networks
• Gene regulatory networks
• Protein-Protein-Interaction networks
Networks between cells:
• Neural networks
• Immune system
Networks in ecosystems:
• Food networks
• Cooperation/Symbiosis
Social networks:
• Friendships
• Epidemiology
Identity of the nodes (vertices)
and meaning of the links (edges)
depends on the studied network
Networks in Biology
Networks in cells:
• Metabolic Networks
• Gene regulatory networks
• Protein-Protein-Interaction networks
Networks between cells:
• Neural networks
• Immune system
Networks in ecosystems:
• Food networks
• Cooperation/Symbiosis
Social networks:
• Friendships
• Epidemiology
A metabolic network is the complete set of metabolic and physical
processes that determine the physiological and biochemical
properties of a cell. As such, these networks comprise the chemical
reactions of metabolism, the metabolic pathways, as well as the
regulatory interactions that guide these reactions. It breaks down
metabolic pathways (such as glycolysis and the Citric acid cycle) into
their respective reactions and enzymes.
Networks in Biology
Networks in cells:
• Metabolic Networks
• Gene regulatory networks
• Protein-Protein-Interaction networks
Networks between cells:
• Neural networks
• Immune system
Networks in ecosystems:
• Food networks
• Cooperation/Symbiosis
Social networks:
• Friendships
• Epidemiology
A gene regulatory network (GRN) is a collection of DNA
segments in a cell which interact with each other indirectly
(through their RNA and protein expression products) and
with other substances in the cell, thereby governing the
expression levels of mRNA and proteins.
Networks in Biology
Networks in cells:
• Metabolic Networks
• Gene regulatory networks
• Protein-Protein-Interaction networks
Networks between cells:
• Neural networks
• Immune system
Networks in ecosystems:
• Food networks
• Cooperation/Symbiosis
Social networks:
• Friendships
• Epidemiology
Protein–protein interactions occur when two or more
proteins bind together, often to carry out their biological
function. Many of the most important molecular
processes in the cell such as DNA replication are carried
out by large molecular machines that are built from a
large number of protein components organized by their
protein–protein interactions.
Picture: Overview of known and predicted protein–
protein interactions in pre-40S complexes. The
interaction map depicts interactions between the various
assembly factors and ribosomal proteins in pre-40S
complexes.
Networks in Biology
Networks in cells:
• Metabolic Networks
• Gene regulatory networks
• Protein-Protein-Interaction networks
Networks between cells:
• Neural networks
• Immune system
Networks in ecosystems:
• Food networks
• Cooperation/Symbiosis
Social networks:
• Friendships
• Epidemiology
Neurons in the brain are deeply connected with one
another and this results in complex networks
controlling structural and functional aspects of the
brain (e.g. behavior).
Networks in Biology
Networks in cells:
• Metabolic Networks
• Gene regulatory networks
• Protein-Protein-Interaction networks
Networks between cells:
• Neural networks
• Immune system
Networks in ecosystems:
• Food networks
• Cooperation/Symbiosis
Social networks:
• Friendships
• Epidemiology
The immune system is a system of biological structures and
processes within an organism that protects against disease.
Picture: Antigen-presenting cells (APCs) present antigen on
their Class II MHC molecules (MHC2). Helper T cells
recognize these, with the help of their expression of CD4 coreceptor (CD4+). The activation of a resting helper T cell
causes it to release cytokines and other stimulatory signals
(green arrows) that stimulate the activity of macrophages,
killer T cells and B cells, the latter producing antibodies. The
stimulation of B cells and macrophages succeeds a
proliferation of T helper cells.
Networks in Biology
Networks in cells:
• Metabolic Networks
• Gene regulatory networks
• Protein-Protein-Interaction networks
Networks between cells:
• Neural networks
• Immune system
Networks in ecosystems:
• Food networks
• Cooperation/Symbiosis
Social networks:
• Friendships
• Epidemiology
All organisms are connected to each other through
feeding interactions. That is, if a species eats or is
eaten by another species, they are connected in an
intricate food web of predator and prey interactions.
Networks in Biology
Networks in cells:
• Metabolic Networks
• Gene regulatory networks
• Protein-Protein-Interaction networks
Networks between cells:
• Neural networks
• Immune system
Networks in ecosystems:
• Food networks
• Cooperation/Symbiosis
Social networks:
• Friendships
• Epidemiology
Describe close and often long-term interaction between
two or more different biological species.
Picture: In a symbiotic mutualistic relationship, the
clownfish feeds on small invertebrates that otherwise
have potential to harm the sea anemone, and the fecal
matter from the clownfish provides nutrients to the sea
anemone. The clownfish is additionally protected from
predators by the anemone's stinging cells, to which the
clownfish is immune.
Networks in Biology
Networks in cells:
• Metabolic Networks
• Gene regulatory networks
• Protein-Protein-Interaction networks
Networks between cells:
• Neural networks
• Immune system
Networks in ecosystems:
• Food networks
• Cooperation/Symbiosis
Social networks:
• Friendships
• Epidemiology
Used to analyze group dynamics, formation of subgroups,
decision making etc.
Picture: Zachary network of a university karate club: a
division into two subgroups over a political issue lead to the
formation of two separate clubs (Zachary 1977).
Networks in Biology
Networks in cells:
• Metabolic Networks
• Gene regulatory networks
• Protein-Protein-Interaction networks
Networks between cells:
• Neural networks
• Immune system
Networks in ecosystems:
• Food networks
• Cooperation/Symbiosis
Social networks:
• Friendships
• Epidemiology
Literally means "the study of what is upon the
people“. Used to study the spread of diseases, e.g.
virus infections or sexually transmitted diseases.
Picture: Distribution
Worldwide
of
Infectious
Diseases
Directed and undirected networks
1
2
1
3
2
3
4
4
5
5
Directed network
Undirected network
In-degree of a node = its number of incoming links
Out-degree of a node = its number of outgoing links
Degree of a node = its number of links
PS: node = vertex; link = edge
Node degree distribution
(= connectivity distribution)
In undirected networks, the node degree of a node n is its number of links. A self-loop of
a node is counted like two edges for the node degree. The node degree distribution gives
the number of nodes with degree k for k = 0,1,….
In directed networks, the in-degree of a node n is the number of incoming links and the
out-degree is the number of outgoing links. Similar to undirected networks, there are an
in-degree distribution and an out-degree distribution.
Characteristics of biological networks
1. Power law behavior
Node degree distribution (= connectivity distribution) follows a power law: a few nodes with
many links (hubs) and many nodes with only a few links
# Nodes
 It makes no sense to calculate an “average node degree”; networks are often called to be
“scale free”
mean
Long tail
Degree
Characteristics of biological networks
2. Small world characteristics
Every node is connected to every other node by only a small number of links. This property
is commonly achieved by a small number of central nodes (hubs) with many connections.
The distance between two nodes is the smallest number of nodes that have to be traversed
to connect them (= shortest path*).
If the average distance between two nodes is 𝑙 ~ log 𝑁 or smaller, then the network has
small world characteristics.
This is true even for very big networks, e.g.
Acquaintance network of the entire world:
you know everybody over just 6 connections
Facebook: only 4 intermediate people on
average between you and anybody else
*The length of the shortest path between two nodes n and m is L(n,m). The shortest path length distribution gives
the number of node pairs (n,m) with L(n,m) = k for k = 1,2,…
and can be used to analyze small-world properties of the network.
Characteristics of biological networks
3. Hierarchical and modular organization
Module = a natural divisions of a network
into groups of nodes such that there are many links within the groups and few links between
groups.
This organization allows for
a) Network robustness: some redundancy to preserve the function even if components fail
b) Network evolution: testing of mutations without being fatal for the individual
Characteristics of biological networks
4. Certain motifs are common
Network motifs = the simplest building blocks of networks
recurrent and statistically significant sub-graphs or patterns
Examples:
Positive/Negative auto-regulation:
Positive/negative auto-regulation in which a transcription factor (TF)
enhances/represses its own transcription
Feed forward loop:
Consist of two TFs, one regulating the other and both regulating the
same target gene, and can function to accelerate or delay the gene
regulation of the target
Characteristics of biological networks
4. Preferential attachment
Networks typically grow over time by adding nodes.
New nodes seem to prefer to connect to nodes that already have many connections:
“rich-get richer”
 Ultimately creates the power law distribution of connectivity distribution
e.g. Price’s model: Publications that are already famous tend to be cited more frequently
Characteristics of biological networks
5. Are dynamic
• Often need to be activated by external or internal signals
e.g. Neuronal networks: change in the environment activates a neuronal network
leading to change in behavior
• Links can be modified to achieve different outputs
e.g. Gene regulatory network: cells activate different TFs at different times or locations
during development, which affects cell fate
• Links can be added or removed
e.g. Social networks: new friendships or breaking up of friends
• Changes over evolutionary time scales
e.g. all molecular networks: duplication of genes, mutations of genes etc.
Preferential attachment
Why do (some) networks follow a power law distribution?
Originally called “cumulative advantage” (Price 1977)
term “preferential attachment was coined by Barabasi & Alberts 1999
Price’s model considers directed networks
Barabasi and Albert’s model is for undirected networks
Preferential attachment
Price’s model of a citation network of scientific papers:
Directed network
Nodes = papers
Links = citations
c = average number of papers cited by a paper
i.e. average out-degree of the network
A new paper cites papers at random
with probability proportional to the citations the paper has already
(plus a constant a, so that probability is not zero; a > 0)
Preferential attachment
Computational implementation – simulation of the network growth:
Start by giving each node a fixed out-degree of c
Add new links proportional to the in-degree a node has
More precisely:
With probability of c/(c+a) attach a new link in proportion to the in-degree of a node,
otherwise choose a node uniformly at random from the set of all nodes
to chose between these two options create a random number 0 ≤ 𝑟 < 1
then, if r < c/(c+a) choose based on proportion
Picking a node uniformly is easy
To pick a node proportionally to its in-degree, select a link uniformly at random
and then pick the node it points to (a node with in-degree q is q-times more likely to be picked
as a node with in-degree 1)
Preferential attachment
How to do this in practice:
Make an array (list) that contains for each node to which other nodes its links point to
(order is not important)
1
4
2
1
3
1
3
4
1
2
2
…
5
Then you can simply uniformly at random choose an element from this list
 This gives you a node proportional to its in-degree
Preferential attachment
Computational implementation – simulation of the network growth summary
1. Generate a random number r in the range 0 ≤ 𝑟 < 1
2. If r < c/(c+a) choose an element uniformly at random from the array
3. Otherwise choose a node uniformly at random from the set of all nodes
4. Create a new link to the chosen node
5. Update the array by adding the node that got the new link
Preferential attachment
In other networks
World Wide Web
Wikipedia vs. a personal homepage
Circle of friends
“everybody's” favorite vs. the “strange” person
Protein-Protein-Interaction networks
RNA-polymerase, KAP1 vs. a specialized enzyme
Gene regulator network
TF with a short vs. one with a long binding motif
Immune system
B-cells vs. macrophages
Food network
Generalists vs. Specialists
Preferential attachment
Extensions to the model: Time
Older nodes had more time to acquire links
 they are expected to have on average more links
Experiment by Salganik et al. 2006: Song download preference
Website with songs of little-known artists
People could download songs for free
People were told before, how often a song had been downloaded by others already
Observed that songs that had been favored in the beginning of the experiment
had a strong advantage and many more downloads in the end
Control: repeated the experiment but shuffled the download numbers for each song
still, the songs that were (wrongly) reported to have had the highest number of downloads
previously won in the end
 it’s preferential attachment and not real song quality
Preferential attachment
Extensions to the model: Removing links
In Price’s model, links were always added but never removed
(You cannot change a published paper by removing citations)
But in biological networks it is also typical that links get removed
Yet their degree distribution can follow a power law
Simple case: to lose a link happens with a probability that is proportional to the number
of links a node has (“preferential attachment in reverse”)
Note: removing a link affects both nodes that it connects
We want the network to grow, so we want that links are added more often than removed
It can be shown mathematically that the resulting network still has power law behavior
(for details, see chapter 14 in Networks – An Introduction by M.E.J. Newman)
Preferential attachment
Extensions to the model: Attractiveness
So far, all nodes had the same chance of gaining links
But this is not very realistic,
e.g. some websites are more likely to receive new links (e.g. something like Wikipedia vs. a
personal homepage)
e.g. some papers are more likely to get cited (e.g. a paper in Science)
 Some nodes are more “attractive”
If the “attractiveness” factor is very strong, we lose the power law behavior
Typical parameters analyzed in a network
http://med.bioinf.mpi-inf.mpg.de/netanalyzer/help/2.7/index.html#simple
Degree of a node = its number of links
Nodes with exceptionally many links are “hubs”
Parameters related to the neighborhood of a node
The neighborhood of a given node n is the set of its neighbors. The connectivity of n,
denoted by kn, is the size of its neighborhood. The average number of neighbors indicates
the average connectivity of a node in the network.
Network centralization: Networks whose topologies resemble a star have a centralization
close to 1, whereas decentralized networks are characterized by having a centralization
close to 0.
The network heterogeneity reflects the tendency of a network to contain hub nodes.
Typical parameters analyzed in a network
The clustering coefficient of a node is the number of triangles (3-loops) that pass through
this node, relative to the maximum number of 3-loops that could pass through the node.
Picture:
There is one triangle that passes through node b (the triangle bcd). The maximum number
of triangles that could pass through b is three (in this case, the pairs (a, c) and (a, d) would
be connected additionally). This yields a clustering coefficient of Cb = 1 / 3.
Ravasz et al. 2002 used the average clustering coefficient distribution to identify a modular
organization of metabolic networks.
Which nodes are the most important nodes?
Which nodes are the most important nodes?
Typical parameters analyzed in a network
The betweenness centrality of a node reflects the amount of control that this node
exerts over the interactions of other nodes in the network. This measure favors nodes
that join communities (dense sub-networks), rather than nodes that lie inside a
community.
Cb(n) = ∑s≠n≠t (σst (n) / σst), where s and t are nodes in the network different from n,
σst denotes the number of shortest paths from s to t, and σst (n) is the number of
shortest paths from s to t that n lies on.
The betweenness value for each node n is normalized by dividing by the number of
node pairs excluding n: (N-1)(N-2)/2, where N is the total number of nodes in the
connected component that n belongs to.
Picture:
The betweenness centrality of node b is computed as follows:
Cb(b) = ((σac(b) / σac) + (σad(b) / σad) + (σae(b) / σae) + (σcd(b) / σcd) + (σce(b) / σce) + (σde(b) / σde)) / 6
= ((1 / 1) + (1 / 1) + (2 / 2) + (1 / 2) + 0 + 0) / 6 = 3.5 / 6 ≈ 0.583
Example: Betweenness centrality in a TF network
Labeled are the four nodes with highest BC
They tend to connect the two modules/communities
Nowick et al. 2009
Typical parameters analyzed in a network
Closeness centrality is a measure of how fast information spreads from a given node to other
reachable nodes in the network .
The closeness centrality Cc(n) of a node n is defined as the reciprocal of the average shortest
path length and is computed as follows:
Cc(n) = 1 / avg( L(n,m) ),
where L(n,m) is the length of the shortest path between two nodes n and m.
Picture:
For example, the closeness centrality of node b is computed as follows:
Cc(b) = 1/ ( (L(b, a) + L(b, c) + L(b, d) + L(b, e)) / 4)
= 4/ (1 + 1 + 1 + 2) = 4/5 = 0.8
Which nodes are the most important nodes?