Exercise 1: Analyzing the topology of interaction networks

Exercise 1: Analyzing the topology of
interaction networks
Overview:
This exercise introduces you to the NetworkAnalayzer plugin for
Cytoscape, a plugin for statistical analysis of networks.
Gettting started…
In the case that Cytoscape and the plug-ins directory has not been
installed at the computer from last weeks exercise, please do so:
You will need to install the newest Cytoscape and get data and plugins
from http://www.cbs.dtu.dk/chipcourse/biosys/Exercises/Cytoscape/
Install bundles (you need only one depending on you OS):
Cytoscape_2_4_1_windows.exe
Cytoscape_2_4_1_macos.dmg
Cytoscape_2_4_1_unix.sh
Plugins (all jar files):
NetworkAnalyzer.jar
MCODE_v1_2.jar
jActiveModules.jar
jfreechart-0.9.20.jar
jfreechart-common-0.9.5.jar
Save all of these files in your Cytoscape installation’s plugins directory.
Probably something like C:\Program Files\Cytoscape_v2.4.1\plugins
Data:
RUAL.subset.sif
Save the data files to your computer (e.g. on the desktop) and make
sure that the “.txt” has not been added to the end of your file
name. If so, remove it such that the RUAL file name ends in “.sif”.
In this exercise we will use a subset of the human interaction dataset by
Rual et al. (Nature.2005 Oct 20;437(7062):1173-8).
Network statistics
The first part of the exercise deals with topology analysis of the proteinprotein interaction network.
STEP 1: Load the network RUAL.subset.sif into Cytoscape by selecting
‘Import’ under the ‘File’ menu, then selecting ‘Network (multiple file
types)’, and then specifying the location of the file.
This network consists of 1089 interactions observed between 419
human proteins, and is a small subset of a large human interaction
dataset. This subset of interactions consists of proteins that interact with
the transcription factor protein TP53.
STEP 2: Try some of the different layouts (circular, organic, hierarchical
and random) by selecting the appropriate layout in the ‘yFiles’ under the
‘Layout’.
One of the most useful layouts for network biology is organic layout. Try
the organic layout: Under ’Layout’ and ‘yfiles’, select ‘organic.
STEP 3: The nodes in this network are labeled by numeric Entrez IDs,
which are the IDs employed by NCBI (www.ncbi.nlm.nih.gov). The node
representing TP53 is numbered 7157. Select this node: Select ->
Nodes -> By Name…. A popup window will appear. Enter the node id
“7157” and click Search, which should highlight TP53 (TP53 will now
appear yellow in the network).
Q1: How many proteins interact with TP53?
STEP 4: First, deselect all nodes in the network. Then apply the
NetworkAnalyzer plugin to the network by selecting ‘Network Analyzer’
from the Plugins menu (ignore the Cytoscape warning that the network
contains both directed and undirected edges). This should produce the
following window (In the case it doesn’t, screenshot of the relevant
windows can be seen below):
As you can see, the NetworkAnalyzer plugin calculates various network
parameters. Browse through the various network statistics/parameters
and try to answer the following questions:
Q2: What is the average degree (connectivity) of the network?
Q3: What is the most likely degree of a random selected node in
the network? And where is TP53 in the node degree distribution?
Q4: Use the node degree distribution and the distribution of
average cluster coefficient (C(k)) to determine whether the network
structure is random, scale free or hierarchical (hint: look at box 2 in
the Barabasi paper) ?
STEP 5: Have a look at the shortest path length distribution for the
entire network using the NetworkAnalyzer plugin.
Q5: What is the highest number of edges that you need to connect
any two nodes in the network?
This phenomenon is known as ‘small-world-network’ and can be found
in many real life networks, e.g. the network that connects actors who
have appeared in the same movie.
Extra: You can connect any to actors on
http://oracleofbacon.org/advanced.html. Try, just for fun, with a few
actors and see how many edges (movies) that are required to connect
them.
STEP 6: As the average cluster coefficient is relatively high it is to be
expected that there will exists some clusters (complexes) in the
network. You will now be introduced to the MCODE plugin. Try to
identify the protein complexes using MOCDE: under the ‘plugins’ menu,
select ‘MCODE’ and then ‘start MCODE’. There are a lot of parameters
under the ‘advanced options’ that can be changed, but just ignore those
and run MCODE on the whole network by clicking ‘Analyze’. This will
identify several complexes, which can be seen in the new Cytoscape
panel that appears.
Try to click on a complex (a complex in the ‘MCODE Results summary’).
This will highlight the complex (yellow nodes) in the large network. The
size threshold allows you to include or exclude more nodes in the
network, but you can ignore that in this exercise.
Try to browse some of the complexes with a score above 1. How many
of these complexes would you have found by manual inspection of the
large network?
Q6: Are there any interactions between protein in complex one
and proteins in complex two? What if you increase the size
threshold?
Exercise 2: Identifying nucleoluslocated protein complexes
Overview:
In this exercise we will integrate functional information with protein-protein
interaction network in a ‘real life’ case study for the human nucleolus. The
nucleolus is an ill-defined substructure found in the nucleus and little is known
about it, however, it is known to be involved in ribosome biogenesis.
Based on a proteomics study an external collaborator has made a mass-specbased purification of proteins found in the human nucleolus. This data is
unfortunately a mixture of true nucleolus proteins and contaminants (proteins
that are not in the nucleolus). You are now faced with the challenge of
identifying true nucleolus-located protein complexes (if there are any!).
Getting started
Restart Cytoscape (this is a quick way to clear all networks and all attributes)
Data:
The protein-protein interaction data can be found in a SIF file:
http://www.cbs.dtu.dk/courses/27041/exercises/Ex2/nucleolus_exercise.sif
To aid the identification of nucleolus-located complexes, a prediction method
for predicting nucleolus-located proteins have been made and applied to
almost all proteins in the interaction data. So, each protein in the interaction
data set has a nucleolus between 0 and 100 - the higher a score for a protein,
the more likely it is to be located in the nucleolus (however, the predictor is
not perfect and can make wrong predictions!). The predictions can be found
here:
http://www.cbs.dtu.dk/courses/27041/exercises/Ex2/nucleolus_prediction.noa
It is only rarely that functional descriptions can be found for human proteins,
so a short functional description of each protein in the data set have been
found by searching for the same protein in yeast and transfer the functional
description to the human ortholog.:
http://www.cbs.dtu.dk/courses/27041/exercises/Ex2/protein_description.noa
Finding nucleolus-related protein complexes
STEP 1: Start Cytoscape and load in the interaction network, the prediction
score, and the functional annotation- The latter two should be loaded as node
attributes.
Find a suitable layout
STEP 2: Use MCODE to identify potential protein complexes in the
network using the default settings.
Q7; How many complexes does the MCODE algorithm identify?
STEP 3: Now we will the visual style to highlight the proteins with a high
nucleolus prediction score. The visual style can be tailor made and to
make a new visual style, open the “VizMapper” in the “view” menu.
Duplicate the existing visual style and try to make a new visual style
(click the ‘define’ button) where the color of each node represents the
prediction score from the prediction method, i.e. make a continuous
mapper that uses the prediction score as attribute. Set the color
scheme, so that there is a color for scores below 25, for 25, 50, 75 and
above 80. Set the colors, so it is easy to separate the low values from
the high values, e.g. let low values be red, intermediate values (50)
yellow, and high values green).
Q8; Can you see any regions in the network, where proteins with
high nucleolus predictions core are clustered
STEP 4: Change the node attribute setting, so it uses the functional
description as attribute.
Look through MCODE complexes with a score above 2.5 by selecting
the nodes in a complex (clicking on the MCODE output-window) and
transfer them to a new network. (‘select’ – ‘to new network’ – ‘selected
nodes, all edges’).
Q9: Can you find complexes where three (or more) proteins have
high prediction scores and they seem to have a similar functional
description? Is this likely to occur by random?
If you find any such protein complexes, write down the number of
proteins in the complex and the most occurring functional
description.(hint: select all the proteins in the complex and display
the attributes in the attribute browser)
Exersice 3: Active Modules
The method for finding “active modules” in interaction networks using
gene expression data was published in Ideker T, et al. Discovering
regulatory and signalling circuits in molecular interaction networks.
Bioinformatics. 2002.
jActiveModules is the plugin implementation of this module search and
scoring method. In this exercise, we will use it on the galatcose data
from the sample directory. It requires that protein-protein interaction
network and p-value attributes have been imported (as we have done in
last weeks exercise).
STEP 1:
Load the network “galFiltered.sif” from your Cytoscape installation’s
sampleData directory (it should be).
You will also need the ORF2gene.na file, so download it to your laptop,
and import it as a node attribute.
Your network will contain a combination of protein-protein (pp) and
protein-DNA (pd) interactions.
Import expression data table: File -> Import -> Attribute/Expression
Matrix…, and select the “galExpData.pvals” file from your sampleData
directory. This file contains gene expression measurements for three
knock-out perturbation experiments. In each experiment, the expression
for a different transcription factor knock-out strain was measured.
After a brief load, a status window will appear, indicating how many
experimental conditions were found (three) and what type significance
values were included.
Set the parameters that jActiveModules will use to score modules.
Plugins -> jActiveModules -> Set Parameters. Select all available pvalue attributes (3) under Attribute Selection and then the Dismiss
button.
STEP 2: Run the search algorithm: Plugins -> jActiveModules -> Find
Modules. A results window should appear when the search is finished.
STEP 3: Select a module result by selecting a network row in the
jActiveModules results window. This will select the corresponding nodes
in the larger graph.
STEP 4: Select the second ranking module and create a new subnetwork. Select the main Cytoscape window (to change focus from the
jActiveModule window) and create a new network with Ctrl+N.
STEP 5: On the VizMapper manager window, click the Duplicate button
to create a new visual style named “Gal80” (or the like) to duplicate the
default style. Click on the Define button to define your new style. This
will bring up the main VizMapper settings window.
The default tab of the VizMapper settings defines the Node Color of
this visual style. Set the Node Color as follows:
i.
ii.
iii.
iv.
v.
Under Mapping, click on the pull-down menu labeled None
and select RedGreen.
In the pull-down menu labeled MapAttribute, select the
attribute “gal80Rexp”. This specifies that each node will be
colored on a color continuum according to Gal80
expression, as follows:
Note that the default node color of pink may fall within this
spectrum. A useful trick is to choose a color outside this
spectrum to distinguish nodes with no expression value
defined. Under Default, click on Change Default, and
select a default color of grey.
Change the color of ‘Above’ to a dark green (black will hide
the node label).
Finally, click on Apply to Network. You should see most
nodes colored pink, green, or white, with a few grey nodes
and a few black nodes.
Layout the edge properties
of this network: Start with
the Edge Attributes (button
at top) and select the Edge
Color tab. Under Mapping,
select ‘BasicDiscrete’ from
the pull-down menu. Make
sure that ‘Map Attribute’
pull-down is set to the
interaction type (edge
attribute called ‘interaction’),
edge colors are set
something like in the screen
image. Apply to Network.
‘pp’ stands for proteinprotein interactions while
‘pd’ stands for protein-DNA
interactions (between
transcription factors and
DNA promoter regions).
STEP 6: Add arrow heads to
the ‘pd’ edges. Select the
‘Edge Target Arrow’ under the ‘Edge Attributes’ tab of the VizMapper
settings window. Under ‘Mapping’, choose ‘BasicDiscrete’, the default
discrete mapping function. Choose an arrow-head for the ‘pd’ graphic
(as in the screen image) and ‘Apply to Network’.
Q10: What can we guess about the activity of Rap1p (YNL216W) in
the gal80 deletion data?

Download Report

Exercise 1: Analyzing the topology of interaction networks

Paperzz.com

Your Paperzz