Exercise 1: Analyzing the topology of interaction networks Overview: This exercise introduces you to the NetworkAnalayzer plugin for Cytoscape, a plugin for statistical analysis of networks. Gettting started… In the case that Cytoscape and the plug-ins directory has not been installed at the computer from last weeks exercise, please do so: You will need to install the newest Cytoscape and get data and plugins from http://www.cbs.dtu.dk/chipcourse/biosys/Exercises/Cytoscape/ Install bundles (you need only one depending on you OS): Cytoscape_2_4_1_windows.exe Cytoscape_2_4_1_macos.dmg Cytoscape_2_4_1_unix.sh Plugins (all jar files): NetworkAnalyzer.jar MCODE_v1_2.jar jActiveModules.jar jfreechart-0.9.20.jar jfreechart-common-0.9.5.jar Save all of these files in your Cytoscape installation’s plugins directory. Probably something like C:\Program Files\Cytoscape_v2.4.1\plugins Data: RUAL.subset.sif Save the data files to your computer (e.g. on the desktop) and make sure that the “.txt” has not been added to the end of your file name. If so, remove it such that the RUAL file name ends in “.sif”. In this exercise we will use a subset of the human interaction dataset by Rual et al. (Nature.2005 Oct 20;437(7062):1173-8). Network statistics The first part of the exercise deals with topology analysis of the proteinprotein interaction network. STEP 1: Load the network RUAL.subset.sif into Cytoscape by selecting ‘Import’ under the ‘File’ menu, then selecting ‘Network (multiple file types)’, and then specifying the location of the file. This network consists of 1089 interactions observed between 419 human proteins, and is a small subset of a large human interaction dataset. This subset of interactions consists of proteins that interact with the transcription factor protein TP53. STEP 2: Try some of the different layouts (circular, organic, hierarchical and random) by selecting the appropriate layout in the ‘yFiles’ under the ‘Layout’. One of the most useful layouts for network biology is organic layout. Try the organic layout: Under ’Layout’ and ‘yfiles’, select ‘organic. STEP 3: The nodes in this network are labeled by numeric Entrez IDs, which are the IDs employed by NCBI (www.ncbi.nlm.nih.gov). The node representing TP53 is numbered 7157. Select this node: Select -> Nodes -> By Name…. A popup window will appear. Enter the node id “7157” and click Search, which should highlight TP53 (TP53 will now appear yellow in the network). Q1: How many proteins interact with TP53? STEP 4: First, deselect all nodes in the network. Then apply the NetworkAnalyzer plugin to the network by selecting ‘Network Analyzer’ from the Plugins menu (ignore the Cytoscape warning that the network contains both directed and undirected edges). This should produce the following window (In the case it doesn’t, screenshot of the relevant windows can be seen below): As you can see, the NetworkAnalyzer plugin calculates various network parameters. Browse through the various network statistics/parameters and try to answer the following questions: Q2: What is the average degree (connectivity) of the network? Q3: What is the most likely degree of a random selected node in the network? And where is TP53 in the node degree distribution? Q4: Use the node degree distribution and the distribution of average cluster coefficient (C(k)) to determine whether the network structure is random, scale free or hierarchical (hint: look at box 2 in the Barabasi paper) ? STEP 5: Have a look at the shortest path length distribution for the entire network using the NetworkAnalyzer plugin. Q5: What is the highest number of edges that you need to connect any two nodes in the network? This phenomenon is known as ‘small-world-network’ and can be found in many real life networks, e.g. the network that connects actors who have appeared in the same movie. Extra: You can connect any to actors on http://oracleofbacon.org/advanced.html. Try, just for fun, with a few actors and see how many edges (movies) that are required to connect them. STEP 6: As the average cluster coefficient is relatively high it is to be expected that there will exists some clusters (complexes) in the network. You will now be introduced to the MCODE plugin. Try to identify the protein complexes using MOCDE: under the ‘plugins’ menu, select ‘MCODE’ and then ‘start MCODE’. There are a lot of parameters under the ‘advanced options’ that can be changed, but just ignore those and run MCODE on the whole network by clicking ‘Analyze’. This will identify several complexes, which can be seen in the new Cytoscape panel that appears. Try to click on a complex (a complex in the ‘MCODE Results summary’). This will highlight the complex (yellow nodes) in the large network. The size threshold allows you to include or exclude more nodes in the network, but you can ignore that in this exercise. Try to browse some of the complexes with a score above 1. How many of these complexes would you have found by manual inspection of the large network? Q6: Are there any interactions between protein in complex one and proteins in complex two? What if you increase the size threshold? Exercise 2: Identifying nucleoluslocated protein complexes Overview: In this exercise we will integrate functional information with protein-protein interaction network in a ‘real life’ case study for the human nucleolus. The nucleolus is an ill-defined substructure found in the nucleus and little is known about it, however, it is known to be involved in ribosome biogenesis. Based on a proteomics study an external collaborator has made a mass-specbased purification of proteins found in the human nucleolus. This data is unfortunately a mixture of true nucleolus proteins and contaminants (proteins that are not in the nucleolus). You are now faced with the challenge of identifying true nucleolus-located protein complexes (if there are any!). Getting started Restart Cytoscape (this is a quick way to clear all networks and all attributes) Data: The protein-protein interaction data can be found in a SIF file: http://www.cbs.dtu.dk/courses/27041/exercises/Ex2/nucleolus_exercise.sif To aid the identification of nucleolus-located complexes, a prediction method for predicting nucleolus-located proteins have been made and applied to almost all proteins in the interaction data. So, each protein in the interaction data set has a nucleolus between 0 and 100 - the higher a score for a protein, the more likely it is to be located in the nucleolus (however, the predictor is not perfect and can make wrong predictions!). The predictions can be found here: http://www.cbs.dtu.dk/courses/27041/exercises/Ex2/nucleolus_prediction.noa It is only rarely that functional descriptions can be found for human proteins, so a short functional description of each protein in the data set have been found by searching for the same protein in yeast and transfer the functional description to the human ortholog.: http://www.cbs.dtu.dk/courses/27041/exercises/Ex2/protein_description.noa Finding nucleolus-related protein complexes STEP 1: Start Cytoscape and load in the interaction network, the prediction score, and the functional annotation- The latter two should be loaded as node attributes. Find a suitable layout STEP 2: Use MCODE to identify potential protein complexes in the network using the default settings. Q7; How many complexes does the MCODE algorithm identify? STEP 3: Now we will the visual style to highlight the proteins with a high nucleolus prediction score. The visual style can be tailor made and to make a new visual style, open the “VizMapper” in the “view” menu. Duplicate the existing visual style and try to make a new visual style (click the ‘define’ button) where the color of each node represents the prediction score from the prediction method, i.e. make a continuous mapper that uses the prediction score as attribute. Set the color scheme, so that there is a color for scores below 25, for 25, 50, 75 and above 80. Set the colors, so it is easy to separate the low values from the high values, e.g. let low values be red, intermediate values (50) yellow, and high values green). Q8; Can you see any regions in the network, where proteins with high nucleolus predictions core are clustered STEP 4: Change the node attribute setting, so it uses the functional description as attribute. Look through MCODE complexes with a score above 2.5 by selecting the nodes in a complex (clicking on the MCODE output-window) and transfer them to a new network. (‘select’ – ‘to new network’ – ‘selected nodes, all edges’). Q9: Can you find complexes where three (or more) proteins have high prediction scores and they seem to have a similar functional description? Is this likely to occur by random? If you find any such protein complexes, write down the number of proteins in the complex and the most occurring functional description.(hint: select all the proteins in the complex and display the attributes in the attribute browser) Exersice 3: Active Modules The method for finding “active modules” in interaction networks using gene expression data was published in Ideker T, et al. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002. jActiveModules is the plugin implementation of this module search and scoring method. In this exercise, we will use it on the galatcose data from the sample directory. It requires that protein-protein interaction network and p-value attributes have been imported (as we have done in last weeks exercise). STEP 1: Load the network “galFiltered.sif” from your Cytoscape installation’s sampleData directory (it should be). You will also need the ORF2gene.na file, so download it to your laptop, and import it as a node attribute. Your network will contain a combination of protein-protein (pp) and protein-DNA (pd) interactions. Import expression data table: File -> Import -> Attribute/Expression Matrix…, and select the “galExpData.pvals” file from your sampleData directory. This file contains gene expression measurements for three knock-out perturbation experiments. In each experiment, the expression for a different transcription factor knock-out strain was measured. After a brief load, a status window will appear, indicating how many experimental conditions were found (three) and what type significance values were included. Set the parameters that jActiveModules will use to score modules. Plugins -> jActiveModules -> Set Parameters. Select all available pvalue attributes (3) under Attribute Selection and then the Dismiss button. STEP 2: Run the search algorithm: Plugins -> jActiveModules -> Find Modules. A results window should appear when the search is finished. STEP 3: Select a module result by selecting a network row in the jActiveModules results window. This will select the corresponding nodes in the larger graph. STEP 4: Select the second ranking module and create a new subnetwork. Select the main Cytoscape window (to change focus from the jActiveModule window) and create a new network with Ctrl+N. STEP 5: On the VizMapper manager window, click the Duplicate button to create a new visual style named “Gal80” (or the like) to duplicate the default style. Click on the Define button to define your new style. This will bring up the main VizMapper settings window. The default tab of the VizMapper settings defines the Node Color of this visual style. Set the Node Color as follows: i. ii. iii. iv. v. Under Mapping, click on the pull-down menu labeled None and select RedGreen. In the pull-down menu labeled MapAttribute, select the attribute “gal80Rexp”. This specifies that each node will be colored on a color continuum according to Gal80 expression, as follows: Note that the default node color of pink may fall within this spectrum. A useful trick is to choose a color outside this spectrum to distinguish nodes with no expression value defined. Under Default, click on Change Default, and select a default color of grey. Change the color of ‘Above’ to a dark green (black will hide the node label). Finally, click on Apply to Network. You should see most nodes colored pink, green, or white, with a few grey nodes and a few black nodes. Layout the edge properties of this network: Start with the Edge Attributes (button at top) and select the Edge Color tab. Under Mapping, select ‘BasicDiscrete’ from the pull-down menu. Make sure that ‘Map Attribute’ pull-down is set to the interaction type (edge attribute called ‘interaction’), edge colors are set something like in the screen image. Apply to Network. ‘pp’ stands for proteinprotein interactions while ‘pd’ stands for protein-DNA interactions (between transcription factors and DNA promoter regions). STEP 6: Add arrow heads to the ‘pd’ edges. Select the ‘Edge Target Arrow’ under the ‘Edge Attributes’ tab of the VizMapper settings window. Under ‘Mapping’, choose ‘BasicDiscrete’, the default discrete mapping function. Choose an arrow-head for the ‘pd’ graphic (as in the screen image) and ‘Apply to Network’. Q10: What can we guess about the activity of Rap1p (YNL216W) in the gal80 deletion data?
© Copyright 2026 Paperzz