A review of in silico approaches for analysis and prediction of HIV

Briefings in Bioinformatics, 16(5), 2015, 830–851
doi: 10.1093/bib/bbu041
Advance Access Publication Date: 5 December 2014
Paper
A review of in silico approaches for analysis and
prediction of HIV-1-human protein–protein
interactions
Sanghamitra Bandyopadhyay*, Sumanta Ray*, Anirban Mukhopadhyay and
Ujjwal Maulik
Corresponding author. Sanghamitra Bandyopadhyay, Departmental information, Machine Intelligence Unit, Indian Statistical Institute;
E-mail: [email protected]
*These authors contributed equally to this work.
Abstract
The computational or in silico approaches for analysing the HIV-1-human protein–protein interaction (PPI) network,
predicting different host cellular factors and PPIs and discovering several pathways are gaining popularity in the field of HIV
research. Although there exist quite a few studies in this regard, no previous effort has been made to review these works in
a comprehensive manner. Here we review the computational approaches that are devoted to the analysis and prediction of
HIV-1-human PPIs. We have broadly categorized these studies into two fields: computational analysis of HIV-1-human PPI
network and prediction of novel PPIs. We have also presented a comparative assessment of these studies and proposed
some methodologies for discussing the implication of their results. We have also reviewed different computational
techniques for predicting HIV-1-human PPIs and provided a comparative study of their applicability. We believe that our
effort will provide helpful insights to the HIV research community.
Key words: HIV-1-human PPI Network; computational PPI prediction; HIV dependency factor; association rule mining; biclustering; random forest classifier; semi supervised classification; rank aggregation; topological properties of network
Introduction
Since 1980s, rapid progression has been noted in the field of
HIV/AIDS research. HIV belongs to a special class of viruses
called retroviruses. Within this class, HIV is placed in the subgroup of lentiviruses. HIV-1 virus contains a single-stranded
RNA genome, which codes for primarily 19 proteins; thus, it
mostly relies on human cellular functions for most activities
during its life cycle. The RNA genome, consisting of seven
structural landmarks (LTR, TAR, RRE, PE, SLIP, CRS and INS) and
nine genes (gag, pol, env, tat, rev, nef, vif, vpr and vpu), encodes
19 proteins. Among these genes, gag, pol and env contain information needed to make the structural proteins for new virus
particles [1].
One of the important areas of HIV research is the detailed
understanding of HIV biology, its evolution and origins [1–6] and
Sanghamitra Bandyopadhyay, PhD in Computer Science, currently serves as a Professor at Indian Statistical Institute, Kolkata, India. Her research interests include Pattern Recognition, Data Mining, Soft Computing and Bioinformatics.
Sumanta Ray is registered for PhD at Department of Computer Science and Engineering in Jadavpur University, Kolkata, India. He is presntly working as
an Assistant Professor at Aliah University, Kolkata. His research interest lie in Computational Biology, Soft Computing, Graph Theory and Data Mining.
Anirban Mukhopadhyay, PhD in Computer science, is currently an Associate Professor and Head of the Department of Computer Science and Engineering,
University of Kalyani. His research interests include soft and evolutionary computing, data mining, multiobjective optimization, pattern recognition, bioinformatics and optical networks.
Ujjwal Maulik, PhD in Computer Science, is currently serving as a Professor in the Department of Computer Science and Engineering, Jadavpur University,
Kolkata, India. His research interests include evolutionary computing, pattern recognition, data mining, bioinformatics and distributed systems.
Submitted: 22 June 2014; Revised (in revised form): 12 October 2014
C The Author 2014. Published by Oxford University Press. For Permissions, please email: [email protected]
V
830
Review on analysis and prediction of HIV-1–human PPI
the process of replication in host cell. Like all other viruses, HIV1 must hijack the host cellular machinery for increasing the
production of viral genomic material. Initially, HIV-1 virus
attaches the envelop protein GP120, which is embedded in the
outer membrane of the HIV virion, to the human receptor protein CD4 Tþ cells. To pass through the cell membrane, binding
to a second receptor is also required. Two different coreceptors
are also involved in the binding process, namely, CCR5, a chemokine receptor that serves as a coreceptor early in the infection and chemokine receptor (CXCR4) that serves as a
coreceptor later. After entering the cell, an enzyme called
Reverse Transcriptase is responsible for initiating the reverse
transcription of a single stranded viral RNA to produce a double
stranded viral DNA. After that, the viral DNA enters into the
CD4 cell nucleus and integrates into the cell DNA. An enzyme
called integrase acts as a key regulator in this process. After
that, long chain of HIV proteins is created through transcription
and translation by using the machinery of the host CD4 cell. An
HIV enzyme called protease cuts up the long chain to produce
smaller HIV proteins, which combine with HIV RNA to form a
new virus particle. Viral proteins interact with human proteins
at every vital stage of the HIV life cycle and form a complex network of molecular events, including virus–host protein–protein
interactions (PPIs) [7, 8]. This intricate network of host and viral
proteins are thus considered as an essential source for understanding the inherent mechanism of viral replication cycle [9].
In silico approaches have been developed as an alternative
for conducting research at the system level. Most of the
approaches primarily focus on the analysis of HIV-1-human PPI
network, which is constructed from the available existing information catalogued in the HIV-1-Human Protein Interaction
Database (HHPID) [10], published in 2009. There are several
approaches that use this database to uncover various relationships between the viral proteins, between the virus and the host
and ultimately between the various host proteins by exploring
diverse methodologies. This valuable information remains in
different published literature in an incoherent way. For an individual researcher, it is difficult to efficiently access this information and also time-consuming to acquire an overview in this
area. Thus, it would be beneficial to collect all the works and
compare different approaches in a single article.
Here we provide a comprehensive survey of various computational approaches that mainly focus on this HIV-1-human PPI
network. To retain the computational flavour we restrict our
analysis at the systems level. Despite being overwhelmed with
dozens of experimental and computational results, our aim is to
|
831
provide a complete view of different works on computational
analysis of the HIV-1-human PPI network, constructed from
HHPID data set. We broadly divide all these approaches in two
categories: network analysis at systems level and prediction of
novel PPIs between HIV-1 and human proteins using various
computational approaches. Figure 1 shows the classification of
the system-level approaches in these two categories and provides a simple overview of all the approaches with year of publication. Since 2006, a number of researchers have provided their
valuable opinions and shed light on different issues for uncovering the inherent relationship between HIV-1 and human
proteins. Here we try to review most of these works and accumulate their results to give a simplified view.
The HIV-1-human protein interaction
network
The National Institute of Allergy and Infectious Diseases has
published the ‘HIV1-Human Protein Interaction Database’
(HHPID) [10] in 2009. This database provides a concise but detailed summary of all known interactions including their brief
descriptions. It is a valuable resource for the HIV-1 research
community. The database was made publicly available at NCBIs
website, http://www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions/
in 2009. The HHPID database provides comprehensive data consisting of 5127 interactions between 19 HIV-1 proteins and 1432
human proteins (January, 2013). This curated database was constructed from approximately 3200 papers published between
1984 and 2007 and was validated by being linked to several publications. Over 14 312 PMID references to the original articles
were collected for validating all the interactions in this growing
database [10]. Moreover, each interaction is associated with a
specific interaction type like ‘upregulate’, ‘activate’, ‘bind’, ‘inhibited by’ etc. A visualization of this database can be seen in
Figure 2. We have extensively searched and found 68 unique
interaction types among the 5127 interactions. The bipartite
network shown in Figure 2a is constructed from this data set
and consists of 19 red nodes, which represent HIV-1 proteins,
and 1432 green nodes, which represent human proteins. We
broadly divide all the interaction types into regulatory classes:
regulating, regulated by and bidirectional (regulation is in both
ways), where the three classes contain 34, 25 and 9 interaction
types, respectively. We create three subnetworks, each corresponding to an interaction class. Figure 2b–d) show interaction
types of the three classes, respectively. The degree distribution
Figure 1. Categorization of system-level approaches in the analysis of HIV-1-human PPI network.
832
|
Bandyopadhyay et al.
Figure 2. HIV-1-human protein interaction network: 19 HIV-1 proteins (15 structural and 4 intermediate proteins) interact with 1432 human proteins via 5127 interactions. The big red nodes represent HIV-1 proteins and small green nodes are human proteins. Considering types of these interactions the edges are labelled in three
different categories: regulating, regulated by and bidirectional. The edges are coloured corresponding to these three categories. (a) represents the whole network. (b),
(c) and (d) represent the three subnetworks constructed from (a) by considering three classes of interaction types. Distributions of degrees and interaction types for
each subnetwork are also shown in figure. A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.
in each subnetwork approximately obeys power law i.e. the
number of nodes with large degree values are less frequent,
whereas the number of nodes with lower degree values are
more frequent. Following the distribution of interaction types in
each subnetwork, ‘upregulates’, ‘inhibits’, ‘downregualates’, ‘activates’, ‘binds’, ‘interacts with’, ‘inhibited by’ and ‘processed
by’ seem to be more common interactions in HHPID.
Additionally, a significant number of interactions are observed
to be associated with class-1 regulation type than the other two
types of regulations.
To understand which of the subnetworks mimics the topological properties of the whole HIV-1-human bipartite network,
we plot relative degree of HIV-1 proteins in each subnetwork
against the whole bipartite network. This is shown in Figure 3.
It is evident from the figure that there is a strong correlation between degree of subnetwork 1 or subnetwork 2 and the global
Review on analysis and prediction of HIV-1–human PPI
|
833
Figure 3. Relative degree of 19 HIV-1 proteins in three subnetworks versus relative degree of whole bipartite network.
Figure 4. Connectivity and betweenness centrality distributions of HIV interacting human proteins belonging to three subnetworks (positive, bidirected and negative)
measured in the context of entire human PPI network. A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.
network (R2 ¼ 0.9949 and 0.9895 for subnetwork 1 and subnetwork 2, respectively). The fitted regression line (R2 ¼ 0.8104) for
subnetwork 3, which is constructed by considering the class 3
type of interactions, asserts the existence of correlation in degree between subnetwork 3 and the whole bipartite network but
not so closely as subnetwork 1 and subnetwork 2. This suggests
that the relative degree of HIV-1 proteins retain the same pattern strongly in subnetwork 1 and in subnetwork 2. We have
also studied degree and betweenness centrality distributions of
HIV-1 interacting human proteins specific to each of these three
subnetworks. For this purpose we collect the human PPI data
set from Human Protein Reference Database (HPRD) [11], which
consists of 39,240 interactions among 9589 human proteins.
Figure 4a and b show, respectively, the degree and betweenness
centrality distribution of HIV-1 interacting human proteins belonging to the three subnetworks in the corpus of entire human
PPI network. The motivation behind this is to see to what extent
the HIV-1 interacting human proteins, associated with specific type of regulation, are connected in the whole human PPI
network. Not surprisingly, no such significant difference is
observed in degree and betweenness centrality pattern. This
suggests that the topological characteristics of HIV-1 interacting
human proteins are weakly correlated with specific regulation
type.
834
|
Bandyopadhyay et al.
A review on HIV-1-human PPI network
analysis approaches
Here we have reviewed several pioneering works dedicated to
the broad analysis of the HIV-1-human protein interaction network. Since 2009 with the introduction of HHPID data set, several works are published that use this data set from different
perspectives. Almost all of them primarily focus on building a
bipartite network constituting HIV-1 and human proteins as
nodes and interactions between them as edges. The created
network is analysed to gain a deeper insight into the HIV research that includes HIV-host perturbation, biological importance of the host proteins that show strong association with
specific HIV interactions, different interaction patterns of host
proteins with HIV-1 proteins and several other issues. The
works we have reviewed in this section use the HHPID data set
as well as use different data sources that are integrated with
HHPID.
Host factors essential for HIV replication
In Pinney et al. [9] an attempt was made to uncover different
new developments in HIV research that provide a helpful insight into the viral perturbation in the host proteome. For this
purpose, they have used the existing knowledge of HIV–host
interactions documented in HHPID [10]. The HHPID data set is
visualized here as a detailed map of viral perturbation of the
host proteome in the context of gene ontological annotations of
human proteins in this data set. The over-represented biological
process terms of the human proteins in this data set reveal how
the various parts of the viral replication cycle are delegated to
different genes and the way in which HIV-1 gene products cooperate to target specific parts of the human cellular system. A
minimum spanning tree (MST) is constructed by calculating the
semantic distance [12] between each pair of HIV-1 interacting
gene products. As a consequence the gene products that are
close to each other in the MST are more likely to be involved in
common biological process.
Pinney et al. [9] found little overlap between HIV-dependency
factor (HDF) (which is basically a class of human proteins that
are essential for HIV replication) identified by Brass et al. [13],
Konig et al. [14] and Zhou et al. [15] and HIV-1 interacting human
proteins reported in HHPID (shown in Figure 5). This discrepancy between HDF data sets and HHPID data is not surprising
because the experimental screening procedure for HDF detection mostly relies on the cell line, HIV strain used and particular
experimental protocol. However, 131 human proteins are
Figure 5. Overlap between host factors identified in three systematic screening
studies [13–15] and HIV interacting human proteins reported in HHPID data set.
A colour version of this figure is available at BIB online: http://bib.
oxfordjournals.org.
believed to interact with HIV-1, as supported by at least two
studies; these have a better chance of being a host factor or an
essential protein. Essential or lethal proteins are those, deletion
of which will result in lethality or infertility, i.e. the organism
cannot survive without these proteins [16, 17]. Lethality of proteins is linked with different unique topological characteristics
of proteins in the human PPI network [18–21]. It is demonstrated
that the proteins with high degree and high betweenness centrality have a tendency of being essential [22–27]. In general
‘centrality-lethality’ rule states that deletion of a hub protein is
more likely to be lethal than deletion of a non-hub protein [24].
The underlying base of the ‘centrality-lethality’ rule is set up by
some interactions between proteins that are indispensable for
the survival or reproduction of an organism. For examining the
involvement of essential proteins with the HIV-interacting
ones, Dickerson et al. [28] study the relationship between protein essentiality and degree/betweenness centrality. For this,
they have considered mouse genome knockout data from [29].
Here a human gene is considered as ‘essential’ if a knockout of
its mouse ortholog has a lethal impact [30]. They surprisingly
observed that the HIV-1 interacting proteins turn out to be nonessential according to the definition of essentiality in [30]. The
possible reason behind this may be that the essential proteins
may follow poor correlation with degree and betweenness
centrality.
It is interesting to see whether the HIV-1 interacting proteins
play important roles in human biological processes or have certain functionalities that make them essential. For this, we have
adopted a rank aggregation scheme to rank all the proteins that
are predicted to interact with HIV-1, and are supported by at
least two of the four studies (shown in Figure 5). We have used
five topological metrics, viz., density, betweenness centrality,
clustering coefficient, eigenvector centrality and average density of neighbourhood nodes, to rank all the proteins. A weighted
rank aggregation technique [31] is used to build an ordered list
of proteins. Figure 6 shows the result of applying the rank aggregation algorithm. It appears from the figure that most of the
proteins that are predicted to interact with HIV-1 and are supported by at least three studies get better rank than the others.
Table 1 shows 11 proteins interacting with HIV-1 and are supported by at least three literature studies. Columns show the
ranked list of these proteins corresponding to each metric. Four
proteins, viz., ‘CD4’, ‘TCEB3’, ‘CHST1’ and ‘DDX3X’ got aggregated rank above 50. The conserved RNA helicase DDX3X has
rank 106, which indicates that it is not so much essential as
others in the list. DDX3X plays a major role in nuclear export
of viral RNA transcripts into cellular cytoplasm. It is known to
inherit antiviral immune mechanisms, which shows its importance in viral replication as well as antiviral activity.
Interestingly in [32] it is reported that knockdown of DDX3X
drastically deteriorates HIV-1 replication in host cells. CD4 is a
glycoprotein found on the surface of immune cells such as T
helper cells and is essential for building up the human immune
system. Although CD4 was placed in the 78th position in the optimal ranking, it plays a key role in transferring signal to other
types of immune cells. Similarly, CXCR4, which acts as an important co-receptor for HIV-1 entrance in the host cell, got 35th
position in the optimal ranking list. It is notable that CXCR4 appears in the first position in the ranking list corresponding to
the degree or connectivity. All the four studies support the
interaction with protein RELA.
We have performed a significance test to investigate
whether the ranking of HIV-1 interacting proteins is significantly better than a random set of proteins. For this purpose,
Review on analysis and prediction of HIV-1–human PPI
|
835
Figure 6. Ranking of proteins. All the host proteins showing interactions with HIV-1 and supported by at least two literature study are ranked with respect to five different metrics. The X-axis represents optimal ranked list of proteins and Y-axis shows the different ranks of corresponding proteins for each metric. The optimal list of
proteins is built by using a cross-entropy (CE)-based Monte Carlo approach for rank aggregation. A colour version of this figure is available at BIB online: http://
bib.oxfordjournals.org.
Table 1. Ranking of 11 host proteins showing interactions with HIV-1 supported by at least three literatures
Protein
NUP153
AKT1
JAK1
CXCR4
CTDP1
MED6
CD4
TCEB3
CHST1
DDX3X
RELA
Aggregated
rank
5
8
16
35
37
41
78
80
81
106
72
Rank based on
Degree
Betweenness
centrality
Clustering
coefficient
Average degree
of neighbouring nodes
Eigenvector
centrality
20
23
21
1
6
65
130
80
45
46
19
96
24
21
26
1
80
6
123
130
29
57
21
53
57
52
34
41
33
23
89
106
108
47
49
19
83
101
60
51
93
40
113
77
3
38
105
117
118
81
80
29
4
110
85
we first rank all 129 HIV-1 interacting proteins supported by at
least two studies in the corpus of all human proteins. Ranking is
performed by considering the five topological metrics, used earlier. We also collected 129 human proteins randomly from the
whole human protein set and performed ranking based on
these five topological metrics. We have carried out this procedure 300 times and got 300 ranked lists of random proteins.
These are then compared with the ranking of HIV-1 interacting
proteins by using Wilcoxon Rank Sum test. The resulting
P-value is significantly low (8.6733e-10), which indicates that
the ranking of HIV-1 interacting proteins is significantly better
than a random set of proteins.
Ranking of proteins apparently induces essentiality, possibly
because of its affinity to get strongly correlated with unique
topological characteristics. So, the HIV-1 interacting proteins,
which got low ranks, are highly important for facilitating HIV-1
infection.
Topological insights into HIV interacting host factors
Besides the biological and ontological analysis of the HHPID
data set, there are several works that use this catalogued information for the purpose of analysing the network to gain interesting topological insights. For example, Dickerson et al. [28]
analysed the HIV-1-human bipartite network based on its topological properties and showed an association of interaction pattern of HIV-1 and human proteins. Their analysis mainly
concentrated on the general properties of the HIV-1-human PPI
interactions and explored the molecular specificity of the HIV1-host perturbation. The analysis showed how the HIV-1 proteins have a tendency to interact with highly connected (‘hubs’)
and central (‘bottlenecks’) proteins. The ‘hubs’ are proteins that
have high degree and ‘bottlenecks’ are proteins with high
betweenness centrality. The degree centrality represents the
number of connected edges, i.e. the number of protein interactions measured per node. Betweenness centrality is used to
measure the centrality of a node in the network by counting the
number of shortest paths that go through that node. In other
words, how many shortest paths would increase in length if the
node is removed from the network [33]. In Pinney et al. [9], ten
over-represented biological processes, viz., cell differentiation,
intracellular signalling cascade, apoptosis, immune response,
regulation of cell cycle, DNA packaging, inflammatory response,
protein–DNA complex assembly, chemotaxis and nucleosome
assembly in HIV-1 interactions were identified. A direct consequence was observed in [28] between degree and betweenness
centrality of human proteins and the over-representation of
those proteins in fundamental biological processes. The mean
degree and betweenness centrality of human proteins associated with the above biological processes were 7.27 105 and
7.40 105 , whereas these values reduced to 2.63 105 and
2.33 105 when whole human protein interaction network is
considered. Therefore, it is obvious that the proteins involved in
fundamental biological processes which are over-represented
836
|
Bandyopadhyay et al.
in HIV-1 interactions, are more connected and central. In Dijk
et al. [34] associations between proteins involved in RNA polymerase II transcription with ‘hubs’, and proteasome formation
proteins with ‘bottlenecks’ were established. In this work, the
authors hypothesized that during the evolution of HIV-1, the
viral proteins are preferentially attached with central human
proteins for direct control and with proteasomal proteins for indirect control over the cellular process. Proteosomal proteins
acted as negative regulators for HIV-1 proteins [7, 35]. In [34] it is
demonstrated that in early stage of infection, proteasomal proteins regulate some important antiviral host factors like
APOBEC3G/F and CD317. In this study, some proteasomal proteins are marked as significantly important based on high
bottleneck (or betweenness centrality) scores and linked with
important cellular processes.
To investigate the extent to which the proteasomal proteins
are associated with crucial and important cellular processes we
perform clustering of HIV-interacting proteasomal proteins
with Gene Ontology (GO) biological process. For functional annotation clustering, we have used the DAVID bioinformatics resources online service [36]. We find almost 75% proteins among
all proteasomal proteins are involved in at least one annotation
cluster and are therefore linked to crucial and important biological processes. To show the significance of this association we
randomly pick 10 sets of non-proteasomal proteins and perform
annotation clustering in similar way. We notice only 4–9.6% of
proteins are associated with important biological processes.
In addition, we collect the first and second interacting neighbours of proteasomal proteins and again perform functional annotation clustering with GO biological processes using DAVID.
We find almost 78% proteins of the first neighbours and 76%
proteins of the second neighbours are associated with important biological and cellular processes. The clustering results of
proteasomal proteins as well as their first and second neighbours are given as supplementary files (S1.xlsx, S2.xlsx and
S3.xlsx). This study suggests that proteasome proteins are not
only responsible for preventing HIV-1 infection at the early
stage of infection but also it shows a better connectivity to the
important cellular processes because of its high bottleneck
scores.
In [34], Dijk et al. find a remarkable over-representation of
disease-associated genes among HIV-1 interacting proteins
compared with non-interacting ones. In [37], it is reported that
degree and betweenness centrality of proteins are poorly correlated with disease association. However, it is tempting to investigate the correlation between the degree and betweenness
centrality of disease-associated HIV-1-interacting human proteins. For this purpose we collect list of genes associated with
different disorder/disease classes from Goh et al. [37]. They have
classified all genetic disorders into 22 primary classes. We study
the association of HIV-1 interacting human proteins in these 22
primary disease classes. We plot the degree against the betweenness centrality of all HIV-1 interacting proteins that are associated with specific disease classes. In Figure 7, a total of 22
scatter plots are shown, clearly signifying some sort of relation
between degree and betweenness centrality with certain diseases. While for cancer, the correlation appears to be strong
(0.9439), for haematological class it is low (0.4840). This indicates
that in case of cancer-associated HIV-1 interacting proteins, the
‘hubs’ (proteins with high degree) are also often ‘bottlenecks’
(proteins with high betweenness centrality). Most of the disease
classes except ‘Hematological’ and ‘Skeletal’ roughly follow an
association between degree and betweenness centrality. In
Supplementary Table S1, we have listed all the HIV-1 interacting
proteins associated with specific disease classes and their degree and betweenness centrality measured in the corpus of
whole human PPI network.
To show whether the correlation patterns between degree
and betweenness centrality of HIV-1 interacting disease associated proteins are statistically significant with respect to random
set of proteins, we have conducted the Wilcoxon Rank Sum test.
For this purpose we take 22 sets of random proteins one for
each disease class, from associated HIV-1 non-interacting proteins and measure degree and betweeness centrality in the context of whole human PPI network. We repeat this procedure 300
times to get 300 instances of 22 samples. After that, we compute
mean degree and betweenness centrality for each sample. We
plot mean degree against mean betweenness centrality for all
22 samples. This is shown in Figure 8. From this figure, it is clear
that the correlation is not as strong as it appears for HIV-1 interacting proteins. In fact, the Wilcoxon Rank Sum test shows that
there is a significant difference at 5% significance level (Pvalue ¼ 0.0265).
Interaction types guide regulation of host factor
by HIV-1
In section 2 we have seen that the HHPID data set includes 68
unique types of interactions between HIV-1 and host proteins
and depending on the regulation type we can subdivide the
whole bipartite network into three subnetworks. Similar type of
classification of interaction types depending on direction of
regulation can be observed in [28] and in [34]. In [28], the interactions are classified into three polar categories (‘positive’,
‘negative’ and ‘neutral’). Positive category involves nine interaction types, viz., ‘activated by’, ‘activates’, ‘enhanced by’, ‘enhances’, ‘stabilizes’, ‘stimulated by’, ‘stimulates’, ‘upregulated
by’ and ‘upregulates’, whereas negative category consists of 13
types of interactions, viz., ‘cleavage induced by’, ‘cleaved by’,
‘cleaves’, ‘competes with’, ‘degraded by’, ‘degrades’, ‘disrupts’,
‘downregulated by’, ‘downregulates’, ‘inactivates’, ‘induces
cleavage of’, ‘inhibited by’ and ‘inhibits’. The remaining 25 interaction types are denoted as ‘neutral’. To investigate the perturbation of multiple host processes via HIV-1 genes, an analysis is
performed to show the involvement of those three classes of
interactions in the 10 over-represented biological processes in
HIV-1 interactions identified in [9]. It is noticed that for most of
the over-represented biological processes’ GO terms, the majority of interactions are mainly ‘positive’ in nature, whereas for
immune response, majority of interactions are more ‘negative’
in nature. This is obvious because ‘negative’ set contains interactions directed from human to HIV-1 proteins and have an adverse effect on HIV-1 proteins.
In [34], almost similar classification of the interaction types
is noticed. Here interaction types are classified under two categories: regulatory (e.g. up- and downregulated, regulated by)
and activation/inhibition (activates, inhibits, inhibited by), depending on the functional level of interactions. Undirected
interactions like ‘binds’, ‘interacts with’ are not considered. Two
directed distinct subnetworks are built by considering regulatory interactions in one network and activation/inhibition interactions in the other. A connectivity analysis has been
performed for each HIV-1 protein and human protein on both
networks. As expected, HIV-1 protein Tat shows high connectivity with host proteins, as it acts as the central transactivator in
promoting viral transcription, which affects disease progression
by interacting with neighbouring cells after being released to
the intercellular medium [38]. Gp120 has almost the same
Review on analysis and prediction of HIV-1–human PPI
|
837
Figure 7. Scatter plot of HIV interacting proteins involved in 22 multiple disease classes. Here x-axis represents degree and y-axis represent betweenness centrality.
number of interactions because it is essential in facilitating
entry in different cells. On the other hand, the structural proteins P1, P6 and Nucleocapsid as well as unspliced Pol have only
a small number of interactions. For performing similar analysis
for each human protein they have enriched their HIV-1-human
PPI network with interactions from human protein interaction
databases BIND [39], BioGRID (http://thebiogrid.org/) and HPRD
(http://www.hprd.org/). The interactions between HDFs (HIV dependency factor), which is a class of human proteins that are
essential for HIV-1 replication, are considered here. They have
constructed two networks: local, which is built by considering
the interactions between HIV-1 interacting human proteins or
HDFs [13–15, 40], and global, which is constructed by considering the interactions between HDFs and non-HDF proteins.
These two networks are then analysed by measuring the degree
distributions of HDFs by considering only HDF–HDF interactions
and HDFs–HIV protein interactions separately. Besides noting
the degree distributions, these two networks are also analysed
with respect to the three network centrality metrics: degree,
betweenness and eigenvector centrality. The eigenvector centrality of a protein describes how well it is connected to other
well-connected proteins. They have compared the local HDF
network with total human protein interaction network using a
one-sided Kolmogorov–Smirnov test [41] and showed that the
measured degrees, eigenvector centralities and betweenness
scores in the local and global network are not from the same
distribution. This suggests that the local network is significantly
more central than the global network with respect to the three
metrics.
Association of small substructure and strong interaction
module with significant host cell subsystems
The biological networks have specifically been found to consist
of small recurring patterns, so-called network motifs [42–45].
These building blocks have been extensively used to study the
structure and dynamic behaviour of networks. In [34], four types
of network motifs were identified in the two subnetworks (regulatory and activation/inhibition) as mentioned in the last section. These significant patterns were identified comparing with
1000 randomized networks, which were created by randomly
rewiring the original networks. The significant patterns are
shown in Figure 9. The three-node feedback loop motif (shown
in Figure 9a) is a pattern where an HIV-1 protein regulates or
signals a human protein that in turn regulates/signals another
HIV-1 protein. The two-node feedback loop (shown in Figure 9b)
838
|
Bandyopadhyay et al.
Figure 8. Scater plot showing the correlation patterns between degree and betweenness centrality of randomly chosen HIV non-interacting proteins.
Figure 9. Four types of network motifs are identified in the two subnetworks (regulatory and activation/inhibition): (a) Three node feedback loop, (b) Two node feedback
loop, (c) coregulation/signaling, (d) clique.
consists of one HIV-1 protein that regulates/signals a human
protein, which in turn regulates/signals the HIV-1 protein.
The interactions among proteins signify the nature of feedback pattern in these motifs. For example, two upregulations or
two downregulations result in a positive feedback, whereas a
negative feedback will be the result of one up- and one
downregulation. When two HIV-1 proteins regulate/activate/
inhibit one human protein, then co-regulation or co-activation/
inhibition patterns are generated (shown in Figure 9c). The two
interactions can be of the same type (e.g. either upregulation or
inhibition). The three node ‘clique’ pattern (shown in Figure 9d)
consisting of two human proteins and one HIV-1 protein is also
identified where the interactions among the proteins have different patterns. A feed-forward type [42–45] (or self-regulatory)
motif occurs when two connected HDFs indirectly interact via
an HIV-1 protein. An exhaustive analysis of these interaction
patterns along with different interaction types may result in
understanding of dynamics and structure of significant host cell
subsystems that are perturbed during the course of HIV-1
infection.
In Macpherson et al. [46], a nice characterization of interaction patterns is observed that is used to investigate host cellular subsystems and their association in the perturbation of host
cellular functions. In this work, they have identified some
higher-level biological subsystems and were able to infer the
biological importance of these subsystems in terms of HIV-1
replication, host cell perturbation and regulation of the immune
response. For identifying key host functions that are entirely
involved in HIV-1 infection and replication, they defined a set of
host proteins that take part in a common set, or ‘profile’, of
Review on analysis and prediction of HIV-1–human PPI
HIV-1 interactions. For this they have used BiMax [47] biclustering technique to group the host proteins that encounter similar
perturbation during HIV-1 infection. In this work, they retrieved
1434 host proteins and 3939 interactions associated with the
specific interaction types from HHPID. A binary interaction matrix containing 1434 rows, 1292 columns and 3939 non-zero values was created corresponding to human proteins, all types of
HIV-1 interactions and all HIV-1-human PPIs, respectively.
Biclustering of this matrix yielded 1306 biclusters, among which
279 biclusters were identified as significant (P 0.001) by Monte
Carlo simulation. These biclusters were enriched with important sets of human proteins that experienced similar perturbations during HIV-1 infection. The biological significance of host
protein sets and their associated interaction profiles defined by
the biclusters is assessed according to three measures: PPI network clustering, semantic similarity in terms of shared GO annotation and sequence similarity. They integrated the human
proteins from significant biclusters with human PPI network. By
this, 38 biclusters were found in which the proteins shared a
greater number of interactions, 24 biclusters were found where
the proteins form a bigger largest connected component and 38
biclusters were found to be enriched with proteins that have a
smaller average shortest path length than that expected by random chance (P 0.05). These signify that HIV-1 proteins have a
strong inclination to interact with host proteins in a similar
manner, which share interactions with one another. These also
identify the presence of multiple HIV-1 interactions with host
protein complexes or other closely associated host network
modules. The host proteins in significant biclusters are more
similar in terms of their GO annotation than that expected by
random chance in all ontologies: molecular function, cellular
component (CC) and biological process. These results indicate
that HIV-1 interacts in a similar pattern with proteins that have
similar GO annotation. It is also observed that the human proteins within the significant biclusters are more similar in their
protein sequence than that expected by random chance
(P << 0:001). Among all the biclusters, 101 significant biclusters
were identified in which human proteins were more similar
in their sequences than that expected by random chance.
The results show that paralogous groups of host proteins have
more inclination to be involved in the same combinations of
regulatory and physical HIV-1 interaction. It is observed that
several biclusters are enriched with multiple host proteins that
show similar interaction profiles among them. By combining
these biclusters according to shared information, an overview
of HIV-1 interactions with a given set of host proteins is
identified.
Some distinguishable patterns of cytokine regulation by
HIV-1, such as the largely stimulatory effects of gp120, Tat and
Nef; upregulation of TNF-alpha and interleukins 1 and 6; the repressive action of Vpr and gp160; and downregulation of interleukin 2 and interferon-c are identified. Two HIV-1-host PPI
networks are also created in which the human host is depicted
as a series of cellular subsystems and the interactions of HIV-1
proteins with these subsystems are investigated to find some
recognizable patterns of interaction in these networks.
Using a neighbour joining method, identified biclusters are
combined. A tree is constructed using a distance measure based
on the overlaps between host proteins constituting the biclusters. The tree is partitioned into different sections according to
the over-representation of the GO terms of the host proteins in
the combined biclusters. Thirty-seven biological subsystems are
identified to be associated with the created partitions.
Associating the biclusters in this way with 37 biological
|
839
subsystems, strong associations between biclusters and specific
subsystems are identified.
It will be interesting to see the association between biclusters that consist of human proteins having the same regulation
type with HIV-1 and the aforesaid 37 biological subsystems. For
this purpose we use biclustering technique to define biclusters
in each of the three subnetwork (‘positive’, ‘bidirected’ and
‘negative’) identified in HIV-human PPI network by considering
direction of regulation of each interaction. A total of 110 biclusters (69 from subnetwork 1, 21 from subnetwork 2 and 20 from
subnetwork 3) are identified using Bimax [48] algorithm.
Figure 10 shows overlap of subsystems involved in the biclusters of three subnetworks, viz., positive, bidirected and negative.
It appears from the figure that most of the subsystems do not
show propensity to be involved in some specific interaction
types. Only six subsystems, viz., ‘Glutamate receptor activity’,
‘protein phosphatase type 2A complex’, ‘DNA integration’, ‘lactate dehydrogenase activity’, ‘adenylate cyclase activity’, ‘cytoskeleton’, four subsystems, viz., ‘ubiquitin’, ‘heat shock 70kDa
protein’, ‘glucosidase activity’, ‘mRNA transport’ and one subsystem, viz., ‘DNA helicase activity’ show some sort of inclination to be involved with specific regulation types like ‘positive’,
‘negative’ and ‘bidirected’, respectively.
To present a distilled representation of the relationship between over-represented biological subsystems mediated
through regulation types, we have constructed three bipartite
networks specific to each of the three regulation types.
Figure 11a has total 69 nodes, each of which corresponds to one
bicluster and 27 nodes representing the biological subsystems
associated with those biclusters. The small yellow nodes represent biclusters, whereas red nodes stand for the subsystems. To
know to what extent each subsystem is associated with each
bicluster, we vary the node size of the subsystems proportionally to the associated proteins with all the biclusters. The edges
in this network represent bicluster–subsystem association that
alienates the subsystems in a regulation-specific manner. The
edge width is proportional to the proportion of proteins in the
biclusters associated with the subsystems. It can be observed
that most of the subsystems solely associated with negative
regulation have a clear role in immune response in the host
cell. For example, it is experimentally verified that synthesis of
glucosidase show antiviral and immunosuppressive activity of
HIV-1 by inhibiting HIV-1–induced syncytia formation and
lymphocyte proliferation [48, 49]. In [50], it is also verified that
NEDD8 (a small ubiquitin-like protein) affects the restoration of
Vif-mediated degradation of the host restriction factor
APOBEC3G (A3G), and this presents a novel antiretroviral therapeutic approach and enhances the ability of the immune system to combat HIV.
Recently, Maulik et al. [51] proposed a multi-objective biclustering technique for extracting strong interaction modules in a
weighted HIV-1-human PPI network. A strong interaction module consists of a set of viral and host proteins, which are highly
interacting with each other. The human proteins involved in
those strong interaction modules are likely to be considered as
important gateways for HIV-1 proteins and may be regarded as
potential drug targets. In this work, the HIV-1-human protein
interaction network is realized as a bipartite graph where two
sets of nodes represent HIV-1 proteins and human proteins, respectively, and edges denote the interactions. The network is
weighted by interaction strength predicted by Tastan et al. [52].
Each ‘quasi-biclique’ (dense bipartite subgraph) represents a
strong interaction module in this weighted bipartite network.
The main issue in this work is that it relaxes the stricter
840
|
Bandyopadhyay et al.
Figure 10. Venn diagram showing the overlap of subsystems involved in three specific regulation types.
condition for forming bicliques [53] by introducing ‘quasibiclique’ that are considered as almost bicliques in HIV-1human bipartite protein interaction network. Searching is
carried out over three different objectives for finding the quasibicliques in HIV-1-human bipartite protein interaction network.
The resulting strong interaction modules (or ‘quasi-bicliques’)
are then analysed to investigate the involvement of dense subnetworks during HIV-1 infection cycle. Some important human
proteins that are responsible for early and late stages of HIV-1
infection were found in the extracted ‘quasi bicliques’. For example, at least one occurrence of kinase protein is observed in
all the identified quasi bicliques. Beside the Kinase family proteins the bicliques are also enriched with CD4, chemokine receptor CXCR4, AP2B1 like proteins. Gene Ontological analysis
over the human proteins identified in the ‘quasi-bicliques’ reveal several biologically homogeneous strong interaction
modules.
Incorporating other data sources with HHPID in the
analysis of HIV perturbation in host cell line
Besides the study of the network built from HHPID data set, the
key to deriving new understanding from the HHPID lies in its integration with other data sources. For example, Chen et al. [54]
analysed the association between HIV-1 infection and human
pathways of diseases that may lead to the identification of common drug targets for viral infections and other diseases. In this
work, they identified links between HIV-1 infection and other
human pathways of disease through several approaches. They
primarily investigated the overlap of human genes involved in
AIDS and other disease pathways. They gathered 220 human
pathways in disease from the Kyoto Encyclopedia of Genes and
Genomes (KEGG) (http://www.genome.jp/kegg/) and statistically
compared with three sets of HIV-1 host factors identified from
three systematic screening studies [13–15]. The evaluation of associations between HIV-1 host factors and KEGG pathways is
based on (i) overlap of human genes involved in AIDS and other
pathways, (ii) co-expression profiles and (iii) common interaction partners in a human PPI network. In particular, the
human pathways are ranked based on these evaluation scores.
The top 10 KEGG disease/pathways that emerge from the final
ranking list are ‘Pancreatic cancer’, T-cell receptor signalling
pathway’, ‘Acute myeloid leukaemia’, ‘B cell receptor signalling
pathway’, ‘small cell lung cancer’, ‘chronic myeloid leukaemia’,
‘Adipocytokine signalling pathway’, ‘toll-like receptor signalling
pathway’, ‘chemokine signalling pathway’ and ‘apoptosis’.
Figure 12 shows the proportion of HIV-1 interacting human proteins that are associated with these pathways. It can be seen
that most of the HIV-1 interacting proteins are involved in each
pathway. So, it is important to investigate whether these HIV-1
interacting proteins have different characteristics from HIV-1
non-interacting proteins that also appear in these pathways.
For this, in Figure 13 we plot a scatter diagram showing the degree versus betweenness centrality of HIV-1 interacting proteins
and HIV-1 non-interacting proteins involved in each of the
pathways. Interestingly, we find a significant correlation between degree and betweenness centrality of HIV-1 interacting
proteins than HIV-1 non-interacting counterparts. It is also noticeable that in most of the pathways, HIV-1 interacting proteins
have more tendency of being ‘hubs’ or ‘bottlenecks’ than the
non-interacting ones. So, it can be stated that HIV-1 makes
some sort of perturbation in the disease-related pathways that
affect the topological properties of the pathway-related proteins
in the context of the whole human PPI network.
Review on analysis and prediction of HIV-1–human PPI
|
841
Figure 11. Association between 37 subsystems and biclusters in three different subnetworks, viz., (a) positive, (b) bidirected and (c) negative. Edge width represents proportion of proteins in biclusters that are matched with subsystems. Node size is proportional to the number of matched proteins with all biclusters. Blue nodes indicate that these subsystems are solely associated with corresponding regulation specific biclusters. A colour version of this figure is available at BIB online:
http://bib.oxfordjournals.org.
Figure 12. Proportion of HIV interacting proteins involved in 10 top-ranked pathways reported in Chen et al. [54].
842
|
Bandyopadhyay et al.
Figure 13. Scatter diagram showing the degree versus betweenness centrality of HIV interacting protein and HIV non-interacting proteins involved in 10 top-ranked
pathways reported in [54]. A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.
In another work, Bandyopadhyay et al. [55] analysed a timecourse gene expression data for identifying active pathways of
connected proteins that were significantly under/overexpressed in the different stages of latent HIV-1 replication
cycle. The differentially expressed genes were traced in the
human protein interaction data set [HPRD] [56] for creating an
integrated network. Some significant ‘activated modules’ of
connected proteins at different stages of virus reactivation cycle
were identified through expression clustering. In summary, this
work permitted the identification of several human proteins
whose expression levels were associated with either the early
induction or the latent phase of viral replication.
In another study [57], several data sources are integrated
with the existing information catalogued in the HHPID to find
out the activity of miRNAs through the HIV-1 regulatory pathway in human. For this they used two TF-miRNA regulation information from two data sources [58, 59] and one putative
database [60]. An exhaustive search was conducted to find maximal bicliques from the HIV-1-human bipartite network constructed from the HHPID database. These bicliques represent
strong interaction modules consisting of a set of HIV-1 proteins
and a set of human proteins. These modules are further investigated to uncover their involvement in possible oncogenic pathways. The significant modules discovered in this study are
established as gateways to transmit the immunodeficiency signal within the complex regulatory network of non-coding and
coding genes within an organism.
A review on HIV-1-human PPI prediction
techniques
PPIs are regarded as the main biochemical reactions in cells,
which determine different biological processes, organized in the
cell. There exit several literatures that assess different methodologies used for predicting PPIs [61–63]. Most of the approaches
predominantly focus on the prediction of PPIs in a single organism such as yeast or human. Here we discuss several computational methods that concentrate on the HIV-1-human PPI
prediction. We also evaluate the applicability of different methods and perform different analysis to compare those methods
at the system level.
Computational methods for prediction of possible viral–host
interactions are one of the major tasks in PPI research for antiviral drug discovery and treatment optimization. Predicting PPIs
between viral and host proteins has contributed substantial
knowledge to the drug design area. Recently, PPI prediction has
been regarded as a promising alternative to the traditional approach to drug design [64]. Having knowledge about the interaction pattern provides a great opportunity to understand
pathogenesis mechanisms, and thus supports the development
of drugs.
There are different approaches available in the literature
that exploit different methodologies for predicting PPIs between
HIV-1 and human proteins. Here, we have classified these
approaches into three categories depending on the methodologies, which have been used in those literatures. In the first category, we collect the works that use classification-based
approaches for detection of PPIs. In the second category, we describe a work that mainly depends on the structural similarities
of HIV-1 and human proteins. Here we also describe a work that
uses the short eukaryotic linear motifs (ELMs) [65] on HIV-1 proteins and human protein counter domains (CDs) information
for predicting interactions between HIV-1 and human proteins.
Recently association rule mining techniques are gaining popularity in this domain and are established to be a useful alternative in the field of PPIs prediction. We mark these works as the
Review on analysis and prediction of HIV-1–human PPI
third category. The classification of these approaches is shown
in Table 2.
Category-1: classification-based approaches
In [52], a supervised learning framework is presented for predicting PPIs between HIV-1 and human proteins. It was the first
attempt to predict the global set of interactions between HIV-1
and human host cellular proteins. In this work, the task of predicting PPIs is formulated as a binary classification problem
where each protein pair belongs to either the interaction or
non-interaction class. A Random Forest Classifier (RF) is trained
with a rich feature set derived from several biological information sources. Total 35 features are recognized based on human
intra-PPIs and other information sources. Some features precisely capture the information from HIV-1 human protein pairs,
while others are related to only human or HIV-1 proteins, and
some are derived from the human interactome. As these extracted features are noisy and redundant, therefore a RF [73] has
been used for the classification task. RF classifier has already
been established to be useful in predicting intra-species PPIs
[74–76]. It is also suitable for handling the scenarios where feature set is noisy and redundant. For building the trees in RF in
[73], a node is chosen to be split, when the attribute causes
highest decrease in the GINI index. All the features are ranked
depending on the Gini importance of the RF classifier. The topological properties like degree and betweenness centrality of the
graph, and the GO neighbour similarity features got the top
rank. Degree is coming out here to be a top-ranked feature. It is
quite expected because degree is the property of a single node
and provides a good prior interaction probability for the prediction purpose. The other top-ranked features with respect to the
Gini importance are pairwise GO similarity, clustering coefficient, GO neighbour function, location features etc. It is pertinent that Gene Ontological features come out to be top-ranked
features. It is possibly because of the importance of GO similarity measures or other GO-related features that contribute
greater knowledge towards interaction prediction [77, 78]. The
authors of [52] have extended their work in [66] by integrating a
semi-supervised multi-tasking approach to improve the accuracy of the previous predictive model. In this work, they reduced
their feature set but while improving the accuracy by considering ‘weakly labelled’ PPIs. The ‘weakly labelled PPIs’ are not experimentally confirmed PPIs, but are likely to have interaction
relationships. Here the motivation of taking these weakly
labelled PPIs is the insufficiency of confirmed labelled PPIs. The
PPIs in the HHPID [10], which are retrieved from different scientific literature, may not have enough evidence of being true
positive labelled interaction. But these interactions may be a
possible candidate to contribute greater knowledge towards the
PPI prediction. To incorporate the ‘weakly labelled PPIs’ information with ‘true positive labelled PPIs’ the authors in [66] have
used the multi-task learning framework. For selecting the positive labelled PPIs and partially labelled PPIs (or weakly labelled
PPIs), they collected experts’ opinions for improving the quality
of the data set. The recruited HIV experts contributed their
knowledge to annotate the interactions of HHPID in truly positive label and partially positive label. The 158 pairs, annotated
by HIV-1 experts, are considered here as ‘gold standard positive
pairs’ and the remaining 2119 pairs are labelled as partially
positive. For choosing the negative set, approximately 16 000
random protein pairs are collected excluding the pairs existing
in the HHPID. To compensate the effect of random sampling of
negative training set, 5-fold cross validation (CV) with 20
|
843
repeated CV runs are performed to obtain the average performance scores.
In another study, Dyer et al. [67] proposed a supervised classification technique using Support Vector Machine (SVM) with a
linear kernel for predicting PPIs between HIV-1 and human proteins. For choosing negative examples, they also used random
pairing of HIV-1 and human proteins. In doing so, they ensured
that no randomly generated protein pair was already known to
interact. They generated three sets of negative examples containing 25, 50 and 100 times the number of the positive examples and measured the performance of SVM. The three types
of features used in this work are domains, protein sequence
k-mers and properties in the intra-species human PPI network.
In [79], a probability weighted ensemble transfer learning
model is proposed to predict PPIs between HIV-1 and human
proteins. Here, the homolog GO information is incorporated to
cope up with the three major problems of PPI prediction, viz.,
data scarcity, data unavailability and negative data sampling
[79]. SVM is adopted here as the individual classifiers of the ensemble model. In this work it is demonstrated that homolog GO
information is effective to enrich or substitute the target GO
information.
Irrespective of all the classification techniques reviewed
here, the task of choosing the negative examples is generally
observed as a common problem. In most of the cases for choosing non- interacting pairs, a set of protein pairs are extracted at
random from the set of all protein pairs. However, there is no
evidence that these chosen pairs do not interact at all. As a result, the classification task remains sensitive towards the
chosen negative samples. In [40], the negative to positive ratio
is kept as 100:1, and this ratio is optimized in each CV step in
the training phase. In [67], the negative to positive ratio was varied as 25:1, 50:1 and 100:1, and using each of these ratios the
performance of SVM classifier is measured under different feature sets. In [66], a remedy of this problem was observed by performing 5-fold CV with 20 repeated CV runs for compensating
the effect of random sampling of negative training set. Here a
clear separation is maintained for the negative labelled data
and partially positive labelled data set. A classifier is trained to
distinguish partial positive examples from negative examples.
This refinement of data set leads to a significant improvement
in identification of interacting protein pairs.
Category-2: structural similarity and motif
sequence-based approach
In Doolittle et al. [68], the significant amount of protein structural information available in the Protein Data Bank are used for
predicting PPIs between HIV-1 and human proteins. The use of
protein structure information integrated with the documented
PPIs is established to be an alternative way for the prediction of
possible protein interactions [80–82]. The motivation of using
the structural information is that given a set of proteins with
known structure and interaction partner, the proteins that
show similar structure with these given proteins are likely to
interact with those interaction partners. In [68], this argument
is validated in context of HIV-1 and human proteins. Based on
the protein structural similarity, an interaction map between
human and HIV-1 is generated. For this, interactions between
‘HIV-similar’ (i.e. structurally similar to some HIV protein)
human proteins and ‘targets’ (i.e. the human proteins interacted with ‘HIV-similar’) are identified and an association
among the interactions between HIV-1 protein and the ‘target’
proteins is established. The predicted interactions are filtered
844
|
Bandyopadhyay et al.
Table 2. Summary of the works involved in the PPI prediction field
Reference
Category-1
Tastan et al. (2009) [52]
Qi et al. (2010) [66]
Dyer et al. (2011) [67]
Category-2
Doolittle et al. (2010) [68]
Evans et al. (2009) [69]
Mukhopadhyay et al.
(2010) [70]
Mukhopadhyay et al.
(2012) [71]
Mondal et al. (2012) [72]
Operating principle
Approach
Number of
predicted
interactions
Mainly supervised classification-based approach where a RF is
trained and tested against an extensive rich feature set combining co-occurrence of functional motifs and their interaction domains and protein classes, GO annotations, post-translational
modifications, tissue distributions and gene expression profiles,
topological properties of the human protein in the interaction
network and the similarity of HIV-1 proteins to human proteins
known binding partners.
A semi supervised multi-tasking framework is introduced for predicting PPIs from a wide range of labelled (experimentally validated) and partially labelled (there are no experimental
evidence of those interactions) data set.
A SVM classifier with linear kernel is used for exploring the predicted interactions in HIV-1-human system. Here domain profiles, protein sequence k-mers and properties of human proteins
in PPI network are incorporated in feature set.
Classification-based
approach
3493
Classification-based
approach
2084
Classification-based
approach
506 (with PE:NE
ratio 1:50)
Protein structure information is integrated for prediction of PPIs.
The main idea behind this technique is that given a set of proteins with defined structures and associated interactions, proteins with similar structures or substructures with them will
tend to share the same interaction partners.
Interactions are mediated by short ELMs on HIV-1 proteins and
human protein CDs known to interact with these ELMs. The decision on which human protein may be targeted by HIV-1 is
guided by taking pair of human proteins that may interact via a
motif conserved in HIV-1 and the corresponding interacting protein domain.
Association rule-based prediction of novel PPIs. The well-known
Apriori algorithm is used for finding association rules among the
HIV-1 and human proteins. Some novel PPIs are also predicted
using those rules.
An association rule mining technique based on biclustering is used
for predicting PPIs. Predictions are defined from the association
rules that are generated using a novel biclustering-based association rule mining technique.
An integrated approach called FIST is proposed here for procuring
association rules among HIV-1 and human proteins. Here FIST is
used to coalesce biclustering and association rule mining technique in one process.
Structural similaritybased approach
884 (using literature filter)
Motif sequencebased approach
4542
Association rule
mining-based
approach
22
Association rule
mining-based
approach
180
Association rule
mining-based
approach
28
based on functional information from previous studies to make
a first set of predictions. For further reducing the list of likely
interactions, these results are refined by requiring both the HIV1 protein and the target human protein to be present in the
same location within the cell, based on GO CC annotation. This
produced a final prediction set including fewer predictions with
higher confidence.
In Evans et al. [69], the prediction task is guided by the association between short ELMs [65] conserved in HIV-1 proteins
and the human protein CDs known to interact with these ELMs.
The functional roles of interactions conceded by ELMs and their
CDs are addressed in several literature [83–85]. The availability
of motif/domain pairs associated with the HIV-1 and human
PPIs is extensively used here for discovering new interactions.
The total prediction set was built from the union of two sets of
human proteins H1 and H2. H1 contains the human proteins
that have direct interactions with one or more HIV-1 proteins
through a human CD and virus ELM. H2 consists of the human
proteins that act as a competitor of HIV-1 protein to interact
with a H1 protein via the conserved ELM. Therefore, if H1 has a
CD, then it may use this CD to interact with an ELM present on
both H2 and HIV-1 proteins.
Category-3: association rule mining-based approaches
In Mukhopadhyay et al. [70], an attempt was made to predict
interactions based on association rule mining technique. Here
the well-known Apriori [86] algorithm is used for generating association rules between human proteins interacting with HIV-1
proteins. Some novel interactions are then predicted using
those rules. Besides the prediction task, the discovered rules describe an association between different human proteins in the
context of HIV-1 interaction. The main advantage of using the
association rule mining [72] technique for prediction purpose, is
that it does not need any negative samples for prediction unlike
classification-based approaches, where predictions are highly
Review on analysis and prediction of HIV-1–human PPI
dependent on the availability of the biologically validated negative samples. In [71], the authors have extended their work by
using a biclustering technique for discovering association rules
between both human and viral proteins. Two binary matrices of
human and viral proteins, HV of size 1432 19 and its transpose
matrix VH of size 19 1432 are built where an entry of ‘1’ denotes the presence of interaction between the corresponding
pair of human and HIV-1 proteins, and an entry of ‘0’ represents
the absence of any information regarding the interaction. For
each of the matrices, a row is considered as a transaction and a
column represents an item. In each transaction, the items for
which the corresponding value in the matrix is 1 are considered
to be purchased in that transaction. Now, the columns of all-1
biclusters, which are extracted from those matrices using
Bimax algorithm, represent frequent closed itemset. The frequent itemsets are then used to generate association rules
among human proteins and viral proteins separately. These
rules basically guide the prediction task.
In another approach [87] proposed by Mukhopadhyay et al.,
regulatory interactions are predicted by using association rule
mining technique. Here all the interactions are partitioned based
on the interaction type and direction of regulation. After partitioning, three different regulatory networks are formed based on
the HIV-1-human interaction information. The interactions are
predicted by considering each of the networks separately. As
here each interaction has a specific interaction type and direction
so, all the predicted interactions also inherit these information.
The approach proposed in [88] also used the association rule
mining technique for extracting frequent closed itemset for predicting PPIs between HIV-1 and human proteins. They proposed
an integrated approach called FIST (Frequent Itemset Mining
using Suffix Tree) that combines biclustering and association
rule mining technique in a single process. The main feature of
FIST is that it efficiently mines frequent closed itemsets, generates minimal non-redundant cover of association rules and
generates hierarchical conceptual biclusters in one process. It
was applied on the HHPID database for predicting some novel
interactions.
All these three approaches are apparently correlated with respect to the methodology they have used, but different techniques are used to mine the association rules. For example, the
first work of Mukhopadhyay et al., the Apriori algorithm, is used
for the purpose of finding association rules. But in case of low
minimum support, large number of frequent itemsets is generated, which results in large computational complexity.
Generation of such a large number of frequent itemsets takes
huge time. In this context the concept of frequent closed itemset [89, 90] is used in [71] as an extension of first work. An itemset is called closed itemset if none of its proper supersets have
the same support value. Here BiMax [47] biclustering algorithm
is used to generate all maximal biclusters. As the columns of
maximal biclusters represent a closed itemset, so all extracted
biclusters satisfying a predefined minimum support condition
provide the set of frequent closed itemsets. However, in Mondal
et al. [88], a frequent closed itemset framework is proposed for
efficiently mining closed itemset. It is also used to find out the
relationship between annotations and interactions by extracting association between proteins and features.
|
845
make comparisons of those predicted interaction sets in different perspectives.
Detecting overlaps between predicted interactions
In Figure 14 we analyse the number of interactions predicted by
Tastan et al. [66], Doolittle et al. [68], Dyer et al. [67] and
Mukhopadhyay et al. [71]. Note that these studies use different
methodologies and different data sets for learning how to predict the interactions. However, the predictions, as analysed in
Figure 14, have been obtained on the same set of 19 HIV-1 proteins and 1432 human proteins, where each method makes predictions on all possible (19 1432) interactions. From this figure,
it can be seen that there is no significant overlap among the
interactions predicted by these four studies. Similar outcome
has also been noticed in Figure 5 where the overlaps among the
predicted HDF sets are low. Doolittle et al. exploited structural
similarity information of HIV-1 and human proteins for prediction purpose. Therefore, the human protein that does not show
structural similarity with any HIV-1 proteins cannot be included
in the possible prediction set. Moreover, they used HPRD and
PIG database for collecting information about interactions, Dali
and PDB database for acquiring information about structural
similarity and HHPID data set for validating the predictions.
Although Tastan et al. and Mukhopadhyay et al. used HHPID
data set for prediction purpose, the main drawback of
Mukhopadhyay et al. technique is that it cannot predict any
interaction whose protein pairs are not included in a maximal
biclique. The method proposed by Tastan et al. produces all possible pairs of interactions and is able to compute prediction
score of each possible interaction pair. However, they did not
validate their predictions by exploring literature survey. In
Figure 13, slightly larger amount of overlap is noticed among
Dyer et al. and Tastan et al., possibly because Tastan et al. incorporated PPI network properties in their feature set, which also
serve as a key predictor in Dyer et al.
Over-representation of 19 HIV proteins in the four
predicted interaction sets
We perform a study to show which HIV-1 proteins are overrepresented in the predicted interaction set in each of the four
studies. In Figure 15 we see that in most of the predicted interaction sets, HIV-1 protein TAT is significantly over-represented,
possibly because of its essentiality for efficient transcription
of the viral genome [91, 92]. However, a remarkable underrepresentation of TAT is found in the predicted interaction set
Analysis of the predicted interaction sets
Figure 14. Venn diagram showing the overlap between predicted interaction
In this section, we have analysed all predicted interaction sets
identified in different literature. In the following subsections we
sets identified in four studies. A colour version of this figure is available at BIB
online: http://bib.oxfordjournals.org.
846
|
Bandyopadhyay et al.
Figure 15. Proportion of predicted interactions involving HIV-1 proteins at each of the four studies. A colour version of this figure is available at BIB online: http://
bib.oxfordjournals.org.
discovered by Doolittle et al. This is possibly because of the lack
of structural similarity between Tat and human proteins and
small number of existing ‘Tat-similar’ human proteins. The predicted set of Doolittle et al. is enriched with transmembrane
glycoprotein gp_41 but surprisingly surface glycoprotein gp_120
is not recognized in this method. So their method is slightly
biased towards the availability of the structural similarity between HIV-1 proteins and human proteins. But when we consider the predictions extracted by association rule-based
approach developed by Mukhopadhyay et al. we see different
things. Here the interactions involving surface glycoprotein
gp_120 are significantly recognized but gp_41 remains unrecognized. Among the four methods, a small proportion of P7/nucleocapsid protein and retropepsin are recognized in the rulebased approach of Mukhopadhyay et al. The predicted interaction set of Tastan et al. is more or less enriched with most of
the HIV-1 proteins. A rich feature set including different biological and network properties used for prediction may be a possible reason for that. In the predicted set of Dyer et al. we see
there are smaller numbers of HIV-1 proteins that are involved
in the interactions with human proteins with respect to that of
Tastan et al.
HDF enrichment in the four predicted interaction sets
We have also studied these four works in the context of enrichment of HDFs in their predicted sets. For this we gathered 275
HDFs from the study done by Brass et al. [13], 296 HDFs from the
study done by Konig et al. [14] and 375 HDFs from the study
done by Zhou et al. [15]. There were 908 unique HDFs in the
union of these sets. We measure the proportion of overlaps between the predicted set of Tastan et al., Doolitle et al., Dyer et al.
and Mukhopadhyay et al. with these HDFs set. In Figure 16 the
overlaps between the predicted interaction sets for each of
these four works and the three HDF sets are shown. For this, we
intersect the human proteins in the predicted interaction set
with three HDF sets individually and measure the proportion of
the human proteins that belong to these HDF sets. We find that
a large proportion of human proteins in the predicted set of
Doolittle et al. belongs to each of the experimentally predicted
HDF sets. Figure 17 also shows the proportion of overlapped
human proteins in the interaction sets that belong to the union
of all three HDF sets.
Overall assessment of all predicted sets using conformal
approach
As the predicted interaction sets are dependent on the corresponding methodologies used for prediction, it is somewhat unfair to draw conclusion regarding the superiority of any method
based on simply comparing them. Still to get an overview of the
quality of the predicted sets, we have made an attempt to compare the predicted sets. For the purpose of a comprehensive
comparison, we take the conformal prediction approach as proposed in [93]. It uses 35 features collected from Tastan et al. [52]
and assigns a confidence level to each predicted pair. We use
this rich feature set because it integrates most of the physical
properties of the proteins. Moreover, for all possible interaction
pairs the values of corresponding features are available in [93].
Usually conformal prediction is more concerned with two or
more classes. However, in [93] conformal prediction is used to
deal with one defined class, which consists of experimentally
validated interaction pairs. Here the Non-Conformity Measure
deals with the feature sets and a P-value is assigned to individual predicted interaction to validate how likely it is with respect
to the previously defined interactions. More is the P-value of an
interaction, more it is likely to occur. Here we find P-values of
all predicted interactions of each of the four studies and plot
the proportion of interactions against the P-values. Figure 18
shows column plots of predicted interactions with P-values for
each of the four studies, separately. From this figure it is clear
that the prediction sets of Tastan et al. and Doolittle et al. have
interactions that are more likely than the interactions predicted
by Mukhopadhyay et al. and Dyer et al. (over 60% of interactions
of Tastan et al. and Doolittle et al. have P-value 0.6, but for
Dyer et al. and Mukhopadhyay et al. it is 33.05 and 56.50%, respectively). Note that this is expected because the 35 features
used in the conformal prediction were collected from Tastan
et al. [52]. For lack of better alternative, we use this approach for
comparing the predictions.
Conclusion
We provide an overview of computational methods on the analysis of HIV-1-human PPI network and the prediction of novel
PPIs between HIV-1 and human proteins. We have extensively
surveyed a number of works, from both analysis and prediction
Review on analysis and prediction of HIV-1–human PPI
|
847
Figure 16. Proportion of overlaps between human proteins in the predicted set of four studies and the three experimentally verified HDF sets. A colour version of this
figure is available at BIB online: http://bib.oxfordjournals.org.
Figure 17. Proportion of overlaps between human proteins in the predicted set of four studies and union of all HDFs gathered from Brass et al. [13], Konig et al. [14] and
zhou et al. [15].
Figure 18. Bar plots show proportion of predicted interactions having specific P-values. X-axis represents ranges of P-values and Y-axis represents proportion of interactions predicted by the four studies viz., Doolittle et al., Mukhopadhayay et al., Dyer et al. and Tastan et al.
aspects. The mostly used database, which has been extensively
used for these studies, is the HHPID database. Some of the studies refined this curated data set before using it in their methods.
For example, Qi et al. [66] provided an expert’s view of the
interaction data set for selecting the positive and negative samples before applying classification task. Similar type of rectification of this data set is noticed in several other studies. However,
the availability of this data set of HIV-interacting human
848
|
Bandyopadhyay et al.
proteins raises the possibility of identifying putative novel drug
targets by in silico methods.
The approaches designed for network analysis used
different computational and statistical methods and some topological properties of network to uncover the inherent relationship between HIV-1 and human proteins. Among all the works
we have surveyed here, biclustering technique is extensively
used in network analysis as well as for PPI prediction. In the
network analysis field, we find a gradual change in the analysis
technique. Former works are fully devoted to the analysis of
topological properties of the HIV-1-human PPI network.
Recently analysis of bicliques or biclusters extracted from the
HIV-1-human PPI network sheds light on the modular structures of the network. Several techniques we have reviewed here
pertaining to network analysis, have taken this approach.
The techniques for interaction prediction are also regarded
as important resources for discovering interesting relationships
between HIV-1 and human proteins. Although the works in this
category primarily predict some novel interactions between
HIV-1 and human proteins, the methods used for this purpose
make use of divergent techniques. Classification-based methods are widely used for prediction. For example, Dyer et al. [67]
made use of SVM-based classifier, whereas RF is used in Tastan
et al. [52] and Qi et al. [66]. However, all the classification
techniques suffer from insufficiency of negative examples. To
compensate for this problem, different techniques are used. For
example, in Dyer et al., negative to positive ratio is varied and results are collected by feeding them in SVM classifier with different feature sets. Another example can be found in Qi et al.
where 16 HIV experts are recruited for collecting experts view
on interacting and non-interacting pairs of HIV-1 and human
proteins. Among all the prediction methods studied in this assessment, only Doolittle et al. used the protein structural information for predicting PPIs between HIV-1 and human proteins.
We found significant overlap between the predicted set of
Doolittle et al. and HDF set identified in large-scale siRNA
screening. Some of the other approaches like Mukhopadhyay
et al. and Mandal et al. do not show a remarkable overlap with
the HDF sets. These approaches made use of association rule
mining technique for prediction purpose. But the rule mining
has certain limitations for predicting interactions. It cannot predict the interactions among proteins that are not a part of some
biclique. On the other hand, Dyer et al. and Tastan et al. were
able to give prediction scores to all possible interaction pairs.
However, as their predicted interactions have little overlap with
experimentally verified HDF sets, it is hard to believe that these
interactions surely exist in reality. However, the conformal approach we have studied here also gives confidence to a new
interaction based on the feature set derived by Tastan et al. As
this feature set is rich enough, the interactions with a high
P-value are reliable with some confidence. It also may be noted
that the predictions in Doolittle et al. are associated with low
P-values but high overlap with HDFs. This contrasts with the results of Tastan et al., indicating that the two methods are probably predicting two different subsets of true PPIs. One of the
aims of reviewing all these methodologies and examining their
predicted interaction sets is to highlight the common ones,
which are supported by most of the techniques. Though this
number is small, but there is enough reason to believe these
interactions to be real as most of the methodologies agree on
them.
In this article, we completely focus on the computational effort that has been made for analysing HIV-1-human PPI network
and predicting interactions between HIV-1 proteins and human
proteins. A brief discussion on some experimental approaches
for HIV-1-human PPI identification is provided in the supplementary file (experimental-technique.docx). Similar type of review may be done on other type of viruses and pathogens. If
sufficient information is available on a viral–host interaction
network, these can be analysed accordingly.
Although the works we reviewed here suffer from several
problems and deficiencies, there are reasons to believe that
these in silico approaches will soon be of immense importance
for understanding infectious diseases in general and in particular for the development of drugs and vaccines to counter viral
replication or the pathogenic effects of infection. There are
more to discover from the systems biology perspective that
focus on data integration from different sources and complex
interplay between virus and host proteins in the virus-host PPI
network. A detailed understanding of this intricate network and
a precise study of its properties will yield novel insights into the
development of new therapeutics and potentially new intervention strategies.
Supplementary data
Supplementary data are available online at http://bib.
oxfordjournals.org/
Key Points
• The computational approaches for analysing HIV-1-
human PPI network and predicting PPIs in it can
broadly be classified in two categories, viz., network
analysis and PPI prediction.
• In network analysis field HIV-1-human PPI network is
analysed using different computational and statistical
methods in which a gradual change is noticed from the
topological analysis to the modular analysis of the
network.
• The approaches in the PPI prediction field can be classified into three categories, viz., classification-based,
protein structural similarity-based and association rule
mining-based approaches.
• Due to the uncorrelated methodologies used in the
prediction field, poor overlap is observed among the
predicted sets.
• More study is required for integrating information
from different data sources for development of new
therapeutics.
References
1. Peterlin B, Trono D. Hide, shield and strike back: how HIV infected cells avoid immune eradication. Nat Rev Immunol
2003;3:97–107.
2. Heeney J, Dalgleish A, Weiss R. Origins of HIV and the evolution of resistance to aids. Science 2006;313:462–6.
3. Goff S. Host factors exploited by retroviruses. Nat Rev Microbiol
2007;5:253–63.
4. Gao F, Bailes E, Robertson D, et al. Origin of hiv-1 in the chimpanzee pan troglodytes troglodytes. Nature 1999;397:436–41.
Review on analysis and prediction of HIV-1–human PPI
5. Gougeon ML. Apoptosis as an HIV strategy to escape immune
attack. Nat Rev Immunol 2003;3:392–404.
6. Stevenson M, Stanwick T, Dempsey M et al. Hiv-1 replication
is controlled at the level of t cell activation and proviral integration. EMBO J 1990;9:1551–60.
7. Bushman F, Malani N, Fernandes J, et al. Host cell factors in
HIV replication: Meta-analysis of genome-wide studies. PLoS
Pathogens 2009;5:e1000437.
8. Jacque J, Triques K, Stevenson M. Modulation of HIV-1 replication by rna interference. Nature 2002;418:435–8.
9. Pinney J, Dickerson J, Fu W, et al. HIV-host interactions: a map
of viral perturbation of the host system. AIDS 2009;23:549–54.
10. Fu W, Sanders-Beer B, Katz K, et al. Human immunodeficiency
virus type-1, human protein interaction database at NCBI.
Nucleic Acids Res 2009;37:D417–22.
11. Peri S, Navarro J, Kristiansen T, et al. Human protein reference
database as a discovery resource for proteomics. Nucleic Acids
Res 2004;32:D497–501.
12. Jiang J, Conrath D. Semantic similarity based on corpus statistics
and lexical taxonomy. In: Proceedings of International Conference on
Research in Computational Linguistics, ROCLING’97, Taiwan, China,
1997.
13. Brass J, Dykxhoorn D, Benita Y, et al. Identification of host proteins required for HIV infection through a functional genomic
screen. Science 2008;319:921–6.
14. Konig R, Zhou Y, Elleder D, et al. Global analysis of hostpathogen interactions that regulate early-stage HIV-1 replication. Cell 2008;135:49–60.
15. Zhou H, Xu M, Huang Q, et al. Genome-scale rnai screen for
host factors required for hiv replication. Cell Host Microbe
2008;4:495–504.
16. Winzeler F, Shoemaker D, Astromoff A, et al. Functional characterization of the s. cerevisiae genome by gene deletion and
parallel analysis. Science 1999;285:901–6.
17. Kamath Rs, Fraser AG, Dong Y, et al. Systematic functional
analysis of the caenorhabditis elegans genome using RNAi.
Nature 2003;421: 231–7.
18. Li M, Zhang H, Wang J, et al. A new essential protein
discovery method based on the integration of protein-protein
interaction and gene expression data. BMC Syst Biol
2012;6:1752
19. Ren J, Wang J, Li M, et al. Prediction of essential proteins by integration of ppi network topology and protein complexes information. In: Proceedings of the 7th International Conference on
Bioinformatics Research and Applications. Berlin, Heidelberg:
Springer-Verlag, ISBRA’11, 2011, pp. 12–24.
20. Chua H, Tew K, Li X, et al. A unified scoring scheme for detecting
essential proteins in protein interaction networks. In: Proceedings
of the 2008 20th IEEE International Conference on Tools with Artificial
Intelligence - Vol. 02. Washington, DC: IEEE Computer Society,
ICTAI’08, 2008, pp. 66–73. doi:10.1109/ICTAI.2008.107.
21. Wang J, Li M, Wang H et al. Identification of essential proteins
based on edge clustering coefficient. IEEE/ACM Trans Comput
Biol Bioinformatics 2012;9(4):1070–80.
22. Jeong H, Mason S, Barabasi A, et al. Lethality and centrality in
protein networks. Nature 2001;411:41–2.
23. Yu H, Greenbaum D, Xin H, et al. Genomic analysis of
essentiality within protein networks. Trends Genet
2004;20:227–31.
24. He X, Zhang J Why do hubs tend to be essential in protein networks? PLoS Genet 2006;2:e88.
25. Hahn M, Kern A. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks.
Mol Biol Evol 2005;22:803–6.
|
849
26. Joy M, Brock A, Ingber D, et al. High-betweenness proteins in
the yeast protein interaction network. J Biomed Biotechnol
2005;2:96–103.
27. Yu H, Kim P, Sprecher E, et al. The importance of bottlenecks
in protein networks: correlation with gene essentiality and
expression dynamics. PLoS Comput Biol 2007;3:e59.
28. Dickerson J, Pinney J, Robertson D. The biological context of
HIV-1 host interactions reveals subtle insights into a system
hijack. BMC Syst Biol 2010;4:1752.
29. Eppig J, Blake J, Bult C, et al. The mouse genome database
(mgd): new features facilitating a model system. Nucleic Acids
Res 2007;35:D630–7.
30. Liao B, Zhang J. Null mutations in human and mouse orthologs frequently result in different phenotypes. Proc Natl Acad
Sci USA 2008;105:6987–92.
31. Pihur V, Datta S, Datta S. Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics 2007;23:1607–15.
32. Sharma D, Bhattacharya J. Evolutionary constraints acting on
ddx3x protein potentially interferes with rev-mediated nuclear export of hiv-1 rna. PLoS One 2010;5:e9613.
33. Ozgr A, Vu T, Erkan G, Radev D. Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics 2008;24:i277–85.
34. Dijk D, Ertaylan G, Boucher C, et al. Identifying potential survival strategies of HIV-1 through virus-host protein interaction networks. BMC Syst Biol 2010;4:1752.
35. Schwartz O, Marchal V, Friguet B, et al. Antiviral activity of the
proteasome on incoming human immunodeficiency virus
type-1. J Virol 1998;72:3845–50.
36. Huang W, Lempicki RA. Systematic and integrative analysis
of large gene lists using DAVID bioinformatics resources. Nat
Protoc 2008;4(1):44–57.
37. Goh K, Cusick M, Valle D, et al. The human disease network.
Proc Natl Acad Sci USA 2007;104:8685–90.
38. Romani B, Engelbrecht S, Glashoff R. Functions of tat: the versatile protein of human immunodeficiency virus type1. J Gen
Virol 2010;91:1–12.
39. Bader G, Donaldson I, Wolting C, et al. Bind–the biomolecular
interaction network database. Nucleic Acids Res 2001;29:241–5.
40. Murali T, Matthew D, Dyer D, et al. Network-based prediction
and analysis of HIV dependency factors. PLoS Comput Biol
2011;7:e1002164.
41. Marsaglia G, Tsang W, Wang J. Evaluating kolmogorovs distribution. J Stat Softw 2003;8:1–4.
42. Yeger-Lotem E, Sattath S, Kashtan N, et al. Network motifs in
integrated cellular networks of transcription-regulation and
protein-protein interaction. Proc Natl Acad Sci USA
2004;101:5934–9.
43. Shen-Orr S, Milo R, Mangan S, et al. Network motifs in the
transcriptional regulation network of Escherichia coli. Nat
Genet 2002;31:64–8.
44. Alon U. Network motifs: theory and experimental
approaches. Nat Rev Genet 2007;8:450–61.
45. Milo R. Network motifs: simple building blocks of complex
networks. Science 2002;298:824–7.
46. MacPherson J, Dickerson J, Pinney J, et al. Patterns of HIV-1
protein interaction identify perturbed host-cellular subsystems. PLoS Comput Bio 2010;6(7):e10000863.
47. Prelic A. A systematic comparison and evaluation of biclustering
methods for gene expression data. Bioinformatics 2006;22:1122–9.
48. Harnett S, Oosthuizen V, van de Venter M. Anti-HIV activities
of organic and aqueous extracts of sutherlandia frutescens
and lobostemon trigonus. J Ethnopharmacol 2005;4:113–9.
850
|
Bandyopadhyay et al.
49. Broek V, Nieuwenhof K, Butters T, et al. Synthesis of alphaglucosidase i inhibitors showing antiviral (HIV-1) and immunosuppressive activity. J Pharm Pharmacol 1996;48:172–8.
50. Stanley DJ, Bartholomeeusen K, Crosby DC, et al. Inhibition of
a nedd8 cascade restores restriction of hiv by Apobec3g. PLoS
Pathog 2012;8:e1003085.
51. Maulik U, Mukhopadhyay A, Bhattacharyya M, et al. Mining
quasibicliques from HIV-1–human protein interaction network: a multiobjective biclustering approach. IEEE Trans
Comput Biol Bioinformatics 2013;10(2):423–35.
52. Tastan O, Qi Y, Carbonell J, et al. Prediction of interactions between HIV- 1 and Human proteins by information integration. Pac Symp Biocomput 2009:516–27.
53. Bhattacharyya M, Bandyopadhyay S, Maulik U. Finding bicliques in digraphs: Application into viral-host protein interactome. In: International Conference on Pattern Recognition and
Machine Intelligence (PReMI), Moscow, Russia, 2011.
54. Chen K, Wang T, Chan C. Associations between HIV and
human pathways revealed by protein-protein interactions
and correlated gene expression profiles. PLoS One
2012;7(3):e34240.
55. Bandyopadhyay S, Kelley R, Ideker T. Discovering regulated
ntworks during HIV-1 latency aand reactivation. Pac Symp
Biocomput 2006;11: 354–66.
56. Jenssenea TK. A literature network of human genes for highthroughput analysis of gene expression. Nat Genetics
2001;8:8–21.
57. Maulik U, Bhattacharyya M, Mukhopadhyay A, et al.
Identifying the immunodeficiency gateway proteins in
humans and their involvement in microrna regulation. Mol
Bio Syst 2011;7:1842–51.
58. Sethupathy P, Corda B, Hatzigeorgiou A. Tarbase: A comprehensive database of experimentally supported animal microrna targets. RNA 2006;12:192–7.
59. Xiao F, Zuo Z, Cai G, et al. Mirecords: an integrated resource
for
micrornatarget
interactions.
Nucleic
Acid
Res
2009;37:D105–10.
60. Bandyopadhyay S, Mitra R. Targetminer: microRNA target
prediction with systematic identification of tissue-specific
negative examples. Bioinformatics 2009;25:2625–31.
61. Pitre S, Alamgir M, Green J, et al. Computational methods forpredicting protein-protein interactions. Adv Biochem Eng
Biotechnol 2008;110:247–67.
62. Browne F, Zheng H, Wang H et al. From experimental approaches
to computationaltechniques: A review on the prediction of protein-protein interactions. Adv Artif Intell 2010;2010:7.
63. Shoemaker B, Panchenko A. Deciphering proteinprotein
interactions part ii computationalmethods to predict protein
and domain interaction partners. PLoS Comput Biol 2007;3:e43.
64. Arkin M, Wells J. Small-molecule inhibitors of protein-protein
interactions: progressingtowards the dream. Nat Rev Drug
Discov 2004;3:301–7.
65. Puntervoll P, Linding R, Gemnd C, et al. Elm server: anew resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res 2003;31:3625–30.
66. Qi Y, Tastan O, Carbonell J, et al. Semi-supervised multi-tasklearning for predicting interactions between HIV-1 and
Human proteins. Bioinformatics 2010;26:i645–52.
67. Dyer M, Murali T, Sobral B. Supervised learning and prediction of physical interactions between human and hiv proteins. Infect Genet Evol 2011;11:917–23.
68. Doolittle J, Gomez S. Structural similarity-based predictions
of protein interactions between HIV-1 and homo sapiens.
Virology 2010;7:82.
69. Evans P, Dampier W, Ungar L, et al. Prediction of hiv-1 virushost protein interactionsusing virus and host sequence
motifs. BMC Med Genom 2009;2:27.
70. Mukhopadhyay A, Maulik U, Bandyopadhyay S, et al. Mining
association rules from hivhuman protein interactions. In:
Proceedings of International Conferance of Systems in Medicine and
Biology (ICSMB 2010), 2010, pp. 349–53.
71. Mukhopadhyay A, Maulik U, Bandyopadhyay S. A novel
biclustering approach to association rule mining for predicting
HIV-1–human
protein
interactions.
PLoS
One
2012;7:e32289.
72. Mondal KC, Pasquier N, Mukhopadhyay A, et al. A new approach for association rule mining and bi-clustering using
formal concept analysis. In: In: Proceedings International
Conference on Machine Learning and Data Mining (MLDM-2012),
Berlin/Germany, 2012, pp. 86–101.
73. Breiman, L. Random forests. Mach Learn 2001;45:5–32.
74. Qi Y, Bar-Joseph Z, Klein-Seetharaman J. Evaluation of different biological data and computationalclassification methods
for use in protein interaction prediction. Proteins 2006;63:
490–500.
75. Chen X, Liu M. Prediction of protein-protein interactions
using random decision forestframework. Bioinformatics 2005;
21:4394–400.
76. Lin N, Wu B, Jansen R, et al. Information assessment on predicting proteinproteininteractions. BMC Bioinformatics 2004;
5:154.
77. Maetschke S, Simonsen M, Ragan M et al. Gene ontologydriven inference of proteinproteininteractions using inducers. Bioinformatics 2012;1:69–75.
78. Xiaomei W, Lei Z, Jie G, et al. Prediction of yeast proteinprotein interactionnetwork: insights from the gene ontology and
annotations. Nucleic Acids Res 2006;34:2137–50.
79. Mei S. Probability weighted ensemble transfer learning for
predicting interactions between HIV-1 and human proteins.
PLoS One 2013;8(11):e79606.
80. Aloy P, Russell R. Interprets: protein interaction prediction
through tertiary structure. Bioinformatics 2003;19:161–2.
81. Lu L, Lu H, Skolnick J. Multiprospector: an algorithm for the
prediction of protein-proteininteractions by multimeric
threading. Proteins 2002;49:350–64.
82. Davis F, Braberg H, Shen M et al. Protein complex compositions predictedby structural similarity. Nucleic Acids Res
2006;34:2943–52.
83. Tonikian R, Zhang Y, Sazinsky S et al. A specificity map for the
pdz domain family. PLos Comp Biol 2008;6:e239.
84. Shelton H, Harris M. Hepatitis c virus ns 5 a protein binds the
sh 3 domain of the fyn tyrosine kinase with high affinity: mutagenic analysis of residues within the sh 3 domain that contribute to the interaction. Virology 2008;5:24.
85. Kadaveru K, Vyas J, Schiller M. Viral infection and human
diseaseinsights
from
minimotifs.
Front
Biosci
2008;13:6455–71.
86. Agrawal R, Srikant R. Fast algorithms for mining
association rules in large databases. In: Proceedings of
20th International Conference on Very Large Data Bases. San
Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 1994,
pp. 487–99.
87. Mukhopadhyay A, Ray S, Maulik U. Incorporating the type
and direction information in predicting novel regulatory
interactions between HIV-1 and human proteins using a
biclustering approach. BMC Bioinformatics 2014;15(1):26.
88. Mondal KC, Pasquier N, Mukhopadhyay A, et al. (2012)
Prediction of protein interactions on HIV-1-Human ppi data
Review on analysis and prediction of HIV-1–human PPI
using a novel closure-based integrated approach. In:
Proceedings of International Conferance of Bioinformatics-2012,
Algarve, Portugal, 2012.
89. Pasquier N, Bastide Y, Taouil R, et al. Discovering frequent
closed itemsets for association rules. In: In: Proceedings 7th
International Conference on Database Theory (ICDT-99). 1999, pp.
398–416.
90. Zaki M, Hsiao C. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng
2005;17:462–78.
|
851
91. Mujeeb A, Bishop K, Peterlin B, et al. NMR structure of a biologically active peptide containing the RNA-binding domain
of human immunodeficiency virus type 1 tat. Proc Natl Acad
Sci USA 1994;91:8248–52.
92. Vaishnav Y, Wong-Staal F. The biochemistry of aids. Annu
Rev Biochem 1991;60:577–630.
93. Nouretdinov I, Gammerman A, Qi Y, et al. Determining confidence of predicted interactions between hiv-1 and human
proteins using conformal method. Pac Symp Biocomput
2012;311–2.