large common neighbors in human protein-protein

1
Supporting Information
Algorithm II:
P2 ( X and Y share A |  , N )
m
 [ P( Z i connects both X and Y |  , N )] P(no other protein connects both X and Y |  , N )
i 1
 N  1  N  2 

 

n

1
n

2
 i

 i
N
N

1
  

i A
  

n
n

1
 i  i 

iA
ni (ni  1)
N ( N  1)
Let M 1 


(1 
iA,i
ni (ni  1)
)
N ( N  1)
ni (ni  1)
) for any randomly picked protein
N ( N  1)

(1 
(1 
ni (ni  1)
) for any randomly picked protein X 2 and Y2 . Also let
N ( N  1)
i1A1 , i1
M2 
 N  1  N  2 

 

n

1
n

2
 i
)
(1   i

N
N

1
  

i A,i
  

n
n

1
 i  i

i2  A2 ,i2 
X 1 and Y1 ; let
Q  {i1}  {i2 }. We can easily see that there are more then N  2m common elements in Q.
Therefore, we have the following inequality:
m i nM
( 1 ) M 1 m a xM
( 1)


m a xM
( 2 ) M 2 m i nM
( 2)

2m
 (1 
i 1
max( ni )(max( ni )  1)
M
) 1 
N ( N  1)
M2
1
max( ni )(max( ni )  1)
(1 
)

N ( N  1)
i 1
2m
For our human PPI network, N = 7,362, 1  m  45 , 1  ni  157 . (Simulations 1,000 times
for the simple random PPI network and the power law–preserving random PPI network also
show the same result.). Thus we arrive at the following inequality:


0.96 
M1
 1.04
M2
M1  M 2 and log( M1 )  log( M 2 )
2
Thus, we consider

(1 
iA,i
ni (ni  1)
n (n  1)
. For
) a constant, and we derive P2   i i
N ( N  1)
iA N ( N  1)
ni (ni 1)
in our paper.
N(N 1)
iA
convenience, we use P2  

Mathematical expression for the probability that three proteins share m interacting
partners: To compute this probability, we count the number of distinct ways in which three
proteins with n1 , n3 and n3 interacting partners have m in common. We divide the whole set
of partners of the three proteins into seven nonoverlapping groups: (i) m common protein
partners that interact with proteins 1, 2 and 3; (ii) n12  m proteins that interact only with
proteins 1 and 2; (iii) n23  m proteins that interact only with proteins 2 and 3; (iv) n13  m
proteins that interact only with proteins 1 and 3; (v) n1  n12  n13  m partners that interact
only with protein 1; (vi) n2  n12  n23  m partners that interact only with protein 2; and (vii)
n3  n13  n23  m partners that interact only with protein 3. We count the total number of
distinct ways of assigning these seven groups to N proteins. This is given by:
N N  m N  n12 N  n12  n 23  mN  n12  n 23  n13  2mN  n1  n 23  m N  n1  n 2  n12 
 






mn12  mn 23  m n13  m
 n1  n12  n13  m n 2  n12  n 23  mn 3  n13  n 23  m
The total number of ways to randomly pick n1 , n 2 and n3 proteins from N proteins is given

 N  N  N 
by:     .
 n1  n2  n3 
Therefore, the probability that three proteins share m interacting partners is given as follows:
 N  N  m  N  n12  N  n12  n23  m  N  n12  n23  n13  2m  N  n1  n23  m  N  n1  n2  n12 






 
m  n12  m  n23  m 
n13  m
n1  n12  n13  m  n2  n12  n23  m  n3  n13  n23  m 



P1 (m | N , n1, n2 , n3 ) 
 N  N  N 
   
 n1  n2  n3 
3
Assessing the reliability of functional predictions for GO and KEGG annotations. If a
protein has at least one annotated significant partner, a list of annotation(s) from its partner(s)
can be sorted by frequency. Suppose that annotations occurring n times or more will be
assigned to this protein. For an annotated protein (based on GO and KEGG annotations), if an
assigned annotation occurs among its known functions, we consider this a correct prediction.
Assuming that GO and KEGG annotations are complete for those annotated proteins, we
define a prediction precise rate as
total number of correct prediction s
and, as n varies, we
total number of prediction s
have different precise rates (Fig. S3) which we used to estimate the FDRs of our functional
predictions ( FDR  1  precise rate ), hence to assess the reliability of our functional
predictions. From Fig. S3, we decided to use n=2 (for KEGG) and n=4 (for GO) as the
thresholds of minimum frequency of functions shared by significant partners, which gave us
relatively low FDRs (21% for KEGG and 30% for GO) without sacrificing too many
predictions (466 predictions for KEGG and 123 predictions for GO were made).