SupplS1

Comparison between global permutation and local
permutation in univariate procedures
In the univariate procedures, assuming a gene set contains n genes, the gene level statistic is
denoted by Xi, the vector of Xi is denoted by X, and gene set level statistic S is calculated as the
average of gene level statistics. The mean and variance of S is calculated as follows.
1 n 
E (S )  E  i X i 
n

1 n
 i E  X i 
n
1 n 
var( S )  var   i X i 
n

2
1  
n

 2  E   i X i    E
n  
 


  X  
2
n
i
i




2
1 
n
n
n
n
2
E
 
E
X

X
X

X





i
i
j
i
i
i
j ,i  j
i

 
n2 
2
1
n
n
n
n
n
n
 2  i E  X i2    i  j ,i  j E  X i X j    i  E  X i     i  j ,i  j E  X i  E  X j 
n
2
1
n
n
n
 2  i E  X i2    E  X i     i  j ,i  j  E  X i X j   E  X i  E  X j  
n
1
n
n
n
 2   i var  X i    i  j ,i  j cov  X i X j  

n 
Distribution of X is generated by permuting the expression matrix. There are two ways to permute.
One is to permute the sample labels while keep the correlations of genes and we call it as global
permutation. The other is to permute sample labels towards every gene so that there are no
correlations among genes. The second one is called local permutation. For clarity, we assume
correlations among genes are all positive. In the global permutation, covariance of any two genes
exists and variance of S would increase along to the increasing of pair-wise correlations. While in
the local permutation, all pair-wise covariance are equal to zero and the variance of S is not
affected by the correlations. Therefore, if genes in a gene set have high pair-wise correlations, the
gene set would gain little chance to be significant if its null distribution is generated by global
permutation.


 



Simulation study
In this section, we would like to show how different permutation methods affect the assessment of
significant gene sets.
1. Gene set with one regulation direction
Assuming a gene set contains 10 genes under two sample classes, i.e. the treatment class and
control class, where each class contains 50 replications. Simulated data is generated from a
multivariate normal distribution. Mean values of genes in control class are set to -0.1, and for
treatment class mean values are set to 0.1. Pair-wise correlations are set from 0 to 0.99, increased
by 0.01. Gene level statistic is the two-sample t-values and the gene set level statistic is the
average of gene level statistics. P-values for gene sets are calculated from 1000 permutations. As
is illustrated in figure 1, the variance of the gene set level statistic arise with the correlations under
global permutations while keeps constant under local permutations. As a result, there are more
chances for gene set analysis method using global permutation to lose significant gene sets. The
proportion of the gene sets with p-value ≤ 0.01 in global permutation is 0.25 and proportion in
local permutation is 0.66.
Fig. 1. Comparison between global permutation and local permutation. Red dots represent significant gene sets
with p-values ≤ 0.01.
2. Gene set with two regulation directions
Assuming a gene set contains 10 genes under two sample classes, i.e. the treatment class and
control class, where each class contains 50 replications. Simulated data is generated from a
multivariate normal distribution. Mean values for the first 5 genes are set to 0.1 in treatment class
and -0.1in control class, and mean values for the last 5 genes is set to -0.1 in treatment class and
0.1 in control class. Pair-wise correlations within the first 5 genes are set to , correlations within
the last 5 genes are set to  and correlations between the first 5 genes and the last 5 genes are set
to –.  is set from 0 to 0.99, increased by 0.01. Gene level statistic is the absolute value of the
two-sample t-values, and the set level statistic is the average of gene level statistics. P-values are
calculated from 1000 permutations. As is illustrated in figure 2, the proportion of the gene sets
with p-value ≤ 0.01 in global permutation is 0.38 and proportion in local permutation is 0.63.
Fig. 2. Comparison between global permutation and local permutation. Red dots represent significant gene sets
with p-values ≤ 0.01.