Comparison between global permutation and local permutation in univariate procedures In the univariate procedures, assuming a gene set contains n genes, the gene level statistic is denoted by Xi, the vector of Xi is denoted by X, and gene set level statistic S is calculated as the average of gene level statistics. The mean and variance of S is calculated as follows. 1 n E (S ) E i X i n 1 n i E X i n 1 n var( S ) var i X i n 2 1 n 2 E i X i E n X 2 n i i 2 1 n n n n 2 E E X X X X i i j i i i j ,i j i n2 2 1 n n n n n n 2 i E X i2 i j ,i j E X i X j i E X i i j ,i j E X i E X j n 2 1 n n n 2 i E X i2 E X i i j ,i j E X i X j E X i E X j n 1 n n n 2 i var X i i j ,i j cov X i X j n Distribution of X is generated by permuting the expression matrix. There are two ways to permute. One is to permute the sample labels while keep the correlations of genes and we call it as global permutation. The other is to permute sample labels towards every gene so that there are no correlations among genes. The second one is called local permutation. For clarity, we assume correlations among genes are all positive. In the global permutation, covariance of any two genes exists and variance of S would increase along to the increasing of pair-wise correlations. While in the local permutation, all pair-wise covariance are equal to zero and the variance of S is not affected by the correlations. Therefore, if genes in a gene set have high pair-wise correlations, the gene set would gain little chance to be significant if its null distribution is generated by global permutation. Simulation study In this section, we would like to show how different permutation methods affect the assessment of significant gene sets. 1. Gene set with one regulation direction Assuming a gene set contains 10 genes under two sample classes, i.e. the treatment class and control class, where each class contains 50 replications. Simulated data is generated from a multivariate normal distribution. Mean values of genes in control class are set to -0.1, and for treatment class mean values are set to 0.1. Pair-wise correlations are set from 0 to 0.99, increased by 0.01. Gene level statistic is the two-sample t-values and the gene set level statistic is the average of gene level statistics. P-values for gene sets are calculated from 1000 permutations. As is illustrated in figure 1, the variance of the gene set level statistic arise with the correlations under global permutations while keeps constant under local permutations. As a result, there are more chances for gene set analysis method using global permutation to lose significant gene sets. The proportion of the gene sets with p-value ≤ 0.01 in global permutation is 0.25 and proportion in local permutation is 0.66. Fig. 1. Comparison between global permutation and local permutation. Red dots represent significant gene sets with p-values ≤ 0.01. 2. Gene set with two regulation directions Assuming a gene set contains 10 genes under two sample classes, i.e. the treatment class and control class, where each class contains 50 replications. Simulated data is generated from a multivariate normal distribution. Mean values for the first 5 genes are set to 0.1 in treatment class and -0.1in control class, and mean values for the last 5 genes is set to -0.1 in treatment class and 0.1 in control class. Pair-wise correlations within the first 5 genes are set to , correlations within the last 5 genes are set to and correlations between the first 5 genes and the last 5 genes are set to –. is set from 0 to 0.99, increased by 0.01. Gene level statistic is the absolute value of the two-sample t-values, and the set level statistic is the average of gene level statistics. P-values are calculated from 1000 permutations. As is illustrated in figure 2, the proportion of the gene sets with p-value ≤ 0.01 in global permutation is 0.38 and proportion in local permutation is 0.63. Fig. 2. Comparison between global permutation and local permutation. Red dots represent significant gene sets with p-values ≤ 0.01.
© Copyright 2026 Paperzz