Pearson`s chi-squared goodness of fit test is used to determine the

Pearson's chi-squared goodness of fit test is used to determine the probabilities that the RefSeq gene and custom genomic features analyses
results are due to chance. The proximity of 100,000 randomly generated integration sites to the set of RefSeq genes and custom genomic features
are used as the expected values for these tests (a link to this file can be found under ‘Test data’ on the VISA home page). To find the expected
number of integrations for each column, the expected ratio is determined and then multiplied by the total observed number for that category. Yate’s
correction is applied when appropriate. The degrees of freedom and chi-square statistic are used to look up the p-value range for each chi-squared
test. More specific steps for these analyses are demonstrated by the hypothetical VISA results below.
RefSeq genes results
Unique RIS
Number of
observed
Number of
expected
1083
100000
Number of
expected
45
2731
21
2748
710
49889
Upstream > 50kb
238
23879
Outside of genes
Within 5 kb upstream of the start of the
closest gene
Within 5 kb downstream of the end of the
closest gene
Within or outside of genes
Within genes
Outside of genes
Number of
observed
307
776
44632
55368
More than 5 kb away from the closest gene
Promoter distance
Within genes
In first eighth of gene
47
5673
Upstream 10-50kb
106
13600
In second eighth of gene
37
5485
Upstream 5-10kb
24
3048
In third eighth of gene
30
5592
Upstream 2.5-5kb
24
1804
In fourth eighth of gene
48
5579
Upstream 1-2.5kb
12
1107
In fifth eighth of gene
44
5726
Upstream 1kb
13
788
In sixth eighth of gene
32
5572
Downstream 1kb
7
807
In seventh eighth of gene
35
5399
Downstream 1-2.5kb
8
1273
In eighth eighth of gene
34
5606
Downstream 2.5-5kb
19
2018
Downstream 5-10kb
30
3459
Downstream 10-50kb
127
17036
Downstream > 50kb
475
31181
Chi-squared tests
Within or outside of genes
*Note: All unique integration sites are counted when calculating the expected values in this test.
Expected
ratio
Observed
number
Expected
number
Within
gene
Outside of
gene
0.446
0.554
307
776
483.365
599.635
64.350
51.872
2
(O - E) / E
Total
1083*
116.222
Chi-square statistic = 116.222
Degrees of freedom = 1
p-value < 0.0001
Within genes
*Note: Only integration sites within genes are counted when calculating the expected values in this test.
Expected
ratio
Observed
number
Expected
number
1st eighth
2nd eighth
3rd eighth
4th eighth
5th eighth
6th eighth
7th eighth
8th eighth
0.127
0.123
0.125
0.125
0.128
0.125
0.121
0.126
47
37
30
48
44
32
35
34
39.022
37.728
38.464
38.375
39.386
38.327
37.137
38.561
1.631
0.014
1.863
2.414
0.54
1.044
0.123
0.539
2
(O - E) / E
Chi-square statistic = 8.168
Degrees of freedom = 7
p-value > 0.05
Total
307*
8.168
Outside of genes
*Note: Only integration sites outside of genes are counted when calculating the expected values in this test.
Expected
ratio
Observed
number
Expected
number
< 5kb
upstream
< 5 kb
downstream
> 5 kb
from gene
0.049
0.050
0.901
45
21
710
38.276
38.514
699.21
1.181
7.964
0.167
(O - E)2 / E
Total
776*
9.312
Chi-square statistic = 9.312
Degrees of freedom = 2
p-value < 0.01
Promoter distance
*Note: All unique integration sites are counted when calculating the expected values in this test.
Expected
ratio
Observed
number
Expected
number
(O - E)2 / E
Upstream
> 50kb
Upstream
10-50kb
Upstream
5-10kb
Upstream
2.5-5kb
Upstream
1-2.5kb
Upstream
1kb
Downstream
1kb
Downstream
1-2.5kb
Downstream
2.5-5kb
Downstream
5-10kb
Downstream
10-50kb
Downstream
> 50kb
0.239
0.136
0.030
0.018
0.011
0.008
0.008
0.013
0.020
0.035
0.170
0.312
238
106
24
24
12
13
7
8
19
30
127
475
258.837
147.288
32.490
19.494
11.913
8.664
8.664
14.079
21.660
37.905
184.11
337.896
1.642
11.574
2.459
1.019
0.000
2.337
0.346
2.429
0.373
1.486
17.920
55.832
Chi-square statistic = 97.418
Degrees of freedom = 11
p-value < 0.0001
Total
1083*
97.418
Custom genomic features
The chi-squared test are performed using the same strategy as demonstrated above for the custom genomic features analysis results. All unique
integration sites are counted when calculating the expected values for the chi-squared tests for the custom genomic features categories.