SnpEff_snps_comparis..

snpEff: Evaluation of Available
Versions
David Roazen
Genome Sequencing and Analysis
Medical and Population Genetics
January 3, 2012
Missense/Silent Ratio in a 1000G Gencode-Annotated
VCF, and with snpEff run on the Same Variants
Missense
Silent
Missense/Silent
1000G Phase 1 SNP calls with
Gencode 7 coding annotations1
299367
208171
1.44
snpEff 2.0.2 + GRCh37.63
341742
146079
2.34
snpEff 2.0.4 RC3 + GRCh37.64
297106
202174
1.47
snpEff 2.0.5 + GRCh37.64
297106
202174
1.47
snpEff 2.0.5 + GRCh37.65
341486
150938
2.26
1
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20111220_coding_annotation_phase1/
Overall Concordance with the 1000G Gencode SNP
Annotations
Silent
Missense
Nonsense
snpEff 2.0.2 + GRCh37.63
70.13%
97.21%
98.77%
snpEff 2.0.4 RC3 + GRCh37.64
97.11%
99.0%
98.40%
snpEff 2.0.5 + GRCh37.64
97.11%
99.0%
98.40%
snpEff 2.0.5 + GRCh37.65
72.48%
98.05%
98.34%
Summary:
•  Both snpEff 2.0.2 + GRCh37.63 and snpEff 2.0.5 + GRCh37.65 produce
an abnormally high Missense:Silent ratio, with elevated levels of Missense
mutations across the entire spectrum of allele counts. They also have a
relatively low (~70%) level of concordance with the 1000G Gencode
annotations when it comes to Silent mutations.
•  This suggests that these combinations of snpEff/database versions
incorrectly annotate many Silent mutations as Missense.
•  snpEff 2.0.4 RC3 + GRCh37.64 and snpEff 2.0.5 + GRCh37.64 produce
a Missense:Silent ratio in line with expectations, and have a very high
(~97%-99%) level of concordance with the 1000G Gencode annotations
across all categories.