Position specific effect of SNP on signal ratio from long

Position specific effect of SNP on signal ratio from long
oligonucleotide CGH microarrays; most single probe
aberrations represent genuine genomic variants
Noyes
1
HA
Rennie K4
Hulme H2
Kemp
1
SJ
Hoyle
DC3
Detection of copy number variations (CNV) on microarrays can be hampered by the presence of Single Nucleotide Polymorphisms (SNP). In order to be confident of calling genuine
CNV rather than SNP, multiple contiguous probes are required to have non-zero log2 signal ratios. Consequently, only large CNV > ~5kb can be detected on typical CNV long-oligo
arrays with probe densities 1 per 2kb. However the majority of CNV are probably <5kb (Nat Genet 2006, 38:82-85).
SNP data from the Perlegen 8 million SNP set and log2 signal ratios from ~300,000 long oligos were integrated in order to characterise the effect of SNP on log2 signal ratio and the
effect of the position of the SNP within the probe. The maximum length of perfect match between probe and target appeared to be the dominant factor that affected hybridisation. The
reduction in effective length of probe meant that single base changes could have a large effect on signal ratio and therefore be detectable on the long oligo arrays. Sequence
differences were only expected to give high log2 signal ratios in our study design; therefore probes with low log2 signal ratios were potentially caused by CNV. Approximately 1000
probes with low log signal ratios were identified which were candidates for small CNV that would not have been identified by existing analysis approaches.
Most single probe aberrations appeared to be caused by genuine biological variants and were not due to experimental noise. Long-oligo CGH arrays can therefore provide more
information than previously thought. The position specific effect of SNP will be useful for microarray design.
Large excess of high signal ratios
when C57BL/6 used as common
reference.
Agilent High density custom tiling array
of 60 mer probes showing part of
Mmu1. Note the much larger number
of red probes (C57BL/6) with non zero
log2 signal ratios than green (A/J)
probes. In all experiments C57BL/6
DNA was used as a control and AJ,
Balb/c or 129 was used as test. 2-10
times as many C57BL/6 probes had
non zero log2 signal ratios as test
strains in hybridisations with 3 different
test DNA to two different array designs.
Since the arrays were designed
against the C57BL/6 genome
sequence it is possible that many of
the non-zero log2 signal ratios in
C57BL/6 are caused by SNP in the test
strains leading to poor probe binding.
A
Brass
2,4
A
B
1 School
of
Biological
Sciences,
BioSciences
Building,
University of
Liverpool, Crown
Street, Liverpool,
L69 7ZB, UK
Alternative models
Alternative models for the effect of SNP on
hybridisation to target. In A the sequences
between the two SNP act as independent probes
and only the longest has a high enough melting
temperature to bind at the annealing temperature
of the reaction. In B the SNP cause small loops
in the probe target duplex and has little effect on
probe melting temperature.
2 School
of
Computer
Science, Kilburn
Building,
University of
Manchester,
Oxford Road,
Manchester, M13
9PL, UK
3North
West
Institute of BioHealth
Informatics,
School of
Medicine,
Stopford Building,
Oxford Road,
Manchester, M13
9PT, UK
4Faculty
of Life
Sciences,
University of
Manchester,
Smith Building,
Oxford Road,
Manchester, M13
9PT, UK
Effect of SNP on log2 signal ratio
The global signal ratio was normalised to 0 but probes that contained a SNP
in the Perlegen set had positive non zero log2 signal ratios and the ratio
increased with number of SNP in the probe (r2 = 0.995). There was also a
significant excess of probes with SNP amongst the probes that had a log2
signal ratio > 1 (p < 10-5).
Effect of Allele combinations on Signal Ratio
Although all SNP were associated with high log2 signal
ratios some combinations of alleles appeared to be
more destabalising than others. Pyrimidine to G
mutations had the largest effect, these would lead to
highly unstable GG or GA mismatches. The data also
suggested that the orientation of the mis-matched
bases on probe or target might be significant.
Length of perfect match between probe and target determines risk of log2 signal
ratio exceeding any given threshold
The strong association between the presence of SNP and positive log2 signal ratios
suggested that model A in panel 1 was the dominant effect. If this is the case then the
signal ratio would be related to the length of perfect match. The histogram shows the
number of probes with log2 signal ratios > 0.3 and < 0.3 for each length of perfect match
between 30 and 59 for probes with one SNP. The clear reduction in the number of probes
with log2 signal ratio > 0.3 the nearer the SNP is to the end of the probe suggests that
length of perfect match is an important component of probe target interaction.
Evidence for additional SNP not in
Perlegen Set
Although there was a very large excess of
SNP in the probes with high signal ratios only
about 1/6th of the probes with high log signal
ratios contained SNP within the Perlegen
dataset.
In order to discover whether some of these
probes might contain SNP not detected by
Perlegen the 500bp region either side of
each probe with a log2 signal ratio > 1 but no
SNP was scanned for SNP. A significant
excess of probes with log2 signal ratio > 1
had a SNP within 500bp compared with the
same number of random probes, suggesting
that many of the probes that did not contain a
published SNP did in fact contain a SNP or
other genomic variation.
Possible additional small CNV
There were 1311 probes with log2 signal ratio < -1 in at least one strain and 519 in at least two strains.
Since these are unlikely to have been caused by SNP they may have been caused by small CNV in the
test strains. 32 of these probes were in regions that were found to be multicopy in C57BL/6 by BLAST
search. Many of the remainder may be multicopy in the test strains but single copy in C57BL6.
ACKNOWLEDGEMENTS: We thank Leanne Wardlesworth for technical assistance and support from Tara Hall of Agilent. These studies were funded by the Wellcome Trust.