Matching Multicopy Y-STR Markers In Closely - DNA

Matching Multicopy Y-STR Markers In
Closely Related Individuals
Dipl.- Ing Thomas Krahn
Matching multicopy Y-STR markers in closely related individuals – Multicopy Y-STR
markers are located on the symmetrical arms of palindromes. The DNA sequences on the
palindromic arms are nearly identical and therefore highly prone to intrachromosomal
recombination events. The talk will demonstrate how the recombination mechanisms will affect
apparent mismatches in closely related individuals and it will present new tools that can help
to understand the inrachromosomal re-arrangements in the Yq11 palindromic region.
What's a palindrome?
Palindromic word phrases
RADAR
Madam, I'm Adam.
So many dynamos.
Cleveland DNA: Level C
Red rum, sir, is murder.
Too bad, I hid a boot.
Sat in a taxi, left Felix at Anita's
Emil asleep, Hannah peels a lime.
Anne, I stay a day at Sienna.
Was it a car or a cat I saw?
Max, I stay away at six a.m.
Go hang a salami! I'm a lasagna hog!
Yawn a more Roman way!
What's a palindrome?
Palindromic word phrases:
Forward = Backward
RADAR
Nucleic acids:
Forward = Reverse Complement (Forward)
AGCTTCTAGTCGACTAGAAGCT
What's a palindrome?
Nucleic acids:
Forward = Reverse Complement (Forward)
AGCTTCTAGTCGACTAGAAGCT
Reverse Complement (AGCTTCTAGTCGACTAGAAGCT)
= AGCTTCTAGTCGACTAGAAGCT
What's a palindrome?
Nucleic acids:
Forward = Reverse Complement (Forward)
AGCTTCTAGTCGACTAGAAGCT
Biological Relevance of Palindromes
Restriction enzymes cut DNA at
palindromic recognition sequences
5'-G^A A T T C-3'
3'-C T T A A^G-5'
Palindrome
5'-G^A A T T C-3'
3'-C T T A A^G-5'
EcoRI
Biological Relevance of Palindromes
Formation of hairpins
Single stranded DNA with partial palindromic sequence
GAACTAGACTTAATGTGAGCTTATAAATTATAATGCTCTCACATTAAGTCTAGTTC
TATA
T
A
T
T
GAACTAGACTTAATGTGAGC
CTTGATCTGAATTACACTCG
A
T
AATA
Hairpin structure
Biological Relevance of Palindromes
Hairpins regulate RNA translation
High temperature / low ionic concentration
RNA
Ribosome
Protein
Low temperature / high ionic concentration
RNA
No translation
No protein
Palindromes in the ds Human Genome
Chromosome
Double stranded DNA
No hydrogen bond interaction
Palindromes in the ds Human Genome
Chromosome
But unexpected things happen...
Hydrogen bonding between
palindromic arms
Possible base differences will be
repaired by the cell's mutation
repair system
recombinational loss of heterozygosity (recLOH)
Recombinational Loss Of Heterozygosity (recLOH)
If we have a palindromic region (that means, that the DNA sequence has a loop and the ends of
the loop can be put parallel next to each other and the parallel ends have nearly exatly the same
DNA sequence, in other words an inverted repeat) we can model a simple structure like this:
LLL
L
AAAAAAAAAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBBBB
L
L
L
L
L
LLL
Recombinational Loss Of Heterozygosity (recLOH)
Palindromic sequences develop independently on each arm, so after some time they acquire
mismatches (M) that could be STR mutations or SNPs on both arms. If it comes to a hairpin
conformation, base pairing is not perfect and the cell tries to repair the apparent mutations.
LLL
L
AAAAAAAAMAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBMBBBBBBBB
L
L
L
L
L
LLL
Recombinational Loss Of Heterozygosity (recLOH)
Depending on the direction of the repair enzyme complex, some mutations get duplicated on the
other arm and some mutations disappear.
LLL
L
AAAAAAAAMAAAAAAAAAAAAAAAAAA
AAAABBBBBBBBBBBBBB
MBBBAAAAA
BBBBBBBBBBBBBBBBBBMBBBBBBBB
L
L
L
L
L
LLL
Recombinational Loss Of Heterozygosity (recLOH)
DYS459, DYS724 (CDY) and DYS464 are on the same palindrome called P1
centromere
N.N.
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYF399
T-type
DYS464
C-type
DYF399
C-type
DYS464
C-type
DYS725
188 bp
188 bp
N.N.
telomere
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYS725
P1
Recombinational Loss Of Heterozygosity (recLOH)
In this example the haplotype has different alleles at all marker pairs
After the recombination event the heterozygosity is lost => recLOH
10
9
centromere
N.N.
DYF397
DYF401
DYF387
DYS459
39
36
DYF385
DYS724
DYF371
C-type
DYF408
14
16
DYF399
T-type
DYS464
C-type
DYF399
C-type
DYS464
C-type
DYS725
188 bp
188 bp
N.N.
telomere
DYF397
DYF401
DYF387
DYS459
10
DYF385
DYS724
36
DYF371
C-type
DYF408
16
DYS725
P1
Typical recLOH Patterns In Genetic Genealogy
A classical DYS459, CDY, DYS464 recLOH
DYS459
9-10
9-10
9-10
9-10
9-10
10-10
9-9
DYS724 (CDY)
37-38
37-38
37-38
37-39
37-38
38-38
37-37
DYS464
14-14-16-18
14-15-16-18
14-14-16-18
14-14-16-18
14-14-16-18
14-14-18-18
14-16-18-18
Genetic distance?
1
0
1
0
1
1
Similar recLOH events happen at all palindromic multicopy markers
DYS385
YCAII
11-14
11-14
11-14
11-11
19-23
19-23
19-19
19-23
Source: Wikipedia / Public Domain United States government pictures
Highest Density of Palindromes in the
Human Genome
Liqing Zhang et al. "Patterns of Segmental Duplication in the Human Genome", Mol. Biol. Evol. 22(1):135–141. 2005
Symbolic Map of the Yq11
Palindromic Region (Version 2)
DYS413
P8
DYS413
DYS390
YCAII
DYF395
DYF371
T-type
DYF408
YCAII
DYF395
DYF371
C-type
DYF408
DYF411
DYS385b*
P5
P4
DYS452
DYS485
DYS385a*
Not to scale!
DYS392
DYF397
NCBI Mapviewer (modified)
DYF411
DYS461
DYS448
P3
DYF397
DYF399 ins G, T-type
DYS464
G-type
DYS725
P2
DYS464
C-type
N.N.
DYS725
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYF399
T-type
DYS464
C-type
DYF399
C-type
DYS464
C-type
DYS725
188 bp
188 bp
N.N.
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYS725
P1
UCSC Genome Browser on Human Mar. 2006 Assembly
DYS459
6
7
8
9
10
11
12
13
Scale:
6
4
0
3
2
2
0
0
0
7
8
9
10
11
12
13
2
9
174
120
0
0
0
404
6467
2540
16
0
0
7676
20908
300
4
1
554
9
0
0
0
0
0
0
0
0
0
1
10
100
1000
10000 100000
YCAII
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Scale:
11
0
0
0
0
0
0
0
0
3
9
72
11
0
0
0
0
0
12
13
14
15
16
17
0
0
0
0
0
0
0
1
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
7
0
0
0
1
0
0
0
0
0
0
0
1
1
3
0
1
0
0
0
0
11
0
19
1
21
3
33
1
0
0
0
0
1
10
100
1000 10000 100000
18
19
20
21
22
23
24
25
26
27
11
15
826
63
426
90 4180
45 2236
251 12245
2
748
0
23
0
1
0
1
382
196
41
145
8
0
0
0
418
89
89
7
0
0
0
166
46
7
3
0
0
123
40
10
0
0
0
0
0
0
0
0
0
0
0
0
DYS385
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Scale:
6
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
0
0
1
0
0
0
0
7
1
5
2
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
16
3
41
5
1
0
0
0
0
0
0
0
0
6
0
9
14
19
128
42
23
4
3
5
1
0
0
0
0
0
0
46
87
36
95
1143
222
43
44
67
60
43
13
3
0
0
0
0
1092
817
5181
27107
6037
1128
210
107
31
13
6
0
0
0
0
0
720
1032
3455
1399
511
250
172
124
83
10
2
5
1
0
0
1063
4150
1843
1377
1147
600
397
131
108
39
17
3
1
1
2976
2531
1161
782
417
136
88
16
8
1
0
0
0
1428
1545
641
404
255
91
31
5
1
0
0
0
653
863
990
302
104
12
5
0
0
0
0
365
892
249
40
12
8
0
0
0
0
216
84
25
2
1
0
0
0
0
29
9
2
0
1
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
10
100
1000
10000 100000
DYS724
9
26
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Scale:
9
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
26
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
1
6
5
2
4
22
8
2
2
1
0
0
0
0
0
0
0
5
5
5
29
41
38
27
14
17
5
3
1
0
0
0
0
11
23
55
55
80
63
90
46
6
7
4
1
1
0
0
27
53
106
95
70
75
45
33
11
12
1
1
1
0
61
168
229
235
173
199
153
52
16
7
0
0
0
182
467
435
547
433
294
132
58
14
1
0
0
404
926
1057
939
524
244
114
27
7
4
0
845
1846
1622
808
421
138
54
8
0
0
1051
1963
1151
444
132
23
7
2
0
935
957
447
134
31
4
2
0
290
220
83
12
4
0
0
82
40
8
1
0
0
12
2
1
0
0
8
0
0
0
1
0
0
0
0
0
0
1
10
100
1000 10000
DYS464 – A Complex Marker With (Usually) 4 Alleles
DYF397
DYS448
P3
DYF397
DYF399 ins G, T-type
DYS464
G-type
DYS725
P2
DYS464
C-type
DYS725
N.N.
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYF399
T-type
DYS464
C-type
DYF399
C-type
DYS464
C-type
DYS725
188 bp
188 bp
N.N.
telomere
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYS725
P1
DYS464 – A Complex Marker With (Usually) 4 Alleles
DYF397
P3
DYF397
DYF401
DYF387
DYF397
DYF399
ins G
T-type
188 bp
DYF399
C-type
N.N.
DYS464G
DYS725
DYS725
DYS464
DYS464
DYS725
DYS725
DYS464
P1/P2
DYF399
T-type
DYF408
alternative conformation
DYF385
DYS459
DYF387
DYF401
DYF385
188 bp
DYF408
DYF371
DYS724
DYS459
DYF397
N.N.
DYF371
DYS724
telomere
DYS464 Extended PCR
Fluorescein
JOE
TAMRA
DYS464 Extended PCR
If the primer ending with G yields a peak,
peak the allele is classed as a G-type.
If the primer ending with C yields a peak,
peak the allele is classed as a C-type.
DYS464-forward
DYS464-reverse-G-Type
DYS464-reverse-C-Type
FAM-ttacgagctttgggctatg
(exactly like Redd paper)
FAM
JOE-cctgggtaacagagagactctttca
g (ending with G)
JOE
TMR-cctgggtaacagagagactctttca
c (ending with C)
TMR
PCR parameters: 30 x (90°, 65°, 70°) and 60° end-elongation 1/2 h
Peaks with a green fluorescence (JOE)
JOE are called G-type.
Peaks with a yellow fluorescence (TMR)
TMR are called C-type.
DYS464 Extended PCR
Electropherogram
DYS464 Extended PCR
Electropherogram
DYS464 Extended PCR
Haplogroup I
No C-type alleles!
Typing of
DYS464X
Y hgroup
A
E
E3b1
G
G2*
I
I1a
I1a3
I1b
I1b
I1b2a
I1b2a1
I1c
J2a1*
N
R1a1*
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
R1b
DYS 464
11g-13g-13g-16g
14g-15.3g-17g-18g
14g-15.3g-17g-18g
13g-14g-15g-15g
12g-12g-12g-13g
12g-14g-15g-16g
12g-14g-14g-16g
12g-12g-14g-14g-15g-16g
11g-14g-14g-14g
11g-14g-14g-15g
11g-14g-14g-15g
11g-11g-14g-15g
14g-15g-15g-16g
12g-13g-15g-16g-16g-16g
14g-14.3g
12g-15g-15g-16g
16c-16c-16g-16g
15c-16c
15c-15c-17c-17g
14c-16c-17c-17g
14c-15c-16g-17c
16c-16g
15c-15c-16c-16g
14c-15c-17c-17g
15c-16c-17g-17g
15c-16c
15c-15c-15c-15c
15c-15c-17c-17g
15c-17c-17c-18g
15c-15c-17c-18g
15c-15c-16g-17c
16c-16c-17c-17g
14c-15c-15c-15g
15c-15c-16c-18g
15c-15c-16c-17.1g
15c-15c-16g-17c
13c-15c-17c-17g
15c-15c-17g-17g
15c-16c-16c-18g
15c-15c-16c-17c
Other haplogroups have
only G-type alleles
R1b has usually 3 C-type
alleles and one G-type allele
Exceptions most likely
products of recLOH
What Is The Use Of The DYS464X Test?
Most R1b males that we have tested show 3 C-types and 1 G-type.
All other haplogroups (including R1a) show 4 G-types.
1.) The DYS464X method can verify, if a person is really R1b
2.) In many cases we can determine, if a single large peak really represents two different alleles
3.) Some DYS464 patterns look similar when the conventional test is used, but really consist of
completely different alleles. For example:
14c-15c-17c-17g is completely different from
14c-15g-17c-17c,
but this wouldn't be seen with the conventional typing method.
4.) DYS464X is a method that helps us observe recombinational loss of heterocygosity (recLOH)
DYS464 Examples From An Arbitrary Surname Project:
(conventional results)
surname A, 1: 14-14-16-17
surname A, 2: 14-14-16-17
surname A, 3: 14-14-16-17
surname B, 1: 14-14-16-18
surname C, 1: 14-14-15-17
If all the 5 persons are R1b then they can be typed with an extended DYS464 test.
We might get the following results:
surname A, 1: 14c-14c-16g-17c
surname A, 2: 14c-14c-16g-17c
surname A, 3: 14c-14c-16c-17g
surname B, 1: 14c-14c-16g-18c
surname C, 1: 14c-14c-15g-17c
From that result we could say:
- A1 and A2 match completely
- A3 has at least a genetic distance of 2 from (A1 & A2)
- B1 has a genetic distance of 1 from (A1 & A2)
- C1 has a genetic distance of 1 from (A1 & A2)
DYF397
DYS448
P3
DYF397
DYF399 ins G, T-type
DYS464
G-type
DYF399
A Fast Moving,
Asymmetrical Palindromic Y-STR
DYS725
P2
DYS464
C-type
DYS725
N.N.
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYF399
T-type
DYS464
C-type
DYF399
C-type
DYS464
C-type
DYS725
188 bp
188 bp
N.N.
ins G
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYS725
P1
poly A
C-type and T-type
reverse primers
DYF399
A Fast Moving,
Asymmetrical Palindromic Y-STR
DYF399 - A Fast Moving,
Asymmetrical Palindromic Y-STR
DYS425 / DYF371
DYS413
P8
DYS413
DYS390
DYF395
DYF371
T-type
YCAII
DYF395
DYF371
C-type
DYF411
DYS385b*
YCAII
DYF408
P5
DYF408
P4
DYF411
DYS461
DYS425 = DYF371 T-type allele
The T-type SNP can get lost by a recLOH
DYS385a*
This is seen as a “NULL-Allele” if only DYS425 is tested
DYS452
DYS485
DYS392
DYF397
DYS448
P3
DYF397
DYF399 ins G, T-type
DYS464
G-type
DYS725
P2
DYS464
C-type
DYS725
N.N.
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYF399
T-type
DYS464
C-type
DYF399
C-type
DYS464
C-type
DYS725
188 bp
188 bp
N.N.
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYS725
P1
DYS425 / DYF371
The HUGO sequence has also a Null allele at DYS425
10c-10c-13c-14c
Normally in R1b (and most other haplogroups):
10c-12t-13c-14c
How comes It To A Deletion?
Symetry in the red/red (P1P2) region allows
another irregular conformation:
DYF397
P3
DYF397
DYF399
ins G
T-type
Recombination breakpoint
DYS464
DYS725
DYS725
DYS464
DYF399
C-type
DYF408
188 bp
188 bp
DYF399
T-type
DYS464G
DYS725
DYS725
DYS464
DYF408
N.N.
Circle conformation
DYF397
DYF371
DYS724
DYF385
DYF387
DYF401
How comes It To A Deletion?
The circular DNA molecule can't replicate on
its own and gets lost in the next cell cycle
DYF397
P3
DYF397
DYF399
ins G
T-type
DYS464
DYS725
DYS725
DYS464
DYF399
DYF408
188 bp
188 bp
DYF399
T-type
DYS464G
DYS725
DYS725
DYS464
DYF408
N.N.
Deletion
DYF397
DYF371
DYS724
DYF385
DYS459
DYF387
DYF401
Famous People with a P1/P2 Deletion
Symbolic Map of the Yq11
Palindromic Region (Version 2)
DYS413
P8
DYS413
DYS390
DYF395
DYF371
T-type
YCAII
DYF395
DYF371
C-type
DYF411
DYS385b*
YCAII
DYF408
P5
DYF408
P4
DYF411
DYS461
DYS385a*
DYS452
DYS485
Not to scale!
DYS392
DYF397
DYS448
P3
DYF397
DYF399 ins G, T-type
DYS464
G-type
DYS725
P2
DYS464
C-type
DYS725
N.N.
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYF399
T-type
DYS464
C-type
DYF399
C-type
DYS464
C-type
DYS725
188 bp
188 bp
N.N.
DYF397
DYF401
DYF387
DYS459
DYF385
DYS724
DYF371
C-type
DYF408
DYS725
P1