20161109093010151

DNA LiRa 2.0
R. Puch-Solis, M. Barron, R. Young, H. Tazeem
T. Clayton, J. Thomsom
9th November 2016
The Team
R. Puch-Solis Statistician
M. Barron
R. Young
H. Tazeem
Software Developers
T. Clayton
J. Thomson
DNA Interpretation Lead
DNA Technical Lead
T. Gravesen
Statistician (Collaborator)
2
Overview
1. Statistical Evaluation Requirement
2. Software architecture
3. Bayesian Networks
4. Gamma models
5. Preliminary result
6. Further work
7. Conclusion
3
Requirement
1. Up to four person mixtures
2. Low level DNA
3. Multiple profiles (replicates)
4. Dropin
5. Dropout
6. Stutters
7. About 100 users in several sites
8. About ten thousand cases per year
9. LRs and Mixture Deconvolution
4
Software Architecture
···
Webpage
Profiles
Database
Calculation Servers
Scalable
Calculations
Database
Accessible from any LGC site and externally through VPN
5
Stutters
Allele
Back Stutter
One STR less
Double Back Stutter
Two STR less
Forward Stutter
One STR more Stutter
15 16
17 18
D3
0.007
0.083
0.006
D22
0.009 0.107
0.057
Back Stutter Proportion =
Back Stutter Height
Allele Height + All Stutter Heights
6
Stutters – D22
Dropin peaks can be at the same height as stutters
7
Main task
Stain
Profile
Pr(
Putative
Genotype
|
g1  29,31
g2  29,32.2
ω, parameters
Mixing
Proportion
)
Other
Parameters
8
Gamma Dist. per Peak
The probability density of a stain profile is obtained by multiplying the length of
each read line and the area of the blue triangle
9
Extending the methodology
Number of Parameters
Question
Two issues
Model all stutters while
using this methodology
Size of Conditional
Probability Tables
10
Genotypes of
Contributor 1
Peak heights
Genotypes of
Contributor 2
11
Size of Conditional Probability Tables
O3 conditional probability table is of size 36 × 2 = 1,458
n1,3
n1,4
O3
n2,3
n2,4
n3,3
n3,4
12
Size of Conditional Probability Tables
O3 conditional probability table is of size 312 × 2 = 1,062,882
n1,2
n1,3
n1,4
n1,5
n3,2
n3,3
O3
n2,2
n2,3
n2,4
n2,5
n3,4
n3,5
Clique tables in a junction tree becomes too large!
Go Back to Standard method: List Genotypes
13
Number of parameters
Stutter
Proportion


Scale Par.
Gamma Dist.
Shape Parameter
Height of an Allele
To extend the methodology
 , ,
(l )
2
(l )
1
(l )
1
 , , ,
(l )
2
(l )
1
(l )
0
(l )
1
l  {1, 2, ,16}
It is not possible to estimate from stain profiles (112 parameters)
Estimate from profiles of known origin (dilution series)
Following Puch-Solis et al. (2013)
Evaluating forensic DNA profiles using peak heights, allowing for
multiple donors, allelic dropout and stutters
14
Estimating Parameters
Back Stutter Height
D3
95% Prob. Int.
99% Prob. Int.
DNA Qty Proxy
s   (1l ) 
Gamma regression through the origin
All stutters and alleles
15
Preliminary Result
Promega ESI 17 Profile (16 loci + Amelogenin)
Target Mixing Proportion: (0.8,0.2)
Estimated: (0.805,0.195)
Correct Genotype Pair
In position 1 in 11 loci
In position 2 in 4 loci
In position 3 in 1 locus
The overall testing will contain hundreds of single profiles and 2,3
and four person mixtures.
However ...
16
Mixture of Distributions
Back Stutter Height
D2S1
DNA Qty Proxy
Allele 16
Allele Dependence
Allele 25
17
Back Stutter Proportion
Allele Effect
D2S1
Allele
18
Back Stutter Proportion
Motif Effect
D2S1
Length of LUS
LUS: Longest Uninterrupted Sequence
Brookes et al. (2012)
Bright, Curran, Buckleton (2014).
Modelling PowerPlex Y stutters and Artifacts
19
Motifs
D2S1
Allele Motif
[TGCC]4[TTCC]13
[TGCC]5[TTCC]12
17
[TGCC]6[TTCC]11
[TGCC]6[TTCC]14
[TGCC]7[TCCC][TTCC]12
20
[TGCC]7[TTCC]10[GTCC][TTCC]2
[TGCC]7[TTCC]13
[TGCC]7[TTCC]2[TTTC][TTCC]10
[TGCC]8[TTCC]12
[TGCC]6[TTCC]14[GTCC][TTCC]2
[TGCC]7[TTCC]13[GTCC][TTCC]2
23
[TGCC]7[TTCC]16
[TGCC]9[TTCC]14
LLUS
13
12
11
14
12
13
10
12
14
13
16
14
Source: Gettings et al. (2016), STRBase
20
Back Stutter Height
D3, Allele 15
95% Prob. Int.
99% Prob. Int.
DNA Qty Proxy
Based on Estimation of Stut. Prop. as a function of LLUS
Higher coefficient of variation at low levels
21
Mixture of Gammas
If an allele has two motifs with different length of LUS,
the pdf of a stutter:
f (hs )  p1 f (hs | 1, )  p2 f (hs | 2 , )
Prevalence
of motif 1 in a
population
Gamma pdf for
motif 1
Prevalence
of motif 2 in a
population
Gamma pdf for
motif 2
The addition of two mixed distribution is a mixed distribution
22
Dropin
Dropin
Heights
pdf
Extraction Negative profiles
23
Dropin
Stain Profile
Putative donor is 10,11
Peak 13 explain as a dropin
Probability density of the stain profile is
multiplied by:
1. Probability of a dropin (about 0.02)
2. Probability that the dropin is allele 13
3. Probability density of the dropin height
24
Conclusions
1. Model is currently being implemented in LiRa 2.0
2. Method also includes uncertain peak (other artefacts)
3. Extensive validation using hundreds of mixtures of known origin
4. It is a collage of several models. Paper in preparation.
Thank you for your kind attention.
25
References
Gettings et al. (2016) Sequence variation of 22 autosomal STR loci detected by next
generation sequencing, Forensic Sci Int Genet. 21
Graversen & Lauritzen (2015). Computational Aspects of DNA Mixture Analysis – Exact
Inferece Using Auxiliary Variables in a Bayesian Network. Stat Comput. 25, pp 527-541
Puch-Solis (2014). A dropin peak height model. Forensic Sci Int Genet. 11, pp 80-84
Brookes et al. (2012). Characterising stutter in forensic STR multiplexes. Forensic Sci Int
Genet. 6, pp 58-63.
Bright et al. (2014). Modelling PowerPlex Y Stutters and Artifacts. Forensic Sci Int Genet.
11, pp 126-136.
Puch-Solis et al. (2013). Evaluating forensic DNA profiles using peak heights, allowing for
multiple donors, allelic dropout and stutters. Forensic Sci Int Genet. 7, pp 555-563.
26