- Nottingham ePrints

Edwards, Victoria C. and McClure, Patrick and Brown,
Richard J.P. and Thompson, Emma and Irving, William
L. and Ball, Jonathan K. (2014) Use of short-tandem
repeat (STR) fingerprinting to validate sample origins in
hepatitis C virus molecular epidemiology studies.
Journal of General Virology, 95 (1). pp. 66-70. ISSN
1465-2099
Access from the University of Nottingham repository:
http://eprints.nottingham.ac.uk/2390/1/STR_Fingerprinting.pdf
Copyright and reuse:
The Nottingham ePrints service makes this work by researchers of the University of
Nottingham available open access under the following conditions.
This article is made available under the Creative Commons Attribution licence and may be
reused according to the conditions of the licence. For more details see:
http://creativecommons.org/licenses/by/2.5/
A note on versions:
The version presented here may differ from the published version or from the version of
record. If you wish to cite this item you are advised to consult the publisher’s version. Please
see the repository url above for details on accessing the published version and note that
access may require a subscription.
For more information, please contact [email protected]
Journal of General Virology (2014), 95, 66–70
Short
Communication
DOI 10.1099/vir.0.057828-0
Use of short tandem repeat fingerprinting to
validate sample origins in hepatitis C virus
molecular epidemiology studies
Victoria C. Edwards,1,2 C. Patrick McClure,1,2 Richard J. P. Brown,1,2
Emma Thompson,3 William L. Irving1,2 and Jonathan K. Ball1,2
Correspondence
Jonathan K. Ball
[email protected]
1
School of Molecular Medical Sciences, University of Nottingham, Queen’s Medical Centre,
Nottingham, UK
2
Nottingham Digestive Diseases Centre Biomedical Research Unit, University of Nottingham,
Queen’s Medical Centre, Nottingham, UK
3
MRC-University of Glasgow Centre for Virus Research, Garscube Campus, Glasgow, UK
Received 31 July 2013
Accepted 4 October 2013
Sequence analysis is used to define the molecular epidemiology and evolution of the hepatitis C
virus. Whilst most studies have shown that individual patients harbour viruses that are derived
from a limited number of highly related strains, some recent reports have shown that some
patients can be co-infected with very distinct variants whose frequency can fluctuate greatly.
Whilst co-infection with highly divergent strains is possible, an alternative explanation is that such
data represent contamination or sample mix-up. In this study, we have shown that DNA
fingerprinting techniques can accurately assess sample provenance and differentiate between
samples that are truly exhibiting mixed infection from those that harbour distinct virus populations
due to sample mix-up. We have argued that this approach should be adopted routinely in virus
sequence analyses to validate sample provenance.
Due to the largely asymptomatic nature of acute hepatitis C
virus (HCV) infection, identifying individuals during this
phase of the disease is problematic. As a result, studies of
cohorts of such individuals are invaluable to improve our
understanding of virus evolution and its impact on
infection outcome. Most studies have shown that infection
in humans and animal models is established by a limited
number of highly related founder viruses (Brown et al.,
2012; Bull et al., 2011; Li et al., 2012). Post-transmission
there is a genetic bottleneck characterized by outgrowth of
a selected variant (Bull et al., 2011; Wang et al., 2010).
However, at least one study has shown that individuals
presenting with acute infections demonstrating large
fluctuations in viral load are associated with infection with
multiple genetically distinct strains (Smith et al., 2010).
Whilst such a dynamic flux of viral variants could be due to
simultaneous and/or rapid reinfection by distinct viral
variants, it is also possible that this phenomenon could be
due to contamination or sample mix-up. Given the
importance of the studies of virus evolution in early
infection and the need to ensure sample provenance in
such studies, we assessed whether short tandem repeat
(STR) fingerprinting could be used to define the likely
origins of serum samples from two cohorts: one set of
One supplementary table is available with the online version of this
paper.
66
samples from a cohort of HCV/human immunodeficiency
virus-infected men and the other from a cohort of Egyptian
healthcare workers from Egypt for whom sample mix-up
was suspected.
The Egyptian study cohort consisted of 32 subjects
reported to be suffering from acute HCV infection.
Sequential samples were available and these were reported
to have been collected over a 300-day period spanning the
acute phase of infection, including the antibody-negative/
RNA-positive window period. Individual subjects were
designated a three-letter ID and sequential samples
numbered chronologically. A second, smaller, cohort
consisted of two patients (designated UK 1 and UK 2),
each suspected of harbouring distinct genotypes of HCV at
different time points during acute infection. Two sequential samples (designated ‘a’ and ‘b’) taken 1 month apart
were available for each patient.
Nucleic acids (RNA and DNA) were extracted from study
samples and control samples using a QIAamp MinElute
Virus Spin Kit (Qiagen). For amplification of the 59 noncoding region (NCR), cDNA was generated with random
hexamers and 200 U Moloney murine leukaemia virus
(MMLV) reverse transcriptase (Fermentas) according to
the manufacturer’s instructions. The viral load of the study
samples was determined by quantitative PCR (qPCR) of
the 59 NCR using a gene-specific primer and Scorpion
057828 G 2014 SGM Printed in Great Britain
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/).
DNA fingerprinting
probe. Input cDNA was quantified on an Mx4000 Multiplex
Quantitative PCR System (Agilent Technologies) alongside
standard controls and results converted to genome copies
per millilitre of serum.
For amplification of the first hypervariable region (HVR1)
of the HCV E2 glycoprotein, cDNA was generated from
control samples using the genotype 4-specific primer
OAS4M (59-CAC CAG CGG CTG AAG CAG CAT TGA39) or the genotype 1-specific primer OAS1a (59- GGG
ATG CTG CAT TGA GTA-39) with 15 U ThermoScript
reverse transcriptase (Invitrogen) and 8.5 ml RNA according to the manufacturer’s instructions. For the study
samples, cDNA was generated with random hexamers and
200 U MMLV reverse transcriptase according to the
manufacturer’s instructions. A 270 bp fragment corresponding to HVR1 of E2 and the E1 and E2 flanking
regions was amplified in a nested PCR using genotypespecific primers. For genotype 1: first round, EOS (59-GGA
CGG GGT AAA CTA TGC AAC AGG-39) and OAS1a;
second round, 170gt1 (59-CAC CAT GGG TTG CTC TTT
CTC TAT C-39) and IASGT1 (59-TTA CGC CTC CGC
TTG GGA TAT GAG TAA CAT CAT-39). For genotype 4:
first round, EOS and E10A (59-TCA TTG CAG TTC AGG
GCA GTC CTG TTG ATG-39); second round, EIIS_MOD
(59-TGG GAT ATG ATG ATG AAC TGG-39) and EIIA
(59-CTG TTG ATG TGC CAG CTG CCA-39).
PCR-positive samples were purified and sequenced. All
available sequences were aligned using MEGA4 software
(Tamura et al., 2007) and the evolutionary relationship
inferred using the neighbour-joining method (Saitou &
Nei, 1987).
STR analysis was carried out on serum-extracted nucleic
acid samples using three separate loci. Each STR was
amplified in a separate PCR from 5 ml RNA using 0.3 U
HotStarTaq DNA polymerase (Qiagen) and gene-specific
primers: TH01 F and TH01 R, vWA F and vWA R, and
D21S11 F and D21S11 R (Opel et al., 2007), according to the
manufacturer’s instructions. The sense primer in each pair
was conjugated to a different fluorophore. PCR products
were mixed and run alongside the GeneScan 500 LIZ Size
Standard (Applied Biosystems) on an ABI Prism 3130
fluorescent DNA analyser (PerkinElmer). Peak traces were
analysed using Peak Scanner 1.0 software (Applied
Biosystems) to determine the size (in base pairs) of the
allele(s) at each locus within all samples. Cluster analysis of
STR allele sizes was carried out using GeneMarker v2.2.2
software (Softgenetics) to generate a distance matrix based
on the Euclidean distance between single samples.
The Egyptian study cohort and UK cohort were sampled
longitudinally during early acute infection. Several patients
in the Egyptian study cohort were found to exhibit ‘yo-yo’
viral load dynamics (Table S1, available in JGV Online),
which has been linked to fluctuations in the complexity of
the virus population (Smith et al., 2010). Phylogenetic
analysis of HVR1 sequences isolated from the study cohort
alongside reference genotype sequences showed that the
http://vir.sgmjournals.org
majority of patients were infected with a genotype 4 virus
(Fig. 1a). A small number of viral sequences clustered most
closely to genotype 1 reference strains. HCV genotype 4 is
the most prevalent genotype within the Egyptian population
(Dusheiko et al., 1994; Ramia & Eid-Fares, 2006). Patient
UK 2 was co-infected with genotype 1a and genotype 4d
viruses, as had been shown previously (unpublished).
However, patient UK 1 contained only genotype 1 viral
sequences (Fig. 1a), although previous analyses of these
samples showed co-infection with genotype 1a and genotype
4d viruses (unpublished).
HVR1 sequences derived from the Egyptian study and UK
cohort samples were subjected to further phylogenetic
analysis to determine epidemiological relationships (Fig.
1b). In the absence of mixed infection, viral sequences that
are epidemiologically linked, i.e. derived from a single
patient during the acute phase of infection, would be
expected to cluster closely on a phylogenetic tree.
Surprisingly, the majority of Egyptian patient samples
were distributed throughout the tree. Sequential samples
from only two of the patients, HUO and XNZ, harboured
HVR1 sequences that clustered on the tree (indicated by an
asterisk, Fig. 1b). Similarly, virus sequences derived from
the UK samples UK 1a and UK 1b, as well as samples UK
2a and UK 2b, did not cluster (Fig. 1b). In a recent study,
mixed acute infections were described in which multiple
infections with up to three subtypes was observed. These
infections were also characterized by highly divergent
clades within a single subtype and multiple switches in the
dominant variant present at a specific time point (Smith
et al., 2010). Therefore, to ascertain whether our data could
be explained by mixed acute infection we further
characterized the samples using STR analysis of the
genomic DNA present in the sera.
Three discrete STR loci were analysed to define relatedness.
This technique is employed commonly in DNA fingerprint
analysis for forensic purposes (Butler, 2006; Butler et al.,
2003) and the loci selected show good discriminatory
power in an Egyptian population (Ahmed et al., 2001;
Klintschar et al., 1999). Samples derived from the same
individual should have identical patterns of STR locus size
and therefore cluster together; however, this was not true
of this cohort (Fig. 2). The majority of study samples did
not cluster according to patient ID and therefore were not
derived from a single individual. Importantly, the UK
cohort samples did cluster based on STR analysis,
demonstrating that each pair of samples was derived from
a single patient and confirming the existence of mixed
infection. Interestingly, the two Egyptian study samples in
which virus clustered closely (HUO and XNZ, Fig. 1b) also
clustered based on STR analysis (Fig. 2), confirming the
epidemiological relatedness of these samples.
Further analysis of clustering data (Figs 1a and 2) identified
some common groupings in both trees. These included
cluster 1 (ASP3, FPK2), cluster 2 (CZR1, IKW2), cluster 3
(AMG3, MHL2) and cluster 4 (CZR2, FZG2). The
67
V. C. Edwards and others
(a)
(b)
ASP 1
AWM 3
0.1
UK 2b
QAF
Z1
G2
3
)
)
17 )
3 89 63 362
2 7
G
FZ a (D 17 96
3 a (D (M
3
1b
79
85
97
99 AMG 3
MHL 2
VHI 1
CLH
3
HD
SAH P 1
1
CZ
R1
CL
R
7
R
85 1 H P H 2 Z 1
UO CQ
1 2
MZ
AM
1b
(D
(U 10
01 93
RR 214 4)
RF Z 2 )
E
AXJ 3 79
HDP 2
2
FPK
XNZ 3 1
XNZ 1
98
JZB 2
UK 1a
1a (M62321)
UK 1b 2a
UK ) 99
2
72
519
U15 RP 1
E
(
3
S
1a
B
JZ )
0
87 2)
9
3
8
43
J
(F
62 5)
4I
J4 243
F
(
6
4g FJ4
(
b
4
2
O
HU K 2 SP 3
FP A W 2
IK 1
X
DG 3
IKW 1
IKW
1
MTK
AMG 1
MHL 3
4n (FJ462441)
4d (EU392172)
97
4k
UK 2b
4d (FJ
IDF
46243
3
7)
99 AXJ 1
95 SAH
C 2
FZ ZR 2
G
DG ID 2
X F1
2
1)
43 9)
62 86
9
J4
(F J83 )
4p t (F 434
4 62 )
6
4
)
(FJ 43
3
2
4q 46
498
Z2
(FJ
MZ (AY73
82)
4c
2b
349
AY7
2b (
80
4k
(E
(E U3
U3 92
9 1
4k
(F 217 73)
4k
J
(EU 462 1)
4
3
4r 921 38)
(FJ
69
4
)
4f (
EU 6243
392
9
174 )
AWM )
ASP 3 77
1
F
4m (FJ46 ZG 1
2433)
4o (FJ462440)
5 (NC009826)
2)
6b (D8426
83)
0
2
1
6a (Y
1b
0.1
FZG 1
IDF 3
RRZ 1
CLH 2
DGX 1
IKW 3
IKW 1
MTK 1
AXJ 1
SAH 2
AMG 1
DGX 2
HUO 1
HUO 2 *
PCQ 2
MHL 3
ASP 3
1
FPK 2
CZR 1
2
IKW 2
SAH 1
HDP 1
CLH 3
VHI 1
AMG 3
MHL 2 3
MZZ 2
IDF 1
CZR 2
4
FZG 2
UK 1a
JZB 3
SRP 1
UK 2a
UK 1b
FPK 1
HDP 2
JZB 2
XNZ 1
XNZ 3 *
AXJ 2
RFE 3
RRZ 2
FZG 3
AMG 2
MZZ 1
QAF 3
Fig. 1. Genotyping viral isolates based on the 270 bp region encompassing HVR1 of E2 and the E1 and E2 flanking regions.
Evolutionary history was inferred using the neighbour-joining method and evolutionary distances computed using the maximumcomposite-likelihood method. Percentage bootstrap support from 500 replicates is shown (only values greater than 70 % are
shown). (a) Genotype was assessed by alignment to reference genotype sequences, highlighted by coloured circles: green,
genotype 1; yellow, genotype 2; pink, genotype 3; dark blue, genotype 4; light blue, genotype 5; red, genotype 6. Samples UK
1a, UK 1b, UK 2a and UK 2b are included. Study samples are listed by their three-letter/single-digit ID code. GenBank
accession numbers are given in parentheses. (b) Cluster analysis of patient-derived viral isolates and control samples based on
HVR1 sequence alignment. Study samples are highlighted with coloured shapes according to patient ID to aid the identification
of patient clusters. Patients for whom only one sample contained nucleic acid have been left blank. UK samples are highlighted
by unfilled squares (patient UK 1) or circles (patient UK 2). Samples that match according to patient ID are highlighted by an
asterisk. Clusters of interest are numbered 1–4.
clustering of these samples based on HVR1 sequences and
STR loci suggests that they were linked epidemiologically,
i.e. derived from a common source; however, it would be
necessary to analyse more STR loci in order to confirm that
samples were obtained from a single individual. The use of
three STR loci is sufficient to determine that two or more
samples are unrelated, but greater discriminating power
would be required to definitively prove a relationship
between samples.
68
This analysis was concerned with identifying the relatedness of sequential samples and did not look specifically at
within-sample contamination. However, previous analyses
using these STR loci have shown that as little as 100 pg
(~30 copies) template DNA can be amplified using this
method (Opel et al., 2007). Furthermore, mixing experiments have demonstrated that a minor contaminating
DNA template, present at only 5–10 % of the major DNA
template, can be detected and distinguished successfully
Journal of General Virology 95
DNA fingerprinting
Euclidean
distance
0.6
0.5
0.4
0.3
0.2
0.1
0.0
UK 2a
UK 2b *
HUO 1
HUO 2 *
PCQ 2
ASP 3 1
FPK 2
CZR 1
2
IKW 2
CZR 2
4
FZG 2
IDF 2
FPK 1
FZG 1
IKW 1
JZB 2
QAF 3
RFE 3
RRZ 2
HDP 2
AMG 1
MZZ 2
DGX 1
JZB 3
XNZ 1
XNZ 3 *
AMG 2
AMG 3 3
MHL 2
VHI 3
SAH 3
UK 1a
UK 1b *
MHL 3
MHL 1
MTK 1
AWM 3
MZZ 1
CLH 2
RRZ 1
DGX 2
HDP 1
SAH 1
AXJ 2
IKW 3
AXJ 3
SRP 1
FZG 3
IDF 3
CLH 3
Fig. 2. Genotyping patient serum samples using cluster analysis of STR loci. Three STR loci were amplified from serumextracted DNA samples and STR size determined by examination of peak traces on Peak Scanner 1.0 software. Cluster analysis
was carried out using GeneMarker v2.4.0 software to generate a distance matrix based on the Euclidean distance between
single samples. Study samples are highlighted with coloured shapes according to patient ID to aid the identification of patient
clusters. Patients for whom only one sample contained nucleic acid have been left blank. UK cohort samples are highlighted by
unfilled squares (patient UK 1) or circles (patient UK 2). The same colours have been used in Figs 1(b) and 2 to aid comparison
of the data. Samples that match according to patient ID are highlighted by an asterisk. Clusters of interest are numbered 1–4.
from the major DNA template (Gross et al., 2008; Opel
et al., 2007). Only two of the Egyptian study samples were
found to contain three peaks for an individual locus (data
not shown), suggesting that there may have been
contaminating DNA within these samples. All other
samples, however, appeared to contain a single DNA
template.
http://vir.sgmjournals.org
This study provides an important proof-of-principle that
STR fingerprinting can be applied to patient-derived
samples to test their provenance. This protocol enabled
us to confirm the presence of mixed acute infection in
samples obtained from our small UK cohort. Patient UK 1
was infected with two highly divergent genotype 1a strains
and patient UK 2 switched from a dominant genotype 1a to
69
V. C. Edwards and others
a 4d virus. Most importantly, we were able to demonstrate
that an extensive sample mix-up had occurred in the
Egyptian patient cohort, resulting in the appearance of
fluctuating viral load and mixed acute infection in these
samples.
In summary, this study has shown that STR analysis can
determine the provenance of serum samples used for HCV
molecular epidemiology studies.
Acknowledgements
This work was supported by the Medical Research Council (grant
G0801169), EU 305600 (HepaMAb) and through a Medical Research
Council PhD studentship. We thank Sana M. Kamal (Department of
Gastroenterology and Liver Disease, Ain Shams Faculty of Medicine,
Cairo, Egypt), for provision of samples, and the independent
reviewers, for their helpful suggestions.
References
Ahmed, A., Linacre, A. M. T., Mohammed, A. A. A., Vanezis, P. &
Goodwin, W. (2001). STR population data for 10 STR loci including
the GenePrint PowerPlex 1.2 kit from El-Minia (Central Egypt).
Forensic Sci Int 117, 233–234.
Brown, R. J., Hudson, N., Wilson, G., Rehman, S. U., Jabbari, S., Hu,
K., Tarr, A. W., Borrow, P., Joyce, M. & other authors (2012). Hepatitis
C virus envelope glycoprotein fitness defines virus population
composition following transmission to a new host. J Virol 86,
11956–11966.
Bull, R. A., Luciani, F., McElroy, K., Gaudieri, S., Pham, S. T., Chopra,
A., Cameron, B., Maher, L., Dore, G. J. & other authors (2011).
Sequential bottlenecks drive viral evolution in early acute hepatitis C
virus infection. PLoS Pathog 7, e1002243.
Butler, J. M., Shen, Y. & McCord, B. R. (2003). The development of
reduced size STR amplicons as tools for analysis of degraded DNA.
J Forensic Sci 48, 1054–1064.
Dusheiko, G., Schmilovitz-Weiss, H., Brown, D., McOmish, F., Yap,
P.-L., Sherlock, S., McIntyre, N. & Simmonds, P. (1994). Hepatitis C
virus genotypes: an investigation of type-specific differences in
geographic origin and disease. Hepatology 19, 13–18.
Gross, A. M., Liberty, A. A., Ulland, M. M. & Kuriger, J. K. (2008).
Internal validation of the AmpFlSTR Yfiler amplification kit for use in
forensic casework. J Forensic Sci 53, 125–134.
Klintschar, M., al-Hammadi, N. & Reichenpfader, B. (1999).
Population genetic studies on the tetrameric short tandem repeat
loci D3S1358, VWA, FGA, D8S1179, D21S11, D18S51, D5S818,
D13S317 and D7S820 in Egypt. Forensic Sci Int 104, 23–31.
Li, H., Stoddard, M. B., Wang, S., Blair, L. M., Giorgi, E. E., Parrish,
E. H., Learn, G. H., Hraber, P., Goepfert, P. A. & other authors (2012).
Elucidation of hepatitis C virus transmission and early diversification
by single genome sequencing. PLoS Pathog 8, e1002880.
Opel, K. L., Chung, D. T., Drábek, J., Butler, J. M. & McCord, B. R.
(2007). Developmental validation of reduced-size STR Miniplex
primer sets. J Forensic Sci 52, 1263–1271.
Ramia, S. & Eid-Fares, J. (2006). Distribution of hepatitis C virus
genotypes in the Middle East. Int J Infect Dis 10, 272–277.
Saitou, N. & Nei, M. (1987). The neighbor-joining method: a new
method for reconstructing phylogenetic trees. Mol Biol Evol 4, 406–
425.
Smith, J. A., Aberle, J. H., Fleming, V. M., Ferenci, P., Thomson, E. C.,
Karayiannis, P., McLean, A. R., Holzmann, H. & Klenerman, P.
(2010). Dynamic coinfection with multiple viral subtypes in acute
hepatitis C. J Infect Dis 202, 1770–1779.
Tamura, K., Dudley, J., Nei, M. & Kumar, S. (2007). MEGA4:
Molecular Evolutionary Genetics Analysis (MEGA) software version
4.0. Mol Biol Evol 24, 1596–1599.
Butler, J. M. (2006). Genetics and genomics of core short tandem
Wang, G. P., Sherrill-Mix, S. A., Chang, K.-M., Quince, C. & Bushman,
F. D. (2010). Hepatitis C virus transmission bottlenecks analyzed by
repeat loci used in human identity testing. J Forensic Sci 51, 253–265.
deep sequencing. J Virol 84, 6218–6228.
70
Journal of General Virology 95