Edwards, Victoria C. and McClure, Patrick and Brown, Richard J.P. and Thompson, Emma and Irving, William L. and Ball, Jonathan K. (2014) Use of short-tandem repeat (STR) fingerprinting to validate sample origins in hepatitis C virus molecular epidemiology studies. Journal of General Virology, 95 (1). pp. 66-70. ISSN 1465-2099 Access from the University of Nottingham repository: http://eprints.nottingham.ac.uk/2390/1/STR_Fingerprinting.pdf Copyright and reuse: The Nottingham ePrints service makes this work by researchers of the University of Nottingham available open access under the following conditions. This article is made available under the Creative Commons Attribution licence and may be reused according to the conditions of the licence. For more details see: http://creativecommons.org/licenses/by/2.5/ A note on versions: The version presented here may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the repository url above for details on accessing the published version and note that access may require a subscription. For more information, please contact [email protected] Journal of General Virology (2014), 95, 66–70 Short Communication DOI 10.1099/vir.0.057828-0 Use of short tandem repeat fingerprinting to validate sample origins in hepatitis C virus molecular epidemiology studies Victoria C. Edwards,1,2 C. Patrick McClure,1,2 Richard J. P. Brown,1,2 Emma Thompson,3 William L. Irving1,2 and Jonathan K. Ball1,2 Correspondence Jonathan K. Ball [email protected] 1 School of Molecular Medical Sciences, University of Nottingham, Queen’s Medical Centre, Nottingham, UK 2 Nottingham Digestive Diseases Centre Biomedical Research Unit, University of Nottingham, Queen’s Medical Centre, Nottingham, UK 3 MRC-University of Glasgow Centre for Virus Research, Garscube Campus, Glasgow, UK Received 31 July 2013 Accepted 4 October 2013 Sequence analysis is used to define the molecular epidemiology and evolution of the hepatitis C virus. Whilst most studies have shown that individual patients harbour viruses that are derived from a limited number of highly related strains, some recent reports have shown that some patients can be co-infected with very distinct variants whose frequency can fluctuate greatly. Whilst co-infection with highly divergent strains is possible, an alternative explanation is that such data represent contamination or sample mix-up. In this study, we have shown that DNA fingerprinting techniques can accurately assess sample provenance and differentiate between samples that are truly exhibiting mixed infection from those that harbour distinct virus populations due to sample mix-up. We have argued that this approach should be adopted routinely in virus sequence analyses to validate sample provenance. Due to the largely asymptomatic nature of acute hepatitis C virus (HCV) infection, identifying individuals during this phase of the disease is problematic. As a result, studies of cohorts of such individuals are invaluable to improve our understanding of virus evolution and its impact on infection outcome. Most studies have shown that infection in humans and animal models is established by a limited number of highly related founder viruses (Brown et al., 2012; Bull et al., 2011; Li et al., 2012). Post-transmission there is a genetic bottleneck characterized by outgrowth of a selected variant (Bull et al., 2011; Wang et al., 2010). However, at least one study has shown that individuals presenting with acute infections demonstrating large fluctuations in viral load are associated with infection with multiple genetically distinct strains (Smith et al., 2010). Whilst such a dynamic flux of viral variants could be due to simultaneous and/or rapid reinfection by distinct viral variants, it is also possible that this phenomenon could be due to contamination or sample mix-up. Given the importance of the studies of virus evolution in early infection and the need to ensure sample provenance in such studies, we assessed whether short tandem repeat (STR) fingerprinting could be used to define the likely origins of serum samples from two cohorts: one set of One supplementary table is available with the online version of this paper. 66 samples from a cohort of HCV/human immunodeficiency virus-infected men and the other from a cohort of Egyptian healthcare workers from Egypt for whom sample mix-up was suspected. The Egyptian study cohort consisted of 32 subjects reported to be suffering from acute HCV infection. Sequential samples were available and these were reported to have been collected over a 300-day period spanning the acute phase of infection, including the antibody-negative/ RNA-positive window period. Individual subjects were designated a three-letter ID and sequential samples numbered chronologically. A second, smaller, cohort consisted of two patients (designated UK 1 and UK 2), each suspected of harbouring distinct genotypes of HCV at different time points during acute infection. Two sequential samples (designated ‘a’ and ‘b’) taken 1 month apart were available for each patient. Nucleic acids (RNA and DNA) were extracted from study samples and control samples using a QIAamp MinElute Virus Spin Kit (Qiagen). For amplification of the 59 noncoding region (NCR), cDNA was generated with random hexamers and 200 U Moloney murine leukaemia virus (MMLV) reverse transcriptase (Fermentas) according to the manufacturer’s instructions. The viral load of the study samples was determined by quantitative PCR (qPCR) of the 59 NCR using a gene-specific primer and Scorpion 057828 G 2014 SGM Printed in Great Britain This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/). DNA fingerprinting probe. Input cDNA was quantified on an Mx4000 Multiplex Quantitative PCR System (Agilent Technologies) alongside standard controls and results converted to genome copies per millilitre of serum. For amplification of the first hypervariable region (HVR1) of the HCV E2 glycoprotein, cDNA was generated from control samples using the genotype 4-specific primer OAS4M (59-CAC CAG CGG CTG AAG CAG CAT TGA39) or the genotype 1-specific primer OAS1a (59- GGG ATG CTG CAT TGA GTA-39) with 15 U ThermoScript reverse transcriptase (Invitrogen) and 8.5 ml RNA according to the manufacturer’s instructions. For the study samples, cDNA was generated with random hexamers and 200 U MMLV reverse transcriptase according to the manufacturer’s instructions. A 270 bp fragment corresponding to HVR1 of E2 and the E1 and E2 flanking regions was amplified in a nested PCR using genotypespecific primers. For genotype 1: first round, EOS (59-GGA CGG GGT AAA CTA TGC AAC AGG-39) and OAS1a; second round, 170gt1 (59-CAC CAT GGG TTG CTC TTT CTC TAT C-39) and IASGT1 (59-TTA CGC CTC CGC TTG GGA TAT GAG TAA CAT CAT-39). For genotype 4: first round, EOS and E10A (59-TCA TTG CAG TTC AGG GCA GTC CTG TTG ATG-39); second round, EIIS_MOD (59-TGG GAT ATG ATG ATG AAC TGG-39) and EIIA (59-CTG TTG ATG TGC CAG CTG CCA-39). PCR-positive samples were purified and sequenced. All available sequences were aligned using MEGA4 software (Tamura et al., 2007) and the evolutionary relationship inferred using the neighbour-joining method (Saitou & Nei, 1987). STR analysis was carried out on serum-extracted nucleic acid samples using three separate loci. Each STR was amplified in a separate PCR from 5 ml RNA using 0.3 U HotStarTaq DNA polymerase (Qiagen) and gene-specific primers: TH01 F and TH01 R, vWA F and vWA R, and D21S11 F and D21S11 R (Opel et al., 2007), according to the manufacturer’s instructions. The sense primer in each pair was conjugated to a different fluorophore. PCR products were mixed and run alongside the GeneScan 500 LIZ Size Standard (Applied Biosystems) on an ABI Prism 3130 fluorescent DNA analyser (PerkinElmer). Peak traces were analysed using Peak Scanner 1.0 software (Applied Biosystems) to determine the size (in base pairs) of the allele(s) at each locus within all samples. Cluster analysis of STR allele sizes was carried out using GeneMarker v2.2.2 software (Softgenetics) to generate a distance matrix based on the Euclidean distance between single samples. The Egyptian study cohort and UK cohort were sampled longitudinally during early acute infection. Several patients in the Egyptian study cohort were found to exhibit ‘yo-yo’ viral load dynamics (Table S1, available in JGV Online), which has been linked to fluctuations in the complexity of the virus population (Smith et al., 2010). Phylogenetic analysis of HVR1 sequences isolated from the study cohort alongside reference genotype sequences showed that the http://vir.sgmjournals.org majority of patients were infected with a genotype 4 virus (Fig. 1a). A small number of viral sequences clustered most closely to genotype 1 reference strains. HCV genotype 4 is the most prevalent genotype within the Egyptian population (Dusheiko et al., 1994; Ramia & Eid-Fares, 2006). Patient UK 2 was co-infected with genotype 1a and genotype 4d viruses, as had been shown previously (unpublished). However, patient UK 1 contained only genotype 1 viral sequences (Fig. 1a), although previous analyses of these samples showed co-infection with genotype 1a and genotype 4d viruses (unpublished). HVR1 sequences derived from the Egyptian study and UK cohort samples were subjected to further phylogenetic analysis to determine epidemiological relationships (Fig. 1b). In the absence of mixed infection, viral sequences that are epidemiologically linked, i.e. derived from a single patient during the acute phase of infection, would be expected to cluster closely on a phylogenetic tree. Surprisingly, the majority of Egyptian patient samples were distributed throughout the tree. Sequential samples from only two of the patients, HUO and XNZ, harboured HVR1 sequences that clustered on the tree (indicated by an asterisk, Fig. 1b). Similarly, virus sequences derived from the UK samples UK 1a and UK 1b, as well as samples UK 2a and UK 2b, did not cluster (Fig. 1b). In a recent study, mixed acute infections were described in which multiple infections with up to three subtypes was observed. These infections were also characterized by highly divergent clades within a single subtype and multiple switches in the dominant variant present at a specific time point (Smith et al., 2010). Therefore, to ascertain whether our data could be explained by mixed acute infection we further characterized the samples using STR analysis of the genomic DNA present in the sera. Three discrete STR loci were analysed to define relatedness. This technique is employed commonly in DNA fingerprint analysis for forensic purposes (Butler, 2006; Butler et al., 2003) and the loci selected show good discriminatory power in an Egyptian population (Ahmed et al., 2001; Klintschar et al., 1999). Samples derived from the same individual should have identical patterns of STR locus size and therefore cluster together; however, this was not true of this cohort (Fig. 2). The majority of study samples did not cluster according to patient ID and therefore were not derived from a single individual. Importantly, the UK cohort samples did cluster based on STR analysis, demonstrating that each pair of samples was derived from a single patient and confirming the existence of mixed infection. Interestingly, the two Egyptian study samples in which virus clustered closely (HUO and XNZ, Fig. 1b) also clustered based on STR analysis (Fig. 2), confirming the epidemiological relatedness of these samples. Further analysis of clustering data (Figs 1a and 2) identified some common groupings in both trees. These included cluster 1 (ASP3, FPK2), cluster 2 (CZR1, IKW2), cluster 3 (AMG3, MHL2) and cluster 4 (CZR2, FZG2). The 67 V. C. Edwards and others (a) (b) ASP 1 AWM 3 0.1 UK 2b QAF Z1 G2 3 ) ) 17 ) 3 89 63 362 2 7 G FZ a (D 17 96 3 a (D (M 3 1b 79 85 97 99 AMG 3 MHL 2 VHI 1 CLH 3 HD SAH P 1 1 CZ R1 CL R 7 R 85 1 H P H 2 Z 1 UO CQ 1 2 MZ AM 1b (D (U 10 01 93 RR 214 4) RF Z 2 ) E AXJ 3 79 HDP 2 2 FPK XNZ 3 1 XNZ 1 98 JZB 2 UK 1a 1a (M62321) UK 1b 2a UK ) 99 2 72 519 U15 RP 1 E ( 3 S 1a B JZ ) 0 87 2) 9 3 8 43 J (F 62 5) 4I J4 243 F ( 6 4g FJ4 ( b 4 2 O HU K 2 SP 3 FP A W 2 IK 1 X DG 3 IKW 1 IKW 1 MTK AMG 1 MHL 3 4n (FJ462441) 4d (EU392172) 97 4k UK 2b 4d (FJ IDF 46243 3 7) 99 AXJ 1 95 SAH C 2 FZ ZR 2 G DG ID 2 X F1 2 1) 43 9) 62 86 9 J4 (F J83 ) 4p t (F 434 4 62 ) 6 4 ) (FJ 43 3 2 4q 46 498 Z2 (FJ MZ (AY73 82) 4c 2b 349 AY7 2b ( 80 4k (E (E U3 U3 92 9 1 4k (F 217 73) 4k J (EU 462 1) 4 3 4r 921 38) (FJ 69 4 ) 4f ( EU 6243 392 9 174 ) AWM ) ASP 3 77 1 F 4m (FJ46 ZG 1 2433) 4o (FJ462440) 5 (NC009826) 2) 6b (D8426 83) 0 2 1 6a (Y 1b 0.1 FZG 1 IDF 3 RRZ 1 CLH 2 DGX 1 IKW 3 IKW 1 MTK 1 AXJ 1 SAH 2 AMG 1 DGX 2 HUO 1 HUO 2 * PCQ 2 MHL 3 ASP 3 1 FPK 2 CZR 1 2 IKW 2 SAH 1 HDP 1 CLH 3 VHI 1 AMG 3 MHL 2 3 MZZ 2 IDF 1 CZR 2 4 FZG 2 UK 1a JZB 3 SRP 1 UK 2a UK 1b FPK 1 HDP 2 JZB 2 XNZ 1 XNZ 3 * AXJ 2 RFE 3 RRZ 2 FZG 3 AMG 2 MZZ 1 QAF 3 Fig. 1. Genotyping viral isolates based on the 270 bp region encompassing HVR1 of E2 and the E1 and E2 flanking regions. Evolutionary history was inferred using the neighbour-joining method and evolutionary distances computed using the maximumcomposite-likelihood method. Percentage bootstrap support from 500 replicates is shown (only values greater than 70 % are shown). (a) Genotype was assessed by alignment to reference genotype sequences, highlighted by coloured circles: green, genotype 1; yellow, genotype 2; pink, genotype 3; dark blue, genotype 4; light blue, genotype 5; red, genotype 6. Samples UK 1a, UK 1b, UK 2a and UK 2b are included. Study samples are listed by their three-letter/single-digit ID code. GenBank accession numbers are given in parentheses. (b) Cluster analysis of patient-derived viral isolates and control samples based on HVR1 sequence alignment. Study samples are highlighted with coloured shapes according to patient ID to aid the identification of patient clusters. Patients for whom only one sample contained nucleic acid have been left blank. UK samples are highlighted by unfilled squares (patient UK 1) or circles (patient UK 2). Samples that match according to patient ID are highlighted by an asterisk. Clusters of interest are numbered 1–4. clustering of these samples based on HVR1 sequences and STR loci suggests that they were linked epidemiologically, i.e. derived from a common source; however, it would be necessary to analyse more STR loci in order to confirm that samples were obtained from a single individual. The use of three STR loci is sufficient to determine that two or more samples are unrelated, but greater discriminating power would be required to definitively prove a relationship between samples. 68 This analysis was concerned with identifying the relatedness of sequential samples and did not look specifically at within-sample contamination. However, previous analyses using these STR loci have shown that as little as 100 pg (~30 copies) template DNA can be amplified using this method (Opel et al., 2007). Furthermore, mixing experiments have demonstrated that a minor contaminating DNA template, present at only 5–10 % of the major DNA template, can be detected and distinguished successfully Journal of General Virology 95 DNA fingerprinting Euclidean distance 0.6 0.5 0.4 0.3 0.2 0.1 0.0 UK 2a UK 2b * HUO 1 HUO 2 * PCQ 2 ASP 3 1 FPK 2 CZR 1 2 IKW 2 CZR 2 4 FZG 2 IDF 2 FPK 1 FZG 1 IKW 1 JZB 2 QAF 3 RFE 3 RRZ 2 HDP 2 AMG 1 MZZ 2 DGX 1 JZB 3 XNZ 1 XNZ 3 * AMG 2 AMG 3 3 MHL 2 VHI 3 SAH 3 UK 1a UK 1b * MHL 3 MHL 1 MTK 1 AWM 3 MZZ 1 CLH 2 RRZ 1 DGX 2 HDP 1 SAH 1 AXJ 2 IKW 3 AXJ 3 SRP 1 FZG 3 IDF 3 CLH 3 Fig. 2. Genotyping patient serum samples using cluster analysis of STR loci. Three STR loci were amplified from serumextracted DNA samples and STR size determined by examination of peak traces on Peak Scanner 1.0 software. Cluster analysis was carried out using GeneMarker v2.4.0 software to generate a distance matrix based on the Euclidean distance between single samples. Study samples are highlighted with coloured shapes according to patient ID to aid the identification of patient clusters. Patients for whom only one sample contained nucleic acid have been left blank. UK cohort samples are highlighted by unfilled squares (patient UK 1) or circles (patient UK 2). The same colours have been used in Figs 1(b) and 2 to aid comparison of the data. Samples that match according to patient ID are highlighted by an asterisk. Clusters of interest are numbered 1–4. from the major DNA template (Gross et al., 2008; Opel et al., 2007). Only two of the Egyptian study samples were found to contain three peaks for an individual locus (data not shown), suggesting that there may have been contaminating DNA within these samples. All other samples, however, appeared to contain a single DNA template. http://vir.sgmjournals.org This study provides an important proof-of-principle that STR fingerprinting can be applied to patient-derived samples to test their provenance. This protocol enabled us to confirm the presence of mixed acute infection in samples obtained from our small UK cohort. Patient UK 1 was infected with two highly divergent genotype 1a strains and patient UK 2 switched from a dominant genotype 1a to 69 V. C. Edwards and others a 4d virus. Most importantly, we were able to demonstrate that an extensive sample mix-up had occurred in the Egyptian patient cohort, resulting in the appearance of fluctuating viral load and mixed acute infection in these samples. In summary, this study has shown that STR analysis can determine the provenance of serum samples used for HCV molecular epidemiology studies. Acknowledgements This work was supported by the Medical Research Council (grant G0801169), EU 305600 (HepaMAb) and through a Medical Research Council PhD studentship. We thank Sana M. Kamal (Department of Gastroenterology and Liver Disease, Ain Shams Faculty of Medicine, Cairo, Egypt), for provision of samples, and the independent reviewers, for their helpful suggestions. References Ahmed, A., Linacre, A. M. T., Mohammed, A. A. A., Vanezis, P. & Goodwin, W. (2001). STR population data for 10 STR loci including the GenePrint PowerPlex 1.2 kit from El-Minia (Central Egypt). Forensic Sci Int 117, 233–234. Brown, R. J., Hudson, N., Wilson, G., Rehman, S. U., Jabbari, S., Hu, K., Tarr, A. W., Borrow, P., Joyce, M. & other authors (2012). Hepatitis C virus envelope glycoprotein fitness defines virus population composition following transmission to a new host. J Virol 86, 11956–11966. Bull, R. A., Luciani, F., McElroy, K., Gaudieri, S., Pham, S. T., Chopra, A., Cameron, B., Maher, L., Dore, G. J. & other authors (2011). Sequential bottlenecks drive viral evolution in early acute hepatitis C virus infection. PLoS Pathog 7, e1002243. Butler, J. M., Shen, Y. & McCord, B. R. (2003). The development of reduced size STR amplicons as tools for analysis of degraded DNA. J Forensic Sci 48, 1054–1064. Dusheiko, G., Schmilovitz-Weiss, H., Brown, D., McOmish, F., Yap, P.-L., Sherlock, S., McIntyre, N. & Simmonds, P. (1994). Hepatitis C virus genotypes: an investigation of type-specific differences in geographic origin and disease. Hepatology 19, 13–18. Gross, A. M., Liberty, A. A., Ulland, M. M. & Kuriger, J. K. (2008). Internal validation of the AmpFlSTR Yfiler amplification kit for use in forensic casework. J Forensic Sci 53, 125–134. Klintschar, M., al-Hammadi, N. & Reichenpfader, B. (1999). Population genetic studies on the tetrameric short tandem repeat loci D3S1358, VWA, FGA, D8S1179, D21S11, D18S51, D5S818, D13S317 and D7S820 in Egypt. Forensic Sci Int 104, 23–31. Li, H., Stoddard, M. B., Wang, S., Blair, L. M., Giorgi, E. E., Parrish, E. H., Learn, G. H., Hraber, P., Goepfert, P. A. & other authors (2012). Elucidation of hepatitis C virus transmission and early diversification by single genome sequencing. PLoS Pathog 8, e1002880. Opel, K. L., Chung, D. T., Drábek, J., Butler, J. M. & McCord, B. R. (2007). Developmental validation of reduced-size STR Miniplex primer sets. J Forensic Sci 52, 1263–1271. Ramia, S. & Eid-Fares, J. (2006). Distribution of hepatitis C virus genotypes in the Middle East. Int J Infect Dis 10, 272–277. Saitou, N. & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4, 406– 425. Smith, J. A., Aberle, J. H., Fleming, V. M., Ferenci, P., Thomson, E. C., Karayiannis, P., McLean, A. R., Holzmann, H. & Klenerman, P. (2010). Dynamic coinfection with multiple viral subtypes in acute hepatitis C. J Infect Dis 202, 1770–1779. Tamura, K., Dudley, J., Nei, M. & Kumar, S. (2007). MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24, 1596–1599. Butler, J. M. (2006). Genetics and genomics of core short tandem Wang, G. P., Sherrill-Mix, S. A., Chang, K.-M., Quince, C. & Bushman, F. D. (2010). Hepatitis C virus transmission bottlenecks analyzed by repeat loci used in human identity testing. J Forensic Sci 51, 253–265. deep sequencing. J Virol 84, 6218–6228. 70 Journal of General Virology 95
© Copyright 2026 Paperzz