University of Groningen Finding causal variants for complex genetic disease Spijker, Geert Theodoor IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2007 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Spijker, G. T. (2007). Finding causal variants for complex genetic disease: the contribution of statistical methodology to fine-mapping and assay optimization s.n. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 15-06-2017 23 A factorial experiment for optimizing the PCR conditions in routine genotyping* Marijke Niens1, Geert T. Spijker1, Arjan Diepstra2 and Gerard J. te Meerman1 Departments of 1Medical Genetics and 2Pathology and Laboratory Medicine, University Medical Centre Groningen, the Netherlands Although most PCRs would produce proper PCR products when first tried, a general optimization is required to yield the best results. This optimization is often achieved by changing one factor at a time. However, this may lead to suboptimal results, since interactions between conditions are difficult to detect with this approach. In the present study, we describe the factorial optimization of PCR conditions for microsatellite genotyping, by introducing small systematic variations in conditions during the genotyping process. The hypothesis was that small changes will not affect genotyping results, but will provide information about the optimality of current conditions. The conditions to vary were the concentrations of buffer, MgCl2, dNTPs, primers, Taq polymerase and DNA, the annealing temperature (Ta) and the number of cycles. We show that, by a 28 factorial experiment, it is possible to identify not only the factors responsible for obtaining good results, but also those responsible for bad results. However, the condition leading to the highest signals is not necessarily the best operational condition. The best operational condition is minimally sensitive to random pipetting fluctuations and yields reliable genotypes as well. Introduction High-throughput genotyping is a core technology for many types of genetic investigations. In linkage and association studies, hundreds to thousands of individuals are genotyped for thousands of different markers. Nowadays, this is done in a semi-automated way using PCRs prepared with pipetting robots and using 384-well thermal cyclers. The subsequent steps consist of capillary Reprinted, with permission, from: Biotechnology and Applied Biochemistry (2005; 42: 157-62), Portland press Ltd. * 24 Chapter 2 electrophoresis or microarray technology. This is a high-throughput, almost industrial, process in which it is crucial to maintain reliable process conditions. The first-pass genotyping results have to be as complete as possible, to reduce high costs and workload due to laborious rework after failure. Insufficient quantity of PCR products is a crucial item in the genotyping process [1]. Optimization of PCR is often tried by sequential optimization of each reaction variable, an approach that rarely leads to complete testing of all possible combinations. In practice, the optimum conditions are rarely identified [2]. It so happens that routine genotyping results are below those obtained during optimization. Such failures are often interpreted as a result of random errors that are difficult to control and identify. While there is some truth in this, it might also mean that sequential optimization of PCR conditions may not be sufficient to obtain a robust working point. Production conditions should be minimally sensitive to uncontrollable and unavoidable random fluctuations, often related to pipetting variation. In a previous study, we showed that, by use of ANOVA, applied to the first-pass results of a genetic study, important factors influencing genotyping success can be determined [3]. Once the potential strength of ANOVA has been understood, we introduced small controlled and designed experimental variations into the PCR conditions of the genotyping process to obtain information about the optimality of our current working conditions and sensitivity of those conditions to random changes. We propose to view optimization of PCR conditions as a continuing process, which does not stop after initial optimization. We applied small systematic changes (in the order of magnitude of random pipetting errors or less) to the best estimate of optimal process conditions. By systematic combination of the conditions, production runs can be set up as factorial experiments. We show data where a 28 factorial design was used, in which each factor assumes two levels. In this design, all combinations of factors occur equally frequently. This type of design is able to estimate the effect of each PCR factor and can check possible interactions between the factors [4]. Since there are complex interactions among the individual variables in the PCR [5], one set of amplification conditions that is optimal for all situations may not exist. The experiments were performed on an existing genotyping project [6], because it is expensive and rather inefficient to generate data only to improve the genotyping process and because applying systematic small changes to process conditions is not likely to affect the success of genotyping. We report here the results of one large and one smaller experiment, where 384 individuals are genotyped with 27 separate microsatellite markers, using fluorescence strength as the dependent variable and process conditions as experimental factors. Optimizing PCR conditions 25 Table 1: PCR conditions Factor -10% of standard Standard condition +10% of standard DNA 22.5 ng 25 ng 27.5 ng - Tris/HCl 9 mM 10 mM 11 mM - KCl 45 mM 50 mM 55 mM 1.35 mM 1.5 mM 1.65 mM MgCl2* 0.9 mM 1.0 mM 1.1 mM dNTPs 0.18 mM 0.2 mM 0.22 mM Buffer - MgCl2 Each primer 0.225 μM 0.25 μM 0.275 μM Taq polymerase* 0.45 units 0.5 units 0.55 units Ta* (experiment 1) -2 °C Primer specific Ta +2 °C (experiment 2) -1 °C +1 °C Number of PCR cycles 30 32 34 Standard PCR conditions and experimental conditions of experiment 1, in a final volume of 10 μl. Asterisks (*) mark factors that were also tested in experiment 2. Materials and methods Source of template We performed experiments within a research project where it was required to genotype markers covering the HLA (human leucocyte antigen) region in a genetic association study for Hodgkin lymphoma [6]. Leucocyte (germline) DNA was extracted from 20 ml of EDTA/blood by standard procedures [7] and stored at −80 °C. Before use, the DNA was diluted in MilliQ water and stored wet at −20 °C. PCR and genotyping procedure The PCR contained the PCR buffer (100 mM Tris/HCl, pH 9.0, 500 mM KCl and 15 mM MgCl2), Taq DNA polymerase (Amersham Biosciences, Uppsala, Sweden), MgCl2, dNTPs (Roche Diagnostics, Mannheim, Germany) and primers [with one 5’labelled with fluorochrome 6-Fam (6-carboxyfluorescein) or HEX (hexachlorofluorescein); Sigma,Malden, The Netherlands]. The concentrations of the basic amplification reaction mixture for the two experiments are shown in Table 1. Reactions were carried out in a final volume of 10 μl. In experiment 1, genotyping was performed on 384 samples in a 384-well plate using 20 microsatellite markers in the HLA region (Table 2); these markers were described previously [6]. Seven newly designed markers in the HLA region were used for genotyping all 384 samples in experiment 2 (Table 2). Primer sequences were selected from NCBI sequence files (http://www.ncbi.nlm.nih.gov/genomes/sts). Thermal cycling was performed on a Primus Multiblock HT PCR system (MWGBiotech, Ebersberg, Germany). PCR was started with incubation of 5 min at 95 °C, 26 Chapter 2 Table 2: Sequences of microsatellite markers Expt. 1 2 Marker 6SL001 6SL002 DNRNGCA RING3CA D6S2658 6BO01 Tap1 G511525 D6S1666 D6S2670 D6S273 TNFα MICA D6S2673 D6S2678 D6S2694 D6S2699 D6S2700 D6S265 D6S2707 HL002 D6S2701 D6S2702 HL003 D6S2704 D6S2705 D6S510 Forward primer (5' → 3') CCTCACCCGATACATAGACATAGG CTCTCGCTACTGTGGTACATGC AGGAATCTAGTGCTCTCTCC TGCTTATAGGGAGACTACCG AGAGAATGGATGCTGCATGAGG AGGGAATTCGGAACTCATTTTT AGAACCAGACAGGTTTCTCCTG GGTAAAATTCCTGACTGGCC CTTGAGGACTGAGTCTGAGTTGG CCACCCACTTCCTCCACTAGAATC GCAACTTTTCTGTCAATCCA CCTCTAGATTTCATCCAGCCACA GCCTTTTTTTCAGGGAAAGTGC TTCTGCGTTTTCAGCCTGCTAG TTGCAGTGAGCCAAGATCGC TCTCTTTCCCAGTGTCCTTCTAAC CGACTCCACCTATGACGGACATAC CAGTTTCGCAACCTGTTTGCC ACGTTCGTACCCATTAACCT CAGTTTCGCAACCTGTTTGCC TACCAGGTTGTAAGGCTCAACAT GAGGTCTGTGGTCATAACTTTGG ATAAAATCCAGGTCATGGTGGA TTGAAAAACAGGTCATTTTTAGGTT CCTTCTCTCCCCAAAGATAAACA GCCTTCAGGACATGTTTGTGTGTA AATGTTCCTGCTTTCATTTCTTT Reverse primer (5' → 3') AGAAATACCGAAATAAGGCCTCC CAAACTGTAAGTCATGACCATGC CTCTAGCAAAAGGAAGAGCC GATGGGAAGTTTCCAGAGTG TGTATAACCCGAAAGTCCAGCTCTC GTAAACTGGGCTGAGATGTACCA GGACAATATTTTGCTCCTGAGGTA GACAGCTCTTCTTAACCTGC GAATCCAGCATTTTGGAGTTGT GTGAATTGTGACTGTGCCAGTACAC GACCAAACTTCAAATTTTCGG GCCTCTCTCCCCTGCAACACACA CCTTACCATCTCCAGAAACTGC GAACCACTCTTCGTACCACAGTCTC CCCCACAAAAAACCCCTGTTTATC GCAATACAGCAAGACCCTGTC CCTCTTCTCAGCTCTTCCATCTCAC GCATCAGCAGTCATTAGGGAAATGC ATCGAGGTAAACAGCAGAAA GCATCAGCAGTCATTAGGGAAATGC GGCTGAGATGAGAGAATCACTTC TGTGGTTTCATTTCCTTCTAGTCA GGCCTAAATGCTTCCTTGGATA GGGCAACAAGATCAAAACTCTG GTAATTTTTGCCACTCTGGAGGA TTCAACTCTTTTAGCTGTTTTGG GTCAAAACTGCAATGGGCTACTA Table 3: Factorial design for three factors each at two levels Reaction Factor 1 Factor 2 Factor 3 mixture 1 1 1 1 2 1 1 2 3 1 2 1 4 1 2 2 5 2 1 1 6 2 1 2 7 2 2 1 8 2 2 2 In the table 1 and 2 represent -10% and +10% of the standard factor condition; all possible combinations result in 8 reaction mixes. followed by 30 or 34 cycles (−2 and +2 of standard used cycles) for amplification with the following thermal profile: 94 °C for 30 s, primer-specific annealing temperature (Ta) (±2 °C of optimal temperature) for 30 s, 72 °C for 1 min and the end of the last cycle was 72 °C for 5 min. The PCR conditions were tested as outlined in Optimizing PCR conditions this paper. Subsequent to the PCR, products were pooled into predefined panels, according to allele length and fluorescent label, by use of a Biomek 2000 pipetting robot (Beckman Coulter, Allendale, NJ, U.S.A.). A sample (2.3 μl) of the pooled products was mixed with 2.5 μl of MilliQ water and 0.2 μl of ET-400R size standard (Amersham Biosciences). The products were visualized by separating the samples on a MegaBACE 1000 capillary sequencer (Amersham Biosciences) according to the manufacturer’s instructions. Genotyping steps other than the PCR were kept constant during the experiments. Experimental design For experiment 1, we used a factorial design of eight factors at two levels, which resulted in 256 different conditions. Table 3 shows an example of a 23 design. Six of the factors in the PCR varied in concentration: PCR buffer, MgCl2, dNTPs, primers, DNA and Taq. The two concentration levels applied for these variables were −10% and +10% of standard concentrations (Table 1). The other two factors varied were the annealing temperature (Ta) with two levels (Ta=−2 and +2 °C) and the number of PCR cycles (30 and 34 cycles). All possible combinations of the five factors in the PCR reaction mixture, PCR buffer, MgCl2, dNTPs, primers and Taq, resulted in 32 different mixtures. These mixtures were systematically divided over two concentration levels of the DNA samples in a 384-well plate. These 384-well plates containing the DNA samples and the different mixtures were divided over the two levels of PCR cycles and annealing temperatures. The experiment was performed on 384 samples for each of the 20 microsatellite markers, so the conditions were divided over a total of 7680 samples; therefore each specific condition was performed 30 times. For the second experiment, three factors with two levels were tested; these were the significant factors resulting from experiment 1, except for the factor of buffer. The buffer also contained MgCl2 and this could interact with the factor, additional MgCl2. The factors MgCl2 and Taq varied with −10% and +10% of optimal concentrations, and the Ta varied with −1 and +1 °C of optimal Ta (Table 1). Experiment 2 was performed on 384 samples for seven microsatellite markers, and the eight conditions were systematically divided over 2688 samples, so each specific condition was performed 336 times. For both experiments, each DNA sample had a fixed position in the plates. Analysis First-pass genotyping results were used for analysis. Lengths of the alleles were visualized and analysed using Genetic Profiler version 2.0 (Amersham Biosciences). The quantity of the PCR product produced was measured by use of the peak height (relative fluorescence) of the short allele. The dataset was prepared using Excel 2000 27 28 Chapter 2 Table 4: Results of ANOVA (mean effects of experiment 1) Factors Type III Sum of Squares Degrees of freedom F-value P-value MgCl2 Ta 630 1 276.53 0 280 1 123 0 Taq 1070 1 471.55 0 Marker 5540 19 121.4 0 Buffer 750 1 329.91 0 dNTP 3.28 1 1.44 0.23 primer 141 1 61.78 0 DNA 5.23 1 2.3 0.129 60 0 Cycles 136 1 Errora 16000 7052 24700 7079 Corrected Totalb Dependent variable: peak height of the shortest allele. R2 =0.350. a The variation left unexplained after the model has been considered. b The total variation in the dependent variable, corrected for the mean. Table 5: Mean peak heights in experiment 1 Factor MgCl2 Ta Taq Buffer Variation Mean peak height SEM 95% confidence interval Lower Upper N -10% 12639.2 264.2 12121.4 13157.0 3533 +10% 18602.3 264.2 18084.5 19120.2 3547 -2°C 17603.1 265.3 17083.1 18123.2 3515 +2°C 13638.4 263.0 13122.8 14154.0 3565 -10% 11699.9 261.3 11187.7 12212.1 3580 +10% 19541.6 267.2 19017.9 20065.3 3500 -10% 18980.4 260.1 18470.6 19490.2 3475 +10% 12452.6 252.4 11957.8 12947.3 3605 (Microsoft Office). The genotyping success was the percentage of alleles for all markers that could be determined. The conditions producing the maximum PCR product was analysed by comparing the peak heights of the different conditions by the use of ANOVA, using SPSS version 11.0 (SPSS, Chicago, IL, U.S.A.). The eight factors were analysed, including interactions between these factors for a significant effect on the quantity of PCR product. The presence or absence of data (representing total failure of PCR) was analysed as a dependent variable as well. Homozygosity of the genotype was introduced as covariate, because homozygous peaks are expected to have a higher signal than heterozygous peaks. Optimizing PCR conditions 29 Table 6: Results ANOVA (mean effects experiment 2) Type III Sum of Squares Degrees of freedom F-value P-value MgCl2 1601 1 125.88 .000 Ta 11020 1 866.28 .000 Taq 239.54 1 18.83 .000 Factors Marker 9097 6 119.21 .000 120.55 1 9.48 .002 MgCl2*Ta 2.41 1 0.19 .663 MgCl2*marker 5653 6 74.07 .000 Taq* Ta 30.87 1 2.43 .119 Taq*marker 2318 6 30.37 .000 Ta*marker 6687 6 87.62 .000 Taq* Ta*marker 1122 6 14.70 .000 17.45 .000 MgCl2*Taq MgCl2*Taq*Ta*marker 4218 19 Errora 28370 2231 71550 2286 Corrected Totalb Dependent variable: peak height of shortest allele, R2= 0.603. Asterisks indicate the interaction between the factors (interaction term). If the P-value is significant, it means that the factors interact with each other. a The variation left unexplained after the model has been considered. Results Experiment 1 The mean peak heights of the different markers for experiment 1 were between 5600 and 42 000. Genotyping success was 92.7%, and the effects of homozygosity on amplitude were negligible. The results of ANOVA are shown in Table 4. Factors that did not influence the peak heights (quantity of PCR product) were the concentrations of DNA, dNTPs, primers and the number of PCR cycles. The factors MgCl2, buffer, Taq, Ta and the marker did significantly affect the signal height (for all the mentioned effects P<0.001, and with a high F value). In general, an increased concentration of MgCl2 and Taq, a decreased concentration of buffer and a lower Ta value gave higher signals (Table 5). An increase of Taq concentration by 20% resulted in signal heights that were almost twice as high. It was not possible to identify the best combination of conditions, since the factor MgCl2 was confounded with other factors, due to an error in the plate setup. In experiment 2, we therefore analysed the best concentrations and temperatures for the interactions between the factors having the strongest main effects as seen in experiment 1. Experiment 2 For experiment 2, the mean peak heights were between 24 000 and 45 000, and genotyping success was 97.8%. The overall variation in amplitude was explained for 30 Chapter 2 60% (R2 = 0.603) by all the experimental variations combined, which means that the variation in amplitude is largely controlled by the experimental conditions. The effects of homozygosity on amplitude were negligible. The results of ANOVA are shown in Table 6; the main factors of experiment 1 (MgCl2, Taq and Ta) remained important (for all the mentioned effects P<0.001) and the marker also influenced the signal height (P<0.001). The significant interaction between the factors MgCl2, Taq, Ta and the marker (P<0.001) indicates that more than an additive signal is present at different combinations of conditions for each marker. The combination of conditions that produced, overall, the highest signals was +10% MgCl2, +10% Taq and a lower Ta value, which is consistent with the main effects. Stable conditions are defined as minimally variable due to change in one of the conditions. Table 7 shows stable conditions for three randomly chosen markers. For marker HL002, the peak heights were not influenced by the experimental variations. The most stable condition for the markers D6S2701 and HL003 occurred at the condition +10% MgCl2, +10% Taq and Ta= −2 °C. When two factors (MgCl2 and Taq) were together reduced, average signal height decreased from 43 035 to 26 072. Conditions for the markers D6S2704, D6S2705 and D6S510 were in particular stable by using a lower Ta. Table 7 also shows results for the marker D6S2702; this marker was most stable under the condition: −10% MgCl2, −10% Taq and Ta=−2 °C or +2 °C. When the concentrations of both MgCl2 and Taq were increased, peak height decreased from 40 682 to 28 011. Discussion With this experiment, we have shown that a factorial experiment can be used without affecting the genotyping success, since success rates were still 92.7 and 97.8%, for experiments 1 and 2 respectively. Experiment 1 showed that, in general, an increased concentration of MgCl2 and Taq, a decreased concentration of buffer and a lower Ta value resulted in an increased quantity of PCR product. In experiment 2, these significant factors were varied, except for the factor of buffer, since the factor of buffer containing MgCl2 interacts with the factor of additional MgCl2. An increase of the MgCl2 concentration would lead to decrease of MgCl2 in the buffer, and the effect shown by the increase of MgCl2 is also caused by increase of MgCl2 in the buffer. The results of the main factors tested in experiment 2 were the same as those for experiment 1, so an increase of MgCl2 did indeed increase the amount of PCR product. With too little Mg2+, the polymerase will have poor activity, but, with too much Mg2+ [2] and a low Ta value, non-specific amplification could become a problem. However, production of non-specific amplification products was not obvious or present in too low concentrations to affect the results. The condition is not universal for all markers, because the effect of the microsatellite marker is high in both the experiments. This could be explained by marker-specific effects, Optimizing PCR conditions 31 Table 7: Most stable peak heights for the markers D6S2701, D6S2705 and D6S2702 Marker D6S2701 Ta −2 °C Taq −10% +10% +2 °C −10% +10% D6S2705 −2 °C −10% +10% +2 °C −10% +10% D62702 −2 °C −10% +10% +2 °C −10% +10% MgCl2 −10% +10% −10% +10% −10% +10% −10% +10% −10% +10% −10% +10% −10% +10% −10% +10% −10% +10% −10% +10% −10% +10% −10% +10% Mean peak height 26 072 42 663 40 402 43 035* 16 833 41 223 21 369 39 539 53 231* 53 583* 51 686* 56 034* 30 762 8 800 29 730 33 335 40 682* 39 894 35 926 28 011 40 366* 36 707 36 283 29 919 since the efficiency of amplification is also influenced by the specific sequence of the target site and primers [8]. Nevertheless, the condition producing the highest amount of PCR product over all markers on average might be a good starting point for optimizing new microsatellite markers. Variations in the concentrations of DNA, dNTPs and primers and in the number of PCR cycles did not affect the quantity of PCR product. Therefore it is likely that the concentrations of these factors were higher than necessary; these concentrations and the number of cycles might be reduced to achieve a possibly more optimal condition and to lower the costs. This experiment shows that our current working conditions were not optimal. In the second experiment, we showed that there was a significant interaction between the factors MgCl2, Taq and Ta. The combination of the main effects gave the same result as indicated by the interaction. The condition with the highest quantity of PCR product was not always the best operational condition, since the second experiment demonstrated the presence of a robust set of conditions with good signals. The use of a stable condition is recommended because, under unstable conditions, only one small variation in MgCl2 or Taq concentration did strongly reduce peak heights. Variations in Ta also had a strong effect on the quantity of PCR product, but generally this factor is well controlled during the PCR process. 32 Chapter 2 We recommend choosing a stable condition as the working point, accepting a marginally lower signal rather than one with maximum signal. Although, apparently, a single robust and optimal set of amplification conditions for all markers does not exist, a robust condition for each marker with little sacrifice in signal was easily identified. For high-throughput genotyping, it is essential to check continuously whether operating conditions are maintained, because small differences due to instrument drift or dilution errors may result in quite large differences in signal strength and genotyping success. A factorial design of the type that we used is very sensitive and can be routinely applied and operated with smaller experimental changes than we applied, in the order of a few percentage change, to verify that conditions are still optimal. The high proportion of explained variance (60%) with regard to signal strength indicates that we now have control over a large part of the quality-defining operating conditions, given the substantial size of random uncontrolled variations. By use of this design, many highthroughput techniques, using different factors, can be checked for their optimality of working conditions. With current robotics, the necessary systematic variations can be produced routinely without much effort. This implies that important process information can be obtained at little cost. Acknowledgements We thank Dr G. van der Steege (Department of Medical Biology, University Medical Center Groningen) and technicians for marker design and DNA isolation respectively. We also thank Dr E. Vellenga, Dr G. W. van Imhoff (Department of Hematology) and Dr S. Poppema (Department of Pathology) of the University Medical Centre Groningen, The Netherlands, for co-designing the Hodgkin study, which we used for our experiments. This research work was supported by a grant from the Dutch Cancer Society (KWFNKB 99-1878) and by Genizon Biosciences (Montreal, QC, Canada). References: [1] Moretti T, Koons B, Budowle B. Enhancement of PCR amplification yield and specificity using AmpliTaq Gold DNA polymerase. Biotechniques. 1998; 25: 716-22. [2] Cobb BD, Clarkson JM. A simple procedure for optimising the polymerase chain reaction (PCR) using modified Taguchi methods. Nucleic Acids Res. 1994; 22: 3801-5. [3] Spijker GT, Bruinenberg M, te Meerman GJ. Efficiency control in large-scale genotyping using analysis of variance. Appl Biochem Biotechnol. 2005; 120: 29-36. [4] Siouffi AM, Phan-Tan-Luu R. Optimization methods in chromatography and capillary electrophoresis. J Chromatogr. 2001; A892: 75–106. [5] Benčina M. Optimisation of multiple PCR using a combination of full factorial design and threedimensional simplex optimisation method. Biotechnol Lett. 2002; 24: 489–95. [6] Diepstra A, Niens M, Vellenga E, van Imhoff GW, et al. Association with HLA class I in Epstein-Barrvirus-positive and with HLA class III in Epstein-Barr-virus-negative Hodgkin's lymphoma. Lancet. 2005; 365: 2216-24. [7] Sambrook J, Fritsch EF and Maniatis T. Molecular cloning: a laboratory anual. Cold Spring Harbor Laboratory Press, Plainview, NY. 1989. [8] Rochelle PA, De Leon R, Stewart MH, Wolfe RL. Comparison of primers and optimization of PCR conditions for detection of Cryptosporidium parvum and Giardia lamblia in water. Appl Environ Microbiol. 1997; 63: 106-14.
© Copyright 2026 Paperzz