The Pharmacogenomics Journal (2003) 3, 77–96 & 2003 Nature Publishing Group All rights reserved 1470-269X/03 $25.00 www.nature.com/tpj REVIEW Single nucleotide polymorphism genotyping: biochemistry, protocol, cost and throughput X Chen PF Sullivan Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA Correspondence: Dr X Chen, Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, 800 E. Leigh Street, Richmond, VA 23298-0424, USA Tel: +1 804 828 8124 Fax: +1 804 828 3223 E-mail: [email protected] ABSTRACT The large number of single nucleotide polymorphism (SNP) markers available in the public databases makes studies of association and fine mapping of disease loci very practical. To provide information for researchers who do not follow SNP genotyping technologies but need to use them for their research, we review here recent developments in the fields. We start with a general description of SNP typing protocols and follow this with a summary of current methods for each step of the protocol and point out the unique features and weaknesses of these techniques as well as comparing the cost and throughput structures of the technologies. Finally, we describe some popular techniques and the applications that are suitable for these techniques. The Pharmacogenomics Journal (2003) 3, 77–96. doi:10.1038/sj.tpj.6500167 Keywords: SNP; genotyping; allele discrimination; cost; throughput Received: 19 November 2002 Revised: 7 February 2003 Accepted: 10 February 2003 INTRODUCTION Single nucleotide polymorphisms (SNPs) have emerged as genetic markers of choice because of their high-density and relatively even distribution in the human genomes1–3 and have been used by many groups for fine mapping disease loci and for candidate gene association studies. When a disease gene has been mapped to a chromosomal region, a high-density SNP mapping or candidate gene association studies are logical steps to follow. These efforts normally require a large amount of genotyping. To perform such large-scale SNP genotyping many SNP markers are required along with high throughput technologies. Efforts funded by the government and private companies have produced several millions of nonredundant SNPs2,3 and they are readily available for such studies. Genotyping technologies have become a significant bottleneck for these applications despite rapid progress in the field. Many new technologies for SNP genotyping have been developed in the last few years. Rapid technological progress makes it difficult to choose appropriate methods for a given application. In this article we review current technologies, focusing on the underlying biochemistry and we compare and contrast the strengths, weaknesses, throughputs and cost structures of these technologies. As a result of space limitations, we intend to cover only the most popular methods. More comprehensive reviews were available elsewhere.4–7 We hope to assist readers in the selection of methods that are most appropriate for their applications. A genotyping protocol normally has two parts—biochemical reactions to form allele-specific products and detection procedures to identify the products. Most technologies separate these two processes to enhance resolution, accuracy and throughput although they can be performed in parallel under certain SNP genotyping X Chen and PF Sullivan 78 circumstances. In the sections that follow, we will first outline a general genotyping protocol as a base for our discussion and then review (i) biochemistries for allelic discrimination, (ii) mechanisms for product detection and identification, (iii) cost structures, and (iv) throughputs. Finally, we will discuss several popular methods in more details. GENOTYPING PROTOCOLS Typically, genotyping protocols start with target amplification and follow with allelic discrimination and product detection/identification (Figure 1). Depending on the mechanisms for allele discrimination and the means of product detection/identification, some or all of these steps could be combined and processed in parallel. Clearly, the fewer the steps involved in a procedure, the greater the ease for operation and automation, which are of critical importance for large-scale operations. However, fewer steps may not be the best way to attain optimal data quality, cost and throughput. The key is to balance the gain from throughput and cost-saving vs the labor and cost of extra procedures. When the gain outweighs the labor and cost, more complex procedures can be justified. Target DNA Amplifications Human genome has about 3 billion basepairs. The target of genotyping is usually a few hundreds basepairs. The challenge of assaying such a tiny fraction of the genome is a key reason why most genotyping techniques begin with target amplification. Almost all techniques use the poly- Figure 1 A workflow chart for SNP genotyping. Typical protocols consist of three components: target amplification, allelic discrimination and product detection and identification. The component reactions can be performed sequentially or in parallel. The style of their execution reflects differences in design philosophy, throughput expectation and cost-saving measurement. The Pharmacogenomics Journal merase chain reaction (PCR) for amplification, albeit other strategies could be used.8,9 In brief, PCR works by using heat to separate a duplex DNA molecule into two single-stranded molecules and then copying each of the two single-stranded molecules with thermostable DNA polymerases and an oligonucleotide primer. When the DNA polymerases finish copying the single stranded molecules, each becomes a new duplex DNA molecule. This process can be repeated 30–40 times, after each round of amplification the amount of targeted DNA fragment doubles. By the end of the process, the amount of the targeted DNA fragment has increased 109 times or more. To our knowledge, the Invader technology10 is the only technique that claims to be able to genotype directly from genomic DNA, although the amount of genomic DNA required is about 10 times greater than that for a PCR. For genotyping purposes target DNA amplification has to be specific—an amplicon amplifies one and only one locus in the genome. Primers that amplify multiple loci can lead to serious genotype errors. Pseudogenes, conserved sequences within gene families and repetitive sequences are common features of our genome and they can cause nonspecific amplifications. To avoid this problem known repeat sequences should be filtered out before designing PCR primers, and PCR products should be tested before genotyping. Since most genotyping techniques rely on PCR for target amplification, its efficiency and cost become a critical issue for large-scale applications. PCR performs best when it is used to amplify a single target, with a success rate of 85% or better. However amplifying one target at a time is not efficient. It would be ideal to be able to amplify multiple targets simultaneously. Unfortunately, multiplexing PCR has serious problems, the success rate drops to 50–70% for markers and not all targets are amplified equally. Furthermore, the scoring rate (the percentage of samples that can be genotyped) also goes down dramatically, from better than 95% for single marker PCR to 60–80% for 5–10 multiplexing PCR. PCR is also expensive for large-scale applications (about $0.25–0.35 per reaction). If 10 multiplex can be performed, the cost could be shared by 10 genotypes. For clinical applications, PCR license fees are also a significant factor and this makes non-PCR-based methods appealing. For all these reasons, PCR becomes the first significant barrier for large-scale applications. Following PCR, there is normally a clean-up step to remove excess dNTPs and PCR primers left over from the reaction. This clean-up is necessary for all primer extensionbased methods (see below) because the excess dNTPs and PCR primers interfere with primer extension. The clean-up can be done enzymatically with shrimp alkaline phosphatase (SAP) and E. coli exonuclease I (Exo I), which inactivate dNTPs and PCR primers respectively, or physically via gel filtrations. This step may or may not be necessary for hybridization- and ligation-based methods. In addition, other special procedures to prepare single-stranded PCR products are also performed here in conjunction with the clean-up.11,12 SNP genotyping X Chen and PF Sullivan 79 Allele Discrimination Reactions Allelic discriminations are the core of genotyping. Mechanistically, the discriminating power originates from DNA polymerases and DNA ligases. The difference in thermodynamics between matched and mismatched DNA duplexes is also exploited. These allele discriminating reactions produce allele-specific products with high specificity and accuracy. The mechanism used for allele discrimination has substantial impact on the protocol, cost and throughput. From protocol perspective if a mechanism selected does not interfere with target amplification, the two can be combined and executed in parallel. For example, hybridization, ligation and the 50 nuclease activity of DNA polymerases do not interfere with PCR process, it is the combination of these processes with PCR that created the Molecular Beacons,13 the TaqMan assay14 and the FRET-DOL15 assay. These single-step, closed-tube assays greatly simplify protocol and minimize user input so that they are readily amenable to automation. The mechanism used for allele discrimination largely determines the specificity and accuracy of the methods. Typically, DNA ligases have the best specificity, whereas different activities of DNA polymerases vary in specificity. The endonuclease activity used for the Invader assay is highly specific. The DNA synthesis activity, depending on the way being used, can be very specific (single-base extension (SBE)) or marginal (allele-specific PCR). The discriminating power of allele-specific hybridization is relatively lower. It is important to understand the discrimination power of these mechanisms, it is equally important, if not more, to understand how a mechanism is devised and used in a particular method. The overall accuracy and specificity of a method depend on its amplification strategy, allele discrimination and detection format. Under optimal conditions all mechanisms can have acceptable (499%) accuracy. could lead to spurious results. All homogeneous methods use indexes to infer genotypes. Solid-phase-mediated methods can detect allele-specific products directly or indirectly. The speed of product detection contributes to the throughput capacity of a method. While the measurement of fluorescence intensities, absorbance,11 electric charge,28 or mass spectrometry takes only seconds or less per sample, electrophoresis can take several hours. The preparation procedures that required before the detection vary greatly among techniques, and they are important factors when considering a technique. While some of the procedures are common to several techniques others are specific to a particular method. It is often the case that the efficiency of a protocol makes it stand out. Most detection systems have built-in mechanisms to perform repetitive work for many samples, such as 96- and 384-well plates or microarrays. In addition, most systems dedicated for SNP genotyping have algorithms for automatic genotype scoring. The protocols and their capacity for multiplexing, therefore, are the most important factors in determining the overall efficiency of cost and throughput. Detection and Identification of Allele-specific Products The next step is the detection and identification of allelespecific products. Depending on the method of detection, an additional clean-up or purification may be necessary. Most solid-phase-mediated methods require a clean-up because the substrates of the allele-specific reaction can produce false-positive signals and complicate genotype scoring. Examples are mass spectrometry,12,16–21 zip-code technology,22,23 Illumina color beads,24,25 Orchid SNP-IT technology11 and all hybridization based methods.26,27 Product detection and identification can be performed in many different ways. Mass spectrometry detects and identifies the allele-specific products by molecular weight; DNA sequencing separates products by size and fluorescence labels. Other methods uses indexes to infer products, such as fluorescence resonance energy transfer (FRET), fluorescence polarization (FP), luminescence, absorbance and melting temperature. When indexes are used, it is critical that the indexes are directly linked to or correlated with the products, any factor that alters the linkage and correlation Allele Discrimination based on DNA Polymerases DNA polymerases are versatile enzymes that have multiple functions. Their major function is to replicate DNA during cell division. To ensure the fidelity of genetic information across generations, DNA polymerases must be highly accurate and precise. This requires, during DNA synthesis, the extending end of a DNA strand and the incoming base match the template exactly. When one of them does not match, the extension would stop, awaiting other functions of the polymerase or other enzymes to correct the error. This DNA synthesis activity has been exploited for genotyping in the form of SBE, allele-specific extension (ASE) and allelespecific PCR (AS-PCR). DNA polymerases also have 50 and 30 exonuclease activities in order to repair errors and remove RNA primers used in DNA replication. These activities form the basis of a mutation detection system (pyrophosphorolysis-activated polymerization29,30) and the TaqMan assay. The Invader assay was developed from a special structure specific endonuclease activity of some archael bacteria. BIOCHEMISTRY OF ALLELE DISCRIMINATION Genotyping is a laboratory procedure that identifies the alleles presented in a given sample. To accomplish this goal, genotyping biochemistry must be highly specific. A biochemical reaction identifies one and only one allele at a time. Since multiple reactions can occur simultaneously at multiple templates and target loci of a sample, collectively the same biochemical reaction can identify multiple alleles and multiple loci. Popular SNP genotyping technologies currently available are based on one or more properties of these enzymes and processes: (i) DNA polymerases; (ii) DNA ligases; and (iii) hybridization. www.nature.com/tpj SNP genotyping X Chen and PF Sullivan 80 Single-base extension SBE or minisequencing derives from the DNA synthesis activity directly. DNA polymerases extend a DNA strand from 50 to 30 and require a template strand to guide the incoming nucleotides; the extending 30 base must be basepaired with the template. If the 30 extending base is not base-paired to its template DNA polymerases would not add new bases to the strand. If the 30 extending base does not have the 3-hydroxyl group (–OH), the strand would not be able to extend any further.31 These features form the basis of SBE.32,33 SBE, literally speaking, is to extend a DNA strand or a primer by one base. To ensure all primers extend one and only one base the triphosphate nucleotides used in the reaction must be a special group, the terminators (ddNTPs), which do not have the 3-hydroxyl group. For SBE, extension primers are designed to anneal immediately upstream to the target polymorphic sites. In the presence of a DNA polymerase and candidate terminator nucleotides complementary to the polymorphic bases, primer extension occurs (Figure 2a). The polymorphic bases dictate which terminators are incorporated. The products of SBE are oligonucleotides that are one base longer than their extension primers. To facilitate detection the terminators can have specific labels, of which fluorophores are the most popular.6,34–37 By detecting the fluorescence labels or other moieties11 linked to the terminators, the genotype of a sample can be inferred. The major difference between SBE and DNA sequencing is that SBE identifies only one base (the polymorphic base) while sequencing can identify up to several hundreds bases and determine their relative order. For genotyping purpose, the information obtained by SBE is adequate although the surrounding sequence is desirable under some circumstances (see below). SBE produces highly specific products. Many studies have shown that the error rate of DNA synthesis for commonly used polymerases is 10-4 or lower. Combined with appropriate detection mechanisms, genotyping methods that use SBE produce highly accurate genotype data. These methods include mass spectrometry, microarray-based detection, FRET and FP detection. Allele-specific extension and allele-specific PCR ASE and AS-PCR make use of the difference in extension efficiency between primers with matched and mismatched 30 bases. DNA polymerases extend primers with a mismatched 30 base at much lower efficiency, from 100 to 10 000 folds less efficient than that with a perfectly matched 30 base. ASE can be performed by itself37 or with a matching primer to amplify a specific allele in a sample, that is, ASPCR.38,39 ASE requires two specially designed allele-specific primers that anneal to the target with their 30 bases matching the two alleles of an SNP (Figure 2b). In AS-PCR, a common reverse primer is needed to match with the two allele-specific ASE primers to amplify the target. Both ASE and AS-PCR produce products as long as their templates; whereas ASE amplifies the templates lineally, AS-PCR does it The Pharmacogenomics Journal exponentially. The allele-specific primers normally bear labeling tags38,39 to enable their identification. ASE can be performed in situ on a chip when the two allele-specific primers are anchored at separate addresses.37 The formation of extension products implies that the 30 allele-specific bases match the polymorphic bases in the templates. By detecting which primer forms the products, a sample’s genotype can be determined. Although SBE possesses superior discrimination in comparison to ASE, genotyping results obtained from ASE may be acceptable.37 As for AS-PCR, the results are largely dependent on the sequence surrounding the marker and the nature of polymorphism. It has been widely observed that while some AS-PCRs can be highly specific, many others produce ambiguous results. It appears that both the local sequence and the type of polymorphism affect the specificity of AS-PCR.40 Furthermore, the nature of exponential amplification of AS-PCR makes the decay of discriminating power much quicker than that of linear amplification as seen in both SBE and ASE. For example, if the polymerase makes 1% error in at the polymorphic base or a discriminating factor of 100 to 1 for a homozygous sample, the accumulated error in the amplified population would reach B24% for AS-PCR (1–0.9930) after 30 cycle amplification, but that for SBE and ASE remains at 1%. When the amplified population is assayed, if only 1% of the population contains error, it normally goes undetected, but if that fraction increases to 24%, it surely will be detected and the sample will be scored as heterozygous. Therefore, it actually requires higher discriminating power for AS-PCR to arrive at same error ratio in the amplified population. Synchronized DNA synthesis—pyrosequencing Pyrosequencing is a novel approach for DNA sequencing,41,42 which takes advantage of a by-product of the DNA polymerase reaction when it incorporates a nucleotide onto a primer. When a DNA polymerase incorporates a nucleotide onto the 30 end of a primer, it produces a pyrophosphate. On pyrosequencing, this pyrophosphate then triggers an enzymatic cascade (ATP sulfurylase, luciferase) to produce a detectable luminescence signal (Figure 2c). For this scheme to work, the synthesis of DNA strands must be synchronized, which can be accomplished by adding nucleotides to the reaction one at a time. The synchronization ensures that no reaction occurs when the incoming nucleotide does not match the template. When the incoming nucleotide matches the template, the DNA strand extends one base and a pyrophosphate is released. The pyrophosphate then initiates the enzyme cascade to produce light; if not, the nucleotide is degraded by yet another enzyme, an apyrase, to refresh the system. Repetition of this procedure produces a sequence complementary to that of the template. Like DNA sequencing, pyrosequencing provides sequence context surrounding polymorphic sites. Pyrosequencer detects the luminescence in real time and it can sequence a short stretch of DNA (10–15 bases) in about 20 min.43 SNP genotyping X Chen and PF Sullivan 81 Figure 2 The biochemistry of allelic discrimination. For all reactions, substrates are listed on the left side and products on the right. (a) SBE. When an extension primer anneals to the template (underlined) immediately upstream of the target polymorphic site (bold) DNA polymerases incorporate ddNTPs onto the 30 end of the primer. The bases incorporated are dictated by the polymorphic bases. The products of SBE are one base longer than the extension primer with an allele-specific label. (b) ASE and AS-PCR. ASE and AS-PCR use allelespecific primers, one for each allele. The 30 end of ASE primers anneal onto the polymorphic bases. If the last base matches the polymorphic base, the primer gets/is extended, otherwise there is no extension. The products of the extension are normally the same length as the templates. To facilitate the identification of the products the primers bear distinct labels. (c) Pyrosequencing. Primers for pyroseqeuncing anneal to templates a few bases upstream from the target polymorphic site. Deoxyribonucleotides are added to the reaction one by one. If the base added matches the template, the primer extends one base, and produces a pyrophosphate, which is turned into a chemiluminescence signal via an enzyme cascade. Otherwise the base is degraded by an apyrase provided in the system. (d) Structurespecific cleavage. Three probes, one invader probe (below the template) and two allele-specific probes (above the template), are used for the Invader assay. One allele-specific probe and the invader probe are needed to form a bifurcated structure with the template. The 30 base of the invader probe does not have to match the template. If the allele-specific probe matches the template at the target site, a bifurcated structure with one base overlap between the allele-specific probe and the invader probe would be formed. The one base overlap is necessary and sufficient to cause the cleavage of the allele-specific probe. The products of the reaction are the 50 portion of the allelespecific probes. (e) Ligation. Ligation joins two probes together. When the bases on both sides of the ligation site match the template, ligation products are generated, otherwise there are no products. (f) Hybridization. When an allele-specific probe anneals to the target perfectly it is more stable. If there is a mismatch with the template it would be less stable and can be washed away. The results of hybridization are that the matched probe remains hybridized and the mismatched is not. www.nature.com/tpj SNP genotyping X Chen and PF Sullivan 82 Like SBE, pyrosequencing uses the synthesis activity of DNA polymerases and it can generate highly accurate genotypes. Since dNTPs are introduced in the system one by one, each addition of a base to the primer produces an amount of pyrophosphate directly proportional to the amount of templates, which dictate what and how many bases are incorporated onto the primer. The signal producing steps (ATP sulfurylase and luciferase) are also quantitative. All these features render the method quite quantifiable when the template sequence does not contain a homo string of same bases. When it encounters a homo string of 4–5 bases such as AAAA or TTTTT, its signal becomes disproportional and the quantification becomes unreliable. Structure specific cleavage — the Invader assay Some DNA polymerases from archael bacteria have evolved with an endonuclease activity that requires specific substrate structure. The structure is a bifurcated duplex with a free 50 end.44 The key feature of the structure is that there must be at least one base overlapping between the strand that has a free 50 end and the strand that is annealed to the template. The enzyme cleaves the strand with a free 50 end when one or more overlapping bases are detected (Figure 2d). The cleavage site is dependent on the length of overlapping bases, eventually it would cut out all overlapping bases to form a structure that can be readily joined by DNA ligases (so it is believed to be part of the function to repair DNA damage45). There would be no cut if there is no overlap. The Invader assay design uses three oligonucleotide probes, two allele-specific ones and one common (the invasive probe), to create artificial bifurcated structures with the templates at the polymorphic site.10,46 When the appropriate target is presented, one or two of the allele-specific probes would be cleaved. The released 50 end of the allele-specific probes can then form still another bifurcated structure designed with an arbitrary sequence purely to amplify the signal for better detection sensitivity (Invader square). The Invader assay is highly specific and can genotype SNP directly from genomic DNA. It is a useful alternative to the methods that based on PCR amplification. For the Invader assay, the quality of the invasive probe is critical. As stated above, the cleavage site depends on where the 30 of invader probe ends. The 30 of the invader probe has to be on the polymorphic site, one base offset can result in false-positive and false-negative signals. The 50 exonuclease activity–the Taqman assay The TaqMan assay exploited the 50 nuclease activity of DNA polymerases. During DNA synthesis if the extending strand encounters a small piece of DNA that is perfectly matched with the template the polymerase cleaves it. In normal DNA synthesis, this activity is believed to be responsible for removing Okazaki fragments. If the small piece of DNA is not annealed to the template with sufficient stability (ie there are mismatches between the strands), the small piece would be pushed off the template but not cleaved.14 The The Pharmacogenomics Journal TaqMan assay exploits this feature and designs two dually labeled fluorescent probes annealed at the SNP site. Each of the two probes anneals to one allele to create a stable structure that would lead to its degradation by the moving DNA polymerase, but when it anneals to the other allele, it forms a structure that is less stable so that the probe gets pushed off the template without being cleaved. The cleavage of the dually labeled probes changes the status of FRET between the two fluorophores, providing a mechanism for its detection (FRET detection, see below). The TaqMan assay also exploited the difference in stability between the perfectly matched and single-base mismatched duplexes. It is an example where multiple mechanisms for allele discrimination are used. Since the difference between the perfect match and single-base mismatch is subtle, the TaqMan assay normally requires substantial optimization to make it work. Although the use of minor groove binding agents improves the performance of the assay,47,48 probe design remains largely empirical that poses substantial difficulties for new users. Allele Discrimination based on DNA ligases DNA ligases join the ends of two oligonucleotides annealed next to each other on a template. The ligation reaction occurs only if the oligonucleotides are perfectly matched with their templates at the ligation site. Mismatches at either end of the ligation site would result in no ligation products (Figure 2e). Taking advantage of this property, two oligonucleotides can be designed to anneal to both sides of an SNP. By detecting the formation of ligation products, genotypes of the target SNP can be inferred.15 To use DNA ligases to discriminate the two alleles in an SNP requires three oligonucleotides, one for each allele and one common to both alleles. Although DNA ligase can discriminate mismatches on both sides of the ligation site, it is more sensitive to the mismatches on the 30 end,49–51 so most designs use two allele-specific oligonucleotides on the 30 end and one common oligonucleotide on the 50 end. There was a recent report that DNA ligase could be used similarly for typing variations in RNA molecules.52 Ligation is a different biochemical reaction than PCR, this makes it possible to combine the two reactions together and to execute them in parallel. There are some recent developments that take use of DNA ligase to create allele-specific, circular-single-stranded probes (padlock probes), which are then amplified by rolling circle amplification.52–55 Although these methods are not yet available commercially, they have potentials to be developed into homogeneous and isothermal technologies that can overcome the barrier of PCR amplification. Allele Discrimination based on Hybridization DNA hybridization is based on the intrinsic property of DNA molecules that they form duplex structures if complementary strands are present. Under many conditions, the two strands forming a duplex structure do not have to be 100% complementary. This is very important as more mismatches between the two strands make the duplex structure less SNP genotyping X Chen and PF Sullivan 83 stable thermodynamically. A less stable duplex is easier to disrupt (denature) when it is heated. The temperature that disrupts 50% of duplex DNA molecules is defined as melting temperature (Tm) for that DNA molecule. SNP genotyping by hybridization exploits the subtle difference in thermodynamics between a perfectly matched and a single-base mismatched duplexes. This method designs two allelespecific probes to hybridize to the target; each probe forms a perfect duplex with one allele but forms a mismatched duplex with the other allele. The ways to differentiate the two duplexes form the basis of detection. This can be accomplished via steady-state or dynamic hybridization. DNA hybridization—steady-state measurement For SNP genotyping, DNA duplexes formed by the two alleles differ only by one base pair. The difference in thermodynamics between the two duplexes is subtle in most cases and its detection requires sophisticated design and instrumentation. One way to overcome this weakness is redundant design, which uses multiple probes for each allele.27,56 A consensus signal from those probes designed for the same allele is used to score the genotypes. The locked nucleic acid (LNA), a new analog of nucleotide, has been reported to increase the Tm of hybridization probes and to help discriminating matched and mismatched probes.57,58 For steady-state measurement, DNA hybridization is normally performed under optimal conditions that probes anneal only to their corresponding allele but not to the other allele. When the reaction reaches steady state, unhybridized probes are washed away, Figure 2f. Genotypes are scored based on the probes that remain hybridized. In traditional hybridization, targets are anchored on solid support and probes are labeled and put in solution. In order to assay a large number of SNPs with microarrays the process is now reversed, probes are anchored on specific addresses of a microarray and targets are labeled and put in solution. Labeling of target DNA is commonly done in conjunction with PCR amplification by using fluorescence-labeled dNTPs. Genotypes are scored by detecting fluorescent intensities at specific addresses that hold allele-specific probes for each SNP.26,27 probes cheaper. It relies on the use of duplex-dependent fluorescence, a property of some fluorescent dyes that emit only if they are intercalated into a DNA duplex. As temperature is gradually ramped up to denature the duplexes a steep decrease of fluorescent intensity would be observed when temperature reaches the Tm, indicating that the duplexes are melting into single-stranded form. Dynamic measurement is a real-time process that demands sophisticated temperature control coupled with data acquisition. This limits its utility for some applications. The allele-specific reactions described above can be performed in aqueous solution32,38,39,60–63 or anchored to a solid support.36,37 The choice of a particular reaction format depends on the characteristics of the technology and throughput expectations. MECHANISMS FOR PRODUCT DETECTION AND IDENTIFICATION Detection mechanisms for allele-specific products vary greatly, from fluorescence and luminescence reading to mass and electric charge measurement. The mechanisms roughly fall into two categories: homogeneous detection and solid-phase-mediated detection. Homogeneous detection is more amenable to automation as it does not require purification or separation after the allele discrimination reactions. However, because the products are not separated or purified, homogeneous methods have limited capacity for multiplexing. Another feature of all homogeneous methods is that they are collective measurements that depend on the relative amounts of reaction substrates and products. These methods detect the products only if the products become dominant. If the substrates remain dominant, these methods would not be sensitive enough to identify the products even if they are formed in the reactions. In other words these methods are more prone to false negatives. In contrast, solid-phase-mediated methods— because of their separation and purification steps—offer superior detection sensitivity and greater multiplexing capacity. Detection formats have significant impact on throughput and cost and are one of the important factors when considering SNP genotyping technologies. Dynamic allele-specific hybridization—DASH techniques In contrast to the end point measurement of steady state, dynamic hybridization measures denature kinetics of DNA duplexes formed by allele-specific probes and their targets.59 Since duplexes formed by the two alleles have different thermodynamics, they would have different Tm. In the process, the duplexes are heated gradually over a range of temperatures that cover the Tm of both alleles while multiple data points of fluorescence intensity are collected. The data create melting curves that are characteristic of each allele. Tm profiling characterizes the hybridization behavior of allele-specific probes better than a single-point measurement. The measurement of Tm does not require modification or fluorescence labeling of allele-specific probes, this makes the Homogenous Detection Mechanisms Fluorescence resonance energy transfer In recent years, there has been a great increase in the applications of fluorescence in the field of DNA testing or genotyping, of which FRET is one of the most popular approaches.62 FRET is a physical phenomenon that occurs between two fluorescent groups when they are in physical proximity and one fluorophore’s emission spectrum (the donor) overlaps the other’s (the acceptor) excitation spectrum. When FRET occurs the donor emission is quenched and the acceptor emission increases when the donor is excited (stoke shift), Figure 3a. Both the quenching of donor emission13,14,64,65 and the increasing of acceptor emission66–68 can be used to monitor FRET. www.nature.com/tpj SNP genotyping X Chen and PF Sullivan 84 For SNP genotyping, there are two different strategies that FRET can be used for detection. The first is based on the separation of two closely spaced fluorescent groups.13,14,38,69 The separation interrupts the energy transfer or greatly reduces its efficiency. The second is to stitch two fluorescent groups together from distant spaces to promote energy transfer between the two fluorophores.15,70 To apply FRET The Pharmacogenomics Journal principle for SNP typing, the reactions can be devised so that the formation of allele-specific products is correlated to the occurrence or disappearance of FRET phenomenon. Once a direct correlation is established between the reactions and FRET, monitoring of FRET during the allele discrimination reactions can be used to infer the formation of allele-specific products and thus to identify the genotypes of a sample. SNP genotyping X Chen and PF Sullivan 85 FRET has been successfully used to detect allele-specific products from SBE,63 ASE,38,71 DNA hybridization,13,14,72 DNA ligation15 and structure specific cleavage.10 It is the most popular mechanism for homogeneous detection. Several commercially available SNP genotyping technologies, such as the TaqMan 50 nuclease assay (http://www.appliedbiosystems.com), the Molecular Beacons (http:// www.molecular-beacons.org/), the Invader assay (http:// www.twt.com), the scorpion AS-PCR assay (http:// www.dxsgenotyping.com/technology_main.htm), and the AS-PCR assay (http://www.intergenco.com/snp.html) all use FRET as the detection method. Fluorescent polarization FP is another property that has been widely exploited for biological applications, although its application for genotyping was relatively recent event.35,73 When a fluorophore is excited by plane-polarized light, its emission remains polarized if the molecule is still and the angle between the exciting plane and the emitting plane is a function of the mass of the molecule when other parameters are kept constant. In reality, emissions are depolarized even if the molecules are excited with polarized light because the fluorophore molecules are in constant motion and molecular motion cannot be synchronized. What observed is a collective measurement of a population of molecules. The degree of depolarization is inversely proportional to the 3 Figure 3 Product detection by fluorescence. (a) FRET. FRET occurs when two fluorophores are in close proximity and one fluorophore’s (the donor) emission overlaps the other’s (the acceptor) excitation spectrum. When FRET occurs the donor’s emission decreases (quenched) whereas the acceptor’s emission increased (stock shift). FRET decays dramatically when the distance between the fluorophores increases, because it is inversely proportional to the sixth power (1/106) to the distance. Both the quenching of donor emission and the increasing of acceptor emission can be used to monitor FRET. There are many designs to use FRET for product detection, of which the TaqMan 50 nuclease assay, shown here, is one of them. Others are described in the text. (b) FP. FP is proportional to molecular volume when pressure, viscosity and other parameters are constant. If the size or molecular volume of the fluorophore changes the FP property of the fluorophore will change. In the SBE reaction shown here, the molecular volume of the dye terminators increases when the terminators are incorporated onto primers, this will lead to an increase of FP values of the fluorophores linked to the terminators. (c) Fluorescence intensity detection. Fluorescence intensities of allele-specific products can be measured directly if the products are separated or purified. Microarrays are one of the many commonly used media to sort and separate the products. A microarray, as shown, contains thousands or more addresses, each of them is anchored with a predefined DNA sequence. For each SNP, a sequence complementary to one of the predefined sequences is used to tag the SBE primer. SBE reactions for many SNPs can be pooled and products are hybridized to the microarray. Different SNPs are sorted by these anchored sequences to unique addresses. The measurement of fluorescence intensities at an address identifies genotypes for an SNP. mass of these molecules. Applying this principle to allele discrimination reactions, FP values (measurements of the degree of polarization) change as the reactions proceed as the substrates are turned into products. In the case of SBE, for example, the FP value of the fluorescent dye linked to a terminator increases (becomes less depolarized) as more dye molecules are converted from nucleotides into larger oligonucleotide molecules (20–30 times bigger for commonly used primers consisting of 20–30 bases), Figure 3b. Thus, the measurement of the FP value serves as an index for the allele discrimination reactions, which depend on the genotypes of DNA samples. Many factors can influence FP measurement, including temperature, viscosity and nonspecific interactions between the fluorescent dye and other buffer components or surfaces of the reaction vessel. So, it is important to maintain constant environments between experiments and samples. Deviation from these constants can lead to false-positive or false-negative results. The FP principle has been successfully applied to SBE,35 DNA hybridization,58,73 the Invader assay 61 and the TaqMan 50 -nuclease assay.74 Solid-phase-mediated Detection Methods The use of solid support in the detection step could potentially increase throughput and reduce cost significantly because it enables massive parallel detection and analysis. A common use of solid supporter is to provide a vehicle to sort and purify allele-specific products so that the detection of these products can be more sensitive and efficient. DNA microarrays and fluorescence coded microbeads are commonly used supporters and DNA hybridization is the mechanism to sort allele-specific products. When a product is purified and/or sorted to a supporter or an address, the measurement of fluorescence intensities and colors at the supporter or array address can be used for the identification of the product. Identification of products by mass spectrometry Mass spectrometry measures precise molecular weight of a molecule. Since allelic discrimination reactions always change the weight of a molecule, mass spectrometry can be used to identify products from a reaction.12,16–21,75 Among the four bases that form DNA molecules, the smallest difference is between the A and the T bases (about 9 Da) and discriminating an A/T polymorphism can be difficult. A modification of SBE using tagged ddNTPs could make the difference easier to detect.76 Another approach is to use a combination of ddNTPs and dNTPs for SBE reaction. The combination of the ddNTPs and dNTPs is made such that one allele stops at the polymorphic site, the other allele stops at one or few bases downstream. This makes the products different in size and, therefore, much easier to differentiate.77 MS detection is capable of multiplex. A simple way to multiplex is to use tagged extension primers. Extension primers with different tags shift their masses to different ranges, thus multiple SNPs can be detected simultaneously.21 Higher throughput is the selling point www.nature.com/tpj SNP genotyping X Chen and PF Sullivan 86 for mass spectrometry. To make the approach viable 3–5 multiplexing is expected otherwise its lengthy protocol would be inefficient. It is worth pointing out that MALDI-TOF MS is one of the best quantification methods for allele frequency estimation. As the throughput of individual sample genotyping imposes a serious limit for many studies, more and more people are looking for ways to reduce the number of genotyping. Genotyping pooled samples becomes attractive. For that purpose, evaluation of allele frequency estimation for different technologies becomes necessary. Multiple recent studies conclude that SBE combined with MS detection achieves highly accurate frequency estimates for pooled samples, the difference between the pooled and individually genotyped frequencies is 1–2%.78–82 Microarray as supporter for genotyping The most important feature of microarray is its high density. In a small area (1–1.5 cm2) hundreds of thousands of predefined sequences can be arrayed. These predefined sequences sort products from allele-specific reactions via DNA hybridization. When allele-specific reactions are coupled with fluorescence labeling, either on the primers or on the nucleotides, simple measurement of fluorescence intensities at addresses of a microarray identifies the products from the allele-specific reactions. Products from SBE, DNA ligation and structure-specific reactions can be engineered to use microarray supporter. A common practice is to attach a short sequence to the primers involved in the allele-specific reactions. The attached sequence is complementary to one of the sequences predefined in the array. For each marker there is a unique sequence. These marker-specific sequences or tags sort products from the allele-specific reactions to markerspecific addresses on the array. A leader in DNA chip manufacture, Affymetrix Inc. (http://www.affymetrix.com/ index.affx), has developed a universal DNA chip (GenFlext Tag Array) that features 2000 unique DNA sequences. To use this universal DNA chip one can, in principle, score 2000 SNPs with one hybridization given that one can do 2000 allele-specific reactions efficiently and effectively.23 Another way to use a microarray for genotyping is to array the entire target sequences on the chip using multiple overlapping oligonucleotides. Several large-scale SNP genotyping reports used this strategy. In essence, this approach resequences the target by hybridization.26,27,83 When the entire sequence of the target is resequenced, genotypes become clear. An advantage of this approach is that novel SNPs can be discovered and genotyped in the same process. The major limitation of this approach is its cost—all reports that use this approach came from high profile groups with industry sponsorship. The resequencing approach also suffers from low accuracy (80–90%) and it is normally not appropriate for linkage disequilibrium and association studies. The microarray approach is capable of high throughput, but it is limited to those applications that use relatively The Pharmacogenomics Journal small numbers of subjects and large numbers of SNPs. When a large number of subjects is used it is difficult to accommodate multiple subjects on a single array. Another practical barrier for using microarray for genotyping is how to generate efficiently a large number of allele-specific products before hybridizing to the microarray. Fluorescence coded microbeads as supporters Similar to DNA chips, well-characterized and unique DNA sequences could be attached to microbeads that feature unique fluorescence signatures.22,24,25,84 After allele-specific reactions products are hybridized to the microbeads. The hybridization links the products of an SNP to microbeads with a unique signature. The identification of the microbeads by their unique signatures could lead to the identification of allele-specific products hybridized to them, therefore, the identification of genotypes for the SNPs. Like microarray, color-coded microbeads can process a large number of SNP markers in parallel. In addition to that microbeads offer more flexibility that is not easily implemented in DNA chips. This is because each microbead is independent of other microbeads and can be manufactured and used separately. When 10 or 100 SNPs are processed, 10 or 100 types of microbeads are mixed with the products while 10 or 100 SNPs only use a small fraction of the features encoded in a DNA chip. The flexibility of color-coded microbeads offers more competitive cost than that of DNA chips. Separation of products by electrophoresis Electrophoresis has long been used to separate and identify DNA molecules in applications like DNA sequencing. In electrophoresis, DNA molecules are separated by size. For allele discrimination, the formation of allele-specific products always results in changes in probe sizes. When the products are labeled with fluorescence tags they can be readily separated and identified by electrophoresis.83,85 Tedious protocols and relatively high costs are main obstacles against the use of electrophoresis for genotyping. The implementation of capillary electrophoresis has reduced the labor required. However, until more SNPs can be packed into one lane/capillary the cost will remain a substantial barrier for broad use of electrophoresis for SNP genotyping. There are some commercial products, including the SNaPshot from Applied BioSystems and the SNuPe from Amershan Biosciences (http://www.amershambiosciences.com), which are developed primarily for SNP genotyping by using DNA sequencers. These products are convenient for those users who have access to sequencers. COST STRUCTURE OF SNP GENOTYPING The cost of genotyping has roughly three general components in addition to technician’s time: (i) Instrument and initial setup costs; (ii) fixed cost per marker and (iii) consumable costs proportional to the amount of usage. Technician time depends on the protocols. Protocols that SNP genotyping X Chen and PF Sullivan 87 are amenable to automation could potentially save greatly on labor cost, and most homogenous techniques have this potential. Instrumentation and initial setup costs are normally high for most technologies and the range is quite wide, somewhere from $25 000 to over $1 000 000. Since many techniques use fluorescence to label allele-specific products, they need instruments for fluorescence detection. This is by far the largest group of instruments used for SNP genotyping, including fluorescence microplate readers, microarray scanners, DNA sequencers and real-time fluorescence detection systems. The selection of instrument normally depends on the nature of the technology selected, which in turn depends on the specific application and throughput expectations. Dedicated systems usually are easier to learn and to use, and have better technical support and higher throughput. They may also possess automation features and include scoring algorithms/software. In contrast, a versatile system could potentially be shared by multiple groups and applications, but for each application the users have to spend substantial time and effort to make it work. The high cost in instrument investment makes it a significant factor when considering a new technique. Fixed costs for each SNP are specific to each technology. These include the number of oligonucleotides used for each SNP and special modification and handling of these oligonucleotides. For the number of oligonucleotides used, AS-PCR and primer extension methods are most efficient, using three oligonucleotides. AS-PCR uses all three oligonucleotides for PCR, while primer extension methods use two for PCR and one for extension. Hybridization methods normally use four oligonucleotides, one for each allele, and two for PCR. Some designs use more oligonucleotides for each allele to increase specificity and signal/noise ratio. Ligation based methods use three oligonucleotides in addition to two for PCR. Finally, the Invader technology uses 5–7 oligonucleotides depending on whether PCR is used or not. Of these oligonucleotides, three are specific for a given marker and are responsible for generating primary signal. Two oligonucleotides, one for each allele, are used for secondary signal amplification and these two oligonucleotides are universal for all markers. If PCR is used, two additional primers are needed. The more the oligonucleotides used for an SNP, the higher the fixed cost. However, if the number of samples to be genotyped for an SNP is sufficiently large, fixed cost for oligonucleotides becomes negligible. Another fixed cost item is the use of specialty reagents and processes, including special purification and/or modification of oligonucleotides. For example, mass spectrometry requires HPLC purification for extension primers; because of its high sensitivity, 1 and 2 oligonucleotides resulted from standard oligonucleotide synthesis could cause problems. Pyrosequencing uses a biotin-labeled primer to facilitate template preparation. The synthesis and purification of biotinylated oligonucleotides are much more expensive than regular oligonucleotide. The TaqMan 50 nuclease assay and the Molecular Beacons use two doubly-labeled fluorescence probes that are quite expensive ($200–400/ probe, compared to $20–25/probe for regular oligonucleotides). Many other methods use fluorescence labeling for oligonucleotides or terminators. Specialty reagents are also used to modify the oligonucleotide backbone to facilitate purification and separation of either PCR or extension products.12 These specialty reagents normally are specific to one or a few methods, and can increase cost remarkably. The requirement for specialty reagents is a potential weakness. Consumable costs cover enzymes and reagents used for the reactions, plastics for sample transfer and clean-up and microarrays and microbeads for sorting and purification, etc. Some of these items are specific to a method while others are common to many. For examples, the Invader technology and pyrosequencing use special enzymes for their allelic discrimination and detection reactions, but almost all SBE-based methods use dye-labeled ddNTPs and a special Taq DNA polymerase86 that incorporates the ddNTPs much more efficiently. Most vendors compare costs based on consumable costs proportional to the volume of use. However, such limited comparisons are inappropriate, and often misleading because consumables do not include fixed costs, the complexity of protocol, the hands-on time of technician and initial instrument investment. A fair comparison, we believe, would be the number of genotypes accomplished by an employee with a fixed budget and time. Applications and throughput structures should also be taken into account (see below). When these apply to users, they may have different options and need to consider existing instruments, the expertise and experience of their employees and find out if that are compatible with the new technique. As lack of expertise is a common cause seen for the failure to implement new technologies, appropriate training of employees should also be included in the overall cost. Understanding the protocols and cost structures of the technologies provides a base to compare and contrast the capacities and costs of different technologies, and to determine which technology might be most suitable for a given application. Generally speaking, prices for the technologies currently available on the market vary significantly, between $0.50–2.00/genotype for consumable reagents. At the low end of the price range, the template-directed dyeterminator incorporation assay with fluorescence detection (FP-TDI, see below) is roughly $0.50/genotype for reagents by list price. Some other methods, because of their requirement for special reagents and/or clean-up procedures, the price is over $2.00/genotype (pyrosequencing). MALDI-TOF mass spectrometry is about $0.75 without multiplexing. For the array-based systems, Affymetrix’s GenFlex costs about $500–650/chip, microbeads from Illumina (http://www.illumina.com/prod_genotyping.htm) need specialized instrument to read (the system costs more than $1 million), its contract service costs $0.15–0.2/ genotype with a starting job of $100 000. www.nature.com/tpj SNP genotyping X Chen and PF Sullivan 88 TECHNOLOGY, APPLICATION AND THROUGHPUT STRUCTURE Each technology, because of its protocol design and detection format, has certain constraints on throughput. From the throughput point of view, PCR amplification is a critical and limiting step. When PCR is used to amplify a single SNP, over 85% of SNPs will be successfully amplified and for each SNP 95% or more samples will be scored for genotypes. This seems acceptable for most applications. However, amplifying one SNP at a time would significantly limit the throughput and increase the cost. Some technologies hope to increase their throughput by multiplexing PCR. But there are problems. When multiple SNPs are amplified together, only 50–70% of the SNPs can be amplified successfully. Even for those SNPs that are amplified, the amount of products varies greatly, somewhere between 10 and 1000 folds. What makes the scenario even worse is that the calling rate for samples decreases dramatically—some SNPs that are scored for sample A may not be scored for sample B or C. As a result, many samples will not have genotype data. For applications that could not tolerate missing data, the strategy to use multiplex PCR to increase throughput would not work. The limitation imposed by PCR capacity applies to almost all technologies except the Invader technology. Even for the Invader technology, limited PCRs are still necessary for many SNPs. However the impacts of PCR limitation vary among different technologies. For most homogenous detection formats like FRET and FP, because their detections have very limited capacity of multiplex, they may not need to rely on multiplexing PCR to increase their throughput. Constraints imposed by multiplexing, therefore, are not important for these techniques. For other formats like mass spectrometry, electrophoresis and microarray hybridization, the capacity of PCR multiplex would greatly increase their throughput. In other words, these techniques are much more dependent on PCR multiplexing than others to reach their throughput potential. The cost structure has significant impact on throughput. For technologies with high fixed cost/SNP, the efficient way to increase throughput is to increase sample size such that the fixed cost can be shared by many samples. This requirement would limit the applications to which these technologies can apply. The TaqMan 50 nuclease assay, the Molecular Beacons and the Invader assay belong to this group. Microarrays have different cost structures that are conditioned on the effective usage of the features in the arrays. To reach the potential throughput, all addresses of an array should be used, otherwise it would be inefficient or too expensive. This approach requires relatively large and fixed number of SNPs because the physical addresses available of the arrays constrain flexibility of the number of SNPs. Different applications have different throughput structures. For example, in the clinical/diagnostic setting, relatively few SNP markers are to be genotyped, but the sample size is potentially very large. Under these conditions, the fixed cost and optimization for markers becomes trivial, because these costs could potentially be shared by many The Pharmacogenomics Journal samples. Methods like the TaqMan 50 nuclease assay,87–89 the Molecular Beacons90 and the Invader technology91 that have relatively high fixed cost and/or require extensive optimization fit the clinical/diagnostic setting very well. Other applications use large numbers of SNPs but relatively few samples. Here a microarray approach may be more reasonable.27,92 Finally, there are applications that require a relatively large number of both SNPs and samples (eg, both in the thousands or more). Examples include gene mapping studies by linkage or linkage disequilibrium for complex diseases and traits93 and pharmacogenetics.94. Neither of the above approaches is suited to these applications. This group of applications demands the most cost effective, high throughput and flexible technologies because for these applications the SNPs used, the number of SNPs and sample sizes vary greatly from study to study. Owing to the differences in requirements, the limitation of throughput is not exactly the same thing across applications. For some of these applications, a throughput of thousands of genotypes (samples markers) per day would be sufficient, and most techniques described in the section on popular methods are capable of delivering this throughput. Projects with genotyping needs from 104–105 genotypes would be realistic and could proceed with these technologies. Such applications include small association studies and fine mapping of chromosome loci for disease-susceptible genes. Projects that require 107 or more genotypes (ie whole genome linkage disequilibrium/association studies) would not be feasible without significant technological advances. CRITERIA FOR SELECTION OF TECHNOLOGIES With good understanding of protocols, allele discriminations, detection formats, cost and throughput structures, we would now like to discuss how to evaluate and select technologies that are appropriate for given applications. It should be kept in mind that high throughput SNP typing is a rapidly changing field and many competitive and compelling methods are under development and can be developed into mature and commercial products. The best technologies today are likely to become outdated in near future. General criteria for selecting a technology are accuracy/ sensitivity, success rates, flexibility, cost and throughput. Except for clinical applications, accuracy and sensitivity seldom are an issue, as the accuracy and sensitivity of all commercially available techniques have been tested and verified by many independent groups. For example, there are many reports for SBE with MS detection,12,16,17,20,76,95 SBE with microarray23 and microbeads96–99 detection, the FP-TDI,100–102 the TaqMan assay,64,88,103 pyrosequencing,104–106 the Invader assay46,107–109 and ligation-based methods.53,84,110,111 The issue becomes more salient when users intend to try out newly developed but not widely used methods. In that context, parallel and systematic tests with established methods should be performed to estimate the accuracy and sensitivity. Expected accuracy should be equal or better than 99%. For clinical applications each test, not just the technology, should go through clinical trials and be SNP genotyping X Chen and PF Sullivan 89 approved by appropriate authorities. The accuracy and sensitivity must meet the standards before a test is approved. Success rates are important indexes for high throughput applications. There are two aspects of success rates. One is the percentage of markers that can be genotyped by the technology. The acceptable rate is 80–85% or better. Factors that contribute to the success rate of markers are the nature of polymorphisms, allele discrimination and detection methods. Some mismatches are known to be difficult for certain allele discrimination, for example, G/T mismatch is difficult to be distinguished by hybridization and ASE,40 others may be difficult for a detection method, such the A/T polymorphism for MALDI-TOF MS detection.82,112 Local sequences where the target polymorphisms located could limit the choices of primer design this could result in nonspecific reactions or failures. AS-PCR, the Invader assay and the TaqMan 50 nuclease assay are more vulnerable than other methods because they all rely on the immediate local sequences to design allele-specific primers. Protocols that use two levels of specificity, namely one for target amplification and the other for allele discrimination, would minimize the risk of nonspecific reactions. The other success rate is the percentage of samples that can be genotyped, which is also referred to as calling rate or scoring rate. Despite the fact that this measurement is dependent on the quality of DNA samples, it relates to technologies for their tolerance of sample quality. It is an important criterion for high throughput applications because sample quality is likely to be different and in some cases it is difficult to control for large-scale, multigroup studies. If a method is too sensitive to sample quality, it can cause substantial missing data, which is a major concern for statistical analyses. A preferable rate would be 95% or better. From the user’s perspective, flexibility is an important issue. Users want a technology that can be used for multiple applications, and that is easy to learn, to design and to scaleup. Flexibility demands a method that is equally efficient and cost-effective whether it is applied to applications that require a small number of markers and a large number of samples or to applications that require a large number of markers and a small number of samples. Flexibility also means that the method can be scaled up with similar efficiency and cost and without further major investment. The importance of cost and throughput is obvious. For large-scale applications the cost affects the overall budget and throughput determines the time table. Only if both the cost and throughput are reasonable can these applications become viable. POPULAR METHODS In this section, we compare and contrast popular methods that have been used in both industry and academic laboratories. We do not intend to cover all techniques, more comprehensive coverage is available elsewhere.4–6 The methods listed here were selected based on their simplicity of principle and protocol, availability of commercial support and flexibility of setting up to small- and median-sized laboratories. At the end of this section, we summarize and compare the features of these methods in Table 1. We hope, through the discussion, that readers could have a balanced view of the techniques and make appropriate decisions for their applications. SBE with MS Detection SBE is a robust reaction and there are many platforms for products detection. The use of matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDITOF MS) is one of the platforms promoted by several companies (Sequenome Inc., http://www.sequenom.com/ genotyping/overview/oview.html, MWG Biotech http:// www.mwgbiotech.com/, and Bruker Doltonics, http:// www.bdal.de/genolink_components.html). In regard to protocol, all begin with PCR amplification, follow with the clean-up step and the SBE reaction. After SBE reactions, the samples have to be purified again and to be mixed with special organic reagents (matrix) and spotted onto a supporter for MALDI-TOF MS analysis. The differences between these companies are the ways they prepare DNA templates and perform SBE reactions. The variations include modifications made to either or both PCR and extension primers, the use of tagged ddNTPs or combinations of ddNTPs and dNTPs, and proprietary matrices that help crystallization of products for mass detection. These modifications can facilitate product purifications and increase multiplex capacity. For example, the use of a thiol backbone in one PCR primer can help preparing single-stranded DNA templates for primer extension reaction,12 and the use of mass-code modification for extension primers21 or ddNTPs76 could shift the mass of products to a different range so multiple extension products can be resolved. SBE is flexible and can be adapted to many different applications. SBE has very high success rate (495%). MS detection is highly sensitive, quantitative and capable of multiplexing. SBE with MS detection could deliver 10 000– 50 000 genotypes/day, with a consumable cost of $0.15– 0.75/genotype. To achieve this level of throughput, the method relies on multiplex PCR which requires substantial optimization and remains largely empirical. Currently, the multiplex level is 3–5 . The higher throughput of MS approach makes it more attractive to industry. Another benefit of MS detection is its quantification. Among the technologies tested for allele frequency estimation MS detection appears to be one of the bests.78–82 This feature becomes more and more important when people realize that the expected throughput of individual genotyping may not be easily materialized in near future. In addition to genotyping, the same equipment can be used for other high throughput applications such as proteomics and metabolite screen.113 This feature is very attractive for those users who work in both fields of functional genomics and proteomics. MS detection is capable of high throughput. To reach the potential multiplex PCR is a necessity, but many users may find it a significant challenge. The relatively higher cost of initial equipment setup, additional sample handling after SBE reactions and requirement of special handling of PCR www.nature.com/tpj SNP genotyping X Chen and PF Sullivan 90 Table 1 A summary of popular methods Method Strength SBE with MS detection High success rate; highly accurate; quantitative; capable of multiplexing SBE with fluorescence intensity detection High success rate; capable of multiplexing The FP-TDI Relatively low instrument cost; simple detection; easy to automate and scale-up Simple protocol; real time or end point scoring; easy to automate and scale-up The TaqMan 5’ nuclease assay Pyrosequencing Providing sequence context surrounding the polymorphism; real-time scoring; quantitative The invader assay No PCR for genotyping; simple protocol; isothermal reaction Weakness Relatively higher instrument cost and longer protocol; problematic multiplex PCR Relatively higher instrument cost and longer protocol; problematic multiplex PCR Limited throughput Very high fixed cost for each SNP, limited throughput; requires optimizations for each SNP Lengthy protocol; relatively expensive instrument and consumable cost; limited throughput Relatively large amount of genomic DNA; requires optimizations for each SNP; the key enzyme not available for general research use and/or extension primers are potential weakness of MS methods. SBE with Fluorescence Detection—the SNP Stream Genotyping System The SNP stream system is a method based on SBE with solidphase-mediated detection marketed originally by Orchid BioSciences.114 At the time this article was written, the business was sold to Beckman Coulter (http://www.beckman.com/). While those systems sold are still supported by Orchid BioSciences, the system will be marketed by Beckman Coulter. It relies on multiplex PCR to amplify targets from several SNPs and after the clean-up of PCR products, uniquely tagged extension primers for each of the SNPs are used to perform SBE with fluorescence dye labeled terminators. The unique tag for each SNP serves as a sorting mechanism to associate the extension products from the SNP to a specific supporter. The supporter can be a well in a microplate, an address in a microarray, or a specific microbead. SBE reactions label the products with allelespecific fluorescence and they are now ready to hybridize to The Pharmacogenomics Journal Cost, throughput and application Reference $0.15–0.75/genotype; 10–50k genotypes/day; most applications (12, 16–20, 76, 78–82, 95, 119, 120, 129, 130) $0.2–0.8/genotype; 20–100k genotypes/day; most applications (99, 114) $0.4–0.6/genotype; 1–5k genotypes/day; applications with o1 million genotypes Cost highly dependent on sample size; 1–5k genotypes/day; suitable for applications with fixed SNPs and large sample sizes $1–2/genotype; 1–2k genotypes/day; suitable for applications with pooled samples (32, 35, 100, 125) No cost information for general research; 1–5k genotypes/day; clinical applications (46, 61, 107–109, 133) (14, 64, 74, 80) (43, 80, 103, 105, 116, 117, 122, 131, 132) the supporters. A washing step is necessary to remove nonspecific hybridization and unreacted terminators. Fluorescence signals can be read out by appropriate systems, which, depending on the supporters used, can be a fluorescence plate reader, a flow cytometry, a microarray scanner or other imaging systems. Genotypes are scored by the fluorescence intensities and their ratios. The key feature of the system is the tagged extension primers. The tags can be made as a universal set. Based on the use of the tags, Orchid BioSciences recently developed ‘arrays in array’ microplates where each well of the plate is a small array with 16 addresses. This feature enables it to score up to 12 SNPs in a well. The system comes with software to design primers and to score genotypes, and some models also provide automation tools. The system, which costs between $250 000 and 500 000, can deliver 20 000–100 000 genotypes/day, and like mass spectrometry detection, its throughput is largely dependent on the successful implementation of multiplex PCR. Running costs are slightly higher than that of the MS detection because of the use of fluorescence terminators and tag arrays. The SNP genotyping X Chen and PF Sullivan 91 implementation of multiplex PCR, therefore, is critical to throughput and cost, without which it is arguably inferior to FP-TDI method which offers simpler protocol and detection. Fluorescence Polarization detection for Template-directed Dye-terminator Incorporation Assay This method uses SBE for allelic discrimination and fluorescence polarization for products detection.32,35 The method is marketed by Perkin-Elmer corporation (http:// www.nen.com/licensing/genotype.htm).The protocol consists of three sequential reactions—PCR, enzymatic clean-up or SAP/exonuclease I digestion and SBE. The reaction is normally performed on a black, skirted PCR plate, in either 96- or 384-well format. When one step is finished, only a single addition of a reaction mixture to the previous reaction is needed. There is no sample transferring and no purification between reactions. When all reactions are completed, the reaction plate can be loaded directly onto a FP reader and genotypes are scored for all samples in the plate in a few minutes. The protocol can be easily automated with simple liquid handling equipment. Like SBE with MS detection and the SNP stream system, FP-TDI is flexible and has high success rate. It can be adapted for many applications that require moderate amount of genotyping for 102–103 markers and 102–103 samples. The main difference between FP-TDI and the other two methods mentioned above is that FP-TDI does not multiplex PCR and it scores genotypes directly after SBE reaction without any manipulation, the other two methods require multiplexing PCR and need further processing after SBE reaction. If multiplexing PCR is not successful, the two methods would have similar throughput as FP-TDI, which is about 1000– 5000 genotypes/day. For FP-TDI the entire protocol can be viewed as a module, the best way to increase throughput is to increase the number of modules, which is essentially the capacity of singular PCR. The consumable cost is about $0.40–0.60/genotype. Initial instrument cost is relatively low compared with other methods, starting at $25 000. The TaqMan 50 Nuclease Assay The 50 nuclease assay is an elegant assay. It is a closed tube, single-step assay, and can score genotypes in real time or at the end of reaction. It combines target DNA amplification with allele discrimination in a single reaction. The allele discrimination relies on the hybridization of two doubly labeled allele-specific fluorescence probes to the target polymorphisms.14 When the allele-specific probe matches the target perfectly, they form a stable duplex structure. During DNA synthesis (PCR), the moving DNA polymerase will cleave (digest) the doubly labeled allele-specific probe. The cleavage of the probe changes the FRET status between the two fluorophores. This change of FRET can be monitored in real time or as end point assay. If the probe has one base mismatch (at the polymorphic site) with the target, the duplex structure formed would not be as stable as duplex formed by perfectly matched probe. When the DNA polymerase encounters such easily disrupted duplex, it does not cleave the probe, instead, it pushes the probe off the template. This action does not change the FRET status between the two fluorophores so no signals would be observed during the reaction. Since the difference between the two alleles is determined by the single-base mismatch, the design of the allele-specific probes is critical for the assay. Despite that the vendor PE Applied Biosystems worked out some rules to guide the design, it is still largely empirical and each new SNP has to go through substantial testing and optimization. The use of a minor groove binding agent47 has greatly improved its performance, and makes its optimization much easier. The assay is currently marketed by PE Applied BioSystems using FRET detection format. An FP detection version was also reported.74 From protocol point of view, the TaqMan assay is the best among the popular methods: it has only one setup step and does not require any postreaction handling for genotype scoring. Like other homogeneous assays, the TaqMan assay has limited capacity for multiplexing and, therefore, throughput is a bottleneck. The instruments that can be used for the TaqMan assay are available from several manufacturers in either 96- or 384-well plate format. The cost of the instruments is from $50 000 to over $150 000. The throughput capacity is about 1000–3000 genotypes/ day/machine when the assay is run in 384-well format and in real-time mode. When the assay is run in end point mode, which is a little more problematic, the throughput is substantially higher, about 5 000–10000 genotypes/day. The major weakness of the method is its fixed cost. The use of two doubly-labeled fluorescence probes (FRET format) can be very expensive (somewhere between $200 and 400/ probe) when the sample volume is not large enough to absorb the fixed cost. The running cost is cheap, basically the cost of PCR, about $0.15–0.35/genotype. For that reason applications that use large sample volume and fixed SNP markers, like those in clinical and diagnostic settings, could take advantage of the simplicity of its protocol. Pyrosequencing Pyrosequencing detects DNA sequences by synchronized primer extension (Figure 2c). The method is marketed by Pyrosequencing AB (http://www.pyrosequencing.com). The method designs extension primer a few bases upstream from the polymorphic site. During primer extension dNTPs are added one by one as dictated by the target sequences. If the incoming base matches the template the base would be added to the extending primer and a pyrophosphate would be produced. The pyrophosphate then triggers the synthesis of ATP, which in turn is used by a luciferase to produce a chemiluminescence signal.41–43,115 If the base added does not match the template, the primer would not be extended and the dNTP is then degraded into dNMP by an apyrase. Since DNA polymerases do not use dNMP, it would not interfere with subsequent reactions. The clever design of pyrosequencing is its careful balance of the enzyme activities between the DNA polymerases and the apyrases. Under the assay conditions, DNA polymerases are faster www.nature.com/tpj SNP genotyping X Chen and PF Sullivan 92 than the apyrases, so the polymerases always process the incoming nucleotide first, but this process is conditioned on the sequences of the template. So if the nucleotide matches the template, it will be incorporated onto the primer by the polymerases, otherwise, it will be left to the apyrases, which are phosphodiesterases that do not discriminate among the four bases and do not produce pyrophosphate. From the description of the mechanism, it would be understood that the amount of dNTPs added and the time window between two consecutive additions are important, they have to be taken into account the balance of the enzymes that compete to use them. Pyrosequencing is a unique technique that provides local sequences surrounding the polymorphic sites. It is a method other than DNA sequencing that can provide sequence contexts of target sites. The design of probes is straightforward, and the measurements are quantitative.116,117 However, the assay is an enzymatic cascade involving multiple enzymes which increase its cost and complexity of the protocol. Because not all enzymes involved are thermostable, heat cannot be used to denature the templates so the assay is run at room temperature. This requires preparation of single-stranded templates, a process that is time consuming and tedious. The low reaction temperature also increases the chances of nonspecific priming of the primers and the templates. To facilitate single-stranded template preparation the current protocol uses a biotinylated PCR primer, which is significantly more expensive than regular primers. The throughput is limited by PCR capacity and template preparation. The pyrosequencer supports 96-well plate format, the real-time detection takes about 20–30 min for a 96-well plate. The running cost of the methods is about $1.0–2.0/genotype and the instrument costs about $100 000. Invader Technology This technology is marketed by the Third Wave Technologies, Inc. (http://www.twt.com) The Invader technology relies on a structure-specific endonuclease activity found in DNA polymerases of some archael bacteria,45,118 and it claims to be the only technology that does not require PCR amplification. This feature is attractive for clinical applications because PCR license fees for clinical application are quite expensive. The company markets the core technology, that is, the biochemistry. The detection comes with multiple formats, products can be detected by gel electrophoresis, FRET,91 FP61 and MALDI-TOF MS.119,120 The company does not market a detection platform, but any equipment that is capable of gel electrophoresis, FRET, FP or MS detection can be used to score the results. The protocol for the Invader assay is simple. After an optional PCR amplification, a primary reaction mixture containing the two allele-specific probes and the invader probe is added to PCR products or genomic DNA and the samples are denatured and reannealed. Then a secondary reaction mixture containing the DNA polymerase and the secondary, signal amplifying universal oligos is added and the samples are incubated at a constant temperature The Pharmacogenomics Journal (60–701C) for 2–4 h. After that the reactions would be ready for gel, FRET or FP detection. For MS detection, the samples have to be purified and mixed with a matrix, then spotted on a support for measurement. The capability to genotype directly from genomic DNA is attractive for some applications that do not want to use PCR. When the assay is combined with limited PCR, it can be very robust and powerful. A recent report,121 that combines multiplex PCR and the Invader assay accomplished impressive throughput, seems support this view. In the report, 100 SNPs were multiplexed in PCR and followed with the Invader assay. The authors reported that they could score 96–98 SNPs out of the 100 SNPs that were multiplexed. As calculated, this could reach a throughput of 300–400k/day. The authors did not elaborate the 100 multiplex PCR. Since the samples were used for the Invader assay, we should not read too much into the capability of PCR multiplex, rather we would consider it a supporting evidence of superior sensitivity of the Invader technology. This is because the Invader technique is capable of genotyping directly from the genomic level, a slightly increase of the targeted fragments over other genomic region, say 10–100s folds, would significantly improve the performance of the assay. A multiplex PCR that produces products with a million-fold difference in concentration would be considered a failure, but that is sufficient to improve the performance of the Invader assay. The company currently supports only clinical applications and limited customer assay development. This excludes the technology from general research use because the company does not sell the key enzyme—cleavase separately. The requirements of many oligonucleotides for each SNP and special quality of the invader probe increase its fixed cost and make it problematic for high throughput applications. DISCUSSION The technologies for SNP genotyping have been improved rapidly in the last few years. Merging new technologies pose challenge for researchers who are not directly involved in technology development but need to use the technologies for their researches. In this article, we reviewed current technologies, focusing on biochemistries for allelic discrimination, mechanisms for product identification and detection, and structures of cost and throughput. By considering the principles and the strength and weakness of the technologies, we hope to provide essential information for selection of technologies appropriate for given applications. Technologies are tools to serve applications. Since researchers have different objectives for their applications appropriate genotyping methods may differ. In essence, all technologies can genotype accurately and effectively, the more salient differences among the technologies are instrument investment, cost and throughput structure and time and effort needed to train new users. In considering new technologies and platforms, we should first focus on the SNP genotyping X Chen and PF Sullivan 93 requirements of applications. For applications that require a few millions of genotypes the methods currently available are very limited, these are SBE with MS and fluorescence detection. Both methods have flexible design and can accommodate applications with large number of samples or large number of markers. Both methods are accurate and have good success rate. As discussed above, although the methods are capable of delivering such throughput, the actual throughput accomplished will be determined by the capability to perform multiplex PCR, which varies greatly among different research groups. The use of microarrays could potentially deliver such throughput, but this is restricted to the applications that use relatively smaller sample size. For applications that require less than a million genotypes, in addition to the two methods mentioned above, the FP-TDI, the TaqMan assay and pyrosequencing are competent for the job. While all of the second tier methods have good tracking records in terms of accuracy and success rate each of them has its unique features that attract certain researchers. FP-TDI has a straightforward design, low instrument cost and fixed cost, but its protocol is not as elegant as the TaqMan assay and its quantification feathers are not as good as that of pyrosequencing. For the Taqman assay, its protocol could be considered the best, but at the expense of higher instrument and fixed costs. While pyrosequencing has a lengthy protocol and expensive running cost, it is one of the bests (the other one is MS detection) for allele frequency quantification,80,116,117 so it can be justified when pooled samples are used for genotyping. The quantification feature of the technologies merits some discussions. As more and more researchers evaluate pooled samples genotyping for association and linkage disequilibrium studies, the quantification feature of the technologies is likely to become an important factor for consideration of new technologies. Of the technologies available today, there are many studies that aim at evaluating the accuracy of allele frequency determination have been published. Examples, in addition to MS detection and pyrosequencing mentioned above, include AS-PCR,123 the TaqMan 50 nuclease assay,80 SBE with DHPLC analysis,122 and SBE with electrophoresis,80,124 and the FP-TDI assay.125 For most technologies, the allele frequencies estimated from pooled samples deviate 1–5% from that of individually typed samples. The implication of this error rate to a study depends on study design, the risk factor of the disease and pooling strategies. Some researchers are optimistic about genotyping pooled samples.126–128 In addition to instrument, cost and throughput, time and efforts needed to train new users should also be considered. From this point of view the experience and expertise of trainees and the technical support from the company are important. It is not uncommon for the same technique to have different outcomes when it is implemented by different researchers. Under some circumstances, hiring a new person with appropriate expertise is more efficient than training existing staff. Adequate and timely supports from the vendor (eg, hands-on demonstration and assistance with trouble-shooting protocols) can make the learning and implementation easier and more effective. In conclusion, understanding of the biochemistries and protocols of genotyping helps users to identify techniques that are appropriate for them. While there may be several techniques that can suit the applications, the selection of a technique has to weigh factors of instrument, cost, throughput, technical support and staff expertise. A successful implementation of a new technology needs combined efforts from the user and the vendor. DUALITY OF INTEREST XC is one of the inventors of the FP-TDI technique and has a small interest from sales of the technique by Perkin-Elmer Corporation. ABBREVIATIONS SNP SBE ASE AS-PCR FRET FP MALDI MS single nucleotide polymorphism single base extension allele-specific extension allele-specific PCR fluorescence resonance energy transfer fluorescence polarization matrix-assisted laser desorption/ionization mass spectrometry REFERENCES 1 Kruglyak L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet 1999; 22: 139–144. 2 Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001; 409: 928–933. 3 Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG et al. The sequence of the human genome. Science 2001; 291: 1304–1351. 4 Syvanen AC. Accessing genetic variation: genotyping single nucleotide polymorphisms. Nat Rev Genet 2001; 2: 930–942. 5 Kwok PY. Methods for genotyping single nucleotide polymorphisms. Annu Rev Genom Hum Genet 2001; 2: 235–258. 6 Shi MM. Enabling large-scale pharmacogenetic studies by highthroughput mutation detection and genotyping technologies. Clin Chem 2001; 47: 164–172. 7 Kirk BW, Feinsod M, Favis R, Kliman RM, Barany F. Single nucleotide polymorphism seeking long term association with complex disease. Nucleic Acids Res 2002; 30: 3295–3311. 8 Qi X, Bakht S, Devos KM, Gale MD, Osbourn A. L-RCA (ligation-rolling circle amplification): a general method for genotyping of single nucleotide polymorphisms (SNPs). Nucleic Acids Res 2001; 29: E116. 9 Wu DY, Wallace RB. The ligation amplification reaction (LAR)— amplification of specific DNA sequences using sequential rounds of template-dependent ligation. Genomics 1989; 4: 560–569. 10 Fors L, Lieder KW, Vavra SH, Kwiatkowski RW. Large-scale SNP scoring from unamplified genomic DNA. Pharmacogenomics 2000; 1: 219–229. 11 Nikiforov TT, Rendle RB, Goelet P, Rogers YH, Kotewicz ML, Anderson S et al. Genetic Bit Analysis: a solid phase method for typing single nucleotide polymorphisms. Nucleic Acids Res 1994; 22: 4167–4175. 12 Sauer S, Lechner D, Berlin K, Plancon C, Heuermann A, Lehrach H et al. Full flexibility genotyping of single nucleotide polymorphisms by the GOOD assay. Nucleic Acids Res 2000; 28: E100. 13 Tyagi S, Bratu DP, Kramer FR. Multicolor molecular beacons for allele discrimination. Nat Biotechnol 1998; 16: 49–53. 14 Livak KJ. Allelic discrimination using fluorogenic probes and the 50 nuclease assay. Genet Anal 1999; 14: 143–149. 15 Chen X, Livak KJ, Kwok PY. A homogeneous, ligase-mediated DNA diagnostic test. Genome Res 1998; 8: 549–556. www.nature.com/tpj SNP genotyping X Chen and PF Sullivan 94 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Bray MS, Boerwinkle E, Doris PA. High-throughput multiplex SNP genotyping with MALDI-TOF mass spectrometry: practice, problems and promise. Hum Mutat 2001; 17: 296–304. Sun X, Ding H, Hung K, Guo B. A new MALDI-TOF based minisequencing assay for genotyping of SNPS. Nucleic Acids Res 2000; 28: E68. Abdi F, Bradbury EM, Doggett N, Chen X. Rapid characterization of DNA oligomers and genotyping of single nucleotide polymorphism using nucleotide-specific mass tags. Nucleic Acids Res 2001; 29: E61–E61. Chen X, Fei Z, Smith LM, Bradbury EM, Majidi V. Stable-isotopeassisted MALDI-TOF mass spectrometry for accurate determination of nucleotide compositions of PCR products. Anal Chem 1999; 71: 3118–3125. Griffin TJ, Smith LM. Single-nucleotide polymorphism analysis by MALDI-TOF mass spectrometry. Trends Biotechnol 2000; 18: 77–84. Ross P, Hall L, Smirnov I, Haff L. High level multiplex genotyping by MALDI-TOF mass spectrometry. Nat Biotechnol 1998; 16: 1347–1351. Ye F, Li MS, Taylor JD, Nguyen Q, Colton HM, Casey WM et al. Fluorescent microsphere-based readout technology for multiplexed human single nucleotide polymorphism analysis and bacterial identification. Hum Mutat 2001; 17: 305–316. Fan JB, Chen X, Halushka MK, Berno A, Huang X, Ryder T et al. Parallel genotyping of human SNPs using generic high-density oligonucleotide tag arrays. Genome Res 2000; 10: 853–860. Ferguson JA, Steemers FJ, Walt DR. High-density fiber-optic DNA random microsphere array. Anal Chem 2000; 72: 5618–5624. Walt DR. Techview: molecular biology. Bead-based fiber-optic arrays. Science 2000; 287: 451–452. Hacia JG, Fan JB, Ryder O, Jin L, Edgemon K, Ghandour G et al. Determination of ancestral alleles for human single-nucleotide polymorphisms using high-density oligonucleotide arrays. Nat Genet 1999; 22: 164–167. Wang DG, Fan JB, Siao CJ, Berno A, Young P, Sapolsky R et al. Largescale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 1998; 280: 1077 –1082. Gilles PN, Wu DJ, Foster CB, Dillon PJ, Chanock SJ. Single nucleotide polymorphic discrimination by an electronic dot blot assay on semiconductor microchips. Nat Biotechnol 1999; 17: 365–370. Liu Q, Sommer SS. Pyrophosphorolysis-activatable oligonucleotides may facilitate detection of rare alleles, mutation scanning and analysis of chromatin structures. Nucleic Acids Res 2002; 30: 598–604. Liu Q, Sommer SS. Pyrophosphorolysis-activated polymerization (PAP): application to allele-specific amplification. Biotechniques 2000; 29: 1072–1080. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 1977; 74: 5463–5467. Hsu TM, Chen X, Duan S, Miller RD, Kwok PY. Universal SNP genotyping assay with fluorescence polarization detection. Biotechniques 2001; 31: 560–568. Syvanen AC. Solid-phase minisequencing as a tool to detect DNA polymorphism. Methods Mol Biol 1998; 98: 291–298. Pastinen T, Partanen J, Syvanen AC. Multiplex, fluorescent, solid-phase minisequencing for efficient screening of DNA sequence variation. Clin Chem 1996; 42: 1391–1397. Chen X, Levine L, Kwok PY. Fluorescence polarization in homogeneous nucleic acid analysis. Genome Res 1999; 9: 492–498. Lindroos K, Liljedahl U, Raitio M, Syvanen AC. Minisequencing on oligonucleotide microarrays: comparison of immobilisation chemistries. Nucleic Acids Res 2001; 29: E69. Pastinen T, Raitio M, Lindroos K, Tainola P, Peltonen L, Syvanen AC. A system for specific, high-throughput genotyping by allelespecific primer extension on microarrays. Genome Res 2000; 10: 1031–1042. Myakishev MV, Khripin Y, Hu S, Hamer DH. High-throughput SNP genotyping by allele-specific PCR with universal energy-transferlabeled primers. Genome Res 2001; 11: 163–169. Germer S, Higuchi R. Single-tube genotyping without oligonucleotide probes. Genome Res 1999; 9: 72–78. The Pharmacogenomics Journal 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 Ayyadevara S, Thaden JJ, Shmookler Reis RJ. Discrimination of primer 30 -nucleotide mismatch by taq DNA polymerase during polymerase chain reaction. Anal Biochem 2000; 284: 11–18. Ronaghi M, Karamohamed S, Pettersson B, Uhlen M, Nyren P. Realtime DNA sequencing using detection of pyrophosphate release. Anal Biochem 1996; 242: 84–89. Ronaghi M, Uhlen M, Nyren P. A sequencing method based on realtime pyrophosphate. Science 1998; 281: 363–365. Ronaghi M. Pyrosequencing sheds light on DNA sequencing. Genome Res 2001; 11: 3–11. Dong F, Allawi HT, Anderson T, Neri BP, Lyamichev VI. Secondary structure prediction and structure-specific sequence analysis of singlestranded DNA. Nucleic Acids Res 2001; 29: 3248–3257. Kaiser MW, Lyamicheva N, Ma W, Miller C, Neri B, Fors L et al. A comparison of eubacterial and archaeal structure-specific 50 -exonucleases. J Biol Chem 1999; 274: 21387–21394. Mein CA, Barratt BJ, Dunn MG, Siegmund T , Smith AN, Esposito L et al. Evaluation of single nucleotide polymorphism typing with invader on PCR amplicons and its automation. Genome Res 2000; 10: 330–343. Walburger DK, Afonina IA, Wydro R. An improved real time PCR method for simultaneous detection of C282Y and H63D mutations in the HFE gene associated with hereditary hemochromatosis. Mutat Res 2001; 432: 69–78. de Kok JB, Wiegerinck ET, Giesendorf BA, Swinkels DW. Rapid genotyping of single nucleotide polymorphisms using novel minor groove binding DNA oligonucleotides (MGB probes). Hum Mutat 2002; 19: 554–559. Tong J, Barany F, Cao W. Ligation reaction specificities of an NAD(+)dependent DNA ligase from the hyperthermophile Aquifex aeolicus. Nucleic Acids Res 2000; 28: 1447–1454. Tong J, Cao W, Barany F. Biochemical properties of a high fidelity DNA ligase from Thermus species AK16D. Nucleic Acids Res 1999; 27: 788–794. Sriskanda V, Shuman S. Specificity and fidelity of strand joining by Chlorella virus DNA ligase. Nucleic Acids Res 1998; 26: 3536–3541. Nilsson M, Antson DO, Barbany G, Landegren U. RNA-templated DNA ligation for transcript analysis. Nucleic Acids Res 2001; 29: 578–581. Pickering J, Bamford A, Godbole V, Briggs J, Scozzafava G, Roe P et al. Integration of DNA ligation and rolling circle amplification for the homogeneous, end-point detection of single nucleotide polymorphisms. Nucleic Acids Res 2002; 30: E60. Thomas DC, Nardone GA, Randall SK. Amplification of padlock probes for DNA diagnostics by cascade rolling circle amplification or the polymerase chain reaction. Arch Pathol Lab Med 1999; 123: 1170–1176. Nilsson M, Barbany G, Antson DO, Gertow K, Landegren U. Enhanced detection and distinction of RNA by enzymatic probe ligation. Nat Biotechnol 2000; 18: 791–793. Cutler DJ, Zwick ME, Carrasquillo MM, Yohn CT, Tobin KP, Kashuk C et al. High-throughput variation detection and genotyping using microarrays. Genome Res 2001; 11: 1913–1925. Jacobsen N, Bentzen J, Meldgaard M, Jakobsen MH, Fenger M, Kauppinen S et al. LNA-enhanced detection of single nucleotide polymorphisms in the apolipoprotein E. Nucleic Acids Res 2002; 30: E100. Simeonov A, Nikiforov TT. Single nucleotide polymorphism genotyping using short, fluorescently labeled locked nucleic acid (LNA) probes and fluorescence polarization detection. Nucleic Acids Res 2002; 30: E91. Prince JA, Feuk L, Howell WM, Jobs M, Emahazion T, Blennow K et al. Robust and accurate single nucleotide polymorphism genotyping by dynamic allele-specific hybridization (DASH): design criteria and assay validation. Genome Res 2001; 11: 152–162. Chen X, Kwok PY. Homogeneous genotyping assays for single nucleotide polymorphisms with fluorescence resonance energy transfer detection. Genet Anal 1999; 14: 157–163. Hsu TM, Law SM, Duan S, Neri BP, Kwok PY. Genotyping singlenucleotide polymorphisms by the invader assay with dual-color fluorescence polarization detection. Clin Chem 2001; 47: 1373–1377. De Angelis DA. Why FRET over genomics? Physiol Genom 1999; 1: 93–99. SNP genotyping X Chen and PF Sullivan 95 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 Chen X, Zehnbauer B, Gnirke A, Kwok PY. Fluorescence energy transfer detection as a homogeneous DNA diagnostic method. Proc Natl Acad Sci USA 1997; 94: 10756–10761. Tapp I, Malmberg L, Rennel E, Wik M, Syvanen AC. Homogeneous scoring of single-nucleotide polymorphisms: comparison of the 50 nuclease TaqMan assay and Molecular Beacon probes. Biotechniques 2000; 28: 732–738. Dubertret B, Calame M, Libchaber AJ. Single-mismatch detection using gold-quenched fluorescent oligonucleotides. Nat Biotechnol 2001; 19: 365–370. Tong AK, Li Z, Jones GS, Russo JJ, Ju J. Combinatorial fluorescence energy transfer tags for multiplex biological assays. Nat Biotechnol 2001; 19: 756–759. Ju J, Glazer AN, Mathies RA. Energy transfer primers: a new fluorescence labeling paradigm for DNA sequencing and analysis. Nat Med 1996; 2: 246–249. Ju J, Ruan C, Fuller CW, Glazer AN, Mathies RA. Fluorescence energy transfer dye-labeled primers for DNA sequencing and analysis. Proc Natl Acad Sci USA 1995; 92: 4347–4351. Wilkins SP, Hall JG, Lyamichev V, Neri BP, Lu M, Wang L et al. Analysis of single nucleotide polymorphisms with solid phase invasive cleavage reactions. Nucleic Acids Res 2001; 29: E77. Chen X, Kwok PY. Template-directed dye-terminator incorporation (TDI) assay: a homogeneous DNA diagnostic method based on fluorescence resonance energy transfer. Nucleic Acids Res 1997; 25: 347–353. Whitcombe D, Theaker J, Guy SP, Brown T, Little S. Detection of PCR products using self-probing amplicons and fluorescence. Nat Biotechnol 1999; 17: 804–807. Solinas A, Brown LJ, McKeen C, Mellor JM,Nicol J, Thelwell N et al. Duplex Scorpion primers in SNP analysis and FRET applications. Nucleic Acids Res 2001; 29: E96. Gibson NJ, Gillard HL, Whitcombe D, Ferrie RM, Newton CR, Little S. A homogeneous method for genotyping with fluorescence polarization. Clin Chem 1997; 43: 1336–1341. Latif S, Bauer-Sardina I, Ranade K, Livak KJ, Kwok PY. Fluorescence polarization in homogeneous nucleic acid analysis II: 50 -nuclease assay. Genome Res 2001; 11: 436–440. Pusch W, Wurmbach JH, Thiele H, Kostrzewa M. MALDI-TOF mass spectrometry-based SNP genotyping. Pharmacogenomics 2002; 3: 537–548. Fei Z, Ono T, Smith LM. MALDI-TOF mass spectrometric typing of single nucleotide polymorphisms with mass-tagged ddNTPs. Nucleic Acids Res 1998; 26: 2827–2828. Pusch W, Kraeuter KO, Froehlich T, Stalgies Y, Kostrzewa M. Genotools SNP manager: a new software for automated high-throughput MALDITOF mass spectrometry SNP genotyping. Biotechniques 2001; 30: 210–215. Werner M, Sych M, Herbon N, Illig T, Konig IR, Wjst M. Large-scale determination of SNP allele frequencies in DNA pools using MALDITOF mass spectrometry. Hum Mutat 2002; 20: 57–64. Ross P, Hall L, Haff LA. Quantitative approach to single-nucleotide polymorphism analysis using MALDI-TOF mass spectrometry. Biotechniques 2000; 29: 620–629. Shifman S, Pisante-Shalom A, Yakir B, Darvasi A. Quantitative technologies for allele frequency estimation of SNPs in DNA pools. Mol Cell Probes 2002; 16: 429–434. Buetow KH, Edmonson M, MacDonald R, Clifford R, Yip P, Kelley J et al. High-throughput development and characterization of a genomewide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Proc Natl Acad Sci USA 2001; 98: 581–584. Mohlke KL, Erdos MR, Scott LJ, Fingerlin TE, Jackson AU, Silander K et al. High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools. Proc Natl Acad Sci USA 2002; 99: 16928–16933. Lindblad-Toh K, Winchester E, Daly MJ, Wang DG, Hirschhorn JN, Laviolette JP et al. Large-scale discovery and genotyping of singlenucleotide polymorphisms in the mouse. Nat Genet 2000; 24: 381–386. Iannone MA, Taylor JD, Chen J, Li MS, Rivers P, Slentz-Kesler KA et al. Multiplexed single nucleotide polymorphism genotyping by 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 oligonucleotide ligation and flow cytometry. Cytometry 2000; 39: 131–140. Medintz I, Wong WW, Berti L, Shiow L, Tom J, Scherer J et al. Highperformance multiplex SNP analysis of three hemochromatosis-related mutations with capillary array electrophoresis microplates. Genome Res 2001; 11: 413–421. Tabor S, Richardson CC. A single residue in DNA polymerases of the Escherichia coli DNA polymerase I family is critical for distinguishing between deoxy- and dideoxyribonucleotides. Proc Natl Acad Sci USA 1995; 92: 6339–6343. Ward CP, Fensom AH, Green PM. Biallelic discrimination assays for the three common Ashkenazi Jewish mutations and a common non-Jewish mutation, in Tay–Sachs disease, using fluorogenic TaqMan probes. Genet Test 2000; 4: 351–358. Restagno G, Gomez AM, Sbaiz L, De Gobbi M, Roetto A, Bertino E et al. A pilot C282Y hemochromatosis screening in Italian newborns by TaqMan technology. Genet Test 2000; 4: 177–181. Matsubara Y, Fujii K, Rinaldo P, Narisawa K. A fluorogenic allelespecific amplification method for DNA-based screening for inherited metabolic disorders. Acta Paediatr Suppl 1999; 88: 65–68. Giesendorf BA, Vet JA, Tyagi S, Mensink EJ, Trijbels FJ, Blom HJ. Molecular beacons: a new approach for semiautomated mutation analysis. Clin Chem 1998; 44: 482–486. Ledford M, Friedman KD, Hessner MJ, Moehlenkamp C, Williams TM, Larson RS. A multi-site study for detection of the factor V (Leiden) mutation from genomic DNA using a homogeneous invader microtiter plate fluorescence resonance energy transfer (FRET) assay. J Mol Diagn 2000; 2: 97–104. Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR et al. Blocks of limited haplotype diversity revealed by highresolution scanning of human chromosome 21. Science 2001; 294: 1719–1723. Kwok PY. GENOMICS: genetic association by whole-genome analysis? Science 2001; 294: 1669–1670. Kwok PY. High-throughput genotyping assay approaches. Pharmacogenomics 2000; 1: 95–100. Paracchini S, Arredi B, Chalk R, Tyler-Smith C. Hierarchical highthroughput SNP genotyping of the human Y chromosome using MALDI-TOF mass spectrometry. Nucleic Acids Res 2002; 30: E27. Taylor JD, Briley D, Nguyen Q, Long K, Iannone MA, Li MS et al. Flow cytometric platform for high-throughput single nucleotide polymorphism analysis. Biotechniques 2001; 30: 661–669. Cai H, White PS, Torney D, Deshpande A , Wang Z, Keller RA et al. Flow cytometry-based minisequencing: a new platform for highthroughput single-nucleotide polymorphism scoring. Genomics 2000; 66: 135–143. Chen J, Iannone MA, Li MS, Taylor JD, Rivers P, Nelsen AJ et al. A microsphere-based assay for multiplexed single nucleotide polymorphism analysis using single base chain extension. Genome Res 2000; 10: 549–557. Armstrong B, Stewart M, Mazumder A. Suspension arrays for high throughput, multiplexed single nucleotide polymorphism genotyping. Cytometry 2000; 40: 102–108. Akula N, Chen YS, Hennessy K, Schulze TG, Singh G, McMahon FJ. Utility and accuracy of template-directed dye-terminator incorporation with fluorescence-polarization detection for genotyping single nucleotide polymorphisms. Biotechniques 2002; 32: 1072–1076, 1078. Swan KA, Curtis DE, McKusick KB, Voinov AV, Mapa FA, Cancilla MR. High-throughput gene mapping in Caenorhabditis elegans. Genome Res 2002; 12: 1100–1105. Straub RE, Jiang Y, MacLean CJ, Ma Y, Webb BT, Myakishev MV et al. Genetic variation in the 6p22.3 gene DTNBP1, the human ortholog of the mouse dysbindin gene, is associated with schizophrenia. Am J Hum Genet 2002; 71: 337–348. Nordfors L, Jansson M, Sandberg G, Lavebratt C, Sengul S, Schalling M et al. Large-scale genotyping of single nucleotide polymorphisms by Pyrosequencingtrade mark and validation against the 50 nuclease (Taqman((R))) assay. Hum Mutat 2002; 19: 395–401. Ringquist S, Alexander AM, Rudert WA, Styche A, Trucco M. Pyrosequence-based typing of alleles of the HLA-DQB1 gene. Biotechniques 2002; 33: 166–175. www.nature.com/tpj SNP genotyping X Chen and PF Sullivan 96 105 106 107 108 109 110 111 112 113 114 115 116 117 118 Berg LM, Sanders R, Alderborn A. Pyrosequencing technology and the need for versatile solutions in molecular clinical research. Expert Rev Mol Diagn 2002; 2: 361–369. Gharizadeh B, Kalantari M, Garcia CA, Johansson B, Nyren P. Typing of human papillomavirus by pyrosequencing. Lab Invest 2001; 81: 673–679. Hessner MJ, Friedman KD, Voelkerding KV, Huber S, Ryan D, Nuccie B et al. Multisite study for genotyping of the factor II (prothrombin) G20210A mutation by the invader assay. Clin Chem 2001; 47: 2048–2050. Agarwal P, Oldenburg MC, Czarneski JE, Morse RM, Hameed MR, Cohen S et al. Comparison study for identifying promoter allelic polymorphism in interleukin 10 and tumor necrosis factor alpha genes. Diagn Mol Pathol 2000; 9: 158–164. Ryan D, Nuccie B, Arvan D. Non-PCR-dependent detection of the factor V Leiden mutation from genomic DNA using a homogeneous invader microtiter plate assay. Mol Diagn 1999; 4: 135–144. Baron H, Fung S, Aydin A, Bahring S, Jeschke E, Luft FC et al. Oligonucleotide ligation assay for detection of apolipoprotein E polymorphisms. Clin Chem 1997; 43: 1984–1986. Edelstein RE, Nickerson DA, Tobe VO, Manns-Arcuino LA, Frenkel LM. Oligonucleotide ligation assay for detecting mutations in the human immunodeficiency virus type 1 pol gene that are associated with resistance to zidovudine, didanosine, and lamivudine. J Clin Microbiol 1998; 36: 569–572. Shahgholi M, Garcia BA, Chiu NH, Heaney PJ, Tang K. Sugar additives for MALDI matrices improve signal allowing the smallest nucleotide change (A:T) in a DNA sequence to be resolved. Nucleic Acids Res 2001; 29: E91. Mann M, Hendrickson RC, Pandey A. Analysis of proteins and proteomes by mass spectrometry. Annu Rev Biochem 2001; 70: 437–473. Bell PA, Chaturvedi S, Gelfand CA, Huang CY, Kochersperger M, Kopla R et al. SNPstream UHT: ultra-high throughput SNP genotyping for pharmacogenomics and drug discovery. Biotechniques 2002; 3: 70–77. Nyren P, Karamohamed S, Ronaghi M. Detection of single-base changes using a bioluminometric primer extension assay. Anal Biochem 1997; 244: 367–373. Gruber JD, Colligan PB, Wolford JK. Estimation of single nucleotide polymorphism allele frequency in DNA pools by using pyrosequencing. Hum Genet 2002; 110: 395–401. Wasson J, Skolnick G, Love-Gregory L, Permutt MA. Assessing allele frequencies of single nucleotide polymorphisms in DNA pools by pyrosequencing technology. Biotechniques 2002; 32: 1144–1150. Lyamichev V, Brow MA, Varvel VE, Dahlberg JE. Comparison of the 50 nuclease activities of taq DNA polymerase and its isolated nuclease domain. Proc Natl Acad Sci USA 1999; 96: 6143–6148. The Pharmacogenomics Journal 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 Griffin TJ, Smith LM. Genetic identification by mass spectrometric analysis of single- nucleotide polymorphisms: ternary encoding of genotypes. Anal Chem 2000; 72: 3298–3302. Griffin TJ, Hall JG, Prudent JR, Smith LM. Direct genetic analysis by matrix-assisted laser desorption/ionization mass spectrometry. Proc Natl Acad Sci USA 1999; 96: 6301–6306. Ohnishi Y, Tanaka T, Ozaki K, Yamada R, Suzuki H, Nakamura Y. A high-throughput SNP typing system for genome-wide association studies. J Hum Genet 2001; 46: 471–477. Giordano M, Mellai M, Hoogendoorn B, Momigliano-Richiardi P. Determination of SNP allele frequencies in pooled DNAs by primer extension genotyping and denaturing high-performance liquid chromatography. J Biochem Biophys Methods 2001; 47: 101–110. Germer S, Holland MJ, Higuchi R. High-throughput SNP allelefrequency determination in pooled DNA samples by kinetic PCR. Genome Res 2000; 10: 258–266. Le Hellard S, Ballereau SJ, Visscher PM, Torrance HS, Pinson J, Morris SW et al. SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis. Nucleic Acids Res 2002; 30: E74. Xiao M, Latif SM, Kwok PY. Kinetic FP-TDI assay for SNP allele frequency determination. Biotechniques 2003; 34: 190–197. Pfeiffer RM, Rutter JL, Gail MH, Struewing J, Gastwirth JL. Efficiency of DNA pooling to estimate joint allele frequencies and measure linkage disequilibrium. Genet Epidemiol 2002; 22: 94–102. Barratt BJ, Payne F, Rance HE, Nutland S, Todd JA, Clayton DG. Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Ann Hum Genet 2002; 66: 393–405. Bansal A, van den BD, Kammerer S, Honisch C, Adam G, Cantor CR et al. Association testing by DNA pooling: an effective initial screen. Proc Natl Acad Sci USA 2002; 99: 16871–16874. Fei Z, Smith LM. Analysis of single nucleotide polymorphisms by primer extension and matrix-assisted laser desorption/ionization timeof-flight mass spectrometry. Rapid Commun Mass Spectrom 2000; 14: 950–959. Stoerker J, Mayo JD, Tetzlaff CN, Sarracino DA, Schwope I, Richert C. Rapid genotyping by MALDI-monitored nuclease selection from probe libraries. Nat Biotechnol 2000; 18: 1213–1216. Nordstrom T, Alderborn A, Nyren P. Method for one-step preparation of double-stranded DNA template applicable for use with pyrosequencing technology. J Biochem Biophys Methods 2002; 52: 71–82. Nordstrom T, Nourizad K, Ronaghi M, Nyren P. Method enabling pyrosequencing on double-stranded DNA. Anal Biochem 2000; 282: 186–193. Kwiatkowski RW, Lyamichev V, de Arruda M, Neri B. Clinical, genetic, and pharmacogenetic applications of the Invader assay. Mol Diagn 1999; 4: 353–364.
© Copyright 2026 Paperzz