MAGICBOX - Automated primer assay design for haploid, diploid and polyploid species. Jonathan Curry, Markus Kietzmann, Steve Smith and Susan Kirby P0367 LGC - Genomics Division, Hoddesdon UK; [email protected] MAGICBOX Example Assay Improvement Introduction With any PCR-based technique, the need for sequence specificity is paramount to obtaining good, accurate and useable data. Accuracy of PCR primer design is key and dependant upon the state of genomic reference sequence information available for the target species. This information is needed to select target sequence that is either totally unique for the length of the primer, or different enough to other similar, but non-target sequences that so that off-target primer binding is negligible. Where PCR targets are highly similar to other regions on the genome, and specificity is an issue, non-specific binding can be reduced further with optimisation of cycling conditions or buffer components such as Mg2+ or with additives such as DMSO. For large genotyping projects this is impractical. The uniquely modified Taq polymerase used in our KASP™ genotyping chemistry has a pronounced 3’ complimentarity requirement providing additional, integral specificity mechanism for added accuracy to our genotyping reactions. This is used as the basis for our SNP allelic discrimination assays, where the two allele-specific primers differ only in the SNP allele present at the 3’ end of the primer. We have found that the principle is not confined to the detection allele specific primers but can also be applied to the reverse common primer for an extra level of target specificity useful in highly homologous targets. In fact genome location specificity is driven by the reverse common primer; only a single unique base is required to discriminate between two near identical loci in any given genome – a term we call primer ‘anchoring’. Unanchored Workflow – Batch import with BLAST • Import sequence files with variant plus surrounding flanking sequence. This can be isubmitted as either our standard delimited text format or fasta • Select BLAST database for the appropriate organism / make custom database • Imported sequences are stored by name and submitted to BLAST by sequence importer • Reports are passed to BLAST interpreter and matched to the sequence by name • Each sequence is analysed by BLAST interpreter as follows: »» Compares query with each returned hit and creates a histogram from scoring matching / mismatching bases along the sequence for each row »» Preserves the order returned by BLAST and uses this to analyse each row to compare scores with query and first returned hit »» Screens each column for differences based on the histogram results »» Alters the score based on differences including gaps and inserted sequence strings. »» Once unique base(s) are found the sequence is annotated with our standard chevron submission format <> »» Multiple identical sequences are reported as are results where no anchors are found. • Annotated sequence is passed to primer picker and assays are designed. Workflow – Batch import with prior BLAST submission • BLAST analysis and design can be performed prior to sequence submission using any available BLAST database: »» NCBI »» Online Resource »» Output from CLOUD_BLAST implementation »» Use the “-outfmt 1” BLAST command or reformat output as query-anchored, showing identities, no gaps in query »» Private resources – there is no need to disclose large genomic data set other than the alignment files which are easily stored and shared • Files are combined and passed to BLAST interpreter and anchor base(s) annotated • Assays are designed using primer picker • This is a simple way of allowing large processing jobs to be performed using off site public or pay-for computing recourses such as Amazon Elastic Compute (EC2) Any Genome • Sequences can be derived from any source and built into a database using the blastdb functions • Databases can be easily created therefore easily updated • Create databases with early stage contig or scaffold assemblies, or map and GBS data to improve assays • Run on standard 32- or 64-bit personal computers / connect to servers containing data and BLAST executables • With a 4 Gb ram, 32-bit Windows PC, large projects can be screened for design prior to development – average 15 seconds for BLAST analysis and design or around 1 second for BLAST interpretation. Anchored - HCAR2 / 590447 Figure 2. Example of closely related sequences in human – HCAR2/3. Presenting the correct target sequence to M^G1B30X allows the algorithm to detect the correct sequence and identify anchors. Plot (A) shows data from the unanchored assay which is detecting alleles at both HCAR2 and 3, giving data with an incorrect frequency. Plot B shows data when the HCAR3 assay is anchored to the A base (shown at the position below the pink chevrons) which gives the correct genotyping for rs55718746. Plot C shows data from an assay for HCAR2 with anchoring to the unique G base to give the correct genotyping for RS590447. Unanchored In order to find unique bases for anchoring a screen of available genomic sequence resources is required using BLASTN (rather than MegaBLAST (1) or BLAT (2) which are too stringent). The key is to find as much divergent, but potentially similar sequence, as possible in order to find base(s) unique to the target loci to enhance target amplification specificity. To perform this manually, especially for the large numbers of markers we design for and the diversity of organisms we genotype this is a time consuming exercise yet hugely beneficial. An automated software solution exists for hexaploid wheat but uses database files that not easily updated (3). To overcome this we have created MAGICBOX, an automated pipeline extension for our LIMS assay design and genotyping software Kraken™ in order to perform and interpret BLAST analysis for any available genome. It communicates directly to easily installed BLAST executables and database, and can also accept output from public or privately obtained BLAST alignment files. Here we outline the MAGICBOX pipeline and how it is accessed, illustrating the technology with assay data from different organisms to highlight its effectiveness. MAGICBOX - Automated primer design Anchored - HCAR3 / rs55718746 Anchored - 2BS Figure 3. An example from hexaploid wheat showing homology onthe same chromosome and across other genomes. The target SNP is 2BS for BS00003365. Plot A shows data from the unanchored assay which has produces off-target amplification. Plot B shows data produced by the anchored assay after processing through MAGICBOX. MAGICBOX example workflow Aii Conclusions We designed and created a simple way of performing automated analysis for the design of target-specific KASP assays. Normally this would need to be done manually in order to ensure specificity when ordering assays for use by customers in their own lab, or for running in our genotyping service laboratories. Ai B C Figure 1: MAGICBOX example workflow Making use of BLAST with it’s flexibility and familiarity to the whole genomics community along, plus its ubiquity and large, easily accessible data, made the choice very simple. The MAGICBOX system is part of Kraken but will be accessible using BLAST output and sequence file (either text or fasta) the universal currency. As publically available BLAST resources can be used to improve assays - local installations of the software and database are not necessary to submit sequence for design. Updating the databases is simple; they can be created / downloaded as needed. Because BLAST interpreter recognises the format rather than a specific file type, problems of software updating/ obsolescence have been minimised. The power of identifying and using unique bases to design and anchor primers was illustrated with before and after data. We chose examples of a diploid human and hexaploid wheat to illustrate that the algorithm can be applied for any organism that has a sequence information. The specificity of KASP is again highlighted by selection of different loci in the human example and the genotyping plots show the change in frequency for their different positions. For wheat one of the most important aspects is to be able to correctly select loci target and discriminate against high similar homeologues. The above example is one of a set of assay 65 we assessed using MAGICBOX. To expand on the in silico – in vitro translation of MAGICBOX we present further data in poster P0368 (to your right). Figure 1 illustrates a typical three stage workflow which includes an importer for either sequence, or sequence and BLAST analysis output (Ai and Aii). • BLAST importer collates all the data into the correct format and runs the BLAST command line scripts. • These can be in any location either locally on a PC, or on a networked server. • BLAST interpreter matches the sequence to the alignment, reads the alignment to identify any unique bases for anchoring, and annotates using our <> convention (B). • The sequence is then annotated to identify the base(s) that are potential anchoring points (C). This is passed to our primer picker software in Kraken for assay design. 1. J. Curry, M. Kietzmann et al. MAGIC BOX – Automated Primer Assay Design for Haploid, Diploid and Polyploid Species. Poster P0367. PAG 2016. 2. Mol Breeding (2009) 23:13-22 doi:10.1007/s11032-008-9209-z 3. Mol Genet Genomics. 2015 Apr;290(2):531-44. doi: 10.1007/ s00438-014-0933-2. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording or any retrieval system, without the written permission of the copyright holder. © LGC Limited, 2016. All rights reserved. FSo/0116
© Copyright 2026 Paperzz