Variant Detection Using Partek Genomics Suite™ 6.6 Overview Partek Genomics Suite™ software can detect variations at single-base resolution in next generation sequencing data. The single nucleotide variation (SNV) detection tools are available for all of Partek Genomics Suite’s next generation sequencing (NGS) workflows, including RNA-Seq, ChIP-Seq, and DNA-seq. This tutorial will illustrate how to: Import NGS data sets Perform QA/QC Detect SNVs Filter the detected SNVs Create a custom annotation database Annotate detected SNVs with known SNPs Annotate detected SNVs with functional effects Visualize the SNVs Note: The workflow described below is enabled in Partek Genomics Suite software version 6.6. Please contact the licensing team at [email protected] to request this version. The screenshots shown below may vary across platforms and across different versions of Partek Genomics Suite software. Description of the Tutorial Data In this tutorial, you will perform SNV analysis of an exome sequencing dataset using Partek Genomic Suite’s DNA-Seq workflow. The data used in this tutorial was downloaded from the NCBI SRA website with the SRA Study ID: SRP007386. That was the first exome sequencing study aimed at understanding the genetic characteristics of the well differentiated papillary mesothelioma of the peritoneum (WDPMP). There are three samples in that study, WDPMP tissue and blood of the same patient, and the WDPMPderived cell line. Two samples were chosen, WDPMP tissue (SRR305173) and peripheral blood (SRR305174) harvested from the same patient. The data has been downloaded from the NCBI SRA website in .sra format and subsequently converted to .fastq files. They were in turn aligned to hg19 using Partek® FlowTM with BWA as the aligner of choice (default settings). For the purpose of the present tutorial, we have extracted reads aligned to chromosome 20 only, but the principles discussed here can be applied to the whole genome as well. Partek User’s Guide: Variant Detection 1 Note: Partek Genomics Suite software can import only aligned .bam/.sam files. If you have data in .fasta or .fastq format, you need to align your files first. You can contact us ([email protected]) for further information. Importing NGS Data Sets To start variant detection, use the workflow selector to invoke DNA-Seq workflow, which provides a step-by-step guide through the analysis (Figure 1). Figure 1: Overview of the DNA-Seq workflow in Partek Genomics Suite To import the aligned reads in .bam/.sam format, select Import and manage samples under the Import section of the workflow (Figure 2). Figure 2: Sequence Import dialog: browse to the files you would like to import 1. Files of type: select BAM Files (*.bam) from the drop-down list. Partek User’s Guide: Variant Detection using Partek Genomics Suite 6.6 2. Browse… to the folder where you have stored the .bam files. Select the files to import by checking the box to the left of the data files. For this tutorial, select chr20-SRR305173.bam and chr20-SRR305174.bam. 3. Select OK and the next Sequence Import window will open as shown in Figure 3. Figure 3: Sequence Import dialog: selecting Output file and directory, Species, and the genome build (Genome/Transcriptome reference used to align the reads) 4. Configure the Sequence Import dialog as follows: a. Output File – provide a name for the output spreadsheet file; the default is to name the spreadsheet after the data being imported. Use the Browse… button if you want to change the output directory. b. Species –Since the tutorial data is human, select Homo sapiens. c. Genome Build – Select the genome build that your data is aligned to. For this tutorial data, choose hg 19. d. Select OK – this will open the Bam Sample Manager dialog (Figure 4) Figure 4: Bam Sample Manager dialog Partek User’s Guide: Variant Detection 3 The Bam Sample Manager window shows the files that are to be imported. In this tutorial the individual file names are SRA ID – in this case, assigning shorter, informative names will lead to clearer labels/legends later in the workflow. 5. To change the names, select Manage samples to invoke the Assign files to samples dialog (Figure 5). The path to the file is shown and the Sample ID is the filename by default. 6. Change the first sample chr20-SRR305173 to Tumour and chr20-SRR305174 to Normal. Figure 5: Assign files to samples dialog If you have data from one experimental condition, which is split into two or more .bam files, you can use Manage samples to assign the files to the same sample. Additionally, one can add or remove files as needed. If the files are being imported for the first time, they have to be sorted in order to enable quick visualization and data analysis. 7. Select OK to go back to the Bam Sample Manager dialog. 8. Select Close to close the Bam Sample Manager dialog and import the data. The imported data will appear in a spreadsheet (Figure 6). Each imported sample is listed in a row, with the number of aligned reads displayed for that run. Partek User’s Guide: Variant Detection using Partek Genomics Suite 6.6 Figure 6: Imported data in a spreadsheet Sample attributes can be easily added for grouping if the data set has replicates or sample groups. To do so, select Add sample attributes from the workflow. Please refer to other online tutorials (such as Down Syndrome Study tutorial) for more information on Add sample attributes. The Sample ID column is particularly important when integration of data from different workflows (i.e. different assays) is desired. The Sample ID column should contain a unique identifier that will serve as a bridge between the multiple datasets, and it is important that they are the identical in each spreadsheet. In this tutorial, choosing a Sample ID column is optional because we do not perform any genomic integration analysis. Nonetheless, choosing a Sample ID column is also useful in telling the software that the entry in the Sample ID column will be used as a unique identifier for each sample. 9. From the workflow, select Choose sample ID column. 10. Use the drop-down list to select the column 1. Sample ID (Figure 7). Figure 7: Choose Sample ID column dialog Partek User’s Guide: Variant Detection 5 Performing QA/QC For quality assessment, select Alignments per read from the QA/QC section of the workflow. The Alignments per read will report the number of reads that are unaligned (align to 0 location), aligned to 1 location, 2 locations, etc. depending on the options selected during the alignment. Alignments per read will also give you information on whether your data is a single-end or a paired-end data. Figure 8: Alignment Counts spreadsheet From Figure 8 above, we can see that the data is a paired-end, as identified by the column headers. For a paired-end reads data set, we would expect to see most of the reads in the category 2 Paired End Alignments Per Read. If you observe a high count in the 0 Paired End Alignments Per Read and 1 Paired End Alignments Per Read, then you should double check your data at the alignment stage. If your data set is single-end, the columns will be labeled in the following fashion: 0 Single End Alignments Per Read, 1 Single End Alignments Per Read, etc. Detecting SNVs The Detect single nucleotide variations option can be found in the Allele-Specific Analysis section of the workflow. Partek Genomics Suite supports two approaches for SNV detection, Detect SNVs among samples and Detect SNVs against the reference sequence (Figure 9). Partek User’s Guide: Variant Detection using Partek Genomics Suite 6.6 Figure 9: Detect Single Nucleotide Variations (SNVs) dialog The method for identifying SNVs is the genotype likelihood test. This test will identify the likelihood that a particular genomic location differs from the reference sequence or among the samples. For further details on the method used, please refer to the Detect SNVs in Sequencing Data using Partek Genomic Suite software white paper which is available from Help > On-line Tutorials. Then select the White Papers tab and look for RNA-Seq section. Detect SNVs Among Samples This method will compare each genomic location directly across the different samples in the spreadsheet (but not to the reference genome). That strategy is useful in detecting somatic mutations, for instance. In order to perform detect SNVs among samples, select Detect single nucleotide variations under the Allele-Specific Analysis section and then Detect SNVs among samples. Figure 10: Detecting SNVs Across Samples dialog Partek User’s Guide: Variant Detection 7 The Detect Differential SNPs Across Samples dialog (Figure 10) provides options to set the log-odds ratio threshold and the resulting file name. A high log-odds score for a reported SNP indicates a strong chance that at least one of the samples has a different base call at that position. For this tutorial, use the default settings, and press OK to proceed. A .2bit file containing the reference genome is needed for this step. Partek Genomics Suite will automatically download the file if one is not already specified. Depending on the speed of your internet connection, this download may take some time, but only needs to be done once. When the SNP detection is completed, a new spreadsheet will appear as shown in Figure 11. Figure 11: Spreadsheet resulting from the “Detect SNVs among samples” tool (SNVsArossSamples) The resulting spreadsheet (Figure 11) has the following columns: 1 2 3 4 to (X+3) remaining columns : position: the genomic location of the detected SNV. : log-odds ratio of different genotypes: the score given to the detected SNV. A higher score indicates a strong discrepancy in base composition across the samples. : reference base: the base call of the reference genome (for example: hg19). If no reference genome is specified, N will be displayed. : “Sample genotype call”: the most likely genotype call for each of the samples at that location (X is the number of samples). : number of A, C, G, T and N calls for each of the samples at the given location. N refers to ambiguous (or unknown) base calls. Detect SNVs against the reference sequence This method will compare each genomic location against the reference sequence (such as hg19), independently for all the samples. This is the starting point for using the subtraction method to select only the SNVs that appear in one particular sample, but not in the other sample. Moreover, the subsequent analysis in this tutorial will be based on the output from this SNV detection method. Partek User’s Guide: Variant Detection using Partek Genomics Suite 6.6 In order to perform Detect SNVs against the reference sequence, under the AlleleSpecific Analysis section of the workflow, select Detect single nucleotide variations and then select Detect SNVs against the reference sequence. Figure 12: Detect nucleotides that are different from the reference dialog The Detect nucleotides that are different from the reference dialog (Figure 12) provides options to set the log-odds ratio threshold and the resulting file name. A high log-odds score for a reported SNV indicates a strong chance that the sample has a nucleotide that is different from the reference sequence at that particular position. We can set the parameter as default and select OK to run the SNV detection. After the SNV detection is completed, you will see the results as shown in Figure 13 below. Figure 13: Spreadsheet resulting from “Detect SNVs against the reference sequence” tool Partek User’s Guide: Variant Detection 9 The results spreadsheet (Figure 13) has the following columns: 1 2 3 4 5 6 7 8 9 10 11 12 – 15 : position: the genomic location of the detected SNV. : log-odds ratio of SNP against reference: the score given to the reported SNP. A higher score indicates a strong discrepancy in base composition compared to the reference sequence. : sample ID: the sample that differed from the reference. If more than one sample differed from the reference at the same location, the samples will be displayed on separate lines. : reference base: the base call of the reference genome. : genotype call: the most likely genotype call of the sample listed in the sample ID column at that position. : total non-reference bases: total number of bases from the sample that do not match the reference (not including no-calls). : total coverage at locus: the total number of reads covering this position. : non-reference average base qualities: Average Phred-scaled base quality score of bases different from the reference. : reference base qualities: Average Phred-scaled base quality score of bases that match the reference. : non-reference average mapping qualities: average mapping quality score of reads containing the variant at the locus (0 = poor, 254 = good, 255 = unknown). : reference average mapping qualities: average mapping quality score of reads containing the reference call at the locus. : number of A, C, G, and T calls for the sample in the sample ID column at this position. Filtering the detected SNVs After we have performed Detect SNVs against the reference sequence, we can then proceed to create list of SNVs of interest. There are multiple ways to perform that and it really depends on how we would like to filter it. Partek User’s Guide: Variant Detection using Partek Genomics Suite 6.6 In this tutorial, the following filtering strategy will be applied: 1. Create a list of SNVs in the normal sample as well as the tumour sample. 2. Use a Venn diagram to overlap the SNVs from the normal with the tumour sample, and obtain the SNVs present in tumour only. 3. Filter the tumour SNVs based on: a. Coverage (≥50); b. Non-reference average base quality (≥20); c. Non-reference average mapping quality (≥20). 4. Filter off-target SNPs. 5. Filter SNVs if they are available on the known SNPs database, such as dbSNP. Creating the List of SNVs in Tumour Sample Only In order to create a list of SNVs in the tumour sample only, we first have to create a list of SNVs in each sample and subsequently use the Venn Diagram to overlap them and then select the section which belongs to the tumour sample only. We can use the Create region list function under the Allele-Specific Analysis section. 1. Starting with the reference-snps spreadsheet (i.e. the result of Detect SNVs against the reference sequence), select Create region list and you will see the List Creator dialog (Figure 14). Figure 14: List Creator dialog 2. Select Specify New Criteria to get the Configure Criteria dialog Partek User’s Guide: Variant Detection 11 Figure 15: Configure Criteria dialog Select the sample ID column. You should see two bars which signify Normal and Tumour. Right-click on the Normal bar to select the normal SNVs, set Name to Normal sample (Figure 16) and push OK. Then repeat the step #2, but this time right-click on the Tumour bar to select the tumour SNVs, set the Name to Tumour sample (Figure 17) and press OK. Figure 16: Selecting SNVs in the normal sample Partek User’s Guide: Variant Detection using Partek Genomics Suite 6.6 Figure 17: Select SNVs in the tumour sample 3. Go back to the List Creator and select both lists by holding the Ctrl-key. Select Venn Diagram to overlap the two lists of SNVs (Figure 18). Figure 18: Overlapping two lists 4. On the Venn Diagram dialog, you will notice that the section marked with “*”. Click on the overlap region to deselect it and select the Tumour sample section to filter in the SNVs that belong to the tumour sample only (Figure 19) Partek User’s Guide: Variant Detection 13 Figure 19: Selecting Tumour sample only SNVs from Venn Diagram. The arrow indicates that the 5827 SNVs in Tumour sample only have been selected (as shown by the *) 5. Press OK and you will be asked to specify a name for this list of SNVs; label it as Tumour-only SNVs (Figure 20). Figure 20: Naming a list in the List Creator dialog 6. Subsequently Tumour-only SNVs will appear in the List Creator dialog. Select the Save button to save the list. You will be asked which spreadsheet to save. Ensure that the Tumour-only SNVs spreadsheet is checked. Select OK to save it (Figure 21). Partek User’s Guide: Variant Detection using Partek Genomics Suite 6.6 Figure 21: Selecting the criteria that will be saved as lists 7. Select Close to close the List Creator dialog box. Filter the tumour-only SNVs based on several criteria At this stage, we have the list with tumour-only SNVs. Several technical filtering can be applied to the list and this tutorial covers how to apply the following three: a. Coverage (≥50) b. Non-reference average base quality (≥20) c. Non-reference average mapping quality (≥20) To quickly filter the list, we can use the Interactive Filter ( available on the tool bar. ) function which is 1. Select the Tumour-only SNVs spreadsheet and click on the Interactive Filter icon. Select the column 7. total coverage at locus. Then set the Min to 50 and hit Enter (Figure 22) (Tip: The black bar at the right means that the current spreadsheet is under filtering. You can clear this filter if you mouse-over the black bar, right-click and then choose Clear Filter. Alternatively, go to Filter > Filter Rows > Clear Row Filters). Figure 22: Filtering based on total coverage at locus. The interactive filter bar is highlighted Partek User’s Guide: Variant Detection 15 2. Repeat the filtering as described above on the columns 8. Non-reference average base quality (set Min to 20) and 10. non-reference average mapping quality (set Min to 20). Remember that whenever you specify the value to filter, you have to hit Enter to ensure that the filter is applied. After this, you should expect to have only 16 SNVs available as shown in Figure 23. Figure 23: Filtered Tumour-only SNVs, containing 16 SNVs (see Rows: in the bottom left corner) 3. Save this list to replace (!) the original Tumour-only SNVs spreadsheet by pushing the Save icon and answering Yes when asked save the filtered list only. Creating a Custom Annotation Database The sample exomes for this tutorial were captured using Agilent SureSelect Human All Exon Kit v1.01. If you have the .bed file provided by a vendor, you can use it to create a custom annotation using Partek Genomics Suite software. As we do not have Agilents coordinates in a .bed format, we have instead downloaded all CCDS exon coordinates from the University of California Santa Cruz site. The resulting hg19-ccds-exon.bed file is included in the data set provided for this exercise. To proceed, we first need to create a custom annotation file from that .bed file. 1. Go to menu Tools > Annotation Manager > Create Annotation. Then select BED file (.bed). Refer to Figure 24 below. Partek User’s Guide: Variant Detection using Partek Genomics Suite 6.6 Figure 24: Annotation Manager dialog 2. Browse… to point to the file hg19-ccds-exon.bed that comes together with this tutorial data and set the Species to Homo sapiens and the Genome build to hg19 (Figure 25). Figure 25: Create Annotation dialog 3. Select OK to create the annotation file and select Close to close the Annotation Manager dialog. Partek User’s Guide: Variant Detection 17 4. From the DNA-Seq workflow, under the Allele-Specific Analysis section, select Annotate with known SNPs. Then select the Custom dropdown list and point to the hg19-ccds-exon.bed (Figure 26). Figure 26: Annotating with a custom file, hg19-ccds-exon.bed 5. Select OK to start the annotation and close the dialog. 6. The Tumour-only SNVs spreadsheet should now have additional two columns at the far right, i.e. Known SNPs and # Known SNPs. If the SNV overlaps with the .bed region, then you should have a value greater than 0 in the #Known SNPs column. The “# Known SNPs” will tell you how many regions overlap with this SNV (Figure 27). From Figure 27, it is clear that only 1 SNV is overlapping with the hg19-ccds-exon.bed, meaning that remaining 15 SNVs could be off-target regions. Partek User’s Guide: Variant Detection using Partek Genomics Suite 6.6 Figure 27: After annotating with hg19-ucsc-exon.bed 7. In order to filter in only the on-target SNV, use the Interactive Filter again and filter based on column # Known SNPs. The criterion for Min set to 1 should be applied (Figure 28). Figure 28: Using the columns “known SNVs” to filter in only on-target SNVs 8. Select Save button to save the Tumour-only SNVs spreadsheet. Select Yes to save only the filtered spreadsheet and this will be the only SNV that we finally have. Annotate detected SNVs with known SNPs Now, we have only one selected SNV and are interested if this SNV is already known in the public database dbSNP. We can easily annotate the SNV with dbSNP by selecting the Annotate with known SNPs again. Select the dbSNP 135 as the dbSNP annotation database. The final result shows that this SNV is not available within the known dbSNP database as Partek User’s Guide: Variant Detection 19 shown in Figure 29, since the right-most two columns, Known SNPs and #Known SNP are None and 0, respectively. Figure 29: Annotating SNVs with dbSNP135. Columns #18 and #19show that this SNV does not overlap with any of the SNV in the dbSNP135 database Since this is a novel SNV, no additional filtering will be done. We can also proceed to annotate with COSMIC database. Select on the Annotate with known SNPs again and the COSMIC database will be available under the Genomic Variants Database. Choose the COSMIC database and you will notice that this SNV is not described within the COSMIC database. This step is left for you to explore by yourself. Annotate detected SNVs with functional effects After we have come to the final list of detected SNVs (Tumour-only SNVs spreadsheet), we would like to know where this SNV is located within the gene, and whether there is any potentially deleterious functional effect associated with this SNV. Select the Tumour-only SNVs spreadsheet and go to Annotate functional effects under the Allele-Specific Analysis section of the workflow. You will then be prompted to decide on the transcript annotation database for the annotation. For this tutorial, choose RefSeq Transcripts – 2013-09-03. The resulting output file will be named as annotate-snvs. We can leave the other parameters at their default values (Figure 30). Figure 30: Annotate SNVs with transcripts dialog Partek User’s Guide: Variant Detection using Partek Genomics Suite 6.6 Your resultant spreadsheet should have 12 columns (Figure 31). Figure 31: Annotate-SNVs spreadsheet The columns are: 1 : Chromosome: chromosome coordinate. 2 : Position: position of the SNV on the chromosome. 3 : Reference: the reference genome call for that position. 4 : Alt: the alternative base observed at this position based on the Tumour-only SNVs spreadsheet. 5 : Sample ID: the sample that carries this SNV. 6 : Gene Symbol: the gene that this SNV locates at. 7 : Transcript: the transcript ID that this SNV locates at. 8 : Strand: the strand of the transcript described in column #7. 9 : Gene Section: the gene section of where this SNV is located. It can have one of the following values: Exon, Intron, Promoter, 5’-Splice Site, 3’-Splice Site. 10 : Functional Effect: the effect that this SNV might cause. It can have one of the following values: Intronic, Missense, Nonsense, Synonymous, Splicing Site, ncRNA, Promoter, 5’-UTR, 3’-UTR. 11 : Nucleotide Change: the nucleotide change with respect to the transcript. If the transcript is on the reverse (-) strand, then the nucleotide will be the reverse complement of the Reference and Alt bases. The position is based on the cDNA nomenclature. 12 : Amino acid change: the amino acid change that could happen because of the base change. This is only applicable if the functional effect is Missense or Nonsense. The position will be based on the protein position”. Visualize the SNVs On any of the SNV spreadsheet, such as reference-snps, SNVsAcrossSamples, or Tumouronly SNVs spreadsheet, we can always right-click on the row header and then select Browse to Location (Figure 32). The resultant chromosome view is shown in Figure 33. You are encouraged to refer to the Chromosome Viewer User Guide which is available on Help > Online Tutorials, under the User Guides section. Partek User’s Guide: Variant Detection 21 Figure 32: Using context menu to Browse to Location Figure 33: Chromosome View; tracks (from the top): chosen transcript model, SNP Proportion track (the box represents an SNV call, while the colors represent relative proportions of the short sequencing reads with the reference and the alternative base call), Bam profile track (one track per sample, showing short sequencing reads), reference genome. Partek User’s Guide: Variant Detection using Partek Genomics Suite 6.6 End of Tutorial For additional assistance, contact our technical support staff by phone at +1-314-878-2329 or by email [email protected]. Last revision: February 2014 Copyright 2014 by Partek Incorporated. All Rights Reserved. Reproduction of this material without express written consent from Partek Incorporated is strictly prohibited. Partek User’s Guide: Variant Detection 23
© Copyright 2026 Paperzz