Script 1. NGS QC Toolkit (version 2.3.1) script to remove adaptors

Script 1. NGS QC Toolkit (version 2.3.1) script to remove adaptors and filter by quality
IlluQC_PRLL.pl -se /path-to-imput-directory/GambFemAnt1.fastq 6 A -l 50 -s 30 -c 23 -t 2 -o
/path.to.output.directory/GambFemAnt1
Description:
-se -> Refers to the type of library: Single-end read file (FASTQ) with primer/adaptor library and
FASTQ variant
6 -> adaptors type (Multiplexing DNA Library)
A -> Automatic detection of FASTQ variant
-l 50 -> The cut-off value for percentage of read length that should be of given quality. In our
analysis 50% of the read had to have PHRED quality score higher than 30.
-s 30 -> The cut-off value for PHRED quality score for high-quality filtering. We used 30.
-c 23 -> Number of processors to be used
-t 2 -> Output format for statistics. 2 = tab delimited
-o -> Output will be stored in the given folder
Script 2. Script used to run STAR (version 2.3.0)
/path-to-output-directory; mkdir GambFemAnt1 && cd GambFemAnt1; echo $PWD && echo
Starting mapping of GambFemAnt1; /path-to-STAR-directory/STAR --genomeDir /path-togenome-index-directory/Genome --readFilesIn /path-to-input-fastq-file/GambFemAnt1.fastq -runThreadN 23 --outFilterMismatchNmax 4 --outFilterMatchNminOverLread 0.75 -seedSearchLmax 30 --seedSearchStartLmax 30 --seedPerReadNmax 100000 -seedPerWindowNmax 100 --alignTranscriptsPerReadNmax 100000 -alignTranscriptsPerWindowNmax 10000; cd /path-to-output-directory && echo $PWD && echo
GambFemAnt1 mapping finished;
Description:
--runThreadN 23 -> number of processors used for alignment
--outFilterMismatchNmax 4 -> alignment will be output only if it has fewer mismatches than 4
--outFilterMatchNminOverLread 0.75 -> outFilterMatchNmin normalized to read length. Ensures
that at least 75% of the read length is matched to the genome
--seedSearchLmax 30 -> defines the maximum length of the seeds
--seedSearchStartLmax 30 -> defines the search start point through the read - the read is split
into pieces no longer than this value
--seedPerReadNmax 100000 -> max number of seeds per read
--seedPerWindowNmax 100 -> max number of seeds per window
--alignTranscriptsPerReadNmax 100000 -> max number of different alignments per read to
consider
--alignTranscriptsPerWindowNmax 10000 -> max number of transcripts per window
Script 3. Script used to run HTSeq (version 0.5.4p5)
./htseq-count --mode intersection-nonempty --strand no --type exon --idattr gene_id /path-toimput-file/Aligned.out.sam /path-to-gtf-file/Anopheles_gambiae.AgamP3.21.gtf > /path-tooutput-directory/GambFemAnt1.csv;
Description:
--mode intersection-nonempty -> Mode to handle reads overlapping more than one feature. If S
contains precisely one feature, the read is counted for this feature. If it contains more than one
feature, the readis counted as ambiguous (and not counted for any features), and if S is empty,
the read (or read pair) is counted as no_feature.
--strand no -> not strand specific data
--type exon -> feature type (3rd column in GFF file) to be used, all features of other type are
ignored (default, suitable for RNA-Seq analysis using an Ensembl GTF file: exon)
--idattr gene_id -> GFF attribute to be used as feature ID. The default, suitable for RNA-Seq
analysis using an Ensembl GTF file, is gene_id.

Download Report

Script 1. NGS QC Toolkit (version 2.3.1) script to remove adaptors

Paperzz.com

Your Paperzz