Script 1. NGS QC Toolkit (version 2.3.1) script to remove adaptors and filter by quality IlluQC_PRLL.pl -se /path-to-imput-directory/GambFemAnt1.fastq 6 A -l 50 -s 30 -c 23 -t 2 -o /path.to.output.directory/GambFemAnt1 Description: -se -> Refers to the type of library: Single-end read file (FASTQ) with primer/adaptor library and FASTQ variant 6 -> adaptors type (Multiplexing DNA Library) A -> Automatic detection of FASTQ variant -l 50 -> The cut-off value for percentage of read length that should be of given quality. In our analysis 50% of the read had to have PHRED quality score higher than 30. -s 30 -> The cut-off value for PHRED quality score for high-quality filtering. We used 30. -c 23 -> Number of processors to be used -t 2 -> Output format for statistics. 2 = tab delimited -o -> Output will be stored in the given folder Script 2. Script used to run STAR (version 2.3.0) /path-to-output-directory; mkdir GambFemAnt1 && cd GambFemAnt1; echo $PWD && echo Starting mapping of GambFemAnt1; /path-to-STAR-directory/STAR --genomeDir /path-togenome-index-directory/Genome --readFilesIn /path-to-input-fastq-file/GambFemAnt1.fastq -runThreadN 23 --outFilterMismatchNmax 4 --outFilterMatchNminOverLread 0.75 -seedSearchLmax 30 --seedSearchStartLmax 30 --seedPerReadNmax 100000 -seedPerWindowNmax 100 --alignTranscriptsPerReadNmax 100000 -alignTranscriptsPerWindowNmax 10000; cd /path-to-output-directory && echo $PWD && echo GambFemAnt1 mapping finished; Description: --runThreadN 23 -> number of processors used for alignment --outFilterMismatchNmax 4 -> alignment will be output only if it has fewer mismatches than 4 --outFilterMatchNminOverLread 0.75 -> outFilterMatchNmin normalized to read length. Ensures that at least 75% of the read length is matched to the genome --seedSearchLmax 30 -> defines the maximum length of the seeds --seedSearchStartLmax 30 -> defines the search start point through the read - the read is split into pieces no longer than this value --seedPerReadNmax 100000 -> max number of seeds per read --seedPerWindowNmax 100 -> max number of seeds per window --alignTranscriptsPerReadNmax 100000 -> max number of different alignments per read to consider --alignTranscriptsPerWindowNmax 10000 -> max number of transcripts per window Script 3. Script used to run HTSeq (version 0.5.4p5) ./htseq-count --mode intersection-nonempty --strand no --type exon --idattr gene_id /path-toimput-file/Aligned.out.sam /path-to-gtf-file/Anopheles_gambiae.AgamP3.21.gtf > /path-tooutput-directory/GambFemAnt1.csv; Description: --mode intersection-nonempty -> Mode to handle reads overlapping more than one feature. If S contains precisely one feature, the read is counted for this feature. If it contains more than one feature, the readis counted as ambiguous (and not counted for any features), and if S is empty, the read (or read pair) is counted as no_feature. --strand no -> not strand specific data --type exon -> feature type (3rd column in GFF file) to be used, all features of other type are ignored (default, suitable for RNA-Seq analysis using an Ensembl GTF file: exon) --idattr gene_id -> GFF attribute to be used as feature ID. The default, suitable for RNA-Seq analysis using an Ensembl GTF file, is gene_id.
© Copyright 2026 Paperzz