Using RNA Sequence Tools

How To Use RNA Sequences
Analysis Tools on HPC
Written by Yiran Zhang
1. Using PuTTy and WinScp
• PuTTY is a client program for the SSH, Telnet and Rlogin network
protocols. These protocols are all used to run a remote session on a
computer, over a network. In other word, it is a client connect server
and PC.
• WinSCP is an open source free SFTP client and FTP client for
Windows. Its main function is the secure file transfer between local
and remote computer. It always used like a dropbox.
• Login PuTTY and WinSCP. See figures below.
Figure1. Interface of WinScp
Figure2. Interface of Putty
2.How to submit a job
•Edit a pbs file by using Putty or notepad++.
•Save the pbs file in your folder of WinScp
•Submit the job
2.1 Edit the pbs file with some useful
command
• Use command: Vi filename to open and edit the file
• Then in the pbs file, (See figure), this file include job information(Job
title, work time, resource, command and comment.
• By press “I’ or “Insert” button, we can edit the file.
• Press “Ese” to stop the current model.
• Then enter “:x filename.pbs” to save the current pbs as the filename.
• Enter “:q!” to quit the editing interface.
2.2 submit the job and check the status
• Find your pbs file. Using cd foldername to open folder and cd .. to
quit the current folder
• Using qsub filename.pbs to submit the job see figure
• Showq to check the queue in server
• Qstat to check your job current status
3. Unzip and fastQC command
• To extract the zip file.
$ tar –zxvf filename.tar.gz
$ gunzip filename.fq.gz
• Use these command in order
•
•
•
•
(1). cd fastqc_v0.11.4
//open the folder fastqc_v0.11.4
(2) cd FastQC
//open the folder FastQC
(3) chmod 755 fastqc
//give premission to use the tool
(4)export PATH=$PATH://home/zhangy/fastqc_v0.11.4/FastQC
//to indicate the export path
• (5) Enter ./fastqc --outdir=/home/zhangy/ --noextract -f fastq
/home/zhangy/testfile/testseqs.fastq
• Note:
• //--outdir= means the (output address).
• //–f fastq means (input address)
• Then you will the window below.
Figure. FastQC command example
You can see some output in your output folder like the figures below
4. Load the module files you need
• $ module load bio/bowtie2/2.2.7
• $ module load bio/tophat/2.0.13
• $ module load bio/cufflinks/2.2.1
• $ module load bio/IGV/2.3.40
Note: 1.You do not need load all of these files together. Only if you
want to use the program. 2. Also you can put the loading command
into your pbs file. See figure below.
Figure. Loading in the pbs file.
5. Create index for tophat
• Using program “bowtie2” to create index for the further work.
• $ ln –s /path/genes.gtf
• $ ln –s /path/genome*
6. Map the RNA-seq reads to a genome using
Tophat.pbs script
• Use qsub Tophat.pbs to submit the job. (The pbs file I did is shown below)
7.
Transcripts Assembly for each sample using cufflinks.pbs script
• Use qsub cufflinks.pbs to submit the job. (The pbs file I did is shown below)
8.
Run Cuffmerge.pbs on all your assemblies to create a single
merged transcriptome annotation
• Use qsub cuffmerge.pbs to submit the job. (The pbs file I did is shown below)
9.
Perform differential expression analysis using Cuffdiff.pbs
• Use qsub qsub cuffdiff.pbs to submit the job. (The pbs file I did is
shown below)
cuff_data <- readCufflinks('diff_out')
10.
Explore differential analysis result
with CummeRbund.(further)
• 10.1 Load the CummeRbound package into R environment
• source("http://bioconductor.org/biocLite.R")
• biocLite("cummeRbund")
• library(cummeRbund)
• 10.2 Create a CummeRbund database from Cuffdiff output
• cuff data <- readCufflinks('diff_out‘)
• 10.3 Plot the distribution of expression levels for each sample
• 10.4 Compare the expression of each gene in two conditions with a scatter plot
• 10.5 ……….

Download Report

Using RNA Sequence Tools

Paperzz.com

Your Paperzz