GeneLooper 3.0 - GeneHarbor Inc

GeneLooper 3.0
User’s Manual
Revised in 2014
© 2014 GeneHarbor, Inc.
GeneLooper User’s Manual
Copyrights
© 2002 GeneHarbor, Inc. All rights reserved.
Revised in 2014
GeneHarbor and GeneLooper are the trademarks of GeneHarbor, Inc.
All other trademarks and registered trademarks are property of their respective owners.
License Agreement
GeneHarbor, Inc. grants a license to use the accompanying software and printed material to you,
the original purchaser. This is a binding Agreement between you and GeneHarbor, Inc.. Use of
the software shall constitute your acceptance of this Agreement. The copying of the software is
strictly prohibited and adherence to this requirement is your sole responsibility. GeneHarbor, Inc.
reserves the right to modify and update the software and printed material without obligation to
notify you, the original owner, of any change in the software and printed material.
Limited Warranty
If the performance of the software does not meet the standard described in the documentation,
GeneHarbor, Inc. will replace the software if notified within 30 days of purchase. In the event of
a replacement agreed by GeneHarbor, Inc., the original software, accessories, and documentation
must be received by GeneHarbor, Inc. in order for a replacement to be sent to you. Under no
circumstances shall GeneHarbor, Inc. and its officers or its distributors be liable for any indirect,
incidental, consequential or exemplary damages arising from the use, or the inability to use the
software even if they were aware of the possibility of such damages.
GeneHarbor, Inc.
[email protected]
www.geneharbor.com
II
GeneLooper User’s Manual
Contents
GeneLooper 3.0 ............................................................................................................................... I
Copyrights ..................................................................................................................................... II
License Agreement ....................................................................................................................... II
CD-ROM Installation.................................................................................................................... 2
Sequence Formatting .................................................................................................................... 3
Entrez Information Extraction .................................................................................................... 7
Sequence Collection ..................................................................................................................... 12
Sequence Separation ................................................................................................................... 14
Sequence Retrieving .................................................................................................................... 17
Open Reading Frame Detection ................................................................................................. 19
Multi-Sequence Alignment ......................................................................................................... 23
Restriction Site Search ................................................................................................................ 27
Translation and Reverse Complement ...................................................................................... 29
Batch Oligo Design ...................................................................................................................... 32
Sequence Viewing and Editing ................................................................................................... 42
Restriction Analysis ..................................................................................................................... 47
Sequence Alignment .................................................................................................................... 51
Local Sequence Similarity Search.............................................................................................. 55
Primer Design .............................................................................................................................. 59
Oligo Database Search ................................................................................................................ 61
ePCR ............................................................................................................................................. 64
Sequence Formatting .................................................................................................................. 65
Document Viewer ........................................................................................................................ 67
III
GeneLooper User’s Manual
Introduction
In the new molecular genetic era, scientists are frequently using multiple genes in their
research such as studying the function of a gene family or a pathway. Therefore, they have the
needs to analyze hundreds of DNA sequences. Currently, analyzing multiple sequences relies on
bioinformaticists. Some scientists like to have the ability to analyze multiple sequences
themselves because they want to have the first hand data and reduce the cost.
So far, as we know, there is no efficient and low-cost tools which can be used by scientist
themselves to analyze multiple sequences. Most multi-sequence analysis solutions are serverbased, high-cost and often require the involvement of professionals who understand programming
languages such as Unix and Perl. GeneHarbor, Inc. is a company dedicated to providing
innovative, versatile, user-friendly, and low-cost bioinformatic tools that directly serve fieldscientists. GeneLooper 3.0 was developed based upon our principles.
GeneLooper 3.0 is a Windows operating system based sequence analysis package that
can be installed into a personal computer. Yet, it is powerful enough to deal with as much as the
entire transcriptome of any organism. Its high capacity enables users to analyze thousands of
sequences at a time. GeneLooper 3.0 is multi-functional. It performs routine operations in
bioinformatics study such as sequence retrieval, open reading frame detection, multi-sequence
similarity search, translation, sequence alignment, and much more. From data preparation to data
processing to data reporting, the concept of user-friendliness is implemented throughout the entire
design. It has no complex database structure, therefore it is very flexible and easy to maintain.
Operation is as easy as clicking a mouse button, and data output is instant and requires no
programming or scripting.
In addition to its high-throughput capability, GeneLooper 3.0 is also meticulously
designed to analyze individual sequences in detail. There are many innovative and unique
features needed by molecular biologists. The authors of GeneLooper 3.0 purposely streamlined
the analysis process, therefore providing the most straightforward interfaces. They also try to
provide results that make biological sense and give users more power to adjust parameters to
accommodate a variety of conditions.
GeneLooper 3.0 is a total package for multi-sequence analyses. To enjoy its power is to
use it in your daily work. GeneHarbor, Inc. will continue to improve its products and services.
Your input and feedback are greatly appreciated.
Geneharbor, Inc.
November 11, 2014
1
GeneLooper User’s Manual
Installation
System Requirements
The recommended system and configuration for GeneLooper 3.0:
Component
Minimum Requirement
Processor
Intel Pentium III or compatible 1000 MHz or greater
RAM
1000 MB or greater
Display
800 x 600 resolution, 256 color depth, small fonts setting, and
256 colors or greater.
Operating System
Windows XP, Window 7, Window 10
Drives
CD-ROM drive
CD-ROM Installation
If you have Autoplay turned on your computer will automatically run the CD-ROM interface,
otherwise follow these directions:
1.
2.
3.
4.
Insert the CD-ROM into your CD-ROM drive.
From the Windows desktop, double-click the My Computer icon.
Double-click the CD-ROM icon.
Double-click the setup icon to start the installation interface.
Follow the instruction of each step during the installation process. GeneLooper 3.0 will be
installed into a default folder assigned by this program. In case you do not want to use the default
folder, you may also install it to another location by choosing the custom installation method. All
components are required for the program and should be kept in the designated folder all the time.
After installing GeneLooper 3.0, the interface will automatically continue to install HASP device
driver required to run GeneLooper 3.0. Follow the instructions to finish the process. An Icon for
GeneLooper 3.0 will be placed on the desktop of your computer.
After the installation, attach the HASP key to the USB port of your computer. Double-click on
the GeneLooper 3.0 icon on the desktop to run GeneLooper 3.0.
2
Part I
Multi-Sequence Utilities
GeneLooper User’s Manual
Chapter 1
Sequence Formatting
Objective: Format sequences for using the utilities in GeneLooper 3.0.
To avoid using a complex database structure for storing and retrieving sequence data,
GeneLooper 3.0 utilizes a modified fasta format for its routine operations. Fasta format is a
simple and frequently used format for storing, transferring and viewing of multiple DNA or
protein sequences. Basically, it is a file containing many sequences. The information for each
sequence has two parts: part one is a brief description beginning with a “>” sign, and part two is
its sequence. The description potrtion usually includes an accession number (ACCN), a Genbank
identification number (GI) and a brief description about the sequence. These items are separated
by several concatenation signs (“|”). A sample of a fasta file is shown below:
fasta format
>gi|11056007|ref|NM_021634.1| Homo sapiens leucine-rich repeat-containing G protein-coupled receptor 7 (LGR7), mRNA
ATGACATCTGGTTCTGTCTTCTTCTACATCTTAATTTTTGGAAAATATTTTTCTCATGGGGGTGGACAGG
ATGTCAAGTGCTCCCTTGGCTATTTCCCCTGTGGGAACATCACAAAGTGCTTGCCTCAGCTCCTGCACTG
TAACGGTGTGGACGACTGCGGGAATCAGGCCGATGAGGACAACTGTGGAGACAACAATGGATGGTCCATG
CAATTTGACAAATATTTTGCCAGTTACTACAAAATGACTTCCCAATATCCTTTTGAG…..
>gi|21489938|ref|NM_145066.1| Mus musculus G protein-coupled receptor 85 (Gpr85) mRNA
ATGGCGAACTATAGCCATGCAGCCGACAACATTTTGCAAAATCTCTCGCCTCTAACAGCCTTTCTGAAAC
TGACTTCCCTGGGTTTCATAATAGGAGTCAGCGTTGTGGGCAACCTTCTGATCTCCATTTTGCTAGTGAA
AGATAAGACCTTGCATAGAGCTCCTTACTACTTCCTGCTGGATCTGTGCTGCTCAGACATCCTCAGATCT
GCAATTTGTTTTCCATTTGTATTCAACTCTG…..
The major advantage of the fasta format is its simplicity. It can be opened, edited and
saved with many text editors including Microsoft Word and Notepad. GeneLooper 3.0 uses a
modified fasta, i.e. adding quotes to both the descriptions and the sequences, thereafter we call it
Q-fasta. The main reasons for the change are to speed up operations and avoid errors. A sample
of a Q-fasta file is shown below:
Q-fasta format
">gi|17986271|ref|NM_000795.2| Homo sapiens dopamine receptor D2 (DRD2) transcript variant 1 mRNA"
"GGCAGCCGTCCGGGGCCGCCACTCTCCTCGGCCGGTCCCTGGCTCCC
GGAGGCGGCCGCGCGTGGATGCGGCGGGAGCTGGAAGCCTCAAGCAG
CCGGCGCCGTCTCTGCCCCGGGGCGCCCTATGGCTTGAAGAGCCTGGC
CACCCAGTGGCTCCACCGCCCTGATGGATCCACTGAATCTGTCCTGGT
ATGATGGAACTCCTTGGCCTCGAGAGCCCCTGGGGCCTAGACTCTGTA
ACATCACTATCCATGCACCAAACTAATAAAACTTTGACGAGTCACCTT
CCAGGACCCCTGGGTAAAAAAAAAAAAAAA…………"
">gi|4504136|ref|NM_000839.1| Homo sapiens glutamate receptor metabotropic 2 (GRM2) mRNA"
"CCATGGGATCGCTGCTTGCGCTCCTGGCACTGCTGCCGCTGTGGGGTGCTGTGGCTGAGGGCCCAGCCAAGAAGG
TGCTGACCCTGGAGGGAGACTTGGTGGCGGCTCCGTGGTGCTTGGCTGCCTCTTTGCGCCCAAGCTGCACATCATC
CTCTTCCAGCCGCAGAAGAACGTGGTTAGCCACCGGGCACCCACCAGCCGCTTTGGCAGTGCTGCTGCCAGGGCC
AGCTCCAGCCTTGGCCAAGGGTCTGGCTCCCAGTTTGTCCCCACTGTTTGCAATGGCCGTGAGGTGGTGGACTCGA
CAACGTCATCGCTTTGA…………"
3
GeneLooper User’s Manual
Preparation of a Q-fasta file:
To convert a fasta file to a Q-fasta file is an easy but critical process since Q-fasta files will
be used throughout the utilities in GeneLooper 3.0. Any non-Q-fasta files will cause errors in
GeneLooper 3.0.
Many public web sites allow free-downloading of sequences in fasta format. We will use the
curated human mRNAs (accession number started with a “NM_” or so called refseq) as an
example to demonstrate the download and conversion processes.
1.
To download the curated human mRNA in fasta format from the National Center for
Biotechnology Information (NCBI), with your Internet Explorer or Netscape browser go
to the FTP download site in the human genome project of NCBI, find the file named
hs.fna.gz: ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Protein/hs.fna.gz and save the
file to your local drive. This file is about 13 MB and it needs a few minutes to download.
2.
The downloaded file is usually compressed or “zipped” and must be decompressed before
use. This can be accomplished by using a piece of free software, WinZip, or other
equivalents. If you don’t have a WinZip pre-installed in your PC, you can download an
evaluation version from http://www.Winzip.com. Unzip the downloaded file “hs.fna.gz”.
Consequently, it generates a new file called hs.fna. The unzipped file contains about
16,000 human mRNA sequences and is about 45 MB in size.
3.
Start GeneLooper 3.0 and then click on the Sequence Formatting button. The Sequence
Formatting and Appending form will appear (Fig. 1). Use the browse button to locate
hs.fna which you have just created and then click on the New file button. The new file
name, “hs_fmt_txt”, will automatically appear in the recipient text box. If you already
have a Q-fasta file and want to append the downloaded sequences to the existing Q-fasta
file, use the Existing File button to locate that file.
4.
There are two filter options. The first is designed to remove non-mRNA sequences such
as those accessions beginning with a “NG_” and “NC_”. The truncation option is
designed to shorten those very long transcripts such as the transcripts of the titin gene*.
The options allow you to remove or truncate a few very long sequences which usually
add computing time and slow down the processes in the other high-throughput utilities.
4
GeneLooper User’s Manual
Figure1. Sample of working interface for creating a Q-fasta file.
5.
Click on the Start button to begin the formatting process. It usually takes few minutes to
finish. Wait until completed and then locate the formatted sequence file following the onscreen instructions. You can open the Q-fasta file, hs_fmt.txt, using the Document
Viewer in GeneLooper 3.0 to see the formatted sequences. The Q-fasta file is now ready
for use by GeneLooper 3.0.
Note*:
1. Both nucleotide and protein fasta files can be used for formatting.
2. Very long sequences sometimes freeze the program due to the limilted processing
capacity of a PC. Fortunately, only a few titin mRNAs (80 kb) are, so far, found
longer than 30,000 bp. In this file, NCBI also includes some non-mRNA sequences
such as those accessions beginning with a “NC_” (mitochondrial genome) or “NG_”
(special genomic sequences and pseudogenes). These sequences are very long and
may not needed by the users.
Sources of sequence files:
Many sets of mRNA sequences from NCBI are good collections, such as all refseq
(accessions beginning with “NM_” or “XM_”) and sequences predicted by GenomeScan
(GS_mRNA.fsa.gz). Mammalian gene collection (MGC) clone sequences are also excellent for
collection. Considering the low cost of hard drive space, you shouldn’t have any problem storing
all mRNA and protein sequences from many organisms on your PC. It is a good habit to organize
your sequence files well, such as giving meaningful names, dating it clearly and putting them in a
few sub-folders.
5
GeneLooper User’s Manual
Internet Links for downloading fasta sequences:
1. Refseqs from NCBI: ftp://ftp.ncbi.nih.gov/refseq
2. Genbank FTP download site: ftp://ftp.ncbi.nih.gov/genbank/
3.
MGC clone sequences from Mammalian Gene Collection: http://mgc.nci.nih.gov/
4.
Download from NCBI Entrez site by using keyword search. Nucleotide or protein from
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Limits&DB=Nucleotide
http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?CMD=Limits&DB=protein. Be sure
to use the limit function to define your search and select the fasta format to save the data
in your local drive.
6
GeneLooper User’s Manual
Chapter 2
Entrez Information Extraction
Objective: Extract useful information from Entrez pages prepared by NCBI and tabulate
extracted data into spreadsheets.
Public databases contain tremendous information about sequences. The Entrez page is
one of the most important information sources. A typical Entrez page is shown in Fig. 1. It
contains annotated sequence with detailed information. In general, an Entrez page is designed for
viewing one sequence at a time, however it is sometimes necessary to examine a large quantity of
sequences such as sequences from a gene family. Entrez Information Extraction in GeneLooper
allows you to extract important information from multiple Entrez pages and put them into
spreadsheets. With the tabulated data, you can classify genes into groups by sorting the data
based on the items extracted.
Figure 1. A sample of Entrez page from NCBI
LOCUS
NP_000471
776 aa
linear PRI 31-OCT-2000
DEFINITION adenosine monophosphate deaminase (isoform E) [Homo sapiens].
ACCESSION NP_000471
PID
g4502079
VERSION NP_000471.1 GI:4502079
DBSOURCE REFSEQ: accession NM_000480.1
KEYWORDS .
SOURCE human.
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE 1 (residues 1 to 776)
AUTHORS Mahnke-Zizelman,D.K. and Sabina,R.L.
TITLE Cloning of human AMP deaminase isoform E cDNAs. Evidence for a
third AMPD gene exhibiting alternatively spliced 5'-exons
JOURNAL J. Biol. Chem. 267 (29), 20866-20877 (1992)
MEDLINE 93015995
PUBMED 1400401
REFERENCE 2 (residues 1 to 776)
AUTHORS Yamada Y, Goto H and Ogasawara N.
TITLE Cloning and nucleotide sequence of the cDNA encoding human
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from U29926.1.
FEATURES
Location/Qualifiers
source
1..776
/organism=Homo sapiens
/db_xref=taxon:9606
/chromosome=11
/map=11p15
/cell_type=T-lymphocyte, cytotoxic
/clone_lib=RPMI 8402, lambda 2001 library of R. Baer
Protein
1..776
/product=adenosine monophosphate deaminase (isoform E)
/EC_number=3.5.4.6
Region
313..401
/region_name=Adenosine/AMP deaminase
/db_xref=CDD:pfam00962
/note=A_deaminase
Region
471..716
/region_name=Adenosine/AMP deaminase
/db_xref=CDD:pfam00962
/note=A_deaminase
CDS
1..776
/gene=AMPD3
/db_xref=LocusID:272
/db_xref=MIM:102772
/coded_by=NM_000480.1:345..2675
7
GeneLooper User’s Manual
Procedure:
I. File preparation: There are several ways to download a large amount of entrez pages.
1. Download the original Entrez data from the NCBI FTP site. For example, the curated
human nucleotide and protein sequence pages can be downloaded from
ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Protein/hs.gbff.gz
and
ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Protein/hs.gnp.gz. Unzip the downloaded
files with WinZip (See Chapter 1 on how to use the WinZip).
2. Selectively download the original Entrez data from NCBI Entrez batch download site by
inputting a list of GI or Accession numbers:
http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide . After downloading
the list, it returns a listing of the brief descriptions of the retrieved sequences. Select
Genbank from the pull-down menu and click on the save button. Save the file in your
local drive.
3. Selectively download the original Entrez data from NCBI Entrez site by using keyword
search.
Nucleotide and protein Entrez pages can be downloaded from:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Limits&DB=Nucleotide
and
http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?CMD=Limits&DB=protein. Be sure
to use the limit function to define your search. Save the file to your local drive as
described in step 2.
II. Extraction:
1. Turn on GeneLooper 3.0 and then click on the Entrez Information Extraction button to
show the entrez information extraction form (Fig. 2).
2. Use the browse button to locate the original entrez data file.
3.
Select nucleotide or protein based on the type of sequences in the file. The default setting
is for nucleotide. There are twelve predefined fields both for nucleotide and protein files.
When the protein option is selected, the fields will be changed to the fields for protein.
4. Check any field from the twelve check buttons that you need to extract.
5. Check the Turn on MS Excel button if you need to monitor the process. The results are
also saved in the same folder.
6.
Click on the OK button to start. Locate the saved data following the on-screen
instructions after it completed.
Notes:
1. A total of twelve fields is pre-designed both for nucleotide and protein pages. The fields
selected for nucleotide are different from that for protein. The selected fields contain
most important information about the sequence. If a particular field is not in the list,
please contact GeneHarbor, Inc. for a customized version.
8
GeneLooper User’s Manual
2.
For the nucleotide page, only one row of records will be generated for each page.
However, for protein page, multiple rows of records are allowed to accommodate those
proteins with more than one domain (region). The samples of the results both from
nucleotide and proteins are shown in Fig. 3 and Fig. 4.
Figure 2A. Entrez information extraction form for nucleotides.
Figure 2B. Entrez information extraction form for proteins.
9
GeneLooper User’s Manual
Figure 3A. Samples of tabulated information from nucleotide pages.
Figure 3B. continued from A.
Column Legend:
A.
B.
C.
D.
E.
F.
Accession
Definition
Version
Organism
Tissue
Chromosome
G.
H.
I.
J.
K.
L.
Cytogenetics
Morbid ID
Gene Size
CDS
Protein ID
Vector
10
GeneLooper User’s Manual
Figure 4A. Samples of tabulated information from protein pages.
Figure 4B. continued from A.
Column Legend:
1
Accession
Definition
Version
Organism
Tissue
Cell Type
Chromosome
Cytogenetics
Morbid ID
Coded by
Signal Peptide
Region
Domain Name
Domain ID
Domain Code
11
GeneLooper User’s Manual
Chapter 3
Sequence Collection
Objective: Collect all “.seq” files in a folder and put them into a Q-fasta file or combine all
small Q-fasta files in a folder into a large Q-fasta file.
You have probably worked on many genes and the saved related sequences are scattered. To
find a particular sequence takes time and requires good memory. It will become even more
difficult when the number of “.seq” files reaches thousands. The Sequence Collection function of
GeneLooper 3.0 can help you put all your “.seq” files into a Q-fasta file. Each of the sequences
can then be opened and studied by using the Sequence View and Editing with a proper ID, and
they can also be batch-retrieved by using the Sequence Retrieve function.
Procedure:
1. Put your individual “.seq” files or small Q-fasta file to be collected or combined into a
file folder for this sequence collection. Start GeneLooper 3.0 and then click on the
Sequence Collection button. The Sequence Collection form will appear (Fig. 1).
2. Use the drive selection menu to find the drive and use the folder navigator to locate the
folder containing the sequence files to be collected. When double-clicking on the folder
name, all names under the folder will appear in the file list box located in the right side of
the form.
3. You have three options to collect the sequence files; 1) “.seq” files, 2) small Q-fasta files
and 3) all “.seq” and Q-fasta files. Select the “.seq” from the file-type option menu
located above the file list box to collect only the “.seq” files, select “.txt” to collect all Qfasta files and select “all” to collect both. Make sure that no other types of files are in the
folder.
4. The collected sequences can be saved to an existing Q-fasta file or as a new Q-fasta file.
In the first case, locate the existing Q-fasta file using the “Browse…” button. In the
second case, use the same button to locate the folder you want the new file to be stored
and then type a file name with “.txt” at the end, or you can use the default path and name:
“combined.txt” in the same folder as the sequence folder.
5. Click on the Collect File button. All of the sequence files will be quickly put into a Qfasta file in an order based on the alphabetic order of the file names. You can examine the
collected sequences in the saved file using Document Viewer in GeneLooper 3.0 or
other text editors. A sample of the Q-fasta sequence created is shown in Fig. 2.
Notes:
1. Each file name of the collected “.seq” file serves as its “Accession” number or
identification number (ID) in the descriptive section of the Q-fasta sequences and is
flanked by two “|” signs. In the case of collecting small Q-fasta files, the order and
content of sequences in the files are not changed.
12
GeneLooper User’s Manual
2. The above procedure only shows how to collect sequence files in one folder. To collect
all your “.seq” files on your personal computer to a Q-fasta file, you can simply put all of
them into one folder. To do so, first find all your “.seq” files by using the windows search
function by typing “*.seq” in the search box and performing the search. Second, select
and copy all the “.seq” files and paste them to a new folder. Using the method described
above, you can collect all of your “.seq” files and convert them to a Q-fasta file.
3. The Sequence Collection function is also very effective to collect “.seq” files created by
many sequencing machines and convert them into Q-fasta file.
Figure 1. Sequence Collection Interface.
Figure 2. A sample of collected sequences in Q-fasta.
"> |NM_000795| description"
"GGCAGCCGTCCGGGGCCGCCACTCTCCTCGGCCGGTCCCTGGCTCCCGGAGGCGGCCGCGCGTGGATGCGGCG
GGAGCTGGAAGCCTCAAGCAGCCGGCGCCGTCTCTGCCCCGGGGCGCCCTATGGCTTGACCCTGAGGAAGGAG
GGGAAGCTGCAGCTTGGGAGAGCCCCTGGGGCCTAGACTCTGTAACATCACTATCCATGCACCAAACTAATAA
AACTTTGACGAGTCACCTTCCAGGACCCCTGGGTAAAAAAAAAAAAAAA"
"> |NM_000839| description"
"CCATGGGATCGCTGCTTGCGCTCCTGGCACTGCTGCCGCTGTGGGGTGCTGTGGCTGAGGGCCCAGCCAAGAA
GGTGCTGACCCTGGAGGGAGACTTGGTGCTGGGTGGGCTGTTCCCAGTGCACCAGAAGGGTGGCAGTGCTGCT
GCCAGGGCCAGCTCCAGCCTTGGCCAAGGGTCTGGCTCCCAGTTTGTCCCCACTGTTTGCAATGGCCGTGAGGT
GGTGGACTCGACAACGTCATCGCTTTGA"
13
GeneLooper User’s Manual
Chapter 4
Sequence Separation
Objective: Create individual sequence files (.seq) from a Q-fasta file. This process is the
opposite of Sequence Collection (Chapter 2).
When performing large-scale sequence analysis, the creation of many individual
sequence files is often required, the so-called dot seq (.seq) files that are accepted by many
sequence analysis tools. One of the sequence sources used to generate these files is Q-fasta files.
The Sequence Separation function in GeneLooper 3.0 provides very efficient and flexible ways
to create “.seq” files. The sequence files can be created from any portion of a Q-fasta file, stored
in a common folder or their own folders, and readily used by GeneLooper 3.0 or other sequence
analysis tools from many vendors.
Procedure:
1. Start GeneLooper 3.0 and then click on the Sequence Separation button. The
Sequence Separation form will appear.(Fig. 1).
2. Use the Browse button to locate the Q-fasta file where sequences are about to be
separated. It takes a few seconds to load the file if the file contains more than
thousands of sequences. The program automatically detects the number of sequences
in the loaded file.
3. If you want to separate all sequences in the loaded Q-fasta file, skip the selection box
labeled “Define a portion…”. If you don’t want to separate all sequences in the
loaded Q-fasta file, you may indicate a portion of the file to be separated by using the
pair of pull-down menus to select the starting sequence and the ending sequence.
4. Choose the save file option to save the resulting “.seq” files in a common folder (see
Fig.2) or save each of the “.seq” files into its own folder (see Fig.3).
5. Click on the Separate Sequences button to begin.
6. Locate the newly created sequence files following the on-screen instructions after the
process ends. Sample files saved in a common folder are shown in Fig. 2, while
sequence files saved in their own folders are shown in Fig. 3.
14
GeneLooper User’s Manual
Figure 1. Sample of the Sequence Separation interface.
Notes:
1. All sequences are stored in a newly created folder named after the input file with an
appendix, a combination of the ending number of the separation and a “_sep”. If you
had previously created a folder with the same name, be sure to remove it prior to the
operation to avoid overwriting files and other errors.
2. The accession number or ID of a sequence is used as its folder and file name.
3. It is a good idea not to store too many separated sequence files in a single folder
because to locate a file sharing a folder with too many is quite slow. You can avoid
this by storing them in few different folders.
Figure 2. Sample files saved in a common folder.
15
GeneLooper User’s Manual
Figure 3. Sample files saved in their own folders.
16
GeneLooper User’s Manual
Chapter 5
Sequence Retrieving
Objective: Selectively batch-retrieve sequences from a Q-fasta file using a list of Accession
(ACCN), Genbank identification (GI) numbers or your own sequence IDs.
The ability to batch-retrieve sequences from your own sequence collections gives you more
power to deal with a large number of sequences and saves your time compared to online retrieval
of sequences and downloading them from other resources. The Sequence Retrieving utility in
GeneLooper 3.0 provides a fast and easy way to batch-retrieve sequences from your sequence
collections, even including those created by yourself or your organization so long as you have the
sequences in a Q-fasta file and a list of IDs. (Fig. 1).
Figure 1. Diagram of the sequence retrieving process
Procedure:
1. Prepare a text file (.txt) containing a list of accession numbers (GI’s) or the ID’s of your own
sequences. To generate the list file, you can use Microsoft Excel and put the ID’s in a single
column (column A only) and save the file as a “.txt” file. See below:
NM_014165
NM_014164
NM_014163
NM_014162
NM_014161
NM_014160
2. Start GeneLooper 3.0 and then click on the Sequence Retrieving button. The Sequence
Retrieving form will appear (Fig. 2).
17
GeneLooper User’s Manual
Figure 2. A sample of Sequence Retrieving interface.
3. Use the upper browse button to locate the Q-fasta file you think contains the sequences to be
retrieved. Use the lower browse button to locate the list file containing the ACCN’s, GI’s or
ID’s. (User Sequence Format utility to create a Q-fasta file as described in its procedure).
4. Select an ID type from the file type options. Click on ACCN if you have a list of Accession
numbers or your own ID’s, otherwise select GI if you have Genbank identification numbers.
5. Clink on the Retrieve button to start. It usually takes a few minutes to retrieve a few
thousand sequences. Once the process is complete, the form will display the file paths of a
Q-fasta file containing the retrieved sequences and a text file containing missed ID’s if any
ID’s failed in retrieving sequence.
Notes:
1. To ensure a successful retrieving, both the sequence file and the ID list file must be in
their correct formats.
2. A single Q-fasta file may not contain all of the sequences you desired. In case a failed
retrieving occurred, you can directly use the missed ID list file to retrieve them from
other Q-fasta files. Repeat the process until you have exhausted all of your collections.
18
GeneLooper User’s Manual
Chapter 6
Open Reading Frame Detection
Objective: Detect the longest open reading frames (ORFs) of multiple sequences in a Q-fasta
file and tabulate the ORF data along with other general information.
An ORF is an important piece of information about an mRNA sequence. From an ORF
sequence, a predicted protein sequence can be derived and subsequently used for studying its
functional domains. In general, for a full-length mRNA, the longest ORF is the region coding for
its protein. The sequence upstream the first ATG of the longest ORF is called the 5’ untranslated
region (5’UTR), and the sequence downstream the stop in the same frame is called the 3’
untranslated region (3’UTR) (Fig. 1).
Figure 1. The ORF map of an mRNA for demonstrating the terms used in the program.
Pre_Stop
5’ UTR
Longest ORF
First ATG
Stop
3’ UTR
Frame 1:
Frame 2:
Frame 3:
The ORF data illustrated in Fig. 1 represents a perfect case of a full-length mRNA. It has a
long stretch of region without any termination codon, and has at least one in-frame termination
codon before the first ATG. Therefore, one can be sure that this sequence represents a full-length
mRNA. It must be pointed out that not all real full-length mRNAs have an in-frame stop codon.
Most of the mRNA sequences in the databases are derived from cDNA sequences which are
frequently truncated at the 5’ end. It requires a great deal of effort to determine whether a cDNA
sequence is full-length. In hunting for new genes, it often requires work with raw data or
incomplete sequences such as EST sequences, which are normally a combination of DNA
sequences of coding regions, UTRs, partially spliced transcripts and genomic DNA
contamination. These sequneces are about 300 to 800 bp in length. If there is a high-throughtput
way to detect partial ORFs, it will help researchers to prioritize their effort in the gene discovery
process.
The ORF Detection function in GeneLooper 3.0 is a high-throughput program. The program
allows users to analyze either full-length or partial mRNA sequences and gives the most
informative and straightforward presentation of ORF data to the users. Fig. 2 demonstrates the
methods used in detecting ORFs’ and the nomenclatures used for data reporting.
19
GeneLooper User’s Manual
Figure 2. Diagram demonstrating the two methods used in detecting the longest ORF.
1. Complete ORF method
Pre_length
ORF_length
Pre_stop: yes
Stop-pos
Pre-stop
ATG-pos
Pre_length
ORF_length
Stop-pos
ATG-pos
Pre_stop: no
mRNA_length
2. Incomplete ORF method
5_ stop: yes
In-frame ATG: yes
3_ stop: yes
Stop
Stop
ORF_length
ATG
ATG
5_stop: no
In-frame ATG: yes
3_stop: no
5_stop: no
In-frame ATG: no
3_stop: no
5_stop: yes
In-frame ATG: yes
3_stop: no
ORF_length
ORF_length
Stop
ORF_length
ATG
mRNA_length
Procedure:
1. Start GeneLooper 3.0 and then click on the Open Reading Frame Detection button.
The ORF detection form will appear (Fig. 3).
2. Use the browse button to locate the Q-fasta file to be analyzed. (User Sequence
Format utility to create a Q-fasta file as described in its procedure.)
20
GeneLooper User’s Manual
3. Select a method for detecting ORF in the ORF option panel. There are two options:
complete ORF method and incomplete ORF method. A complete ORF is defined if it
has an ATG at the 5’ end and a stop at the 3’ end. This option is suitable for fulllength mRNA’s. An incomplete ORF is defined as a region without any stop codon
disregarding whether there is an ATG and/or a termination at the end. This option is
designed for raw sequence data or partial sequences such as EST’s for an initial
screen of potential coding sequences. In both methods, the longest ORF for each
sequence will be reported (see Fig.2 for detail).
4. Check the Turn on MS Excel if you want to monitor the detection process. A copy
of the results will be automatically saved in the same folder as the input file and the
same file name with an appendix of “_Lorf.txt” (complete ORF) or “_Porf.txt”
(incomplete ORF) based on the method used. If the Turn on MS Excel is not
checked, the saved data file will appear in a spreadsheet of the Data Viewer after the
process is completed.
5. Click on the Start button to start. Samples of ORF data are shown in Fig. 4 and 5.
Figure 3. A sample of the Open Reading Frame Detection interface.
Notes:
1. An ORF reported here is the longest ORF of a sequence. It may differ from its coding
sequence (CDS) described in the NCBI records because NCBI may not use the first ATG
as the initiation codon and may not use the longest ORF as the coding sequence. To
compare your obtained data with NCBI’s data, you can see chapter 12 on how to extract
information from the Entrez records.
2. The ORF detection function also extracts other information from a Q-fasta file, such as
accession, GI and description. Therefore it is useful when you want to get a list of GI or
ACCN from a Q-fasta file.
21
GeneLooper User’s Manual
3. Detecting the ORF’s of all available full-length mRNA sequences from human takes
about an hour, while the returned data is extremely informative. From this type of
tabulated data and by a few calculation steps, you can answer some fundamental
biological questions, such as the average size, the size-distributions of human transcripts,
the average sizes of the coding regions and 5’ untranslated regions. The incomplete ORF
detection method allows you to study the coding regions of partial cDNA’s, for example,
to screen those with relatively long incomplete ORF’s because they are more likely to be
genuine transcripts coding for proteins. The resulting table can be also used as the general
information of your gene collections.
Figure 4. Sample of the data file obtained by using the complete ORF method:
Figure 5. Sample of the data file obtained by using the incomplete ORF method:
22
GeneLooper User’s Manual
Chapter 7
Multi-Sequence Alignment
Objective: Perform multi sequence to multi-sequence similarity search. Provide both a tabulated
summary of similarity data as well as detailed whole sequence alignments of
homologous sequences in document files.
Many public domains and private sectors provide free-access, web-based sequence
blasting services using the Blast program (Basic local alignment tools). It has been proven to be
very effective in searching sequence homology, especially against a large data set. However, most
of these service sites only allow single sequence blast and provide a standard alignment report. To
set up a high-throughput blast facility requires a great deal of equipment and human resources.
Furthermore, to tabulate the results of multiple-sequence blasting data to a summary is quite
challenging.
The Multi-Sequence Similarity Search utility in GeneLooper 3.0 is designed for a search
algorithm that is more flexible in selection of data sets, high-throughput, low-cost and with a
better data presentation. It is very effective when performing multiple sequence similarity
searches against a relatively small data set such as the whole transcripts of all human genes
(35,000 sequences).
Features:
1. Search sequences in flexible data sets with adjustable searching parameters.
2. High-throughput: performing thousands of sequences at once.
3. Instant data report: the tabulated data containing useful information including accessions,
lengths, ORF, and proximal homologous locations of the two sequences.
4. Completed sequence alignments instead of fragmented alignments.
5. Saved sequences of query and returned sequences in a common folder for further study.
6. A search process can be interrupted and resumed easily.
Procedure:
1. Start GeneLooper 3.0 and then click on the Multi-Sequence Blast button. The MultiSequence Similarity Search interface will appear (Fig. 1).
2. Use the browse button in the top box to locate the sequence database file in Q-fasta.
(User Sequence Format plus strand and the other with the reverse complement strand.
This program only searches one strand against the sequences of a database, in order to
reduce computing time.
3. Select the Save sequence option if you need the individual sequence files to do further
homologous sequence analysis.
4. Select the Save alignment option if you need a detailed, print-out-ready sequence
alignment to review later. The alignment of each queried sequence and its homologues
23
GeneLooper User’s Manual
will be saved in individual document files. A search with this selection will take a slightly
longer time than that without this selection.
5. Check the Turn ON MS Excel button if you would like to monitor the search process.
6. Click on the Start button to begin. Locate the saved data following the on-screen
instructions after the process is finished. Samples of returned data are shown in Fig. 25.utility to create a Q-fasta file as described in its procedure.)
7. Use the browse button in the lower box to locate the query sequence file in Q-fasta.
8. Select search stringency. The default setting is 30 for reporting low homologous
sequences. See the notes below for more details.
9. Select a strand for a similarity comparison. The default setting is the plus strand. To
compare sequence with both strands, you can perform two separate searches.
Figure 1. Multi-Sequence Similarity Search interface.
Notes:
1.
This program can be used both for nucleotide and protein sequences, but is not suitable
for a large-sized DNA such as genomic DNA (>60,000 bp).
24
GeneLooper User’s Manual
2.
The search speed is dependent upon the speed of the PC, the search stringency and the
size of the database. It will take about 10 hours to search one thousand sequences against
twenty thousands sequences using a regular PC (1.5 GHz, 500 RAM).
3.
A lower setting on the search stringency (<50) results in sequences with low similarities
and usually slows down the program, while a higher setting (>100) only reports those with
higher similarities. You may get a proper setting for your needs by testing some sequences
you know.
4.
The scoring system used in the resulting report is quite different from that used by the
Blast 2 program. The higher the score, the higher similarity.
Figure 2. The tabulated similarity report. (A)
(B) continued from A.
Column specification:
A. Query_ID: sequence search order.
B.
Query_Accn: the query accession number.
C.
Query_Size: the length of query sequence.
D. Mtch_ACCN: the accession of returned sequence.
E.
Mtch_Name: the description of returned sequence.
F.
Mtch_Seq_Size: the length of returned sequence.
G. Mtch_ORF: the ORF length of returned sequence.
H. Mtch_atgpos: the initiation codon (ATG) position of the returned sequence.
I.
Mtch_stoppos: the termination codon (stop) position of the returned sequence.
25
GeneLooper User’s Manual
J.
K.
L.
Q_beginpos: the proximal location of matching from 5’ end of the query sequence.
M_beginpos: the proximal location of matching from 5’ end of the returned sequence.
Score: the similarity value used by GeneLooper 3.0.
Figure 3. Sample of saved sequences.
Figure 4. Sample of saved alignment files.
Figure 5. A sample of the saved whole sequence alignments opened with Document Viwer
26
GeneLooper User’s Manual
Chapter 8
Restriction Site Search
Objective: Search for restriction sites of multiple sequences in a Q-fasta file.
When dealing with a large number of sequences, it sometimes requires analysis of the
restriction sites for many nucleotide sequences. The restriction information of DNA sequences is
important for designing experiments such as subcloning. Traditionally, searching restriction sites
is done on one sequence at a time, and conceivably, it is inefficient when multiple sequences are
involved in an experiment. The Restriction Search function of GeneLooper 3.0 provides a simple
solution for the purpose.
Procedure:
1. Start GeneLooper 3.0 and then click on the Restriction Search button. The Restriction
Site Search form will appear (Fig. 1).
Figure 1. The Restriction Site Search interface.
2. Use the browse button to locate the sequence file in Q-Fasta format. (Use Sequence
Formatting utility to create a Q-fasta file as described in its procedure).
3. Select the restriction enzymes you want in the frequently used enzyme panel. If a desired
enzyme is not in the panel, it can be entered from the custom sites by selection from the
pull-down menu. If the enzyme is not in the pull-down menu, it may be manually entered
by typing the enzyme name and its recognition site into the name and site boxes
respectively. For custom sites, it only allows one enzyme for each search. If multiple
27
GeneLooper User’s Manual
customized enzyme sites are needed, you can easily do multiple searches and combine all
results in one.
4. Select the Turn on MS Excel check box if you wish to monitor the search process. The
results of a search are always saved in the input file folder using the input file name with
an appendix of “_res”.
5. Click on the Detect sites button to start. The results will update in an Excel sheet
immediately (Fig. 2) if the function is selected. Otherwise, after the process is complete,
the results will be shown in the DATA Viewer sheet.
Figure 2. Sample of the restriction search data.
Notes:
1. The data from each selected enzyme will occupy a column in the resulting spreadsheet,
and the location of each site in a sequence is put in a pair of brackets in the corresponding
columns.
2. The ORF information is presented because the ATG and Stop positions are often taken
into account when deciding whether a particular enzyme site is usable. How to use the
restriction data is merely based on user’s needs. However, simple sorting of the enzyme
column is one of the effective ways to group DNA sequences based on a restriction
enzyme site.
28
GeneLooper User’s Manual
Chapter 9
Translation and Reverse Complement
Objective: 1. Translate mRNA sequences into protein sequences.
2. Generate reverse complement strand.
Determination and translation of the coding regions of mRNA sequences are important steps
in gene annotation and gene mining since protein sequences are rich in information that leads to
the clues about the biological functions of a protein. For example, conserved domains are derived
from protein sequences. The ability to directly translate multiple DNA sequences into proteins
using a PC gives researchers an advantage in studying a large number of genes.
Features:
1. Translate thousands of sequences in a short period of time.
2. Multiple ways to translate: direct translation, translation of the longest ORF or translation
of all three frames.
3. Instant data report including protein length and molecular weight.
4. Translated protein sequences are saved in a Q-fasta file.
Procedure:
1. Start GeneLooper 3.0 and click on the Translation and Reverse Complement button. The
Translation and Reverse Complementing form will appear (Fig. 1).
2. Select a method to translate the sequences. There are three options: 1) to translate the longest
ORF, which tells the program to automatically detect the longest ORF and then translate it;
2) to translate the sequence from the beginning of each sequence, and 3) to translate all three
frames. With the last option, Frame 1 is to start translation at the position of base 1, frame 2 is
to start at the base position 2 and frame 3 at the base position 3.
3. Select the Turn on MS Excel check box if you wish to monitor the translation process. Once
it starts, a summary of the results of a translation is immediately displayed in an Excel sheet
(Fig. 2).
4. Click on the Translation button to start the program.
5. When the program is complete, follow the on-screen instructions to locate the translated
protein sequences in a Q-fasta file stored in the input file folder. The file name is the DNA
file name with an appendix of “Lorf_pep”. In addition to the protein sequence file, a
summary file is also saved in the input file folder and is also named after the input file with
an appendix of “_pep_info”. The summary file can be opened in the Data Viewer of
GeneLooper. A sample of the translated protein is shown in Fig. 3A and B.
29
GeneLooper User’s Manual
Figure 1. The Translation and Reverse Complement interface.
Figure 2. Sample of the summary of a translation process.
Notes:
1. The descriptive lines of a translated protein in the Q-fasta file are the same as that of its
DNA sequence except that the word “protein” is added at the end of each description.
2. When direct translation or three-frame translation is selected, internal terminations may
occur. Each termination codon is represented by a “*”.
30
GeneLooper User’s Manual
3. The names for all three proteins translated from a DNA sequence by the three-frame
method are the same except a plus number (+1, +2 or +3) is added after the “>” sign to
indicate which frame the protein is encoded. See Fig 3B.
Figure 3A. Sample of protein sequences translated by the ORF method.
">gi|4758283|ref|NM_004441.1| Homo sapiens EphB1 (EPHB1) mRNA"
"MALDYLLLLLLASAVAAMEETLMDTRTATAELGWTANPASGWEEVSGYDENLNTIRTYQVCNVFEPNQ
NNWLLTTFINRRGAHRIYTEMRFTVRDCSSLPNVPGSCKETFNLYYYETDSVIATKKSAFWSEAPYLKVDT
IAADESFSQVDFGGRLMKVNTEVRSFGPLTRNGFYLAFQDYGACMSLLSVRVFFKKCPSIVQNFAVFPETM
TGAESTSLVIARGTCIPNAEEVDVPIKLYCNGDGEWMVPIGRCTCKPGYEPENSVACKACPAGTFKASQEA
EGCSHCPSNSRSPAEASPI……"
">gi|4503680|ref|NM_003890.1| Homo sapiens IgG Fc binding protein (FC(GAMMA)BP) mRNA"
"MGALWSWWILWAGATLLWGLTQEASVDLKNTGREEFLTAFLQNYQLAYSKAYPRLLISSLSESPASVSIL
SQADNTSKKVTVRPGESVMVNISAKAEMIGSKIFQHAVVIHSDYAISVQALNAK….."
Figure 3B. Sample of protein sequences translated by the three-frame method
">+1|4758283|ref|NM_004441.1| Homo sapiens EphB1 (EPHB1) mRNA (protein)"
"EFHMHTHTHARPHRPTRTHSCPRPRSAPGSPVRARARKDTEKPPAESAAAPWDAALSRRCCLGLVSACG
PSAGDGPGLSTTAPPGIRSGCDGRNVNGHQNGYCRAGLDGQSCVRVGRSQWLR*KPEHHPHLP…… "
">+2|4758283|ref|NM_004441.1| Homo sapiens EphB1 (EPHB1) mRNA (protein)"
"NSTCTPTPTRARTAPRAHTPAHAHAALREVRSGRERERIPRSHPRRAQRRPGTRRSPGAAASAWSRPAGR
RPAMALDYLLLLLLASAVAAMEETLMDTRTATAELG….. "
">+3|4758283|ref|NM_004441.1| Homo sapiens EphB1 (EPHB1) mRNA (protein)"
"IPHAHPHPRAPAPPHAHTLLPTPTQRSGKSGPGESAKGYREATRGERSGALGRGALPALLPRLGLGLRAV
GRRWPWIIYYCSSWHPQWLRWKKR*….. "
">+1|4503680|ref|NM_003890.1| Homo sapiens IgG Fc binding protein (FC(GAMMA)BP) mRNA (protein"
"LQPWVPYGAGGYSGLEQPSCGD*PRRLQWTSRTLAERNSSQPSCRTISWPTARPTPASLSPVCQRAPLQSP
SSARQTTPQRRSQ*GPGSRSWSTSVPRLR**AARSSSMRW*SILTMPSLCRH*MPSLTQRS*HCCGPSRP*
Reverse Complement
Occasionally, one needs to study the reverse complement strand such as sequence alignment
or antisense. The function to generate a reverse complement strand is added in conjunction with
the translation function.
To derive a reverse complement strand, select Reverse complement from the option
panel, consequently, activating the Reverse complement button and inactivating the translationrelated buttons. Use the browse button to input the sequence file in Q-fasta format, and then click
on the Reverse complement button. Once the process is complete, the reverse complement
sequence file in Q-fasta will be saved in the input file folder and named after the input file name
with the appendix of “_rev.txt”. The descriptive lines for the reverse complement strand are the
same as that of its input sequence with the additional words “reverse complement strand” at the
end.
31
GeneLooper User’s Manual
Chapter 10
Batch Oligo Design
Objective: Design optimal oligos for multiple sequences.
When studying genes in a large quantity, it is not unusual to design many oligos. For
instance, subcloning cDNA fragments from the members of a gene family to a particular vector or
generating multiple cDNA fragments for the production of a DNA array is dependent upon a
large amount of oligos used in polymerase chain reaction (PCR). Automation and optimization of
the designing process can save time and reduce the cost.
Features:
1.
2.
3.
4.
5.
Design optimal oligos for multiple sequences with flexible parameter settings.
Innovated algorithm: select a best pair of oligos from predefined regions.
Adjustable designing regions based on the ORF information.
Automatically add restriction sites and protect bases if it’s needed.
Instant data report.
Procedure:
1. Start GeneLooper 3.0 and then click on the Batch Oligo Design button to show the design
form (Fig. 1).
2. Use the browse button to locate the input DNA sequence file in Q-fasta format. (Use
Sequence Formatting utility to create a Q-fasta file as described in its procedure).
3. Check either Forward or Reverse Oligo if only one oligo is needed for a sequence. Check
both if a pair of oligos is needed. The Set Parameter buttons underneath become enabled after
the selection.
4. Click on the Set parameter button for the forward oligo to define the region in which the
forward oligo is located. Click on the Set parameter button for the reverse oligo to define the
region in which the reverse oligo is located. The parameter setting forms will appear as
shown in Fig. 2 A and B.
5. Set the parameters for a forward oligo:
5.1. Define the left boundary. There are four starting points in a typical mRNA sequence: 5’
end, ATG, Stop and 3’ end. You can select any of these for the starting point. For
example, let’s select the ATG as the reference point and then set the left boundary at a
point which is 100 bases upstream of the reference point, here we type in –100. The
minus sign indicates that it is the upstream of the ATG. See Fig. 3 for the terms used in
the parameter forms.
5.2. Define the right boundary. Let’s again use the ATG as the reference point and input 1
in the input box. With the settings, the program will design an oligo in the region from
100 bases upstream of the ATG to the letter A of the ATG in every sequence.
32
GeneLooper User’s Manual
Figure 1. The Batch Oligo Design interface.
Figure 2A. Forward Oligo Parameter form.
Figure 2B. Reverse Oligo Parameter form.
Figure 3. Diagram referencing location- and direction-terms used in the parameter settings.
5’ end
ATG
Stop
-
3’ end
+
33
GeneLooper User’s Manual
Note: There are four frequently used points for a typical mRNA sequence: the 5’ end, ATG, Stop and 3’ end. Any of
the four can be used as a reference point. The counting bases from the point towards the 5’ end is considered as
minus, and plus if towards the 3’ end.
5.3. Add a restriction site from the pull-down menu if needed and type the protection bases
if it is necessary. Click the OK button to close the forward parameter setting form.
6. If a reverse oligo is needed, click on the Set parameter button to define the region in which
the reverse oligo is located. The reverse parameter setting form will appear. (Fig. 2 B).
6.1. First, similar to set the forward parameters, set the left boundary. Let’s use the Stop as
the starting point and type “1” in the box next to the Stop label.
6.2. Second, define the right boundary, let’s again use the Stop and type 200 in the box next
to the “Stop” label. With the settings, the oligo will be designed in the region between
the stop and 200 bases downstream of the stop.
6.3. Add a restriction site from the pull-down menu if needed and type the protection bases if
it is necessary. Click on the OK button to close the reverse parameter setting form.
7. Set the annealing temperature (Tm) of the oligos from the pull-down menu in the batch oligo
design form. The annealing temperature will be used for every oligo.
8. Select the Turn on MS Excel button if you need to monitor the designing process.
Disregarding the selection, the results are always saved in the input file folder with a file
name that is the input file name and an appendix “_oligo.txt”.
9. Click on the Design Oligo button to start. After it finished, locate the data by following the
on-screen instructions. A sample of the designed oligos in a resulting file is shown in Fig. 4.
Figure 4A. Sample oligos designed by the program.
44
GeneLooper User’s Manual
B. Continue from A.
Column Legends:
A. Number: the accession of the input sequence.
B. Oligo1Name: the accession and “_F” for the forward oligo name. If the
region is too short to design an oligo, it will add the word “impossible” in the
cell.
C. Oligo1Seq: the sequence of the forward oligo.
D. Score1: the penalty score for the forward oligo. The lower the score, the
better the oligo.
E. Fpos: the position (5’ end ) of the forward oligo in the input sequence.
F. Oligo2Name: use the accession and “_R” for the reverse oligo name.
G. Oligo2Seq: the sequence of the reverse oligo.
H. Score2: the penalty score for the reverse oligo.
I. Rpos: the position (5’ end) of the reverse oligo in the input sequence.
J. FragmentSize: the result of Rpos minus Fpos.
45
Part II
Single Sequence Utilities
GeneLooper User’s Manual
Chapter 11
Sequence Viewing and Editing
Objective: View, edit and manipulate nucleotide and protein sequences.
Procedure: Start GeneLooper 3.0 and then click on the Sequence Viewing and Editing
button to show the form (Fig. 1).
Figure 1. Sequence Viewing and Editing Interface.
1. Load sequence: There are three ways to load sequences to the sequence pages. Click on the
File menu to reveal the sub-menus (Fig. 1). The Sequence Viewer can open sequence files
created by most sequencing machines and software and allows the loading of multiple
sequences. You may load up to 50 individual sequences into 50 viewer sheets at once.
A.
Option 1: Click on Open to load a single file which can be a sequence file with a “.seq”
extension or a Q-fasta file. The first 50 sequences in the Q-fasta file can be loaded on 50
viewer sheets.
B.
Option 2: If you want to work on many individual sequence files located in a folder, click
on Open a folder of sequences on the File menu and use the folder navigator to locate the
sequence folder to open. Then click on OK. It can load up to 50 “.seq” files in this folder.
(Fig 2)
C.
Option 3: If you want to choose sequences in a Q-fasta database file to view, click on
Open from a local database to open the dialog box shown in Fig. 3. Locate the database
file (Q-fasta) after clicking on the browse button. Click on Get all IDs button and all ID’s
42
GeneLooper User’s Manual
(ACCN) of the sequences in the file will be loaded into the pull-down menu. From this you
may select a sequence from the pull-down menu to view, or type the ID of the sequence in
the Input sequence ID box. Click Ok to load the sequence onto a viewer sheet. The typein ID can also be the GI of a sequence.
Figure 3.
After opening a sequence on the viewer sheets, define the sequence type by selecting the
Nucleotide or Peptide Sequence option. The default setting is nucleotide. The screen above the
sequence box dynamically shows the sequence information such as the size, GC content if it is
nucleotide, or molecular weight if peptide and the selected location. A sample of sequences
loaded on the viewer is shown in Fig. 4.
Figure 4.
2.
Sequence Editing: Use the sub-menus under the Edit menu or the icon shortcuts to
perform standard sequence editing, such as Cut, Paste, Copy, Select all and Change cases.
Under the Edit menu, the Filter function removes any characters other than 26 alphabetic
letters. If the filter option on the form is checked, a sequence will be auto-filtered when
pasted from the clipboard.
43
GeneLooper User’s Manual
3.
Detection: Click on the Detect menus to show the detection functions (Fig. 5).
A. First Open Reading Frame: detects the first ORF after the cursor position.
B. Longest ORF: detects the longest open reading frame of the entire sequence.
C. Longest Incomplete ORF: detects the longest region without a termination codon. See
chapter 3 for details about ORF definition. This method is useful for examining the 5’
UTR and raw data.
D. ORF Map: detects all ATG’s and termination codons in all three frames and presents
them in a map (Fig. 6).
Note: Use the shortcuts to run each function.
Figure 5. The sub-menus under the Detect menu.
Figure 6. A sample of an Open Reading Frame (ORF) map. Each of the long vertical lines
represents a termination codon and the short ones represent ATG locations. The adjacent
nucleotides of the sequence can be navigated at the bottom when moving the cursor on each
of the sequence lines.
44
GeneLooper User’s Manual
E. Short Sequence search: Click on the Search menu to show the search box shown in Fig.
7. Type in the sequences to be searched and then click on the OK button. The match is
highlighted on the viewer sheet. The search is case-sensitive and proceeds after the
cursor
position
Figure 7. The Search Input box.
F. Restriction Site: Click on the menu to export the current sequence to the Restriction
Analysis form. See Chapter 14 for details.
G. Oligo Location: Click on the menu to export the current sequence to the Oligo Database
Search form. See Chapter 15 for details.
4. Function:
A. Reverse Complement Strand: makes
a reverse complement sequence to
replace selected sequence or whole
sequence if no selection.
Figure 8. Sub-menus under the Function menu
B. Protein Sequence: translates the
selected sequence, or whole sequence if
no selection, to protein and places it in a
new form.
C. Sequence Similarity Search: exports
the current sequence to the Local
Sequence Similarity Search form. See
Chapter 16 for details.
D. Align Sequence: exports all openedsequences to the Sequence Alignment
form.
E. Oligo Design: exports the current
sequence to the Oligo Design form.
F. Hydropath Plot: generates a hydropath plot of the current protein sequence. The
mechanism is based on the Kyte and Doolitte method (Fig. 9, see Chapter 10 for more
information). Move the cursor on the curve on the left panel to view its zoom-in picture
on the right panel.
45
GeneLooper User’s Manual
Figure 9. A sample Hydrophobocity Plot of a protein sequence.
G. Journal Format: exports the current sequence to the Sequence Journal Format form.
See Chapter 16 for details.
46
GeneLooper User’s Manual
Chapter 12
Restriction Analysis
Objective: Restriction analysis of DNA sequence, virtual enzyme digestion, gel
electrophoresis, and vector map drawing.
Restriction digest and agarose gel electrophoresis are routine experiments in a molecular
biology laboratory. Easy-to-use and informative software can assist researchers to design a
successful experiment. The Restriction Analysis function is one of the innovative designs in
GeneLooper 3.0. It has an all-in-one interface on which users can see all enzyme sites grouped by
the number of cuts in the DNA sequence and test any enzyme digestion with a single click. It also
provides the open reading frame (ORF) information of a sequence, which is an important factor in
deciding which enzymes can be used. A typical restriction form is shown in Fig. 1.
Figure 1. The Restriction Analysis interface.
Procedure:
I. Load a DNA sequence:
Click on the Browse button and then select a DNA sequence file with a “.seq” extension to
open. Any letters other than 26 alphabetic letters in the sequence will be filtered
47
GeneLooper User’s Manual
automatically. A sequence can also be imported from the Sequence Viewing and Editing
form you are using. The restriction search is case-insensitive.
Once a sequence is loaded to the sequence box, the program automatically detects the longest
ORF and draws the positions of the ORF in the picture box. It also searches all restriction sites in
the sequence and groups the enzymes into four classes based on the number of sites found in the
DNA sequence. The names of the enzymes are stored in the four pull-down menus, namely no
site, 1 site, 2 site and 3 and more. The sequence in the box can be cut and pasted, and any
change in the sequence will trigger a new round of ORF detection and restriction search process.
II. Enzyme Selection:
1. Select any enzyme from one or more of the three pull-down menus (1 site, 2 site and 3
and more). The names and locations of the selected enzyme will appear both in the lower
text box and the picture box. If the enzyme has multiple cutting sites it will appear
multiple times at the distinct locations.
2. If you want to choose all enzymes in a pull-down menu, click on the Show All button
under the menu (optional). The names and locations of all enzymes that can digest the
sequence will be shown in both the text box and the picture box.
3. If the name labels in the picture box are overlapping each other, use your mouse to
separate them by holding down the left mouse button and dragging a label up or down.
4. Click on the Undo Pick button to unload all selected enzymes.
5. Click on the Hide ORF button to hide ORF information.
6. Click the Copy map button to copy the restriction map to the other document programs
such as MS PowerPoint.
III. Virtual Gel:
1. Select any combination of restriction enzymes from the pull-down menus.
2. Choose the Linear or Circle option based on the nature of your DNA.
3. Click on the Virtual Gel button to see a computer-generated gel picture similar to the
one shown in Fig. 2. Separate any overlapping band-labels by holding down the left
mouse button and dragging a label up or down.
4. Click on the Reset button to perform another digest by repeating the step 1 to 3.
Note: only the 1 kb ladder (Courtesy of Invitrogen) is used as the DNA Marker. The virtual gel
picture can be copied and pasted into other document programs.
48
GeneLooper User’s Manual
Figure 2. Sample Virtual Agarose Gel electrophoresis.
Draw a plasmid map:
1. Select the restriction enzymes with which you wish to label the plasmid map.
2. Click on the Show Circle Map button. The plasmid map drawing form will appear.
3. Use the line width menu to adjust the line width of the circle.
4. To change the circle size, check the Change the circle size box, move your cursor to the
circle, hold down the left mouse button and move it. Moving it toward the center will
reduce the circle size, whereas moving it away from the center will increase the circle
size. Once the size is settled, uncheck the Change size box immediately to fix the
drawing.
5. Separate any overlapping labels by holding down the left mouse button and then moving
the label to the desired location.
49
GeneLooper User’s Manual
6. To add a new label, click on the Add label button. The label enter box will appear.
Select font and font size, type the label and hit the Enter key. Use the mouse point to
move the label to a desired location as described above.
7. Click on the Undo label button to remove all newly added labels if needed.
8. The picture can be copied and pasted to other programs such as MS PowerPoint.
Figure 3. The Vector Map Interface
50
GeneLooper User’s Manual
Chapter 13
Sequence Alignment
Objective: Perform one-on-one sequence alignment and present alignment data in multiple
ways.
Sequence alignment is one of the most frequently used operations in DNA and protein
sequence analysis. It can give detailed information about the sequence relationships between two
DNA or protein sequences. Therefore, from the alignment results, it is possible to identify
conserved regions that may suggest their similarities in biological functions. The Sequence
Alignment function in GeneLooper 3.0 is innovative, sophisticated and informative.
Features:
1. Easy to input sequences: a pool of sequences can be loaded simultaneously.
2. Sophisticated alignment algorithm: searches the best alignment between two
sequences.
3. Protein sequences can be translated directly from the loaded DNA sequences and
aligned to immediately test the consequence of a difference in nucleotide sequence.
4. Align the reverse complement strands derived from loaded DNA sequences.
5. Adjustable and detailed alignment diagram.
6. Editable whole sequence alignment report.
7. Tabulated base-to-base comparison: ideal for SNP analysis.
Figure 1. The Sequence Alignment interface.
51
GeneLooper User’s Manual
Procedure:
1. Input sequences:
A. Click on the Import Seq button to import a single sequence.
B. Click on the Import Folder to import all sequences in a folder.
C. Import from the Sequence Viewing and Editing form.
Note: The sequences can be proteins or nucleotides. All names of imported sequences are stored
in the pull-down menu and will be kept available until the Clear All button is clicked or the
form is closed.
2. Load two sequences for alignment:
A. Select the first sequence name from the pull-down menu as SeqI, consequently, its
sequence will be loaded to the SeqI row in the sequence box.
B. Select the second sequence name from the pull-down menu as SeqII, and the sequence
will go to the SeqII row in the sequence box.
Note: The proteins encoded by the two DNA sequences can be aligned after translating the DNA
into protein. There are fours methods in the pull-down menu on the Translation panel to
choose for the translation process. Use the Rev complement function to generate the
reverse complement of a loaded sequence.
3.
Click on the Align button to align the two sequences. The identical base pair in the two
sequences will be connected by a “|”. The alignment box only can show a portion of the
alignment. To see other regions of the alignment, move the cursor left or right within the
lower picture box to view the entire aligned sequences (see below). To stop the sequence
from moving, click on the left mouse button. To resume sequence alignment movement, click
on the left mouse button again.
4. Click on the Draw Diagram to create an alignment diagram in the lower picture box (Fig. 1).
In a diagram, the thin horizontal line indicates a dissimilar region between the aligned
sequences, while the blue-colored bar represents the homologous region. The location of a
homologous region is labeled precisely, and the entire drawing is in proportion to the lengths
of the sequences.
Note: A significant homology displayed in the sequence diagram is adjustable by changing the
setting in the Draw parameters panel. The default setting is 50 bp long and 100 percent
identities. Any region fulfilling the requirements will be represented by a blue bar. In the
case of 100 percent identity, the labels at the junction of blue bars (match) and the thin line
(mismatch) indicate the exact transition positions. If this parameter is set less than 100
percent, they only indicate approximate positions of the transition. The default setting is
very useful to identify the junctions of exons when comparing two splice variants. The
percent of identities is defined as the number of matched bases of SeqI divided by the
length of SeqI in a defined region by the two guidelines.
5. To separate any overlapping labels, point the cursor to a label, hold down the left mouse
button and move it. Click on the Show guide labels button to see two guidelines for
dynamically calculating the identities of the region flanked by them. The guidelines can be
moved by moving the cursor when holding down the left mouse button. The two guidelines
can be hid by clicking on the Hide guide labels and revealed by clicking the same button
again.
52
GeneLooper User’s Manual
6. Click on the Copy Diagram button to copy the diagram to the clipboard and paste it to other
document programs such as MS PowerPoint.
7. Click on the Page View to view the whole sequence alignment (Fig. 2). You can set the row
width by changing the number of bp/row in the pull-down menu above the Page View button.
Figure 2. Sample of whole sequence alignment in a page view.
Note: The alignment is shown in a document editor so that it can be modified as a document file.
The identities are defined as the number of matched bases in SeqI divided by the total
length of SeqI. The SeqI and SeqII can be exchanged during the loading sequence process
(step 2).
8. Click on the Table View button to view a tabulated base-to-base alignment (Fig. 3). All bases
of the two sequences will be aligned vertically in two columns, and each base-position
relative to its 5’ end is precisely marked in the adjacent columns. The Table View function is
very useful for SNP study.
A.
B.
C.
Click on the Show Mismatch button to show all mismatched bases.
Click on the Show Gap to show all bases in the gap regions
Click on the Show Match button to show all matched bases.
53
GeneLooper User’s Manual
Figure 3. A sample of base-base alignment in a table view.
9. Click on the Reset button to clear all current alignment data and repeat the above steps to
perform a new alignment for another pair of loaded sequences. Click on the Clear All button
to unload all the loaded sequences.
54
GeneLooper User’s Manual
Chapter 14
Local Sequence Similarity Search
Objective: Perform a sequence similarity search against sequences stored in the hard drive of
your personal computer.
Sequence similarity search is usually done through public domains such as NCBI or your
organization’s facility. NCBI provides a free-access service and does a great job. However, since
the workload for NCBI is huge, it may take a long period of time to retrieve data. There are many
cases in which you only want a quick similarity search against a relatively small data set. The
Local Sequence Similarity Search in GeneLooper can do just this in your personal computer and
has some unique features. It is extremely useful to search one member of a gene family against all
other members of the family saved in a common folder, or search similarities among the members
of a cluster saved in a folder created by the Sequence Clustering utility (see Sequence Clustering
for detail).
Features:
1. Quickly run a similarity search for a selected sequence against your sequence
databases that are saved in Q-fasta files and/or “.seq” files in your PC. The
searchable database can be many “.seq” files, or mixed with many Q-fasta files so
long as the files are all in a common folder.
2. Provide a whole sequence alignment instead of fragmented alignments for every
positive sequence.
3. Adjustable search stringency.
4. Reliable: it uses your own computer so you can count on it.
55
GeneLooper User’s Manual
Figure 1. A sample of the local sequence blast.
Procedure:
1. Start GeneLooper 3.0 and then click on the Local Sequence Blast button to open the
search form (Fig. 1).
2. Loading database files: click on the Browse button in the Select database panel to open
the file explorer form (Fig 2). Use the file explorer to select the folder containing the
sequence files that will be used as the search database. The files can be “.seq”, Q-fasta or
a mixture of the two. Files other than these two formats must be removed to avoid
operating errors. Click on the OK button to close the form and all file names from the
folder will appear in the pull-down menu on the Select Database panel as well as the
pull-down menu in the Query Sequence box (Fig. 1).
Figure 2. The file explorer for loading database files with (.seq) and (.txt) extensions.
56
GeneLooper User’s Manual
3. Input a query sequence in one of the four ways:
A. Select a sequence name from the pull-down menu. This option is designed for
searching sequence similarity among sequences in the chosen folder.
B. Open a sequence from a saved sequence file by using the file open browser.
C. Import from the Sequence Viewing and Editing form.
D. Copy and paste a sequence directly from an opened sequence document.
4. Select a database name in the pull-down menu loaded in step 1. You can select any one of
the files or all files by selecting ALL.
5. Select forward, reverse or both strands for a similarity search.
6. Select search stringency from the pull-down menu in the panel. A higher number
corresponds to a higher stringency and a faster search.
7. Click on the Search button to begin. It takes a few seconds to a few minutes to finish,
based upon the speed of your PC, the size of the data set and the search stringency.
Note: The names of matched sequences will appear in the lower text box in order from
the highest score to the lowest score. The score system is used for comparing
homologues in each search and different from that used in Blast 2 (a third party
program). The stringency setting will affect the search score.
8. Click on the Page View button to see the details of the whole sequence alignments that
are kept in a single document file following each sequence search (Fig. 3).
9. Click on the Export to Align button to export all related sequences to the alignment form
for further alignment analysis (see Chapter 15 for details).
Note: The local sequence similarity search function is an alterative to the BLAST program
provided by NCBI, but won’t replace the latter. Since the search capability in
GeneLooper is largely dependent upon the features of a PC, such as its processing speed
and memory, it will be able to search all mRNA sequences from an organism. It can be
used to compare sequences from a gene family or your own collections by querying one
against the rest. The process is quick, and the data are very informative. It is not
suggested to analyze or search extremely long sequences like genomic DNA with the
current program.
57
GeneLooper User’s Manual
Figure 3A. Sample of the alignment in page view (part A)
Figure 3B. Sample of the alignment in page view (part B)
58
GeneLooper User’s Manual
Chapter 15
Primer Design
Objective: Design PCR primers from a nucleotide sequence.
A successful DNA amplification by PCR is partly dependent upon the pair of oligos
used. Optimized oligos can increase the yield of the amplified DNA and reduce the
background caused by non-specific reactions. The Oligo Design function in GeneLooper
3.0 uses a reliable algorithm to ensure selection of the best pair of oligos from defined
regions in a DNA sequence.
Features:
1. Easy to setup: all-in-one design interface.
2. Diagram with ORF information, sliding guidelines and pull-down menus for precisely
defining a region for the selection of an oligo.
3. Automatically detect internal restriction sites to guide enzyme site selection for addition
to the 5’ ends of the oligos.
4. One step addition of restriction site and protection bases.
5. Return multiple pairs of oligos in a spreadsheet.
Figure 1. A sample of the Primer Design interface
59
GeneLooper User’s Manual
Procedure:
1. Load a sequence to the sequence box using the browse button. You can also paste
sequence from the clipboard or import a sequence from the Sequence Viewing and
Editing form.
2. Define the boundaries of the forward oligo and reverse oligo. The program automatically
detects the ORF and displays it in the diagram. Use the two pairs of guidelines to set the
boundaries or directly set precise boundaries to a single base using the four pull-down
menus.
3. Set the annealing temperature for the oligos from the Tm pull-down menu.
4. As an option, you may select enzyme sites from the enzyme selection pull-down menu
(RE site) for forward or reverse oligos. These enzymes listed in the menu don’t have any
internal sites between the 5’ of the forward boundary and the 3’ end of the reverse
boundary and can be dynamically updated following any change of the two guidelines.
Add a few bases to the 5’ end of the restriction site to ensure a complete digestion of the
PCR product with the selected enzyme.
5. Select the number of oligo pairs to be designed on the Oligo Returned pull-down menu.
6. Click on the Design Oligo button to begin. It takes a few seconds for an oligo pair to be
displayed in the resulting spreadsheet. The best pairs are listed at the top of the table.
7. Clicking on the oligo sequence in the table will highlight the oligo sequence in the
sequence box so that you can verify the oligo sequence and examine the adjacent bases.
8. Highlight the oligos you want to copy and then click on the Copy Oligo button. The
copied oligos can be pasted to other document programs such as MS Excel.
9. Click on the Print Oligo button to print the spreadsheet containing the oligo information.
Notes:
1. Any change in the sequence will trigger an update of the ORF information and the
restriction enzymes in the pull-down menu.
2. The program uses an arbitrary scoring system to evaluate the oligos. The lower the
penalty score, the better the oligo.
3. There are sixteen pre-selected, frequently used enzymes for the 5’-addition.
BamH I/GGATCC
Sac I/GAGCTC
Bgl II/AGATCT
Sac II/CCGCGG
EcoR I/GAATTC
Sal I/GTCGAC
Hind III/AAGCTT
Sca I/AGTACT
Kpn I/GGTACC
Sma I/CCCGGG
Not I/GCGGCCGC
Spe I/ACTAGT
Pst I/CTGCAG
Xba I/TCTAGA
Pvu II/CAGCTG
Xho I/CTCGAG
60
GeneLooper User’s Manual
Chapter 16
Oligo Database Search
Objective: Search all your existing oligos based on a nucleotide sequence.
When studying multiple genes, it is not unusual to use many oligos. Managing the
oligo data you have accumulated can sometimes be a daunting task. GeneLooper 3.0
provides an easy and efficient solution for you to organize the oligos of your own or
even your organization and make them a searchable database. This function reduces
the time for looking an existing oligo and avoids redundant oligo orders.
Procedure:
2
1. Create a searchable oligo database.
GeneLooper 3.0 uses a simple data structure to store your oligo information. The oligos
allowed in the file is about 60,000. Basically it is a spreadsheet having the following three
columns (Fig. 1):
Column A is for the oligo names,
Column B is for the oligo sequences (5’ to 3’) and
Column C is for general information about the oligos such as owners and locations.
Figure 1. Sample of oligo database file
61
GeneLooper User’s Manual
Save the work sheet as a .csv file using the name “oligo_db.csv” in the same folder
as GeneLooper 3.0.exe (Fig. 2) which is usually in “C:\program file\GeneLooper\”
folder. Note that the name, the type and location of the oligo file are extremely critical
in order for the search to work. Please follow the instructions precisely. If a file
template with the same name is already in the folder, simply replace it with your oligo
file.
3
Figure 2. Save the file as “oligo_db.csv” using the Excel Save as dialog box.
2. Oligo Search:
A. Start GeneLooper 3.0 and then click on the Oligo Database Search button. The
Oligo Database Search form will appear (Fig. 3).
B. Use the Browse button to load the sequence, or paste the sequence to the sequence
box from the clipboard. Also, it can be imported from the Sequence Viewing and
Editing forms.
C. The program searches for an exact match between query sequence and oligos, but
can allow for mismatch at the 5’ end of an oligo in order to accommodate those
with a linker or adaptor sequences at the 5’ end. The default setting is for a whole
sequence match. Adjust the number of bases from the pull-down menu to
perform a 5’ end mismatch search.
D. Click on the Search Oligo button to start. It takes a few seconds to finish. The
matched oligo information is tabulated in a spreadsheet. For each oligo matched, a
localized line with its location label is shown in the diagram in the picture box.
Move the cursor to separate any overlapping labels.
E. Click on an oligo sequence to highlight the oligo sequence in the query sequence
box.
F. Click on the Print form, Print Oligo, Copy Oligo or Copy picture button to
print entire form, print spreadsheet, copy highlighted oligos or copy the diagram
in the picture box respectively.
62
GeneLooper User’s Manual
Figure 3. A sample of Primer Database Search interface:
63
GeneLooper User’s Manual
Chapter 17
ePCR
Objective: To check if a PCR design is correct and to generate the sequence of a PCR
amplicon
Procedure:
1. Click on ePCR button to turn the ePCR form (Figure 1). Load a sequence
through File Open or by pasting. The sequence will be in the sequence box.
2. Paste a forward primer sequence to the box next to “Paste forward Primer Seq”.
A short line is drawn above the long line (DNA template) if there is match
between the forward primer and the DNA template sequence. Otherwise no line
will be drawn and a popup message indicating no match between the two.
3. Paste a forward primer sequence to the box next to “Paste reverse Primer Seq”.
A short line is drawn below the long line (DNA template) if there is match
between the reverse primer and template sequence. Otherwise no line will be
drawn and a popup message indicating no match between the two.
4. If both match, click on RUN PCR button. A line indicating PCR amplicon will be
drawn below.
5. Click on Export PCR Seq button. The sequence will be exported to Sequence
view.
6. You can check the sequence and any restriction sites and adaptors you added in
the 5’ PCR primers.
Figure 1. Sample of ePCR Interface
64
GeneLooper User’s Manual
Chapter 19
Sequence Formatting
Objective: Format sequence for publication or presentation.
We often need to create a standard sequence format (Fig. 1) for publication or presentation
since it is one of the most informative ways to present an mRNA sequence. GeneLooper 3.0
provides a very flexible solution to generate this format.
Features:
1.
2.
3.
4.
Multiple options for selecting sequence to translate into protein.
Adjustable row width.
Single or triple letter code for amino acid residues.
Standard document editing environment for further editing.
Figure 1. A sample of working interface
65
GeneLooper User’s Manual
Procedure:
1. Start GeneLooper 3.0 and then click on the Sequence Journal Format button. The
formatting form will appear (Fig. 1).
2. Use the Open icon button to load the sequence, import it from the Sequence Viewing and
Editing form, or paste it directly from the clipboard.
3. Select an option to translate your mRNA sequence. See Chapter 4 for details about the
definitions of different ORFs.
4. Set the formatting parameters including font size, font name, number of amino acid
residues per row and single letter or triple letter code for representing the amino acid
residues.
5. Click on the Get format button to generate the format and click on the Undo button to
start again.
6. Modify the formatted sequence using the functions provided by the document editor. The
format can be saved and copied. Note that the added modifications (color, under line and
so on) may be lost when copying the file onto the clipboard. To transfer the sequence
with all modifications intact, save the file and re-open it using Microsoft Word.
66
GeneLooper User’s Manual
Chapter 20
Document Viewer
I. Document Viewer
The Document Viewer in GeneLooper 3.0 is a text editor for viewing and editing files
used for or created by GeneLooper 3.0.

It is recommended that the following files be opened by the Document Viewer:
Fasta files.
Q-fasta files.
Downloaded Genbank files (Entrez Pages).
Alignment reports created by the Multi-Sequence Similarity Search utility.
Saved alignment reports by the Sequence Alignment and Local Sequence Similarity
Search.
Formatted sequences saved in the Sequence Journal Formatting utility.


The following are supported operations:
Editing: Cut, Copy, Paste, Print, and Save.
Find: Find, Replace, Replace all. (The Find function is case sensitive.)





Figure 1. A sample of fasta file opened by the Document Viewer
67
GeneLooper User’s Manual
II. Data Viewer
The Data Viewer in GeneLooper 3.0 is designed for viewing the data created by
the high-throughput utilities in GeneLooper 3.0.
It is recommended that the following files be opened by the Document Viewer:
Utility
Appendix of Data file
Open Reading Frame Detection
Sequence Clustering
Multi-Sequence Blast
Restriction Site Search
Translation
Hydrophobic Domain detection
Batch Oligo Design
Entrez Information Extraction
_Lorf.txt (complete ORF), _Porf.txt (incomplete ORF)
_clust.txt
_mtch.txt
res_site.txt
_pep_info.txt
_tmd.txt
_oligo.txt
_entz_info.txt
The following are supported operations: Print, Copy, Sort.
Figure 1. A sample of data file opened by the Data Viewer
Note: An alternative to the Data Viewer is using MS Excel to open these data files. When using
MS Excel, select the delimited option and Comma as the delimiter.
68