Idiographica: a general-purpose web application

BIOINFORMATICS APPLICATIONS NOTE
Vol. 23 no. 21 2007, pages 2945–2946
doi:10.1093/bioinformatics/btm455
Genome analysis
Idiographica: a general-purpose web application to build
idiograms on-demand for human, mouse and rat
Taishin Kin1,* and Yukiteru Ono2
1
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology
(AIST), 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan and 2Information and Mathematical Science Laboratory, Inc.,
1-5-21 Meikei Building, Otsuka, Bunkyo-ku, Tokyo 112-0012, Japan
Received on May 22, 2007; revised on August 6, 2007; accepted on August 29, 2007
Advance Access publication September 24, 2007
Associate Editor: Chris Stoeckert
ABSTRACT
2
Summary: We have launched a web server, which serves as
a general-purpose idiogram rendering service, and allows users to
generate high-quality idiograms with custom annotation according
to their own genome-wide mapping/annotation data through an
easy-to-use interface. The generated idiograms are suitable not only
for visualizing summaries of genome-wide analysis but also for many
types of presentation material including web pages, conference
posters, oral presentations, etc.
Availability: Idiographica is freely available at http://www.ncrna.org/
idiographica/
Contact: [email protected]
The rendering convention for chromosomes is designed after
the scheme used in the UCSC Genome Browser (Kuhn et al.,
2007), where telomeres and their proximal regions are rendered
as curved shapes and centromeres are rendered as wedged
shapes. The surface of a chromosome can be mildly shaded
according to the user’s preference. Chromosomes are aligned
from left to right. G-band annotation data and centromere/
telomere information for human, mouse and rat are obtained
from the Genome Browser database. Idiographica renders
chromosomes to fit variable page sizes. For the current web
server, the largest admissible page size is B0 (1456 1030 mm)
at fixed resolution of 200 dpi where the pixel size of the page is
11 464 8110 and the rendering size of human chromosome 1
(245 522 847 bp) is 235 6114 which means 40 kb is represented as a single pixel. At this resolution, the whole human
genome (3 076 781 887 bp for hg17) is represented with 76 920
total pixels. The maximum number of annotation is limited to
1 annotation per 4 pixels or 1 annotation per 160 kb. Therefore,
the maximum number of annotations for the entire human
genome is 17 764. We use the Cairo graphic library (http://
cairographics.org/) for our rendering engine. The rendering
time takes up to 3.5 min for B0, human, and maximum amount
of mapping information with known gene density background.
1
INTRODUCTION
An idiogram is a diagram of the chromosomes showing
varieties of cytogenetic bands such as C (centromere),
G (Giemsa), R (reverse), Q (quinacrine) and N (nucleus)
bands. However, idiograms are not limited to representing
these cytogenetic bands. We find idiograms used in many
journal articles on a regular basis and especially in web
interfaces to visualize varieties of genome-wide information
such as genomic distributions of genes and their associated
elements (Hubbard et al., 2007; Imanishi et al., 2004; Kuhn
et al., 2007; Wheeler et al., 2007). As these examples indicate,
application of idiograms has diversified beyond its origin.
In the era of genome-wide analysis, an effective method to
visualize genome-wide information is needed for daily research
activity. Idiograms are an effective method to fulfill the need.
However, there is neither software nor a web server to allow
users to build their custom idiograms. Therefore, we developed
Idiographica—a web server that allows users to build customized high-quality idiograms from their own data by allowing
users to upload a description file. The generated idiogram
is a high-quality image that is suitable for poster presentation, projector screen presentation and journal publication.
A user can use generated idiograms without any obligation or
restriction.
*To whom correspondence should be addressed.
3
RENDERING SCHEME
IDIOGRAPHICA SERVER
The Idiographica server is available at http://www.ncrna.org/
idiographica/. A user should fill out a web form presented
on the Idiographica page then, click the submit button to send
a request to the server. A user is required to supply an
email address. This is the only mandatory item to generate an
idiogram. The other optional items include Species, Chromosome, Background, Annotation and Figure Configuration, which
each have default values. Therefore, no additional operation is
needed to generate a simple idiogram. A user can easily utilize
these options to enrich their custom idiogram.
Details of the options follow. Species is an option to choose
the organism that the custom idiogram is based on. Currently,
human (hg17 and hg18), mouse (mm8) and rat (rn4) are available. The Chromosome option allows a user to choose which
ß The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
2945
T.Kin et al.
Fig. 1. A sample idiogram generated with our Idiographica server, which shows the genome-wide distribution of G-protein coupled receptors
in the human genome. The data are imported from the SEVENS database (Ono et al., 2005).
chromosomes to render. Background is an option to select
a band type to render, which includes G-band, GC content,
repeat element density, known gene density, mRNA density
and EST density. The sources of these data are several
database tables of UCSC Genome Browser. G-band data is
from the cytoBand table. Repeat element density represents the
number of repeat elements (Repetitive Sequence Region track)
per 1 Mb. Two types of densities are defined for known genes,
mRNAs and ESTs. The first is defined as the number of items
per 1 Mb. The second is defined as the ratio of genomic bases
belonging to known genes, mRNAs or ESTs per 1 Mb. The
data sources for known genes, mRNAs, ESTs are the Known
Genes track, Human/Mouse/Rat mRNAs track and Human/
Mouse/Rat ESTs track, respectively. If a user needs to put their
own genomic annotation information onto the idiogram, he/she
can upload a description file—a tab-delimited text file—to the
Annotation field. The description file can contain information
for a title: an arbitrary string that appears on top of the
idiogram, a legend: a text to relate a color/symbol to a category
and mapping information: a set of lines to relate a genomic
position to an annotation with cosmetic preferences such
as font size and text color. The description file should follow
a simple format that is described at the website. Figure
configuration specifies the size, format, orientation and annotation of the generated image where the size option ranges from
B5 (smallest) to B0 (largest), the format option allows PNG or
PDF, the orientation option specifies which page orientation to
use (vertical or horizontal), the annotation option turns on
or off the visibility of mapping information name labels and the
3D shading option turns on or off the visual cosmetics to render
chromosomes.
2946
The Idiographica server sends a request to its job queue as
soon as the request is submitted by a remote user. After the
request is processed and an idiogram is generated, the server
sends the user an email in order to notify completion of
the request. The email presents an URL to access the custom
idiogram on the server. The generated idiogram is scheduled
to be erased 24 h after its creation. Therefore, a user needs
to download the idiogram to his/her local computer before its
deletion. A sample idiogram generated with the Idiographica
server is shown in Figure 1.
ACKNOWLEDGEMENTS
We thank Dr Martin Frith for his generous help to improve our
manuscript. This work is partially supported by the Functional
RNA Project funded by New Energy and Industrial Technology Development Organization (NEDO).
Conflict of Interest: none declared.
REFERENCES
Hubbard,T.J.P. et al. (2007) Ensembl 2007. Nucleic Acids Res., 35, D610–D617.
Imanishi,T. et al. (2004) Integrative annotation of 21,037 human genes validated
by full-length cDNA Clones. PLoS Biol., 2, 856–875.
Kuhn,R.M. et al. (2007) The UCSC genome browser database: update 2007.
Nucleic Acids Res., 25, D668–D673.
Ono,Y. et al. (2005) Automatic gene collection system for genome-scale overview
of G-protein coupled receptors in eukaryotes. Gene, 30, 63–73.
Wheeler,D.L. et al. (2007) Database resources of the national center for
biotechnology information. Nucleic Acids Res., 35, D5–D12.