BIOINFORMATICS APPLICATIONS NOTE Vol. 25 no. 4 2009, pages 548–549 doi:10.1093/bioinformatics/btp012 Gene expression The Flannotator—a gene and protein expression annotation tool for Drosophila melanogaster E. Ryder1,∗ , H. Spriggs1 , E. Drummond1 , D. St Johnston2 and S. Russell1 1 Department of Genetics, University of Cambridge, Cambridge CB2 3EH and 2 The Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK Received and revised on November 25, 2008; accepted on January 1, 2009 Advance Access publication January 6, 2009 Associate Editor: Joaquin Dopazo ABSTRACT Summary: Understanding developmental processes and building towards integrative systems biology require detailed knowledge of the spatio-temporal expression of genes and proteins. We have developed a software package for collecting, storing and searching the annotation of protein or gene expression patterns in Drosophila melanogaster. Using standard Drosophila anatomy and Gene Ontologies, the system can readily capture expression patterns at any stage of development and in all recognized tissue types as well as details of sub-cellular localization. The web-based system allows multiple groups to work in collaboration and share images and annotation. Availability: http://www.flannotator.org.uk/ Contact: [email protected] 1 INTRODUCTION The ability to generate in vivo tagged proteins has tremendous potential for furthering our understanding of developmental processes by allowing the characterization of sub-cellular protein localization and facilitating the isolation of multi-protein complexes. We have recently embarked upon a large-scale genetic screen, using a transposon-based strategy, to generate and characterize YFPtagged (yellow fluorescent protein) protein trap lines in the model organism Drosophila melanogaster. Part of this project involves a collaboration to annotate the expression patterns of the new lines during fly development and includes over 30 UK-based laboratories. To aid in the analysis of YFP-trap lines, we have written software, The Flannotator, that facilitates the annotation of protein trap expression during all stages of development and in all tissue types (including sub-cellular location and spatial descriptors) using standard FlyBase Drosophila anatomy (http://www.flybase.org) and Gene Ontology (GO) (The Gene Ontology Consortium, 2000). Our web-based system allows multiple groups to work in collaboration and share images and annotation easily, whilst still protecting the original data. The Flannotator is written in a mixture of php and javascript and is available as a VMware image (http://www.vmware.com) that includes the Ubuntu Linux 8.04 operating system (http://www.ubuntu.com), web server, database and all software required. Such a design considerably simplifies installation and mitigates against potential ∗ To whom correspondence should be addressed. 548 incompatibilities between linux versions or distributions. Although we designed Flannotator using Drosophila ontologies, the system can be readily modified to utilize anatomy ontologies from other species. Both full and demonstration versions are freely available from http://www.flannotator.org.uk. 2 METHODS 2.1 The Flannotator web engine Due to the complexity of the Flannotator, the system requires a modern web browser for full functionality: FireFox (all platforms), Google Chrome and Internet Explorer 6+ support full access to the annotation interface but older browsers can be used for general site viewing and searching. The Flannotator web site is split into various sections that capture experiment descriptions, allow image uploading and annotation and facilitate browsing or searching for specific genes or expression patterns. Individual users are given unique usernames and passwords that provide access to the site at different levels (annotators, administrators, etc.). Access to specific data fields such as mapping data or other users’ annotations can be restricted if required and experiments are defined so that one user cannot edit the annotations of another. 2.2 Stock and sequence management Our screen generates new protein trap insertions at random and consequently each new line must be mapped to identify the identity of the trapped gene. The Flannotator includes a full stock management system with a recorded history to easily track changes. Mapping data from DNA sequencing is imported and the software automatically determines whether an insertion is within a gene and in the correct frame to produce a functional YFP fusion based on the genome release loaded into the system. The Gbrowse genome viewer (Stein et al., 2002) is included to provide a graphical overview of the transposon insertion sites and gene models. The Flannotator currently includes D.melanogaster release 5.3 annotation (FlyBase) for determining gene fusions, although different versions or genomes from other species can also be loaded if required. 2.3 Defining experiments and uploading images An annotation experiment is defined by the user ID, tissue category, method of sample preparation and stock name. After defining a few baseline parameters (e.g. developmental stage being annotated and microscope used), images are uploaded to the stage chooser page, which consists of an unsorted bin and other bins based on the particular developmental stages selected during the setup phase. Initially images are stored in the unsorted bin, and are drag and dropped into the correct stage bins where they can be viewed and processed in the main annotation window. © The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] The Flannotator Fig. 1. Annotation window. 2.4 Annotation using controlled ontologies Expression patterns are described using fully controlled vocabularies from the FlyBase anatomy, GO sub-cellular location ontology and defined FlyBase descriptors. The current FlyBase anatomy ontology consists of over 6000 terms based on a tree structure. While such a comprehensive ontology is essential to capture a complete description of expression patterns, the majority of ontology terms will not be relevant to an individual annotator focused on a particular tissue or developmental stage. To facilitate the user’s navigation through this complex ontology, terms are placed into ‘tissue type’ groups that are themselves higher level terms (e.g. central nervous system). Selecting a tissue group populates a menu with only the terms relevant for a particular annotation. Individuals can select a subset of ontology terms for a particular annotation task, with information about each term (place in the ontology tree, descriptions, etc.) easily accessed. A reverse lookup system is available if a user knows the terms they want to use but are unsure which tissue group(s) it belongs to. Custom restricted lists can be constructed and saved by individuals, for example including particular developmental stages or types of microscopy employed. There is a similar problem with the large number of GO Cellular_Component ontology terms since, in this case, many terms are irrelevant to Drosophila biology and for this particular project we use the GO_slim ontology subset to provide a useful core set of relevant terms. Of course, if required, a full ontology may be used. During the annotation process (Fig. 1), all terms are selected using menus to prevent problems associated with free text entry, and tree construction is restricted to prevent impossible annotations; for example sub-cellular terms which contain anatomy terms as children. The NOT qualifier is also available to allow explicit observations of where there is no protein expression. Specific comments or observations about experiments or individual images are allowed as free text entry, and moveable arrows and floating comments boxes can be used to highlight areas of interest in the image. To aid in complex annotations, sub-trees (e.g. sub-cellular terms and descriptors) from one anatomy term can be copied and pasted to another greatly reducing the amount of time needed. Terms that are used frequently are put into a ‘top 10’ list making them immediately available for future annotations. Fig. 2. Section of an annotation report showing keyword cloud. complex queries can also be constructed using a tree system similar to the annotation process, based on AND and OR operators. Controlled vocabulary terms further down the tree from the original search term are automatically included in the search (e.g. searching for egg chamber will also include oocyte, follicle cell and nurse cell). Sequencing and annotation data can be exported in XML format for downstream processing or for use in other pipelines. 3 RESULTS The FlyProt database currently consists of over 10 000 pieces of annotation and over 11 000 images from more than 25 annotators on approximately 300 protein trap lines, displaying the robustness and expandability of the Flannotator system for large projects. ACKNOWLEDGEMENTS We would like to acknowledge all of the members of the FlyProt team; John Roote, Dr Nick Lowe, Dr Kathryn Lilley, Dr Jo Rees, Ingrid Wesley, Laura Harris, Jane Webster, Glynnis Johnson, Pam Fletcher, Svenja Hester and Julie Howard. Also our annotators, in particular Helen White-Cooper who helped drive the Flannotator through its early versions. Images reproduced with the kind permission of Roger Guy Phillips, Andrea Brand, Helen WhiteCooper, Alex Gould and Andrew Bailey. Funding: Wellcome Trust as part of the FlyProt project (#076739). 2.5 Browsing and querying expression annotation Summarized information about each protein trap insertion is displayed on the stock report page, which consists of the genetic and sequence data along with images and annotations from all users (Fig. 2). To give a summary view of the collected annotation data, annotation keyword clouds based on simple anatomy term analysis are provided with links to lines showing similar clouds. To facilitate subsequent data mining, various query tools have been developed. In addition to simple gene queries, Conflict of Interest: none declared. REFERENCES Stein,L.D. et al. (2002) The generic genome browser: a building block for a model organism system database. Genome Res., 12, 1599–1610. The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nat. Genet., 25, 25–29. 549
© Copyright 2026 Paperzz