Creation of a new versatile database for linking

Creation of a new versatile database for linking molecular and
phenotypic information in perennial crops:
The HiDRAS ‘AppleBreed Database’
A. Antofie1*, M. Lateur1, R. Oger1, A. Patocchi2, C.E. Durel3, W.E. Van de Weg4
*Corresponding author: [email protected]
This work is part of the HiDRAS (High-Quality Disease Resistant Apples for a Sustainable Agriculture) EU project. HiDRAS aims to identify genetic factors controlling quality and disease resistance of
apple fruit in order to support breeders in the raise of new cultivars that meet the consumer's preferences. Website: http://www.hidras.unimi.it/
Introduction
This database is specially designed to support genetic studies in perennial crops, especially studies on marker-trait associations, candidate gene validation and
allele mining. It takes account for particularities of perennials like apple: (1) vegetative propagated, allowing the same genotype to be present at various
localities, (2) long juvenile phase, (3) multi annual crop, (4) long economical life time, and (5) simultaneous availability of successive generations in the same
plot of breeding programs, experimental stations and gene banks. These aims and particularities determined the general structure of the database. It stores
and easily gives access to huge numbers of genotypic and phenotypic data coming from multiple pedigreed plant populations (progenies, cultivars, breeding
selections). It insures a high traceability of the data flow over generations and years, it includes validation procedures for phenotypic and marker data to
certify data quality , and it presents basic statistical overviews. Finally it can be loaded and explored by the web.
Database design
Web interface and applications
Figure 1 shows the five main structures of the database. Genotype is the
core structure which has two sub-structures at the physical level: Tree and
DNA-sample, and one sub-structure at the meta level: Pedigree. This
approach allows any link between phenotypic and molecular data.
Tree and DNA-sample hold the identity descriptors for each individual tree and
DNA sample including an unique accession code for each Tree and DNA
sample and a genotype name [cultivar/selection/seedling]. Tree also includes
descriptors for location (institute, plot, row, position in row), and origin (origin of
bud wood; year of sowing, planting and crafting; rootstock). DNA-sample also
includes the origin of each sample (tree from which the sample was derived,
date of isolation, position on microtitre plates during successive steps, person
who performed the isolation etc).
Pedigree describes the pedigree of each accession up to the founder level,
thus allowing ‘Pedigree Genotyping’, a new pedigree-based approach of QTL
identification & allele mining in pedigreed populations (Van de Weg et al., 2004).
Molecular data contains data and descriptors on expression profiles and PCR
markers including scores for each genotype, marker descriptors (primers,
locus, map position, developer, DNA sample, place of sample on micro titer
plates for PCR & sequencing reaction, tester).
Phenotype data contains the phenotypic assessments for each individual tree
and various descriptors: sample, date, equipment used, observer etc.
Locality describes general characteristics of each locality, including
temperatures, rainfall, soil composition, direction and slope of plot, altitude.
Organization describes the participating partners (people, localities).
Users can easily have an overview on the data concerning a specific genotype
or series of genotypes thanks to a module for data extraction. This module
allows users to dynamically build their own queries (see Figure 3).
Furthermore, other interfaces allow comparisons among results coming from
different localities during the same time period (see Figure 4) and for the
same cultivar, or among results from different time periods and a single
locality.
These web interfaces were developed in PHP language.
The AppleBreed database uses a centralized MySQL database management
system under a Linux environment.
Figure 3 - Data extraction module used to build dynamically queries
Genotype identifier
Genotype identifier
Genotype_identifier
Location of each
genotype is registered
Location_identifier
GENOTYPE
Physical
Meta
(TREE, DNA-sample)
(PEDIGREE,
SYNONYMS)
Genotype_identifier
Genotype_identifier
MOLECULAR DATA
PHENOTYPE DATA
(MAPS, ALLELES,
MOLECULAR MARKERS)
(FRUIT QUALITY
DISEASE RESISTANCE)
Locations are supervised
by an institution
Location_identifier
Organisation_identifier
Organisation_identifier
ORGANIZATION DATA
LOCALITY DATA
(LABORATORY,
INSTITUTION)
(SITE, TRIAL PLOT)
APPLEBREED
database
Figure 1 - Conceptual Data Model of AppleBreed DB.
Figure 4 - Graphical display used to compare results from different locations (partners).
Data management
Figure 2 illustrates the data management procedure for submission and
validation of data. Firstly, users send their data to the database manager
using specific, standardized templates. The database manager checks the
data for consistency by means of special software. Suspicious data are sent
back to the user for validation. After re-submission users can visualise and
upload the validated data through a web interface.
Locality
Location
Location
data
User 1
…
Molecular
Location
Location
data
Genotype
data
The AppleBreed database model provides a unique tool for geneticists and
breeders working on perennial crops like apple and aiming to combine
phenotypic and molecular marker data, and supports pedigree based analysis
of the data including ‘Pedigree Genotyping’ (Van de Weg et al., 2004).
This database may be useful in intercontinental collaborations on markertrait associations, validation of candidate genes and functional allelic
diversity. It can be directly applied to apple, while its structure forms a firm
foundation on which other users can build their own applications.
NO
SERVER
Genotype
data
DB
manager
validation
Conclusions
Validated
data
YES
APPLEBREED
database
References
USER
Web
Server
Internet
Results
visualisation
interface
Van de Weg, W.E., Voorrips, R.E., Finkers, R., Kodde, L.P., Jansen, J. and Bink, M.C.A.M. 2004. PEDIGREE
GENOTYPING: A NEW PEDIGREE-BASED APPROACH OF QTL IDENTIFICATION AND ALLELE MINING. Acta Hort. (ISHS)
663:45-50
Data
extraction
interface
Disclaimer "This project is carried out with the financial support from the Commission of the
European Communities (contract N°
N° QLK5QLK5-CTCT-20022002-01492), DirectorateDirectorate-General Research - Quality of
Life and Management of Living Resources Programme“
Programme“ "This poster does not necessarily reflect the
Commission's views and in no way anticipates its future policy in
in this area. Its content is the sole
responsibility of the publishers."
User n
Validation & transfer data
Exploration
data
Data file production
Figure 2 - Schematic overview of the data flow during submission, validation and visualisation/extraction
1
- Walloon Agricultural Research Centre
CRA-W
Gembloux (Belgium)
2
- Plant Pathology, Institute of Integrative
Biology (IBZ), ETH Zurich
Zurich (Switzerland)
3
- National Institute for Agricultural
Research INRA
Angers (France)
4-
Plant Researcher International
PRI
Wageningen (The Netherlands)