Plant Functional Genomics (at PSB) CATMA, CAGE, AGRIKOLA

Plant Genomics and
Mathematical modelling:
a recipe for Systems Biology
Martin Kuiper
Computational Biology Division
Department of Plant Systems Biology
VIB/UGent
http://www.psb.ugent.be/cbd
“to exploit the revolution in plant genomics by
understanding the function of all genes of a reference
species within their cellular, organismal and evolutionary
context by the year 2010.”
Arabidopsis thaliana
Nuclear genome: 125 Mb
c. 29,000 genes
20% experimental data about function
48% predictable function
32% unknown function
Need to:
- speed up gene function discovery
- conduct genome-scale analyses
- develop tools and resources
 Study gene network systems
rather than single genes
Why do we need Systems Biology?

We are pretty good in identifying the ‘parts’ of an organism:
– Genomics, Functional Genomics – genes, gene products

An organism is more than the sum of its parts: It is not the
primary sequence that gives rise to biological forms and
functions, but the dynamical behaviour of these parts.

We need to record the dynamical behaviour of the parts

We need to do this by systematically perturbing a biological
system, and recording characteristic changes of the parts

Mathematical modelling should reconcile an in silico gene
interaction model with the observed dynamics
Understanding the dynamics of a biological system
Plant Systems Biology
Biology
(100)
Computational
Systems
Biology
(20)
Biology
(10)
Functional
Genomics
Bioinformatics
(30)
Topics:
 Plant
Functional Genomics (at PSB)
– CATMA, CAGE, AGRIKOLA
 Modelling:
– Analysis of ‘Compendium’ data
– SIM-plex mathematical modeller
CATMA
Complete Arabidopsis Transcriptome MicroArray
Goal: Construction of a collection of Gene- specific Sequence
Tags (GSTs) representing most Arabidopsis genes
Key resource for large-scale
gene function studies
• Microarray transcript profiling
• RNA interference
Both are based on sequence-specific hybridisation
between complementary nucleic acid strands.
robust, versatile, shareable, open
CATMA covers a segment of
clone-based Functional Genomics
ATG
STOP
Promoter
ORF
GST
Reporter fusion
Transactivation
Molecular interaction
ChIP-chip
ORFeome
Protein interaction
Fluorescence tagging
HTP biochemistry
Activation screening
Complementation
Transcript profiling
RNAi-based
gene silencing
CATMA
Gene Specific Tag (GST) Design
BLASTn
primer3
SPADS
3’
5’
gene models:
• Eugene
• TIGR3.0, 5.0
GST collection today: 24,576
On CATMA v2 array: 22,366
v3 under construction
Currently 6000 new GSTs
Gene Specific Tag
• 150-500 bp
• < 70% identity with
any other sequence
http://www.catma.org
Thareau et al (2003) Bioinformatics 19: 2191-2198
CATMA Benchmarking
Dose-response curves
CAGE
Compendium of Arabidopsis Gene Expression
EU FP5 Demonstration Project:
 Started: 1 November 2002
Aims:

Exploit CATMA v2, v3 arrays for Arabidopsis
transcriptome analysis

Process a total of 2000 samples on 4000 arrays



Implement common standards for sample growth and
preparation, datarecording and processing, across
laboratories
Deliver a prototype Compendium reference database
in ArrayExpress
Supplement Compendium data with precomputed
results (gene-specific significance, clustering results,
etc)
CAGE Standards
samples
Large redundancy in samples
Types of samples
Ecotypes:
Stress:
Mutants:
Research:
Total:
560
378
482
580
2000
28%
19%
24%
29%
Today:
~ 30% done
(hybridised,
pre-processed
and uploaded)
http://www.cagecompendium.org
CATMA Consortium
Complete Arabidopsis Transcriptome MicroArray
Department of Plant Systems Biology
Ghent University - VIB, Belgium
Pierre Hilson, Pierre
Unité de Recherche en Génomique Végétale (Génoplante)
INRA/CNRS - Evry, France
Jean-Pierre Renou
Michel Caboche
VIB Microarray Facility
Leuven, Belgium
Paul Van Hummelen
Max Planck Institut für Moleculare Genetik (GABI)
Berlin, Germany
Wilfried Nietfeld
Hans Lehrach
Genomic Arabidopsis Resource Network (GARNET)
United Kingdom
Jim Beynon, Mark
Crow, Martin Trick
NWO Program “Functional Genomics of A. thaliana”
University of Utrecht - The Netherlands
Peter Weisbeek
Microarray Core Facility
University of Lausanne - Switzerland
Philippe Reymond
Ed Farmer
Departamento de Genetica Molecular de Planta
Centro Nacional de Biotecnologica - Madrid, Spain
Javier Paz-Ares
Umeå Plant Science Center
Umeå – Sweden
Rishi Bhalerao
Goran Sandberg
http://www.catma.org
Rouzé, Marc Zabeau
AGRIKOLA
Arabidopsis genomic RNAi knock-out line analysis
Introns
GST
Constitutive
(35S)
inducible
GST
at least 20,000 genes
Transform
Arabidopsis with 4,000
of these plasmids
http://www.agrikola.org
Targeted gene silencing using RNAi
AA
AAA
AAA
AAA
AA A
only a few
transformants per
gene required
AA
A
AAAAA
AAAAA
A
AAAAA
AAAAA
AAAAAAAAAAA
AA A
A AA
RNAi
RNAi
AAAAAAAAAAA
hpRNA
phenotypes can easily be
studied in different
ecotypes/genotypes
plants with a range of
phenotypes can be obtained
silencing of
essential genes can be
studied using conditional
promoters
Preliminary results
Over 20,000 hairpin RNA expression vectors were
produced via Gateway (Invitrogen) recombinational
cloning technology
pAGRIKOLA/GST-induced phenotypes:
 can copy known knockout mutants
 can be obtained for essential genes
 can give insight into the functions of
unstudied genes
Hilson et al (2004) Genome Research 14, 2176-2189
Magdalena Weingartner, Karin Köhl, Melanie Lück,
Thomas Altmann; Universität Potsdam, Institut für
Biochemie und Biologie, -Genetik-, c/o Max-PlanckInstitut
für
molekulare
Pflanzenphysiologie,
Am
Mühlenberg 1, 14476 Golm, Germany
Rebecca De Clercq, Ryan Whitford, Mansour
Karimi, Caroline Buysschaert, Rudy Vanderhaeghen
, Raimundo Villarroel, Pierre Hilson; Department of
Plant Systems Biology, VIB, Ghent, Belgium
Alexandra Tabrett, Jennie Rowley, Sharon Hall, Jim
Beynon; Warwick HRI, Wellesbourne, Warwick CV359EF,
UK
Vasil Chardakov, Wendy Byrne, Mark Bennet,
Murray Grant; Department of Agricultural Science,
Imperial College London, Wye Campus, Ashford TN25
5AH, UK
Andéol Falcon de Longevialle, Alexandra Avon,
Beate Hoffmann, Céline Léon, Anne Marmagne,
Fanny Marquer, Claire Lurin, Ian Small; UMR
Génomique Végétale (INRA/CNRS/UEVE), Evry, France
Antonio Leyva, Maria Dolores Segura, Yolanda
Fernandez, Javier Paz-Ares; Department of Plant
Molecular Genetics, Centro Nacional de Biotecnología,
28049-Madrid, Spain
With special thanks to: the CATMA consortium; Chris
Helliwell, Peter Waterhouse (CSIRO, Canberra); Ian
Moore (Oxford University)
AGRIKOLA is funded by the FP5 grant QLRT-2001-01741
The Cycle of Systems Biology
+ Questions
Top-down and bottom-up modelling
top-down
bottom-up
Biological Process
Genome-scale
functional
genomics data
Predictive
mathematical
model
Statistics
Mining
Knowledge
Mathematics
Gene network components
Computational Biology

Top-down modelling

Bottom-up modelling
Genes
Yeast Microarray Data Compendium
Experiments
Combinatorial statistic
experiments
Gene A
Gene B
Discretise : up/down/undecided
(based on ratios or p-values)
Gene A
Gene B
similarity
Similarity between profiles can be measured either by Pearson
correlation coefficient or considered a combinatorial problem:
What is the chance that partial identity between two patterns
occurs by chance?
p-values
Clustering strategy
Genes
• correlation over subset of conditions
• p-values
• overlapping clusters
• networks, hubs
• natural visualisation
Gene
profiles
Comb. p-value (corrected) < 0.01
graph based clustering
GO labeling & visualization
CS - responsive genes
Response to stimulus,
mating
ergosterol
biosynthesis
cell wall
biosynthesis
BiNGO
Maere et al., 2005
vitamin
metabolism
carbohydrate
metabolism
amino acid metabolism
(ion) transport
Some examples

Top-down modelling

Bottom-up modelling
http://www.psb.ugent.be/cbd/papers/sim-plex/
Mathematical model
Approximation: gene activation is simplified
to a step function
gene
activation
Piecewise Linear
Differential Equation (PLDE)
= summation of
step-ups & step-downs
(plus 1 degradation term)
q
activation threshold
activator
amount
...
A
B
B
Instead of
differential
equations,
use SIM-plex ’s
easy if-then
statements.
C
P
www.psb.ugent.be/cbd
/papers/sim-plex
Mathematical model of the cell
division cycle of fission yeast
Novak, Pataki, Ciliberto, Tyson (2000), Chaos
KRP2 : transition of mito. to endo.
Show-case:in study of KRP2 involvement in
transition from mitotic division to endocycle.
CDKB1;1
activity
KRP2
protein
P
KRP2
protein
CDKA;1
activity
mito
Mitotic division
Wild-type Arabidopsis
CDKB1;1
KRP2
CDKA;1
fixedcomp CDKB11 0 0, 11 0, 13 10, 15 0,
31.1 0, 33 9, 34.9 0,
51.3 0, 53 7, 54.7 0
comp KRP2
6 0.02
comp CDKA1 14
comp KRP2ph
timepoints 0 to 80
if true then KRP2 0.5
if CDKB11 > 4 then transform KRP2 to KRP2ph 3
if true then CDKA1 1
if KRP2 > 8 then CDKA1 -0.9
Dominant negative CDKB1;1
CDKB1;1
KRP2
CDKA;1
KRP2 overexpression
CDKB1;1
KRP2
Verkest et al., Plant Cell 17 2005
CDKA;1
Conclusions:



Functional Genomics data and resources
are essential for systems biology
Information extraction and integration
needs to be further facilitated
Mathematical modelling doesn’t have to be
rigorously accurate, it is already great if it
can extend the capability to hypothesize
Acknowledgements:

Computational Biology Division:
Steven Maere
Steven Vercruysse
Gert Sclep

Functional Genomics:
Pierre Hilson

Cell Cycle:
Lieven De Veylder, Dirk Inze

Leaf Growth and Development:
Gerrit Beemster

ESAT – KU Leuven:
Joke Allemeersch, Steffen Durinck