Introduction to Plant Genome and Gene Structure

Introduction to
Plant Genome and
Gene Structure
Dr. Jonaliza L. Siangliw
Rice Gene Discovery Unit, BIOTEC
Defining genome
„
Coined by Hans Winkler (1920) as
“genom(e)” by joining gene and chromosome
„
Lederberg and McCray (2001) defines
Genome as the complete gene compliment or
the total DNA amount per haploid
chromosome set
RGDU at “Community of Practices”
CARDI June 18-21, 2007 (GCP_BIOTEC)
Brief history of genome size study in plants
(Estimates of DNA amounts)
„
Early studies are based on analysis of isolated nuclei or
cell suspension
„
Led to the use of the term C value (Swift 1950b)
„
These studies dealt only with relative DNA contents and did
not provide estimates of absolute DNA mass
„
The first estimate of the absolute amount of DNA in the
nuclear genome of a plant was done for Lilium species
„
Genome size – the mass (in picgrams, pg-1) of DNA per haploid
nucleus
Brief history of genome size study in plants
(Estimates of DNA amounts)
„
(A) Zingeria biebersteiniana
– a monocot species with
chromosome 2n = 4
„
(B) Voanioala gerardii – a
rainforest palm from
Madagascar with a
chromosome count of
2n = 600
MICHAEL D. BENNETT*
Proc. Natl. Acad. Sci. USA
Vol. 95, pp. 2011–2016, March 1998
Brief history of genome size study in plants
(Estimates of DNA amounts)
„
Examples of DNA
amounts and chromosome
sizes. (A) Brachyscome
dishrosomatica 2n = 4,
1c= 1,.1 pg (B)
Myriophyllum spicatum 2n
= 14, 1c = 0.3 pg (C)
Fritillaria sp. 2n = 14, 1c =
65 pg (D) Selaginella
kraussiana 2n = 40, 1c =
0.36 pg (E) Equisetum
variegatum 2n = 216, 1c =
30.4 pg
T. Ryan Gregory
The evolution of the genome
(2005)
Brief history of genome size study in plants
(Main areas of focus in early genome size studies)
„
Developing methods for estimating plant genome size
and testing and proving their accuracy.
„
Exploration of the ranges in genome size in different
groups and at various taxonomic levels.
„
Investigating genome size variation through the (a)
mechanism responsible, (b) rates of change, and (c)
evolutionary significance to resolve the so called Cvalue paradox.
Brief history of genome size study in plants
(Main areas of focus in early genome size studies)
„
Constancy and the origin of the “C-value”
„
„
„
DNA constancy hypothesis
Swift (1950a) – referred to classes of DNA as
„ Class I – being the common diploid value
„ Class II – as Class 1C value representing the haploid
DNA content.
C-value paradox – the DNA/cell does not correspond
to the total gene content of the organism
Brief history of genome size study in plants
(Main areas of focus in early genome size studies)
T. Ryan Gregory
Paleobiology, 30(2), 2004, pp. 179–202
Brief history of genome size study in plants
(Impact of molecular revolution on genome size research)
„
Molecular work on DNA sequences gave insight on the structure
and content of individual genomes but at the same time have
inhibitory effect on genome size research such as
„
„
„
Strong emphasis on DNA C values per se began to fade and in 1980s it
was almost impossible to obtain grant funding to estimate genome size
The revelation of repetitive DNA sequences believed to cause potential
changes in copy number led to reports of substantial intraspecific
variation (violating the rule of DNA constancy) such as those related to
developmental, environmental and geographical factors.
Led to the necessity of second wave of careful measurements proving
that intraspecific variation is due to technichal artifacts are challenges
posted to genome size researchers
(www.rbgkew.org.uk/cval/workshopreport.html)
Brief history of genome size study in plants
(Genome size studies in the post-genomic era)
„
Large scale genome sequencing program
„
„
Study of molecular basis of genome evolution in plants
Allow investigation and comparison of different taxa (ex.
subspecies indica and japonica)
Brief history of genome size study in plants
(Genome size studies in the post-genomic era)
„
Large scale genome
sequencing program
„ Allow investigation
and comparison of
different species
within families
(Oryza, Sorghum,
Zea)
Brief history of genome size study in plants
(Genome size studies in the post-genomic era)
„
„
Large scale genome
sequencing program
„ Allow investigation
and comparison of
difference between
families (Poaceae and
Brassicaceae)
Reveal the key
molecular mechanisms
involved in the gain
and/or loss of DNA
resulting in changes in
genome size
Patterns in plant genome size evolution
(The extent of variation in plant taxa)
How do plant genome sizes evolve?
(Sequences responsible for the range of genome sizes
encountered in plants)
„
Repetitive DNA in plants is
composed of transposable
elements (TEs)
„
Class I – RNA-mediated mode
of transposition
„ Retrotransposons –
characterized by long
terminal repeats (LTRs)
„ Retroposons – lacks
terminal repeats (non-LTR
retroelements) and use
reverse transcriptase to
transpose through an
RNA intermediate
How do plant genome sizes evolve?
(Sequences responsible for the range of genome sizes
encountered in plants)
How do plant genome sizes evolve?
(Sequences responsible for the range of genome sizes
encountered in plants)
„
Repetitive DNA in
plants is composed of
transposable elements
(TEs)
„
Class II – DNAmediated mode of
transposition
„ Helitrons
„ Mutator-like
elements
„ Miniature inverted
repeat transposable
elements (MITES)
How do plant genome sizes evolve?
(What triggers the spread of transposable elements?)
„
Transcriptional activation can be induced by
experimental manipulations of various biotic
and abiotic stresses like
Wounding, tissue culture and disease attack
„ Adaptation to water availability in Hordeum,
(Vicient et al.1999a) rice (Jiang et al. 2003)
„
How do plant genome sizes evolve?
(What triggers the spread of transposable elements?)
„
Polyploidization and
interspecific hybridization
may trigger TEs
amplification in Nicotiana
(Comai, 2000)
How do plant genome sizes evolve?
(Satellite DNA)
„
Satellite DNA
„
„
„
Tandemly arranged
repeats of identical or
similar sequences
Variable in size but the
most common
monomeric units are
150-180 bp and 320380 bp
Two smaller unit of
satellite DNA
„
„
Minisatellites (10-40
bp repeats)
Microsatellites (2-6
bp repeats)
How do plant genome sizes evolve?
(Genome size increase by polyploidy)
„
Polyploidy – results from
combining three or more
basic chromosome sets or
genomes in one nucleus
„ Prominent mode of
speciation
„ C-value and basic genome
size are not equivalent,
thus C-value must be
indicated as 1Cx-value to
indicate the basic genome
size (Greilhuber et al.
2005)
How do plant genome sizes evolve?
(Mechanisms of genome size decrease)
„
Unequal intrastrand homologous
recombination
„
„
Illegitimate recombination
„
„
Occurs between the long terminal
repeats of LTR-retrotransposons
that leads to the deletion of
internal DNA segment
Recombination that does not
require the participation of a recA
protein or large (>50 bp) stretches
of sequence homology
Loss of DNA during the repair of
double stranded breaks
„
Often accompanied by DNA
deletions
Intraspecific variation in genome size
(Intraspecific variation and speciation)
„
„
Speciation may occur without any change in C-value
and likewise, variation in DNA amount can also
precede reproductive isolation and morphological
diversification
Speciation was thought to depend mainly on changes in
informational genes. Comparative genomics revealed
constancy in this part of the genome and non-coding
sequences determine diversity and suggested to play
major role in plant speciation.
Intraspecific variation in genome size
(Intraspecific variation and speciation)
Intraspecific variation in genome size
(Intraspecific variation and speciation)
Methodology for estimating genome size in plants
(Complete genome sequencing)
„
„
Arabidopsis thaliana – dicot
plant that was sequenced
through the Arabidopsis
Genome Initiative in 1997
and whose complete
sequence was made public in
2000
The genome size was
estimated as 125 megabases
(Mb) based on the size of the
sequenced regions (115.4
Mb) plus the roughly 10 Mb
for the unsequenced
centromere.
From DNA sequence to gene discovery
DNA
DNA
„
DNA is a molecule which encodes genetic
information.
It is a long, coiled, double-stranded chain of
interlocking base-pairs called a double-helix.
„ There are four types of bases in DNA: A (adenine),
T (thymine), G (guanine), and C (cytosine).
„ The order of the bases in a DNA strand, called the
sequence, creates a code for information: the DNA
code 'ATC' has a different meaning than the code
'TCA,' and so on.
„
DNA Structure
DNA Structure
DNA Structure
DNA Structure
Central Dogma of Molecular Biology
Gene
„
A gene is a section of the DNA strand that carries the
instructions for a specific function.
„
„
„
For example, the 'globin' genes contain instructions for
making the hemoglobin protein, which is the protein which
allows our blood to carry oxygen throughout the body.
Humans have about 50,000 different genes, which work
together in complex ways to control much of what our
bodies do.
While we all have the same genes, there are different versions
of many genes, called alleles. For example, while most people
have genes which give them pigmented (coloured) eyes, there
are multiple alleles for specific eye colors. Each person has
particular combination of alleles for eye color, for hair color,
etc., which makes him or her genetically unique.
Gene structure
Eukaryotic Gene Expression
Eukaryotic Gene Complexity
„
Enhancers - A DNA
element that strongly
stimulates transcription
of a gene or genes.
Usually found upstream
from the genes they
influence.
Eukaryotic Gene Complexity
„
Promoters – a DNA
sequence to which RNA
polymerase binds prior
to initiation of
transcription. Usually
found just upstream
from the transcription
start site of a gene.
Eukaryotic Gene Complexity
„
5’ UTRs - A region of a gene which is transcribed into mRNA,
becoming the 5' end of the message, but which does not contain
protein coding sequence.
„ The 5'-untranslated region is the portion of the DNA starting from
the cap site and extending to the base just before the ATG
translation initiation codon. While not itself translated, this region
may have sequences which alter the translation efficiency of the
mRNA, or which affect the stability of the mRNA
Eukaryotic Gene Complexity
„
Introns - intervening
sequences in the gene
that are removed in the
formation of the
functional mRNA.
Usually includes noncoding sequence but
there are instances of
alternative processing
where sequences can be
both introns and exons.
Eukaryotic Gene Complexity
„
Exons - sequences in the
gene that are found in
the functional mRNA.
Includes coding
sequence but may also
include non-coding
sequence.
Eukaryotic Gene Complexity
„
3’ UTR - A region of the DNA
which is transcribed into mRNA
and becomes the 3' end or the
message, but which does not
contain protein coding sequence.
Everything between the stop codon
and the polyA tail is considered to
be 3' untranslated.
„ The 3' untranslated region may
affect the translation efficiency
of the mRNA or the stability of
the mRNA. It also has
sequences which are required for
the addition of the poly(A) tail
to the message (including one
known as the "hexanucleotide",
AAUAAA).
Mutation
„
Mutation - A mutation is a permanent change in
the DNA sequence of a gene. Mutations in a
gene's DNA sequence can alter the amino acid
sequence of the protein encoded by the gene.
Nature of mutations
„
Substitution mutations - convert one type of base pair
into another. G-C to A-T and A-T to G-C changes are
referred to as transition mutations (replacement of a
purine to pyrimidine base pair by a purine to
pyrimidebase pair). G-C to C-G, G-C to T-A, A-T to TA, and A-T to C-G are called transversions
(replacement of a purine-pyrimidine base pair by a
pyrimidinepurine base pair).
„
„
Although transitions are more common than transversions,
both kinds of mutations occur as a consequence of
replication errors, both can result from chemical damage to
DNA, and both have been implicated as causative factors in
inherited genetic disease and cancer.
Single nucleotide changes can change the codon to that of
another amino acid, thus altering the protein. In addition,
such changes can also create a stop codon
Nature of mutations
Nature of mutations
„
Small
insertions/deletions comprise a second
relatively common class
of mutation.
„
Genetic changes of this
sort involve insertion or
loss of a small number of
contiguous base pairs (one
to several hundred).
Nature of mutations
Nature of mutations
Nature of mutations