DNA microarrays for studies of higher plants and other

trends in plant science
update
DNA microarrays
for studies of
higher plants
and other
photosynthetic
organisms
In the four years since the complete sequence
of a genome was first reported1, there has been
a dramatic increase in DNA sequence information for plants. Notably, the Synechocystis
PCC6803 genome has been sequenced2; a
second photosynthetic prokaryote genome is
.40% complete (see Box 1); over one-third of
the Arabidopsis genome sequence is complete
(see Box 1); and a genome sequencing project
has been initiated for rice3. In addition, approximately 25 000 non-redundant Arabidopsis
cDNA sequences are currently available4 and
EST projects for a wide range of agricultural
crops are being contemplated.
How will we make use of this wealth
of sequence information? One of the most
powerful tools that has recently been developed to bridge the gap between sequence
information and functional genomics is DNA
microarray technology. Here, we briefly outline the salient features of this technology and
discuss some of its potential applications to
plant biology.
An overview of DNA microarray
technology
Thus far, two general types of microarrays
have been developed: DNA fragment-based
microarrays5, and oligonucleotide-based microarrays6. Here we focus primarily on the DNA
fragment-based microarrays and reference
to oligonucleotide-based microarrays is solely
to provide contrasting examples. Detailed descriptions of the technical aspects of generating and using DNA microarrays are available7,8,
and several groups provide descriptions of the
instrumentation, methods and sample microarrays via the Internet (Box 1).
DNA fragment-based microarrays
Microarrays are formed by robotically depositing specific fragments of DNA at indexed
locations on microscope slides (Fig.1). The
fragments can originate from a variety of
sources including anonymous cDNA clones,
EST clones, anonymous genomic clones, or
DNA amplified from open reading frames
(ORFs) found in sequenced genomes. With
38
January 1999, Vol. 4, No. 1
currently available technology, it is feasible to
array up to 10 000 spots/3.24 cm2. Thus, in
theory, all 20 000–25 000 Arabidopsis genes
can be displayed on a standard microscope
slide.
Once produced, the microarrays are hybridized with fluorescently-labeled mRNAderived probes (Fig. 1). After removing the
unbound probe, the fluorescent probe that has
hybridized to DNA fragments on the microarray is excited by light. The fluorescent signal emitted from each spot is a reflection of
the abundance of the corresponding sequence
in the original probe. Because the detection is
fluorescence based, the signal output is very
sensitive, and individual mRNA species can
be detected at a threshold of 1 part in 100 000
(Ref. 9) to 1 part in 500 000 (Ref. 10). Also, the
signal output has a broad dynamic range, so
both weak and strong signals can be monitored
on the same microarray. Thus, quantitative
information about mRNA levels for a large
number of genes can be collected concurrently.
Microarray technology is ideally suited for
making pair-wise comparisons of samples5,8.
Two fluorescent tags, with different excitation
and emission optima, can be used to label two
distinct probes (e.g. two mRNA populations
from physiologically or genetically distinct
samples). The two probes are mixed and allowed to hybridize to the same microarray,
thereby eliminating any differences associated
with the hybridization process (Fig. 1). For
each DNA fragment in the microarray, the ratio
of fluorescence emission at the two wavelengths reflects the ratio of the abundance of
that sequence in the two probes. A ratio of two
or greater is generally accepted as significant10,11. Improved statistical methods to assess
the limits of resolution may refine this estimate.
Advantages and disadvantages to DNA
fragment-based microarrays
Oligonucleotide-based microarrays, in particular those produced by photolithography
methods, have very high densities (250 000
oligonucleotides/cm2) and are inherently more
consistent from array to array in comparison
to DNA fragment-based microarrays7,8. The
inherent potential of misplacing clones when
handling the thousands of clones required to
construct DNA fragment-based microarrays can
also be avoided with oligonucleotide-based
microarrays, because the oligonucleotides are
Box 1. Internet resources for DNA microarrays
Sequencing projects
Web site for the Rhodobacter capsulatus sequencing project at the University of Chicago,
Dept of Molecular Genetics and Cell Biology, directed by Randal Cox: http://capsulapedia.
uchicago.edu
Web site for the Arabidopsis thaliana Database: Arabidopsis Genome Initiative (AGI)
Totals. The AGI was established in 1996 to coordinate large-scale sequencing efforts:
http://genome-www3.stanford.edu/cgi–bin/webdriver?Mlval5atdb_agi_total
Web site listing sequenced organelle genomes at the Entrez browser provided by the
National Center for Biotechnology Information, Bethesda, MD, USA (which also builds,
maintains and distributes the GenBank Sequence Database): URL: http://www.ncbi.nlm.
nih.gov/Entrez/Genome/euk_o.html
DNA microarrays at public institutionsa
Web site for Pat Brown’s Microarray group at Stanford University. This site provides detailed protocols and descriptions of the parts required to construct an arrayer and a scanner.
Examples of yeast microarray data are also included: http://cmgm.stanford.edu/pbrown/
array.html
Web site from the Technology Development Group of the Stanford Genome Center,
directed by Ron Davis:
http://sequence-www.stanford.edu/group/techdev/index.html
Web site for the Microarray group within the National Human Genome Research Institute.
Detailed protocols, descriptions of arrayer and scanner design, and analysis software are
given. Also, links to commercial vendors are provided:
http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/
Web site for the Microarray group in Lee Hood’s group at the University of Washington,
Dept of Molecular Oncology and Development. Descriptions of microarray based
approaches under development are given including ink-jet-based methods of in situ oligonucleotide synthesis:
http://chroma.mbt.washington.edu/mod_www/
a
For additional listings of resources for DNA microarrays, including commercial suppliers
of instruments see Ref. 21.
1360 - 1385/99/$ – see front matter © 1999 Elsevier Science. All rights reserved. PII: S1360-1385(98)01354-5
trends in plant science
update
Preparation of target DNA
False colour image of
scanned slide
Gene amplification
Print DNA
Hybridize
Preparation of fluorescent probe
mRNA 1
AAAAAAAAAAAA
TTTTTTTTT
mRNA 2
AAAAAAAAAAAA
TTTTTTTTT
Fluorescent cDNA synthesis
Fig. 1. The principal steps in producing and utilizing DNA microarrays. Target DNAs to be printed on microscope slides are prepared by PCRamplifying inserts from clones or ORFs from genomic DNA. The PCR products are spotted onto a microscope slide at indexed locations using
quills. At present, 10 000 distinct DNA samples (genes) can be spotted on an area of 3.24 cm2. Probes from two independent samples are prepared by labeling with fluorescent nucleotides (e.g. Cy3-dCTP and Cy5-dCTP). The two probes are mixed together and hybridized to a microarray in a small volume under a coverslip. After washing off the unbound probe, the fluorescent signal associated with each element of the
microarray is ‘read’ with a dual laser scanning device and images are captured with a photomultiplier tube (PMT). Quantitative values for the
signals for each probe and the ratio of signals for the two probes can be calculated using various software programs. Abundant mRNA species
are represented by spots on the microarrays that are highly fluorescent and rare mRNA species by spots that exhibit weak fluorescence. The
result is often displayed as a false-color image, as depicted in the diagram, in which yellow indicates the two signals which were similar in
intensity (i.e. the gene was expressed at similar levels in the two mRNA populations), red indicates that the ‘red’ probe signal was stronger and
green indicates that the ‘green’ probe signal was stronger.
synthesized in situ on the basis of gene sequences obtained directly from sequence data
bases. Furthermore, because the sequence that
hybridizes in oligonucleotide-based microarrays is very short (i.e. 20–25 nucleotides),
the hybridization reactions are very sensitive
to single nucleotide mismatches, unlike DNA
fragment-based microarrays. This makes oligonucleotide-based microarrays particularly
well-suited to genotyping or resequencing
applications12. The major disadvantage of
oligonucleotide-based microarrays is that their
construction depends on the availability of
accurate sequence data. For this reason, the
only photosynthetic organisms for which we
can contemplate building oligonucleotidebased microarrays at the present time are
Synechocystis PCC6803, Rhodobacter capsulatus and Arabidopsis. Conversely, sequence
information is not required for the construction of DNA fragment microarrays, and this
type of microarray is more appropriate for the
majority of plant species. In comparison to
dot blots of clones arrayed on membrane filters, microarrays are more sensitive, exhibit a
broader dynamic range and permit simultaneous analysis of two complex probes (compare
Refs 9 and 13). Thus, although dot blots may
be appropriate for small scale, speciality arrays, only microarrays can provide genomescale analyses of gene expression patterns.
Shared data management and analysis
One significant implication of DNA microarray technology is that many identical arrays can be produced to serve as a common
experimental platform among researchers. If
coupled to the development of centralized data
bases, researchers will be able to search across
multiple data sets for interesting expression
patterns. Just as the nucleic acid and protein
sequence data bases depend on input from
many groups, public microarray expression
data bases will similarly stimulate a level of
analysis that is not possible with narrowly
defined data sets. The development of software tools to identify groups of co-regulated
genes will lead to the discovery of shared promoter motifs, signal transduction pathways
and transcription factors11. For example, transcription factors are key determinants of
developmental and physiological processes,
and exploitation of gene expression data generated with DNA microarrays will help in
identifying targets for genetic manipulation
and crop improvement.
Transcript accumulation studies in
higher plants
DNA microarray technology allows an integrative analysis of gene expression patterns,
so that, for the first time, scientists will be able
to assess the impact of a specific treatment,
environmental factor or developmental stage
on all aspects of plant biology. Because the
output is quantitative, subtle changes in gene
expression can be detected, in addition to the
more dramatic changes observed with such
techniques as subtractive hybridization and
differential display. Moreover, DNA microarray analysis can be used to identify genes
that are unaffected, as well as those that show
reduced transcript accumulation (Fig. 2). Such
broad-based views of genome-wide gene expression patterns are likely to transform our
understanding of basic physiological processes in plants and to provide new avenues
for studying intractable, complex traits, such
as heterosis.
Although DNA microarrays offer significant advantages for gene expression studies,
they do have limitations. The current methods of producing probes for eukaryotes rely
on significant amounts of poly(A)1RNA.
Improvements in protocols that would permit
January 1999, Vol. 4, No. 1
39
trends in plant science
update
Finally, DNA microarrays can be used
to screen populations of plants for polymorphic ‘expression fingerprints’ which, together with the rich genetic resources in many
plant species (e.g. natural accessions; transgenic lines; mutants; near-isogenic lines; and
substitution lines) can be used to identify
genes or groups of genes involved in specific
plant processes. Once a distinct expression
fingerprint has been correlated with a complex process, such as drought tolerance, this
expression fingerprint can potentially be used
as a tool to evaluate new genetic materials
for the trait in conjunction with phenotypic
tests.
The first DNA microarrays were constructed for Arabidopsis5,9, but DNA microarray analysis can also be applied to species
for which we have no EST or genomic sequence data7. Microarrays can easily be prepared using random clones from normalized
cDNA libraries. Clones exhibiting interesting
expression patterns can be sequenced and further characterized. This is a promising strategy for gene discovery in exotic species with
novel properties.
Fig. 2. Scatter plot comparing the mean hybridization signal of 673 genes for two
Arabidopsis accessions, Columbia (n53) and Kas-1 (n54). Probes were prepared from
poly(A)1RNA from rosette leaves. The hybridization signals are plotted on a log-log scale.
Although the majority of genes are expressed at similar levels in the two accessions, a small
number of genes exhibit variation in expression levels between Columbia and Kas-1. Two
such genes are highlighted in red. CXc750 is a pathogen-inducible gene that has been
reported previously to be expressed in uninfected Columbia22. LRR is a gene with sequence
similarity to leucine-rich repeat containing genes.
the use of poly(A)1RNA from small tissue
samples or even from single cells would extend the usefulness of the technique. As with
northern blots, DNA microarrays provide
information about transcript levels, and additional studies are required to ascertain
whether altered transcript levels reflect
changes in synthesis or turnover. As with
other hybridization-based methods, crosshybridization among closely related gene family members may occur, so confounding the
analysis of gene expression of individual family members. Also, not all genes that participate in a process exhibit changes in transcript
levels. For example, in some signal transduction pathways, regulation occurs at the level
of phosphorylation or dephosphorylation of
the gene product. Developments in the complementary field of proteomics will provide
the tools needed for global analyses of changes
in polypeptides14.
Potentially, the DNA microarray technology is also a powerful tool for new gene discovery. One or a few genes are typically used
as marker genes for any given plant process.
However, with the ability to survey large
40
January 1999, Vol. 4, No. 1
numbers of genes, more specific marker genes
are likely to be identified. As an example, new
heat-shock responsive genes were identified in
a survey of 1000 random cDNA clones, suggesting that even for well-studied processes
new gene candidates can be discovered10.
Marker genes may prove useful in studies to
monitor plant responses and in screens for
mutants that exhibit altered regulation or
expression patterns of marker genes.
One challenge facing plant biologists is in
assigning functions to the large numbers of
genes being identified in the various sequencing projects. Many genes that are currently
classified by sequence similarity to known
genes remain poorly defined in terms of their
specific roles in plant growth and development. By determining expression patterns of
genes across a wide range of developmental
and environmental conditions, it may be possible to develop hypotheses about the specific
functions of these genes. These hypotheses
can then be tested by over- or under-expressing the gene in transgenic plants and monitoring changes in morphology or physiological
competence.
Transcript accumulation studies in
photosynthetic prokaryotes and
chloroplasts
Photosynthetic prokaryotes are excellent
model systems for DNA microarray analyses
because of their relatively small genome size
and the wide range of regulatory mutants
available. The Synechocystis PCC6803 genome contains 3168 deduced protein coding
sequences2 and the Rhodobacter capsulatus
genome is only three quarters the size of the
E. coli genome15. Thus, the construction of
DNA fragment microarrays containing entire genomic complements of ORFs should
be straightforward. These microarrays will be
particularly useful for global analyses of gene
expression under various environmental conditions. Such studies can be conducted using
both wild type and known regulatory mutants
to discern what portion of any global response is controlled by a specific regulatory
factor. Furthermore, microarray-based gene
expression analyses are not limited to organisms with sequenced genomes. The small
genome size of photosynthetic prokaryotes
allows indexed genomic libraries containing
3–5 kbp inserts to be arrayed; 5000–10 000
spots will typically be sufficient for complete coverage of most genomes of this
type. Some redundancy will exist in such
microarrays, and clones corresponding to
spots of interest will need to be sequenced
after being identified. However, this is an
attractive approach for researchers studying
organisms with unique physiological responses but whose genomes have not been
sequenced.
trends in plant science
update
To date, sequencing projects have been
completed for 13 higher plant chloroplast
genomes16, and three plant and two Chlamydomonas spp. mitochondrial genomes (see
Box 1). The small number of ORFs that
such genomes typically contain can easily be
examined on DNA fragment microarrays.
Many key photosynthetic complexes consist
of components that are encoded by both
the chloroplast and nuclear genomes. DNA
microarrays would allow transcripts from
each chloroplast ORF to be analysed relative
to the expression patterns of their nuclearencoded counterpart. The regulation of gene
expression in developing and mature chloroplasts is known to occur at both transcriptional
and post-transcriptional levels17 and microarrays will be useful in determining transcript levels of chloroplast genes under a
variety of developmental stages and environmental conditions. Similar kinds of experiments could be envisioned for mitochondrial
genes. In addition, several chloroplast and
mitochondrial mRNAs undergo complex
splicing events, including trans-splicing18.
The nature and relative frequency of these
events could potentially be monitored with
DNA microarrays19.
Speculative applications
A range of applications of DNA fragmentbased microarrays have been described8. The
technique can also be used in a ‘reverse
Southern’ procedure to identify gross DNA
polymorphisms in various genetic materials.
For example, segments altered in deletion
or substitution lines can be determined using
comprehensive DNA microarrays. Such an
approach would be helpful in characterizing
collections of mutants and assessing more
global changes in genome structure in natural populations. One caveat is that this application is likely to be limited to organisms
with smaller genomes20. In addition, the insertion sites of transposable elements or TDNAs can potentially be ascertained using
DNA microarrays7. Finally, microarrays
constructed of genomic fragments might
be used in south-western experiments with
fluorescently-labeled DNA binding proteins
to determine candidate target genes for the
binding proteins.
Conclusion
At present, researchers often have only a
limited knowledge of areas of plant biology
beyond their immediate field of study. However, growth and development is an integrative process that reflects the genotype and
life history of a plant. We anticipate that DNA
microarray technology will revolutionize our
understanding of plant processes by providing
a global perspective of responses to environmental and developmental changes.
Most applications outlined in this article are
based on determining transcript accumulation
patterns. However, it is likely that DNA
microarray technology and its uses will evolve
rapidly over the coming years as better instruments for preparing microarrays and better
software programs for analysing data from
microarrays are developed.
Acknowledgements
We would like to thank our colleagues
at Stanford for providing us with access
to their equipment and for their advice. In
addition, we would like to thank the many
plant researchers with whom we have
discussed the DNA microarray technology.
Their questions, concerns and suggestions
have framed the arguments presented in this
article. Carnegie Institution of Washington
publication #1400. This work was supported in part by the Carnegie Institution
of Washington, NSF, US-DOE and the
USDA.
David M. Kehoe
Dept of Biology, Indiana University,
Bloomington, IN 47405, USA
Per Villand and Shauna Somerville*
Dept of Plant Biology, Carnegie Institution of
Washington, Stanford, CA 94305, USA
*Author for correspondence
(e-mail [email protected])
References
1 Fleischmann, R.D. et al. (1995) Whole-genome
random sequencing and assembly of
Haemophilus influenzae RD, Science 269,
496–512
2 Nakamura, Y. et al. (1998) Cyanobase,
a WWW database containing the complete
nucleotide sequence of the genome of
Synechocystis sp. strain PCC6803, Nucleic Acids
Res. 26, 63–67
3 Sasaki, T. (1998) The rice genome project in
Japan, Proc. Natl. Acad. Sci. U. S . A. 95,
2027–2028
4 Rounsley, S.D. et al. (1996) The construction
of Arabidopsis expressed sequence
tag assemblies, Plant Physiol. 112,
1177–1183
5 Schena, M. et al. (1995) Quantitative monitoring
of gene expression patterns with a
complementary DNA microarray, Science 270,
467–470
6 Lipshutz, R.J. et al. (1995) Using oligonucleotide
probe arrays to access genetic diversity,
BioTechniques 19, 442–447
7 Lemieux, B., Aharoni, A. and Schena, M.
(1998) DNA chip technology, Mol. Breed. 4,
277–289
8 Eisen, M.B. and Brown, P.O. DNA arrays for
analysis of gene expression, Methods Enzymol.
(in press)
9 Ruan, Y., Gilmore, J. and Conner, T. (1998)
Towards Arabidopsis genome analysis:
monitoring expression profiles of 1400
genes using cDNA microarrays, Plant J. 15,
821–833
10 Schena, M. et al. (1996) Parallel human
genome analysis: microarray-based
expression monitoring of 1000 genes,
Proc. Natl. Acad. Sci. U. S. A. 93,
10614–10619
11 DeRisi, J.L., Iyer, V.R. and Brown, P.O.
(1997) Exploring the metabolic and genetic
control of gene expression on a genomic scale,
Science 278, 680–686
12 Wang, D.G. et al. (1998) Large-scale
identification, mapping and genotyping
of single-nucleotide polymorphisms in
the human genome, Science 280,
1077–1082
13 Desprez, T. et al. (1998) Differential
gene expression in Arabidopsis
monitored using cDNA arrays, Plant J. 14,
643–652
14 Pennington, S.R. et al. (1997) Proteome
analysis: from protein characterization to
biological function, Trends Cell Biol. 7,
186–173
15 Vlc̆ek, C. et al. (1997) Sequence of
a 189-kb segment of the chromosome
of Rhodobacter capsulatus SB1003,
Proc. Natl. Acad. Sci. U. S. A. 94,
9384–9388
16 Korab-Laskowska, M. et al. (1998)
The organelle genome database project
(GOBASE), Nucleic Acids Res. 26,
138–144
17 Gruissem, W. (1989) Chloroplast gene
expression: how plants turn their plastids on,
Cell 56, 161–170
18 Wissinger, B., Brennicke, A. and Schuster, W.
(1992) Regenerating good sense: RNA editing
and trans splicing in plant mitochondria,
Trends Genet. 8, 322–328
19 Giege, P. et al. (1998) An ordered
Arabidopsis thaliana mitochondrial
cDNA library on high-density filters allows
rapid systematic analysis of plant gene
expression: a pilot study, Plant J. 15,
721–728
20 Shalon, D., Smith, S.J. and Brown, P.O. (1996)
A DNA microarray system for analysing
complex DNA samples using two-color
fluorescent probe hybridization, Genome Res. 6,
639–645
21 Marshall, A. and Hodgson, J. (1998) DNA chips:
an array of possibilities, Nat. Biotechol. 16,
27–31
22 Aufsatz, W. and Grimm, C. (1994) A new,
pathogen-inducible gene of Arabidopsis is
expressed in an ecotype-specific manner, Plant
Mol. Biol. 25, 229–239
January 1999, Vol. 4, No. 1
41