A physical map of the highly heterozygous Populus genome

The Plant Journal (2007) 50, 1063–1078
doi: 10.1111/j.1365-313X.2007.03112.x
A physical map of the highly heterozygous Populus genome:
integration with the genome sequence and genetic map and
analysis of haplotype variation
Colin T. Kelleher1, Readman Chiu2, Heesun Shin2, Ian E. Bosdet2, Martin I. Krzywinski2, Chris D. Fjell2, Jennifer Wilkin1,
TongMing Yin3, Stephen P. DiFazio3,†, Johar Ali2, Jennifer K. Asano2, Susanna Chan2, Alison Cloutier2, Noreen Girn2, Stephen
Leach2, Darlene Lee2, Carrie A. Mathewson2, Teika Olson2, Katie O’Connor2, Anna-Liisa Prabhu2, Duane E. Smailus2, Jeffery M.
Stott2, Miranda Tsai2, Natasja H. Wye2, George S. Yang2, Jun Zhuang1, Robert A. Holt2, Nicholas H. Putnam4, Julia Vrebalov5,
James J. Giovannoni5, Jane Grimwood6, Jeremy Schmutz6, Daniel Rokhsar4, Steven J.M. Jones2, Marco A. Marra2, Gerald A.
Tuskan3, Jörg Bohlmann1,7,8, Brian E. Ellis1, Kermit Ritland7, Carl J. Douglas8,* and Jacqueline E. Schein2
Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z3, Canada,
Genome Sciences Centre, 100-570 West 7th Avenue, Vancouver, BC V5Z 4S6, Canada,
Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6422, USA,
US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA,
Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY 14853, USA,
Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305-5329, USA,
Department of Forest Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada, and
Department of Botany, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
Received 1 October 2006; revised 9 February 2007; accepted 23 February 2007.
For correspondence (fax +1 604 822 6089; e-mail [email protected]).
Present address: Department of Biology, West Virginia University, Morgantown, WV 26506, USA.
As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging
model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of
an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was
constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence
assembly to be 485 10 Mb in size. BAC ends were sequenced to assist long-range assembly of wholegenome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple
sequence repeat-based markers were derived from the end sequences and used to initiate integration of the
BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs,
were aligned to the sequence assembly (JGI Populus trichocarpa, version 1.0). These alignments represent a
total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group
sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was
the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing
of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning
contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of
these haplotype sequence differences.
Keywords: Populus trichocarpa, physical map, genome integration, BAC end sequences, poplar genomics,
haplotype diversity.
Black cottonwood (Populus trichocarpa Torr. & Gray) is a
genetically highly variable outbreeding tree species that
primarily inhabits floodplains and river margins. It is windª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd
pollinated and propagates sexually through minute windborne seeds, often dispersed long distances along river
corridors (Braatne et al., 1996; Farmer, 1996). The species
1064 Colin T. Kelleher et al.
extends through approximately 30 degrees of latitude along
western North America, and ranges inland to the Rocky
Mountains (Burns and Honkala, 1990). Populus (poplar)
species are economically important crops in temperate climates throughout the world for a variety of purposes,
including wood pulp, paper and biomass, and for use in
phyto-remediation and waste water treatment.
Because of its relatively small genome size (485 Mb), the
availability of genetic and genomic resources, and ease of
propagation and genetic manipulation, Populus provides a
useful model system to study a number of biological
processes of importance to woody perennial plants, such
as dormancy, secondary xylem (wood) development, metabolism and responses to environmental stress (Strauss and
Martin, 2004; Taylor, 2002). As commercial species, poplars
are of most interest with respect to wood production, and
Populus has thus been the focus of numerous studies
examining the molecular biology of wood and secondary
wall formation (Plomion et al., 2001; Schrader et al., 2004;
Sterky et al., 1998, 2004). Poplar hybrids, such as P. trichocarpa · P. deltoides L., grow much faster than either parental species alone, and are widely used in plantations as a
fast-growing source of wood and fiber. In terms of its
ecological adaptation, poplar is also a valuable study
subject. For example, due to its size and longevity, poplar
provides an ideal system to study spatial and temporal
patterns of local and systemic defenses against herbivores
(Arimura et al., 2004). A large number of genetic adaptations
are likely to explain its ecological success over broad
geographical and climatic ranges (Cronk, 2005). It forms
hybrid zones with multiple sympatric species from the
Tacamahaca and Aigeiros sections of the genus (Eckenwalder, 1996), making it useful for studying factors involved in
species distinction and the biological species concept
(Rieseberg et al., 1999).
To further expand the use of Populus as a model woody
perennial species, the development of genomic tools and
resources is essential. Primary among those recently made
available is the full Populus genome sequence, derived from
a wild P. trichocarpa individual (named Nisqually-1), and the
accompanying genome annotation (Tuskan et al., 2006). A
multitude of additional resources, including controlled cross
populations, cross-species molecular markers, EST collections and full-length cDNAs, have been developed and
employed to further poplar genomics (Ralph et al., 2006;
Strauss and Martin, 2004; Tuskan et al., 2006). Populus has
been used extensively in experimental and population
biology studies, and dense genetic maps are available for
a number of species within the genus (Cervera et al., 2001;
Yin et al., 2004). With the growth in available resources,
Populus is becoming increasingly attractive as a model
organism for tree biology (Tuskan et al., 2004a).
Clone-based physical maps have been shown to be useful
in providing a framework to aid in the generation and
validation of genome sequence assemblies and as a valuable resource for map-based cloning (Chen et al., 2002;
Gregory et al., 2002; Krzywinski et al., 2004; McPherson
et al., 2001; Mozo et al., 1999; Nelson et al., 2005; Wallis
et al., 2004). To enhance the resources available for poplar
genomics and to assist assembly of the poplar wholegenome shotgun sequence and its integration with the
genetic map, we undertook the generation of a poplar
physical map by large-scale fingerprinting of a bacterial
artificial chromosome (BAC) library. The library was constructed using DNA from the P. trichocarpa Nisqually-1
individual and so has exact correspondence with the DNA
used for the genome sequence assembly. In physical map
construction, similarities in large insert clone fingerprint
patterns are used to identify clones derived from overlapping regions of the genome, and this information is used to
create a series of ordered, overlapping clones representing
contiguous genomic regions (contigs). End sequence reads
from physical map clones [e.g. BAC end sequence (BES)
reads] can be used to align physical map contigs to
sequence assemblies, thereby integrating physical maps to
the genome sequence. The integration of genetic and
physical maps has also been shown to be a useful genomic
resource for map-based cloning (Chen et al., 2002).
Poplar presents a particular challenge for physical mapping and genome sequencing efforts due to its high level of
heterozygosity and its gene and genome duplications
(Sterck et al., 2005; Tuskan et al., 2006). These two phenomena (heterozygosity and duplication) could confound both
sequence and fingerprint contig assembly. A high level of
heterozygosity could lead to independent assembly of
haplotypes in hyper-variable genomic regions. Likewise,
duplicated regions could lead to mis-assembly due to the
presence of genomic regions of high sequence similarity at
multiple locations within the genome. Other plant species
subject to physical mapping efforts to date include Arabidopsis (Mozo et al., 1999), maize (Fang et al., 2003; Nelson
et al., 2005), rice (Chen et al., 2002) and soybean (Wu et al.,
2004). Maize, rice and soybean have undergone considerable domestication and inbreeding, which has led to a more
homogenized genetic complement (Buckler et al., 2001;
Wang et al., 1999), and Arabidopsis is an inbreeding species
with consequent low heterozygosity (Abbott and Gomes,
1989; Bustamante et al., 2002). In contrast, Populus is an
obligate dioecious outcrosser, with high levels of gene flow
due its wind-pollinated habit. Analysis of 4.2 million
phred 20 shotgun sequence end reads used for the P. trichocarpa Nisqually-1 genome assembly supports the high level
of heterozygosity and haplotype diversity in the Nisqually-1
individual, with an overall rate of approximately 2.6 polymorphisms (SNPs and insertion/deletions) per kb (Tuskan
et al., 2006).
In this paper, we report generation of the poplar BAC
fingerprint physical map, tanchoring of the physical map to
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
Physical mapping of the Populus genome 1065
the sequence assembly through BES alignments, and
integration of the genetic and physical maps using markers
common to both. Unexpectedly, alignment of physical map
contigs onto the poplar genome sequence assembly
revealed a consistent pattern of co-aligning BAC contigs.
These were presumed to represent haplotypes, a finding
confirmed by genetic analysis of representative examples.
Targeted sequencing of representative BAC clones uncovered the details of extensive indel and SNP haplotype
polymorphisms within this P. trichocarpa individual, but
the haplotypes sequences were otherwise co-linear, and we
found no evidence of haplotype-specific gene complements,
as has been found in maize. The physical map and its
integration with the poplar genome and genetic map are
core resources important in establishing poplar as a model
system for tree biology.
BAC library and fingerprint map assembly
A BAC library of 48 384 clones created from the Nisqually-1
individual was used to construct the physical map. The BACs
were fingerprinted with HindIII, and successful fingerprints
were obtained for 46 025 (95%) of the clones. On average,
fingerprints contained 31 restriction fragments within the
range of 600–30 000 bp. The average insert size of the
clones, based on the fingerprint data, was 100 kb, thus the
fingerprinted clones represented approximately 9.4-fold
coverage of the 485 10 Mb Populus genome, as estimated
from the genome sequence (Tuskan et al., 2006).
An initial automated assembly of the clone fingerprints
was performed using FPC (fingerprinted contigs) software
(Soderlund et al., 1997, 2000). This assembly was performed
at relatively high stringency to avoid binning together clones
from unrelated regions of the genome. All contigs containing >10 clones (representing approximately 40% of all clones
in contigs) were manually edited to refine the clone order
derived by FPC, using clone and contig editing tools
available within the FPC software. During this process, a
total of 515 clones were identified as having fingerprints
resulting from either cross-well contamination or partial
enzyme digestion, and these were removed from the
available clone set. Contig merges were performed manually where supported by the fingerprint data. Contigs with
£10 clones had their clone order refined using our automated contig ordering application, CORAL (Flibotte et al., 2004),
which became available after manual editing had been
partially completed. During the manual review phase, we
identified small subsets of overlapping clones internal to
some contigs that, while otherwise highly similar in their
restriction fragment patterns to their closest neighbors in the
contig, contained restriction fragment pattern irregularities
in the form of missing and extra fragments. These irregular
fragments were commonly shared within the small subset of
clones but not with the larger group of clones in the contig,
suggesting underlying, biologically relevant DNA differences. These clones potentially represented restriction fragment differences resulting from sequence polymorphisms
between the two parental haplotypes (i.e. the same genomic
region), or clones containing duplicated or repetitive genomic sequences (i.e. similar sequence but from different
regions of the genome). In the absence of any orthogonal
evidence suggesting the underlying nature of the observed
differences, and wishing to avoid potentially collapsing
independent regions of the genome into the same map
contig, we removed these clone subsets from the larger
contigs and placed them into independent contigs.
Following completion of manual editing of the largest
contigs and automated clone ordering for the remaining
contigs, automated scripts were employed to compare
clones at contig ends to identify additional contig merges.
This was performed at a reduced stringency from that of the
initial fingerprint assembly. Seven rounds of automated
merging were performed with varying parameters for
required fingerprint similarity at the merge point (see
Experimental procedures). This included two rounds in
which singleton clones (those that did not assemble into
contigs with the parameters used for the initial automated
assembly) were assessed for their potential to bridge contigs
that otherwise would have insufficient similarity between
their end clones to permit a merge, and one round in which
contigs with £3 clones were inserted internally into larger
contigs where supported by fingerprint similarity. Following
each round of merging, a subset of the merges was
manually reviewed to ensure the parameters were sufficiently stringent to prevent incorrect merges being performed. At the end of this process, the map consisted of 3471
contigs. On average, contigs contained 11 clones, ranging
from 2 to 128 clones per contig (excluding a single exceptional contig containing >1000 clones, described below).
When the sequence assembly became available (September
2004, JGI Populus trichocarpa, version 1.0), additional contig merges were performed based on supporting evidence
from the BES alignments to the sequence scaffolds (see
below and Experimental procedures).
BAC end sequencing
To facilitate integration of the BAC map with both the Populus sequence assembly and genetic maps, end reads were
obtained from the clones in the BAC library. A total of 81 904
BES reads passed quality filters (see Experimental procedures), with an average phred 20 (Ewing and Green, 1998;
Ewing et al., 1998) read length of 504 bp, corresponding to
more than 41 Mb of poplar genomic DNA sequence, nearly
10% of the total genome size. These reads represented
44 422 BAC clones, 37 482 (84%) of which had reads from
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
1066 Colin T. Kelleher et al.
both ends. The BESs provided a clone-linked sequence resource for use in aligning BACs to the genomic sequence
and for identification of simple sequence repeats (SSRs) to
be used in genetic mapping experiments. In addition, the
BESs were employed during assembly of the Populus genome sequence, providing a powerful aid to long-range
contiguity of the assembled scaffolds (Tuskan et al., 2006).
The availability of the BESs also allowed us to decipher
the nature of one unusual contig in the physical map. This
contig contains 1271 clones, approximately 10 times more
than the 128 clones in the next-most populated contig, and
well above the map average of 11 clones per contig. The vast
majority of these clones are entirely redundant, suggesting
that they potentially represent a duplicated sequence within
the genome or cloning bias resulting in the over-representation of a small genomic region. Analysis of BESs derived
from clones in this contig showed that they had high
similarity to Arabidopsis and poplar chloroplast DNA, suggesting that the genomic DNA used in the construction of
the BAC library was contaminated with chloroplast DNA,
and that BAC clones derived from the chloroplast DNA had
assembled into a deeply redundant contig in the map. This
phenomenon was also encountered during genome sequencing efforts, where it was revealed that, in some of the DNA
libraries used, the poplar total genomic DNA was contaminated with chloroplast genome DNA, leading to separate
assembly of the poplar chloroplast genome at a very high
level of sequence depth (Tuskan et al., 2006).
map. From the 122 BES-derived SSRs on the genetic map,
119 of the corresponding contigs were mapped to the
Nisqually-1 pedigree (family 545). An example of this
physical–genetic map integration is shown in Figure 1 for
LG X. Table 1 summarizes the total number of contigs
mapped onto family 545 using BES-derived SSRs and the
percentage of each LG covered by the contigs. The average
contig coverage of the LGs using these SSRs alone was 22%,
a significant percentage given the relatively small number of
SSRs tested. Given the success of this approach, we next
evaluated a much larger pool of 3506 potential SSR markers,
derived from in silico analysis of BESs. Of these, 1769 passed
the BLAST criteria, based on primer sequence alignment to
the LG sequence assembly, and 352 were mapped in family
13 while 392 failed mapping. These data will be presented in
detail elsewhere (T. Yin et al., unpublished). Given the
success of mapping physical map contigs to LGs using the
BES-derived SSRs (Figure 1 and Table 1), and the 50%
success rate of mapping additional BES-derived SSRs, it
would be possible to anchor a substantial fraction of
physical map contigs to the genetic map using the larger
set of BES-derived SSR markers. However, due to parallel
work on aligning the physical map and the genome
sequence assembly, we decided to concentrate on a more
Integration of genetic and physical maps
To anchor the physical map to poplar genetic maps, we used
a P. trichocarpa Nisqually-1 genetic map based on simple
sequence repeat (SSR) and AFLP (amplified fragment length
polymorphisms) markers in a pedigree (family 545) in which
Nisqually-1 was the female parent. The SSR markers were
primarily developed from the BES reads. The remaining ten
SSRs have been used in other poplar genetic map studies
(Cervera et al., 2001; van der Schoot et al., 2000; Tuskan
et al., 2004b; Yin et al., 2004) or were designed from
assembled shotgun sequence. In total, 122 BES-derived
SSRs were used for construction of the Nisqually-1 genetic
map. In addition, 123 dominant AFLP markers (Vos et al.,
1995) were added to the genetic mapping analysis. This
map, and a consensus Populus map derived from merges
with maps derived from other Populus pedigrees, including
P. trichocarpa family 13 (Yin et al., 2004) will be described in
detail elsewhere (T. Yin et al., unpublished).
We evaluated use of the in silico identified SSRs derived
from the BESs as genetic markers for the purpose of
integrating the physical map contigs onto the family 545
genetic map. This evaluation was based solely on markers
from the Nisqually-1 pedigree. SSRs were selected based on
contig size to map large physical map contigs to the genetic
Figure 1. The position of physical map contigs on the LG X genetic map.
Individual contigs are represented as different colored sections along the
linkage group. Contigs were mapped to the LGs using BES-derived SSRs
(markers beginning with G). Approximately 29% of the LG is covered by the
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
Physical mapping of the Populus genome 1067
Table 1 Summary of the integration of the genetic and physical
maps through BAC end sequence derived SSR markers
No. contigs mapped
to the Nisqually-1 pedigree
Estimated %
coverage of LG1
Average 22 %
The data presented is for those markers mapped in the Nisqually-1
pedigree and used to evaluate the integration of the BAC physical
map and the genetic map.
The % LG cover values given are only for those physical map contigs
mapped in Nisqually-1. The percentage coverage of the LG by contigs
was calculated a posteriori of the poplar sequence assembly and is
based on the contig size estimates.
high-throughput and automated approach to linking the
physical map and the genome assembly – namely, alignment through BESs.
Alignment of the BAC map to the genome sequence
Alignment of the BESs to version 1.0 of the JGI Populus
trichocarpa assembly (http://genome.jgi-psf.org/Poptr1/
Poptr1.home.html) enabled large-scale integration of the
physical map with the genome sequence. A total of 73 374
end sequence reads derived from 42 809 unique clones
passed the alignment filters (see Experimental procedures).
A total of 34 770 clones (76% of all map clones) had
informative BES alignments, of which 22 526 (65% of clones
with informative reads) had paired-end alignments. Using
these clone alignments, we positioned the contigs on the
genome sequence assembly. Examination of the order and
orientation of the map contigs with respect to the sequence
assembly identified adjacent map contigs with closely juxtaposed assembly coordinates, indicating candidate contig
merges that had not met the stringency requirements for
automated merges based on fingerprint similarities alone
(probably due to insufficient overlap between the contig
ends). With the supporting sequence coordinate evidence,
Figure 2. A summary of contig size distribution within the physical map.
The number of clones per contig varied from 2 to 189, with an average of 14.
and where substantiated by the fingerprint data, these
merges were performed, producing a final map with 2802
contigs and 5746 singletons. The average contig size is
466 kb, with some contigs larger than 1 Mb. The distribution
of fingerprint contig sizes is shown in Figure 2.
A total of 2226 of the 2802 contigs aligned to unique
regions in the genome assembly. An additional 185 contigs
mapped to multiple sequence regions, with the majority of
these (87%) mapping to two regions. The remaining 391
contigs could not be positioned on the sequence assembly
using our criteria. Thus, 86% of the physical map contigs
were aligned to the genome sequence assembly, and these
contigs contained 97% of all clones assigned to map contigs.
The scaffolds in the version 1.0 Populus genome assembly (Tuskan et al., 2006; Yin et al., unpublished) contain in
total 485 Mb of genomic DNA, of which 308 Mb are
anchored to the 19 Populus LG assemblies. BAC contigs
aligned to the genome sequence represent 384 Mb (79%) of
the entire assembled poplar sequence and 295 Mb (96%) of
the LG assemblies. As shown in Table 2, all LG assemblies
with the exception of LG XIX have >90% coverage in the BAC
map, based on the contig alignments. A graphical display of
the contig alignment results for all 19 LGs is provided in
Figure S1. An example of this display, showing LG X, is
provided in Figure 3. In a small number of cases, contigs
were mapped to two genomic regions within a linkage group
(represented by internal arcs linking contigs in Figure 3 and
Figure S1). These could potentially indicate regions of
repetitive DNA or genome duplication. They may also arise
from fingerprints derived from mixed DNAs, resulting from
cross-well contamination in the library plates, which may
cause a mis-assembly in the fingerprint map (contigs joined
in error).
Of obvious note in Figure 3 are the complex contig
alignment patterns in many regions, with multiple contigs
aligning to the same sequence region. These alignment
patterns are also observed in the other LG alignments
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
1068 Colin T. Kelleher et al.
No. aligned
map contigs
No. aligned
BACs in contigs
No. aligned
Estimated %
coverage of LG1
Table 2 Summary of integration of the
sequence assembly linkage groups (LGs)
with the physical map
The % coverage was calculated using the distances between aligned clones, obtained from the
sequence assembly.
(Figure S1). These patterns suggest two theoretical possibilities. First, the genome sequence may be mis-assembled
in these locations, for example by artifactual collapse of
Figure 3. Fingerprinted BAC clone and contig
layout on the LG X sequence assembly.
Clone placement is based on BES alignments to
the genome sequence. The ideogram of LG X is
composed circularly (outermost ring), with 1 Mb
spans colored in alternating black and white
strips. The innermost histogram track (black)
illustrates the depth of aligned BAC clone coverage, with each concentric circle representing a
fivefold clone depth. The next outer histogram
track (red) shows the coverage provided by
aligned BAC map clones not assigned to contigs
(singletons). The next track shows the extent of
anchored contigs, coded with an alternating
color scheme. The final track inside the ideogram
circle shows the sequence alignment position of
individual aligned clones in each contig, colored
by map contig assignment. Fingerprint contigs
aligning to two different regions of the sequence
assembly are linked by arcs. Genetic markers
(SSRs) derived from BESs are indicated by
triangles on the sequence track. Green triangles
indicate SSRs mapped in the Nisqually-1 pedigree, blue triangles indicate those mapped in
another P. trichocarpa pedigree (family 13), pink
triangles indicate those for which mapping
failed, and gray triangles indicate SSRs that have
not yet been tested, illustrating that a large
number of well-spaced SSRs remain for use in
future use in integration of the genetic and
physical maps. The outer track of triangles shows
those SSRs used for preliminary mapping of
contigs to LGs. The diagram was prepared using
Circos (http://mkweb.bcgsc.ca/circos/).
duplicated but physically distinct regions during the genome assembly process. However, the large-scale co-linearity
of the genetic map and sequence assembly (Tuskan et al.,
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
Physical mapping of the Populus genome 1069
2006) suggests that mis-assembly at such a large scale is
unlikely. A second possibility is that the clones in overlapping BAC contigs represent individual Nisqually-1 haplotypes that were assembled independently into distinct
haplotype-specific contigs in the physical map due to
haplotype-specific sequence diversity. This hypothesis was
investigated further, as described below.
Haplotype sequence diversity
To assess the possibility that distinct physical map contigs
aligning to the same locations in the sequence assembly
represented the two Nisqually-1 haplotypes, we selected
and sequenced eight BAC clones representing four examples of putative haplotype differences, based on fingerprint
pattern discrepancies and presence in contigs co-aligning to
the genome. The BAC clones represent genomic regions on
LGs I and XIV. We compared the sequences of the BACs in
detail to ascertain the nature of the differences and to identify nucleotide variations that could be genetically mapped
to test for allelism. We describe below representative results
for two of the four examples. Table S1 contains a summary
of the polymorphisms found in all four pairs of sequenced
BAC clones.
On LG I, contig 846, a small contig consisting of seven
clones, has sequence assembly coordinates spanning
105 kb of this LG (http://genome.jgi-psf.org/Poptr1/Poptr1.home.html). These coordinates are contained entirely within
the LG I alignment coordinates of contig 8, a larger contig
containing 127 clones and spanning 1263 kb (Figure 4a). The
contig 8 clones that lie within the region of co-alignment
share consistent HindIII fingerprint patterns, suggesting that
they represent a single haplotype. The corresponding BAC
clones in contig 846 have HindIII restriction patterns highly
similar to, but distinct from, those in contig 8, suggesting
that the clones in contig 846 potentially represent the
alternative haplotype for this region. The LG I sequence
assembly alignment coordinates for clone T0048O04 from
contig 846 overlap by 89 kb with the LG I alignment coordinates for clone T0068B19 from contig 8 (Figure 4a), which is
98% of the alignment length for T0068B19. However, the two
clones differ in >20 restriction fingerprint fragments (Figure 4b). Comparison of the clone sequences revealed they
were substantially co-linear, sharing a high degree of
identity (84%) in the overlap region, consisting of segments
of complete identity interspersed with localized differences
in the form of indels and nucleotide substitutions, some of
which affect HindIII recognition sites. Figure 5 illustrates the
differences between the sequences relative to the resulting
restriction fingerprint patterns of the two BAC clones. One
large (11 kb) and two small (1244 bp and 189 bp) indels were
identified, in addition to five nucleotide differences, that
impact HindIII recognition sequences such that these recognition sites are present in one clone but absent in the other.
As shown in Figure 5, these five SNPs, together with the
three indels noted above, account for the anomalous HindIII
fingerprint patterns that resulted in these BAC clones
assembling into distinct contigs. Additional analysis of the
sequences revealed a further 888 single nucleotide differences between the clones, and 217 smaller indels, ranging
Figure 4. Illustration of two of the four putative
haplotype-specific contigs analyzed.
(a) Schematic representation of the relationship
between contigs 8 and 846 on LG I and contigs
160 and 162 on LGX IV. Sequence assembly
coordinates are from version 1.0 of the P. trichocarpa genome assembly (Tuskan et al., 2006).
The approximate locations of BAC clones
T0068B19 (contig 8), T0048O04 (contig 846),
T0021 J18 (contig 160) and T0033M07 (contig
162) are shown.
(b) HindIII fingerprint images of the BAC clone
pairs aligning to LGs I and XIV. Thin lines drawn
through the DNA bands indicate restriction fragments identified by BandLeader software; red
lines indicate fragments that are unique to each
potential haplotype.
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
1070 Colin T. Kelleher et al.
Figure 5. Depiction of the sequence differences between clone T0068B19 (contig 8) and clone T0048O04 (contig 846) affecting the HindIII fragment patterns.
Individual restriction fragments for each clone are represented by colored boxes, and numbered underneath from left to right using the sequence of clone T0068B19
as the reference. Matching fragments are represented in light green, and are assigned a consensus fragment number in black text. Unmatched fragments in
T0048O04 are assigned alphabetical identifiers and colored differentially based on the nature of the sequence difference. The various types of sequence differences
are classified in the key on the lower left. The region from fragments 10 to 15 of T0068B19 is expanded in the center for a more detailed depiction of the alignment
from 1 to 275 bp. In total, 14.8% of the clone overlap region
contains alignment gaps. Sequence analysis of two additional BAC clone pairs identified within contig 8, and
displaying restriction fragment differences also suggestive
of haplotype differences, revealed a similar pattern of large
regions of sequence identity interrupted by several indels
ranging in size from 10–1000 bp and SNPs at HindIII
recognition sites.
A similar analysis was carried out on representative BAC
clones T0021J18 and T0033M07 from contigs 160 and 162,
respectively, two small contigs that have overlapping alignments on LG XIV (Figure 4a). Sequence comparison of these
two BAC clones revealed that they were co-linear, but
differed by 440 single nucleotide changes and 83 indels,
ranging in size from 1 to 129 bp. However, only 0.5% of the
clone overlap region contains alignment gaps. Two of the
SNPs and one 20 bp indel affect HindIII recognition sites.
The observed HindIII fingerprint pattern differences in the
clones correlated with the fragment sizes predicted by the
BAC sequences (data not shown).
The results of these BAC sequence comparisons were
compatible with haplotypic variation as the cause of the
observed restriction fingerprint pattern variation. To test the
hypothesis that LG I clones T0068B19 and T0048O04 represent two Nisqually-1 haplotypes, and thus that the contigs
into which they were placed are haplotype-specific, one SNP
site in a HindIII recognition sequence in T0068B19 was
mapped in parallel with the corresponding alternative SNP
from T0048O04 in family 545. As shown in Figure 6, the
alternative SNPs mapped to the same location on LG I, but in
reverse phase. This illustrates that they are alleles of the
same locus, consistent with the interpretation that BAC
clones T0068B19 and T0048O04 are derived from distinct
Nisqually-1 haplotypes. Similarly, genetic mapping in family
545 of putative alternative alleles at a HindIII SNP in clones
T0021J18 and T0033M07 showed that they map to the same
location on LG XIV. These results indicate that considerable
haplotype sequence variation exists in the Nisqually-1
genome, including numerous small to large indels and
SNPs, and that in regions where the variation sufficiently
perturbs the fingerprint patterns, this resulted in the creation
of haplotype-specific contigs during the map assembly
process, as indicated in Figure 3 and Figure S1.
To examine the manner in which these haplotype-specific
sequences were assembled into the genome sequence,
which is derived from reads generated from both haplotypes, we compared the sequences of the BAC clones to
those of the corresponding region from the version 1.1
poplar genome assembly (http://genome.jgi-psf.org/
Poptr1_1/Poptr1_1.home.html). An example of this analysis
is shown in Table 3, which illustrates the sequence alignment of BAC T0068B19 (contig 8) to the corresponding
region of the assembly. The genome assembly in this region
is a mosaic of the sequences of T0068B19 and the corresponding alternative haplotype BAC T0048O04, as recognized
by diagnostic indels specific to each clone sequence.
Notably, the 11 kb region deleted in T0048O04 relative to
T0068B19 is also absent in the genome assembly. Analysis
of this haplotype-specific 11 kb sequence revealed numerous short open reading frames (ORFs), and two large ORFS
of 229 and 154 amino acids. However, no matches of the
ORFs to poplar ESTs were found, and none had significant
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
Physical mapping of the Populus genome 1071
Table 3 Comparison of haplotype content in the poplar genome
assembly with respect to the haplotype-specific restriction fragments represented in BACs T0068B19 and T0048O04
HindIII restriction fragment size (bp)
Figure 6. Genetic maps of LG I, showing mapping of contig 8/846 SNPs.
Positions of LG I genetic markers are shown on the right, with distances in
centimorgans (cM) on the left. The mapped locations of two putative
alternative SNP alleles (SNP2_1 and SNP2_2) polymorphic between clones
T0068B19 and T0048O04 are shown.
similarity to predicted proteins in the non-redundant
sequence database.
Sequence analysis of all four pairs of haplotype-specific
BAC clones showed that, in each case, the haplotype-specific
sequences in regions of clone overlap were co-linear, but
contained extensive indel polymorphisms (Figures 4 and 5,
Table 3 and Table S1). This led to alignment gaps affecting
between 0.5 and 14.8% of the overlapping sequences
(Table S1). Despite large-scale haplotype sequence co-linearity, these results raised the possibility that, in addition to
ORFs in the large 11 kb haplotype-specific sequence described above, gene content and/or order could be distinct in
the two haplotypes, as has been described for several
regions of the maize genome (Brunner et al., 2005; Song and
Messing, 2003; Wang and Dooner, 2006).
To determine whether indel polymorphisms could affect
gene content and/or order in a haplotype-specific manner,
we examined in detail the locations and potential effects of
31 haplotype-specific indels ranging in size from 19 to
1244 bp, across a total of >320 kb of sequence from each
haplotype, as summarized in Table 4. We first examined the
genome assembly within the regions represented by the
P. trichocarpa assembly
Fragments shared in common between the two haplotypes are
indicated in green; fragments specific to the haplotype represented
by clone T0048O04 are indicated in blue; fragments specific to the
haplotype represented by clone T0068B19 are in red. The assembly
coordinates were determined using BES alignment coordinates for
clone T0068B19. The analysis indicates that the assembled sequence
in this region is a mosaic of the 2 haplotypes embodied in T0048O04
and T0068B19. Note that the 1403 bp fragment in the assembly,
labeled in grey, is a combination of the two haplotypes. It lacks the
HindIII restriction site present in T0068B19 (thus reflecting the
haplotype represented in T0048O04) but contains a 24 bp insertion
missing in T0048O04 (thus reflecting the haplotype represented in
sequenced BAC clones (http://genome.jgi-psf.org/Poptr1_1/
Poptr1_1.home.html), and found a total of 25 annotated
genes (Table 4). When the locations of the selected indels
were mapped relative to the genome assembly and annotated genes, we found that in 30 of 31 cases, the indel fell
either within an intergenic region lacking an ORF (29 cases)
or within an intron (two cases). In one case, a 24 bp
haplotype-specific indel (absent in the genome assembly)
resulted in a predicted haplotype-specific eight amino acid
insertion in a predicted gene product (eugene3.00141429).
However, as this short gene appears to encode only a
fragment of an ammonium transporter protein, and lacks
EST expression support (http://genome.jgi-psf.org/Poptr1_1/
Poptr1_1.home.html), it is questionable whether it is functional. In summary, our data show that extensive haplotypespecific polymorphism exists in poplar, ranging from SNPs
to indels of variable size, up to >10 kb. However, haplotypespecific sequences were largely co-linear, and we found no
evidence for differences in gene content or order between
the two haplotypes in the regions we examined.
The Populus trichocarpa genome represents that of an undomesticated and highly heterozygous plant species. We
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
1072 Colin T. Kelleher et al.
Table 4 Summary of haplotype-specific indels examined for effects on predicted gene content
Indel location
BAC pair
T0068B19 T0048O04
T0021J18 T0033M07
T0053A03 T0011N15
T0017N13 T0065A01
No. predicted
No. indels
Indel size
Indel effect
on genes6
Ctg 8 Ctg846
Ctg 160 Ctg 162
Ctg 836
Ctg 1158
Sequenced haplotype-specific BAC clones, as described in text and Table S1.
BAC fingerprint contig number or numbers.
Overlap between the pair of haplotype-specific BAC clones.
Genes predicted in the corresponding region of the Populus genome assembly (http://genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html).
Size range of the indels investigated, in bp.
Number of genes with changes in structure or location that could be affected by an indel.
Eight amino acid insertion into predicted gene eugene3.00141429, annotated as an incomplete coding sequence and without EST expression
support (http://genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html).
have aligned the physical map contigs to the sequence
assembly, and integrated the physical and genetic maps
through SSR markers. This work illustrates the power of an
integrated approach to assembling a physical map that is
anchored to both a whole-genome assembly and a genetic
map. These represent complementary resources, each having the ability to inform the others, and their integration
provides added utility to the research community. For
example, the BESs provided an essential link between the
physical map and the genome sequence. Through these, the
physical map aided long-range contiguity of the sequence
assembly and resolution of repetitive regions (Tuskan et al.,
2006). In turn, alignment of the physical map contigs to the
linkage groups and sequence scaffolds provided information useful for improvement of the physical map assembly.
Physical map coverage and genome representation
The sum of the estimated sizes of the BAC contigs is
approximately 577 Mb. This is approximately 20% larger
than the current estimated genome size of 485 Mb, derived
from the sequence assembly. The difference in the mapbased estimate is probably due at least in part to unrecognized overlap between the ends of map contigs, as has been
reported in the soybean physical map study (Wu et al.,
2004), which would result to some extent in an inflated size
estimate. However, a large contributing factor to the difference in genome size estimates is the existence of haplotypespecific contigs in the BAC map. If we consider only a single
haplotype contig for each of the co-aligning contigs anchored to the genome sequence, then the overall genome
size estimate represented by the map is reduced to 478 Mb,
which is in very good agreement with the 485 Mb estimated
derived from the sequence assembly. However, it is also
possible, given the duplicated nature of the genome (Sterck
et al., 2005; Tuskan et al., 2006), that in some cases dispersed, duplicated sequences have been collapsed within
the sequence assembly. At present, 187 BAC contigs map to
multiple regions of the genome based on the BES alignments. The discrepancies between the physical map and the
sequence assembly will require further investigation to
elucidate the underlying reasons for the assembly differences, providing an excellent opportunity to understand the
structure of sequence duplications and the nature of haplotypic differences in this species.
The physical map provides good coverage of the
sequence assembly. Approximately 384 Mb (79%) of the
entire poplar sequence assembly and 295 Mb (96%) of
linkage group assemblies were anchored to the physical
map (Table 2). This is probably an underestimate as it is
based only on contiguous regions of aligned BESs. It does
not take into account the presence of any BAC clones lacking
sequence alignments that flank the aligned regions within
contigs, or map contigs that could not be anchored to the
sequence assembly. The number of anchoring clones per
aligned contig averaged 10 and ranged from 2 to 138. These
contigs varied in size, with the majority being above 200 kb
and some extending to over 1 Mb (Figure 2). These larger
contigs are extremely useful for long-range sequence integration, providing a framework on which to orientate
sequence scaffolds. However, it is important to note that
the physical map was derived from a single BAC library
constructed by HindIII partial digestion, and thus would not
contain regions of the genome where HindIII sites are
separated by distances larger than can be typically cloned
into a BAC vector. Based on size analysis of P. trichocarpa
genomic DNA digested to completion with HindIII (data not
shown), we estimate that as much as 10% of the genome
could be missing from the BAC library, and thus the physical
map would lack coverage for these regions of the genome
In comparison to other plant physical maps, the number
of contigs in the poplar map is still large and requires further
resolution. The physical map of rice contains 458 contigs
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
Physical mapping of the Populus genome 1073
representing approximately 90% of the 430 Mb genome at
an estimated 20-fold coverage (Chen et al., 2002). For
Arabidopsis, a physical map consisting of 27 contigs
covered the majority of the 125 Mb genome (Mozo et al.,
1999). Rice and Arabidopsis have more homogenized
genomes due to domestication in the case of rice (Buckler
et al., 2001) and inbreeding in the case of Arabidopsis
(Bustamante et al., 2002). In contrast to this, the physical
mapping efforts in soybean are more comparable with those
in poplar, in terms of gross genome size and chromosome
number. The soybean genome is more complex than that of
rice or Arabidopsis, having a larger chromosome number
(soybean n = 20, poplar n = 19, rice n = 12, Arabidopsis
n = 5) and larger genome size (approximately 1115 Mb), and
being tetraploid (Wu et al., 2004). However, it is autogamous
and has been domesticated for approximately 3000 years.
These factors have combined to result in reduced haplotypic
diversity (Zhu et al., 2003). The soybean BAC- and BIBAC
(binary BAC)-based physical map consisted of 2905 contigs,
representing a 9.6-fold redundancy (Wu et al., 2004). However, this physical map did not include a sequence comparison, which was possible with the poplar genome and which
facilitated an approximate 12% decrease in the contig
number. Considering the reduced diversity within soybean,
the poplar map compares well due to the larger haplotypic
diversity inherent in poplar as an obligate outbreeding
Options for improving the P. trichocarpa physical map
include creation of additional BAC libraries using different
enzymes, as it has been shown that two-enzyme methods
outperform single-enzyme methods in simulations (Xu
et al., 2004), and optimization of library construction protocols to obtain BACs with larger inserts.
Haplotype sequence diversity and effect on gene content
An unanticipated outcome of the alignment of the physical
map with the poplar genome sequence was the detection of
haplotype-specific map contigs, which resulted from high
levels of haplotype variation in some regions of the genome.
Analysis of the sequence differences between haplotypespecific BAC clones suggests that haplotypes are characterized by numerous small to large indel polymorphisms, in
addition to SNPs, raising the possibility of differences in the
repertoire of genes between haplotypes. Such differences
would not be apparent from the genome assembly because,
as illustrated in Table 3, the genome assembly represents a
mosaic of the two haplotypes.
In maize, detailed analysis of haplotype-specific DNA
sequences has revealed striking examples of non-co-linearity in DNA sequence between haplotypes and, in some
cases, haplotype-specific gene complements. For example,
near the bz locus, Wang and Dooner (2006) observed
extensive DNA sequence non-co-linearity in eight maize
haplotypes as the result of massive differences in the
insertion sites and numbers of mobile DNA elements
surrounding and within eight genes in the region examined.
In another example, at a locus containing multiple z1C
genes, haplotype-specific differences in z1C gene number
and order were found in two haplotypes (Song and Messing,
2003). Finally, Brunner et al. (2005) observed extensive
breakdown of sequence co-linearity between two maize
haplotypes at four loci on different maize chromosomes.
This lack of co-linearity is largely caused by differential
insertion of long-terminal repeat retrotransposons in a
haplotype-specific manner, but, surprisingly, there were
also a number of haplotype-specific genes at three of the
four loci (i.e. genes that are present in one haplotype but
absent in the other).
Our data from the sequences of four pairs of BAC clones
representing over 320 kb of haplotype-specific DNA sequence in poplar allowed us to compare the extent and
consequences of haplotype-specific DNA polymorphisms in
this species. In contrast to maize, the pairs of poplar
haplotype sequences were largely co-linear, punctuated by
an assortment of small to large indels, indicating a lack of
large-scale sequence rearrangement in the haplotypes relative to each other. Our analysis of an 11 kb region specific
to one haplotype on LG I, but absent in the genome
assembly, failed to find support for expressed genes in this
region. Moreover, almost all haplotype-specific indels were
in intergenic regions where they have no impact on gene
content or order (Table 4). While small indels in coding
regions such as observed in one gene in haplotype-specific
contigs 160 and 162 (Table 4) may be relatively common in
the poplar genome (Tuskan et al., 2006), many of the larger
indels appeared in gene-poor regions (data not shown).
These data do not exclude the possibility of haplotypespecific differences in gene content in poplar, but the fact
that no such differences were found in a total of 320 kb of
haplotype-specific sequence containing 25 annotated genes
at four loci suggests such differences may be relatively rare
or confined to certain regions of the poplar genome. Indeed,
extensive haplotype diversity, including haplotype-specific
differences in gene content appear to be present on LGXIX
(T.-M. Yin, G.A. Tuskan and S.P. DiFazio, unpublished data),
an interesting LG with relatively poor genome sequence
assembly (Tuskan et al., 2006) and physical map coverage
(Figure S1). However, based on our sampling of the poplar
genome, the extensive and widespread haplotype-specific
genome organization and gene content found in maize do
not appear to be general phenomena in angiosperms, and
may be related to extremely active families of mobile
elements in that species. Further analysis of haplotype
variation in poplar and other plants will provide more
definitive data on whether the variation we observed in
poplar is more typical of angiosperms. The large numbers of
apparently haplotype-specific BAC contigs revealed by the
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
1074 Colin T. Kelleher et al.
physical map present an opportunity for more detailed
analysis of the nature and functional consequences of
haplotype sequence diversity in Populus.
Integration of genetic and physical maps
A total of 122 SSRs developed from the BESs were mapped
onto the poplar genome assembly using the Nisqually-1
pedigree (Tuskan et al., 2006; T. Yin et al., unpublished). The
markers are spread across the linkage groups, and those
mapped initially were chosen to span each linkage group
relatively evenly (approximately 20–30 cM spacing) and also
to represent large physical map contigs. These SSRs were
used to map contigs onto genetic positions (Figure 1). Using
122 BES-derived SSR markers, a total of 22% of the LG
assemblies were covered by 119 contigs mapped to the
genetic map (Table 1). These markers enable a direct
association of genetic loci with physical map BAC clones.
They also provide a resource for comparative genomics in
the genus Populus and related species. Additional microsatellites and SNP markers are being designed based on
observed polymorphisms in the genome sequence, and
these will be used to complete the map-based genome
assembly (T. Yin et al., unpublished). Integration of the
physical and genetic maps, by mapping BES-derived SSRs
and genome assembly, also provides a genome-wide data
set of comparative genetic and physical distances across the
19 linkage groups, from which potential differences in
recombination rates can be inferred. These data will be
presented in detail elsewhere (T. Yin et al., unpublished).
Map utility
The framework of the physical map, the sequence assembly
and the genetic markers provide a considerable collection of
resources for poplar research. The current map already
provides a considerable resource in terms of genomic
interrogation. The combined integration of physical map,
genetic map and genome sequence will be of use in detailed
studies on QTLs for traits of interest in tree biology, such as
wood quality, biomass production, responses to environmental cues, and responses to biotic and abiotic stresses
(Frewen et al., 2000; Tagu et al., 2005). Once QTLs of interest
have been mapped to intervals, use of the BES-derived
markers will allow identification of specific BACs in these
intervals, providing a source of cloned Populus genomic
DNA of known sequence location for functional studies on
selected candidate genes. As an example, we have used a
QTL map of poplar wood quality traits to identify markers in
regions of interest, and subsequently used the physical map
BAC clones to target these regions for further characterization (C.T. Kelleher et al., unpublished data).
The Nisqually-1 physical map BAC clones also provide a
reference point for Populus genome organization. The
genus Populus contains 29 species, distributed among six
sections (Eckenwalder, 1996), and many of these species
have unique ecological adaptations (Cronk, 2005). While all
species contain 19 chromosomes, the extent to which smallscale genome rearrangements or insertions/deletions occur
between species, perhaps contributing to changes in gene
complement and adaptation, is unknown. The Nisqually-1
physical map, combined with the genome sequence, will
serve as a reference for comparative studies on gene
synteny and genome structure within the genus, using
BAC libraries prepared from other Populus or related
species, by comparative BAC mapping and hybridization
strategies to the Nisqually-1 BAC contigs.
In summary, the physical map and other resources
available for poplar genomics should significantly aid the
advance of research into the biology of woody perennials,
and help establish this poplar as a model system for tree
biology. The complete fingerprint map is available for
download in FPC format from the Genome Sciences Centre
website (http://www.bcgsc.bc.ca/lab/mapping/data). The
map may also be viewed using Internet Contig Explorer
(iCE) (Fjell et al., 2003), a Java-based application that allows
viewing of FPC-based maps (http://www.bcgsc.ca/ice/), and
copies of the BAC library containing all clones may be
obtained by contacting the corresponding author (CJD).
Experimental procedures
BAC clone fingerprinting and map construction
The BAC library was constructed from Nisqually-1 genomic DNA
partially digested with HindIII, and consisted of 48 384 BAC clones.
The procedure for BAC library construction has been described
elsewhere (Stirling et al., 2001). BAC clones were fingerprinted by
HindIII digestion and fragment separation on agarose gels (Marra
et al., 1997; McPherson et al., 2001; Schein et al., 2004). Restriction
fragment identification, fragment mobility and size determination
were performed automatically using BandLeader software (Fuhrmann et al., 2003). Automated fingerprint map assembly was performed using FPC version 5.0.1 (Soderlund et al., 1997, 2000), with
an initial assembly performed using default parameters and a
Sulston cut-off score of 1e-15 (Sulston et al., 1988).
Additional processing of the map contigs was achieved by a
combination of manual review using tools within FPC and external
automated tools. CORAL (Flibotte et al., 2004) is an automated
application for improving clone order within FPC-assembled contigs, and was applied to contigs containing £10 clones. The majority
of contig merging was achieved by the use of automated scripts.
Multiple rounds of analysis were performed, with varying parameters used to identify valid merges. Clone fingerprint comparisons
were performed only between clones at the ends of contigs, or
between singleton clones (clones that did not assemble into
contigs) and clones at the ends of contigs. Fingerprint similarities
were first calculated using the Sulston score (Sulston et al., 1988),
and those falling below the set cut-off score for the round were
identified as candidate merges. The candidate merges were further
interrogated for the number of unconfirmed fragments across the
merge point, where an unconfirmed fragment is one that is present
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
Physical mapping of the Populus genome 1075
in a clone at a contig end, but is not present elsewhere in that contig
end nor in the clones at the end of the candidate merging contig.
Those candidate merges not exceeding the allowed number of
unconfirmed fragments for the round were considered valid merges
and the contigs were joined. The Sulston score cut-off and the
permitted number of unmatched fragments were varied for each
round, with the parameters balanced in order to avoid promiscuous
merges. Sulston score cut-offs varied from 9e-10 to 9e-6, and
permitted unconfirmed fragments varied from 0 to 3, with the latter
parameter allowing for minor errors in BandLeader fragment
identification. A single exceptional round of merging was performed in which very small contigs were merged internally to larger
contigs only if all clones in the smaller contig matched a group of
neighboring clones in a larger contig, with a maximum Sulston
score cut-off of 9.99e-10 and a maximum of two unconfirmed
fragments. A copy of the BAC library containing all clones in the
physical map may be obtained by contacting the corresponding
author (CJD).
Contig size estimation
To estimate contig sizes based on fingerprint data, an automated
algorithm was used to compare the restriction fragments of overlapping clone pairs in the canonical clone set for each contig.
Canonical clones are the set of non-redundant overlapping clones
spanning a contig that each represent a unique complement of
restriction fragments in their fingerprint, such that the remaining
non-canonical clones within the contig are subsumed by the
canonical clones (i.e. all the restriction fragments in the fingerprint
of a non-canonical clone are completely represented in one of the
canonical clones). The unique fragments for each canonical clone
were identified, and their sizes were summed to estimate the overall
size of the contigs. Specifically, the algorithm performed the following procedure for each contig: (i) sum the sizes of all the fragments in the left-most canonical clone in the contig to create a
cumulative size estimate; (ii) identify the next canonical clone
immediately to the right and identify its unique fragments (any
fragments not shared with the two previous canonical clones to the
left or the next two canonical clones to the right), then add the sizes
of these unique fragments to the cumulative size estimate; (iii) repeat step 2 until all unique fragments in the canonical clones have
been identified and summed to give a total size estimate for the
contig. Two fragments were considered the same if their calculated
standard mobilities were within 10 mobility units of each other.
Genetic and physical map integration
A genetic map of the Nisqually-1 pedigree was constructed through
the collaborative effort of Oak Ridge National Laboratory, Tennessee, USA, and the Treenomix group at the University of British
Columbia, Canada (T. Yin et al., unpublished). The pedigree was
family 545, an inter-specific F1 population obtained from a cross
between P. trichocarpa (Nisqually-1 as mother) and P. deltoides L.
(as father) (Stirling et al., 2001). The data obtained from the genetic
map were merged with another P. trichocarpa pedigree map, based
on family 13 (Yin et al., 2004). Young buds were collected and the
DNA was isolated using Qiagen DNeasy plant mini kits (http://www.
qiagen.com/). A total of 94 individuals were used for the AFLP
analysis and 87 for the SSR analysis. The markers from the genetic
map were used to integrate the physical map with the genetic map.
SSR markers primarily developed from the BESs (http://
www.ornl.gov/sci/ipgc/ssr_resource.htm) and those used in other
Populus mapping studies (Cervera et al., 2001; van der Schoot et al.,
2000; Yin et al., 2004) were used by both laboratories. At the
University of British Columbia, additional dominant AFLP markers
(Vos et al., 1995) were analyzed for the genetic mapping, and both
SSRs and AFLPs were visualized by addition of an M13 sequence on
the forward primer and subsequent addition of M13 IRD-labeled
primer (for details see Oetting et al., 1995).
For PCR amplification of SSR loci, 20 ng of total genomic DNA
was added to a 10 ll reaction volume of 1· Roche PCR buffer,
250 lM dNTPs, 0.2 lM forward and reverse primers, 0.05 lM M13
IRD-labeled primer and 1 U of AmpliTaq DNA polymerase (Roche;
http://www.rochecanada.com). Reactions were carried out on an
MJ Research PT-100 thermal cycler (http://www.bio-rad.com) with
the following program: 95C for 4 min, followed by 30 cycles of 95C
for 1 min, 60C for 30 sec and 72C for 1 min, then a final extension
for 4 min at 72C. AFLP reactions involved a restriction digestion, a
pre-selective amplification and a final selective amplification step
(Vos et al., 1995). A 30 ll restriction–ligation reaction was incubated
at 37C for 4 h. The reaction contained 250 ng total genomic DNA in
a reaction mix of buffer (giving final concentrations of 10 mM Tris
HCL, 10 mM MgAc, 50 mM KAC and 5 mM DTT), 12 U EcoRI or PstI
and 8 U Tru9I, 2.5 pmol EcoRI/PstI adaptor, 25 pmol Tru9I adaptor,
0.15 mM ATP and 0.25 U of T4 DNA ligase (Invitrogen; http://
www.invitrogen.com/). The restriction–ligation reactions were diluted 1:10, and 5 ll of this reaction was used in a pre-selective PCR
amplification together with 1· Roche PCR buffer, 200 lM dNTPs,
0.15 lM EcoRI/PstI/Tru9I pre-selective primers and 1 U of AmpliTaq
DNA polymerase (Roche). PCR amplification was performed in an
MJ Research PT-100 thermal cycler with the following cycles: 94C
for 1 min, followed by 30 cycles of 94C for 30 sec, 65C for 30 sec,
72C for 1 min, and a final extension for 5 min at 72C. Pre-selective
PCR product was diluted 1:40, and 5 ll were used in a 20 ll reaction
comprising 1· Roche PCR buffer, 400 lM dNTPs, 0.3 lM EcoRI/PstI/
Tru9I selective primers and 1 U AmpliTaq DNA polymerase (Roche).
Selective amplification reactions were the same as for the preselective amplification, except that M13 IRD-labeled primer was
added to a concentration of 0.05 lM and a drop-down annealing
temperature was used, starting at 65C and decreasing by 0.7C for
each cycle until a final set of 22 cycles at an annealing temperature
of 56C.
SSRs and AFLPs were analyzed on LI-COR 4300 DNA analyzers
(http://www.licor.com) with 6% polyacrylamide gels, and on an
ABI3730 capillary sequencer (Applied Biosystems; http://www.
appliedbiosystems.com). Images were processed using SagaMX
AFLP, SagaGT (LI-COR Biosciences; http://www.licor.com) and
analyzed using JoinMap (Van Ooijen and Voorrips, 2001) and
MAPMAKER to determine linkage groups (Lander et al., 1987).
Details on the genetic mapping analysis will be presented elsewhere (T. Yin et al., unpublished). MapChart 2.1 was used to draw
linkage group diagrams (Voorrips, 2002).
A number of BES-derived SSRs (122) were used to position
physical map contigs on the genetic map. In addition, the remaining
SSRs were shown to be useful in integrating other LG in silico onto
the physical map by using BLAST hits from SSR primer sequences
against the BESs. The BLAST results were screened for e values
below 1.0, and size ranges between 50 and 300 bp. Those that
passed these criteria were used to illustrate the integration of the
physical and genetic maps.
BAC end sequencing and alignment of map contigs to
sequence assembly
The BAC DNA isolated for fingerprinting was also used to generate
end sequence data for the clones. The protocol for BAC end
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
1076 Colin T. Kelleher et al.
sequencing reactions was provided by S. Zhao of the Institute for
Genomic Research (Rockville, MD, USA). The primers used were
)21 M13 forward (TGTAAAACGACGGCCAGT) and M13 reverse
The data were collected on ABI Prism 3700 DNA analyzer
sequencing instruments. The trace data were processed by the
program phred (Ewing and Green, 1998; Ewing et al., 1998), using
default parameters, and the sequence trimmed for quality and vector
sequences. Reads that contained <15 bp of sequence following
processing were removed from the data set. Average read lengths
were calculated from the quality length reported by phred for each
read. The BES traces are available from the trace archives at the
National Center for Biotechnology Information (NCBI) (Ti numbers
1439871865–1439912628, 1439111083–1439151202, or query with
‘species_code = ‘POPULUS TRICHOCARPA’’’).
Comparisons of BESs to the whole-genome shotgun assembly
scaffolds (JGI Populus trichocarpa genome assembly, version 1.0)
were performed using BLAST (Altschul et al., 1990). Those alignments satisfying the criteria of either (a) >99% identity and
e-value < 10)50, or (b) >95% identity for >95% of the read length
with an alignment length >50 bp, were used to anchor fingerprint
map contigs to the sequence assembly. Where alignments for both
end sequences of a clone were available, the paired reads were
required to have alignments with opposite orientation. Groups of
two or more clones with overlapping end sequence alignments
were used to map the contigs to the sequence. In cases where
contigs mapped to multiple sequence regions, these were filtered as
follows. The region with the most clones aligning to it was accepted.
The region with the next largest number of alignments was required
to have at least three aligned clones, the next four, and so on. In this
way, some small contig loci that passed the initial two-clone
minimum were rejected because the presence of larger loci resulted
in an increased minimum clone cut-off.
BAC insert sequencing, assembly and analysis
BAC clones T0021J18 and T0033M07 were sequenced using a random in vitro transposon insertion approach. BAC DNA was prepared as per the fingerprinting protocol described by Schein et al.
(2004), and transposon libraries were generated using the Template
Generation System I Kit (Finnzymes; http://www.finnzymes.fr),
following the manufacturer’s recommended protocol for BAC
clones, and the Kan(R) Entranceposon (Finnzymes). BACs with
inserted transposons were cultured and DNA-purified in a 96-well
format (Schein et al., 2004). Sequencing reactions were assembled
in 384-well clear optical reaction plates (Applied Biosystems; http://
www.appliedbiosystems.com/) using a Biomek FX workstation
(Beckman-Coulter; http://www.beckmancoulter.com) (Yang et al.,
2005). To each 8 ll reaction (total volume), the following were added: 5 ll of purified BAC DNA, 0.7 ll of sequencing primer (5 pmol/
ll, Invitrogen), 0.3 ll of Ultrapure water (Gibco; http://www.
invitrogen.com) and 2 ll of BigDye v.3.1 ready reaction mix
(Applied Biosystems). Sequence reads were performed on transposed BAC clones using primers SeqA2 (5¢-GAATTCTCTAGATGATCAGCGGC-3¢) and SeqB2 (5¢-CGAACTTTATTCGGTCGAAAAGG-3¢). Cycling was performed on PTC-225 thermal cyclers (MJ
Research) with parameters of 95C for 2 sec, followed by 85 cycles
of 96C for 30 sec, 56.6C for 5 sec using SeqA2 primer or 56.0C for
5 sec using SeqB2 primer and 60C for 3 min, followed by incubation at 4C. Reaction products were precipitated using 2 ll of
125 mM EDTA (pH 8) and 18 ll of 95% ethanol per well, followed by
centrifugation at 2750 g for 30 min at 4C. The supernatant was
decanted by inverting the plate and firmly shaking liquid from the
wells. Plates were left to air-dry for 15 min. Samples in each well
were then resuspended in 10 ll of Ultrapure water and analyzed
using a 3730XL DNA analyzer (Applied Biosystems).
Transposon-directed sequenced reads were base-called using
phred (Ewing and Green, 1998; Ewing et al., 1998). The base-called
reads were imported and checked for contamination against
Escherichia coli, vector and transposon sequences. BAC vector
and inserted Mu transposon sequences were removed, and the
remaining sequences were assembled together using PHRAP
(http://www.phrap.org/). After initial assemblies, CONSED (Gordon
et al., 1998) was used to view the data to check for any possible
errors and make appropriate corrections/edits. CONSED navigation
tools were utilized to check for low-quality (phred quality below 30)
consensus sequences and high-quality discrepancies (mismatches
of phred base quality of 20 and above between/among individual
reads) in the assembled reads. All the repeat regions were manually
assembled using single base pair mismatches and read pairs
information. Mononucleotide and dinucleotide runs were sorted
by making 4–5 bp overlapping joins between the read pairs. For
runs with not enough read pairs information to tilepath, the contigs
were joined together by making minimum appropriate joins. All the
finished assemblies were re-examined if any misplaced high-quality
read pairs were found and fixed accordingly. Final confirmation of
the finished assemblies was made by comparing their in silico
HindIII restriction enzyme digests to the respective experimental
restriction enzyme digests, and any deviation was manually examined and corrected
To correlate the sequence differences with the resulting fingerprint differences, each pair of BAC sequences was first in silicodigested using the HindIII restriction enzyme motif, and the
resulting fragments mapped to corresponding fragments in their
experimental fingerprints, using a size tolerance of 10 bp. The two
clone sequences were then aligned against each other and against
the corresponding region of the genome assembly (based on BES
alignments) using Dotter software (Sonnhammer and Durbin,
1995). The sources of variation in DNA sequences were identified
and checked for existence in the corresponding genome assembly
region. Potential effects of haplotype-specific indel polymorphisms
on gene order and content were investigated by comparing indels
and surrounding sequences to the poplar genome assembly and
annotation (http://genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html).
This project was supported by Genome Canada, Genome British
Columbia and the Province of British Columbia (Treenomix project)
with funds to C.J.D., B.E.E., J.B. and K.R., and by a Natural Sciences
and Engineering Research Council of Canada (NSERC) Discovery
Grant to C.J.D. We thank Scott Paper Inc. for maintenance of the
family 545 material used for the Nisqually-1 genetic map. Funding
for the Oak Ridge National Laboratory portion of this research was
provided by the US Department of Energy, Office of Science,
Biological and Environmental Research Carbon Sequestration
Program, the Basic Energy Sciences Program, and National Science
Foundation grant 0421743 to G.A.T. Oak Ridge National Laboratory
is managed by UT-Battelle, for the US Department of Energy under
contract number DE-AC05-00OR22725. M.A.M., S.J.M.J. and R.A.H.
are scholars of the Michael Smith Foundation for Health Research;
J.B. is a Steacie Fellow of the Natural Sciences and Engineering
Research Council of Canada.
Supplementary Material
The following supplementary material is available for this article
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
Physical mapping of the Populus genome 1077
Figure S1. Fingerprinted BAC clone and contig layout on the
sequence assemblies of each of 19 Populus trichocarpa linkage
groups (LG) based on BES alignments to the genome sequence
Table S1 Summary of haplotype-specific DNA polymorphisms
based on BAC sequence comparisons
This material is available as part of the online article from http://
Abbott, R.J. and Gomes, M.F. (1989) Population genetic-structure
and outcrossing rate of Arabidopsis thaliana (L) Heynh. Heredity,
62, 411–418.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J.
(1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–
Arimura, G., Huber, D.P.W. and Bohlmann, J. (2004) Forest tent
caterpillars (Malacosoma disstria) induce local and systemic
diurnal emissions of terpenoid volatiles in hybrid poplar (Populus · trichocarpadeltoides): cDNA cloning, functional characterization, and patterns of gene expression of ())-germacrene D
synthase, PtdTPS1. Plant J. 37, 603–616.
Braatne, J.H., Rood, S.B. and Heilman, P.E. (1996) Life history,
ecology, and conservation of riparian cottonwoods in North
America. In Biology of Populus and its Implications for Management and Conservation (Stettler, R.F., Bradshaw, Jr, H.D., Heilman, P.E. and Hinckley, T.M., eds). Ottawa, Canada: NRC
Research Press, pp. 57–80.
Brunner, S., Fengler, K., Morgante, M., Tingey, S. and Rafalski, A.
(2005) Evolution of DNA sequence nonhomologies among maize
inbreds. Plant Cell, 17, 343–360.
Buckler, E.S., Thornsberry, J.M. and Kresovich, S. (2001) Molecular
diversity, structure and domestication of grasses. Genet. Res. 77,
Burns, R.M. and Honkala, B.H. (1990) Silvics of North America:
1. Conifers; 2. Hardwoods. Agriculture Handbook 654.
Washington DC: US Department of Agriculture, Forest
Bustamante, C.D., Nielsen, R., Sawyer, S.A., Olsen, K.M.,
Purugganan, M.D. and Hartl, D.L. (2002) The cost of inbreeding in
Arabidopsis. Nature, 416, 531–534.
Cervera, M., Storme, V., Ivens, B., Gusmao, J., Liu, B., Hostyn, V.,
Van Slycken, J., Van Montagu, M. and Boerjan, W. (2001) Dense
genetic linkage maps of three Populus species (Populus deltoides, P. nigra and P. trichocarpa) based on AFLP and microsatellite markers. Genetics, 158, 787–809.
Chen, M.S., Presting, G., Barbazuk, W.B. et al. (2002) An integrated
physical and genetic map of the rice genome. Plant Cell, 14, 537–
Cronk, Q.C.B. (2005) Plant eco-devo: the potential of poplar as a
model organism. New Phytol. 166, 39–48.
Eckenwalder, J.E. (1996) Systematics and evolution of Populus. In
Biology of Populus and its Implications for Management and
Conservation (Stettler, R.F., Bradshaw, Jr, H.D., Heilman, P.E. and
Hinckley, T.M., eds). Ottawa, Canada: NRC Research Press), pp. 7–
Ewing, B. and Green, P. (1998) Base-calling of automated sequencer
traces using phred. II. Error probabilities. Genome Res. 8, 186–
Ewing, B., Hillier, L., Wendl, M.C. and Green, P. (1998) Base-calling
of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185.
Fang, Z., Cone, K., Sanchez-Villeda, H. et al. (2003) iMap: a database-driven utility to integrate and access the genetic and physical maps of maize. Bioinformatics, 19, 2105–2111.
Farmer, J. R. E. (1996) The genecology of Populus. In Biology of
Populus and its Implications for Management and Conservation
(Stettler, R.F., Bradshaw, Jr, H.D., Heilman, P.E. and Hinckley,
T.M., eds). Ottawa, Canada: NRC Research Press), pp. 33–50.
Fjell, C.D., Bosdet, I., Schein, J.E., Jones, S.J.M. and Marra, M.A.
(2003) Internet Contig Explorer (iCE) – a tool for visualizing clone
fingerprint maps. Genome Res. 13, 1244–1249.
Flibotte, S., Chiu, R., Fjell, C., Krzywinski, M., Schein, J.E., Shin, H.
and Marra, M.A. (2004) Automated ordering of fingerprinted
clones. Bioinformatics, 20, 1264–1271.
Frewen, B.E., Chen, T.H.H., Howe, G.T., Davis, J., Rohde, A.,
Boerjan, W. and Bradshaw, H.D. (2000) Quantitative trait loci and
candidate gene mapping of bud set and bud flush in Populus.
Genetics, 154, 837–845.
Fuhrmann, D.R., Krzywinski, M.I., Chiu, R. et al. (2003) Software for
automated analysis of DNA fingerprinting gels. Genome Res. 13,
Gordon, D., Abajian, C. and Green, P. (1998) Consed: a graphical tool
for sequence finishing. Genome Res. 8, 195–202.
Gregory, S.G., Sekhon, M., Schein, J. et al. (2002) A physical map of
the mouse genome. Nature, 418, 743–750.
Krzywinski, M., Wallis, J., Gosele, C. et al. (2004) Integrated and
sequence-ordered BAC and YAC-based physical maps for the rat
genome. Genome Res. 14, 766–779.
Lander, E., Green, P., Abrahamson, J., Barlow, A., Daly, M., Lincoln,
S. and Newburg, L. (1987) MAPMAKER: an interactive computer
package for constructing primary genetic linkage maps of
experimental and natural populations. Genomics, 1, 174–181.
Marra, M.A., Kucaba, T.A., Dietrich, N.L., Green, E.D., Brownstein,
B., Wilson, R.K., McDonald, K.M., Hillier, L.W., McPherson, J.D.
and Waterston, R.H. (1997) High throughput fingerprint analysis
of large-insert clones. Genome Res. 7, 1072–1084.
McPherson, J.D., Marra, M., Hillier, L. et al. (2001) A physical map of
the human genome. Nature, 409, 934–941.
Mozo, T., Dewar, K., Dunn, P. et al. (1999) A complete BAC-based
physical map of the Arabidopsis thaliana genome. Nat. Genet. 22,
Nelson, W.M., Bharti, A.K., Butler, E., Wei, F., Fuks, G., Kim, H.,
Wing, R.A., Messing, J. and Soderlund, C. (2005) Whole-genome
validation of high-information-content fingerprinting. Plant
Physiol. 139, 27–38.
Oetting, W.S., Lee, H.K., Flanders, D.J., Wiesner, G.L., Sellers, T.A.
and King, R.A. (1995) Linkage analysis with multiplexed short
tandem repeat polymorphisms using infrared fluorescence and
M13 tailed primers. Genomics, 30, 450–458.
Plomion, C., Leprovost, G. and Stokes, A. (2001) Wood formation in
trees. Plant Physiol. 127, 1513–1523.
Ralph, S., Oddy, C., Cooper, D. et al. (2006) Genomics of hybrid
poplar (Populus trichocarpa · deltoides) interacting with forest
tent caterpillars (Malacosoma disstria): normalized and fulllength cDNA libraries, expressed sequence tags, and a cDNA
microarray for the study of insect-induced defences in poplar.
Mol. Ecol. 15, 1275–1297.
Rieseberg, L.H., Whitton, J. and Gardner, K. (1999) Hybrid zones and
the genetic architecture of a barrier to gene flow between two
sunflower species. Genetics, 152, 713–727.
Schein, J., Kucaba, T., Sekhon, M., Smailus, D., Waterston, R. and
Marra, M. (2004) High-throughput BAC fingerprinting. In Bacterial
Artificial Chromosomes. Volume 1: Library Construction, Physical
Mapping, and Sequencing (Zhao, S. and Stodolsky, M., eds).
Humana Press, Totawa, NJ, pp. 143–156.
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078
1078 Colin T. Kelleher et al.
van der Schoot, J., Pospiskova, M., Vosman, B. and Smulders, M.
(2000) Development and characterization of microsatellite markers in black poplar (Populus nigra L.). Theor. Appl. Genet. 101,
Schrader, J., Nilsson, J., Mellerowicz, E., Berglund, A., Nilsson, P.,
Hertzberg, M. and Sandberg, G. (2004) A high-resolution transcript profile across the wood-forming meristem of poplar identifies potential regulators of cambial stem cell identity. Plant Cell,
16, 2278–2292.
Soderlund, C., Longden, I. and Mott, R. (1997) FPC: a system for
building contigs from restriction fingerprinted clones. Comput.
Appl. Biosci. 13, 523–535.
Soderlund, C., Humphray, S., Dunham, A. and French, L. (2000)
Contigs built with fingerprints, markers, and FPCV4.7. Genome
Res. 10, 1772–1787.
Song, R. and Messing, J. (2003) Gene expression of a gene family in
maize based on noncollinear haplotypes. Proc. Natl Acad. Sci.
USA, 100, 9055–9060.
Sonnhammer, E.L.L. and Durbin, R. (1995) A dot-matrix program
with dynamic threshold control suited for genomic DNA and
protein sequence analysis. Gene, 167, GC1–10.
Sterck, L., Rombauts, S., Jansson, S., Sterky, F., Rouze, P. and Van
de Peer, Y. (2005) EST data suggest that poplar is an ancient
polyploid. New Phytol. 167, 165–170.
Sterky, F., Regan, S., Karlsson, J. et al. (1998) Gene discovery in the
wood-forming tissues of poplar: analysis of 5,692 expressed
sequence tags. Proc. Natl Acad. Sci. USA, 95, 13330–13335.
Sterky, F., Bhalerao, R.R., Unneberg, P. et al. (2004) A Populus EST
resource for plant functional genomics. Proc. Natl Acad. Sci. USA,
101, 13951–13956.
Stirling, B., Newcombe, G., Vrebalov, J., Bosdet, I. and Bradshaw,
H.D. (2001) Suppressed recombination around the MXC3 locus, a
major gene for resistance to poplar leaf rust. Theor. Appl. Genet.
103, 1129–1137.
Strauss, S.H. and Martin, F.M. (2004) Poplar genomics comes of
age. New Phytol. 164, 1–4.
Sulston, J., Mallett, F., Staden, R., Durbin, R., Horsnell, T. and
Coulson, A. (1988) Software for genome mapping by fingerprinting techniques. Comput. Appl. Biosci. 4, 125–132.
Tagu, D., Bastien, C., Faivre-Rampant, P., Garbaye, J., Vion, P.,
Villar, M. and Martin, F. (2005) Genetic analysis of phenotypic
variation for ectomycorrhiza formation in an interspecific F1
poplar full-sib family. Mycorrhiza, 15, 87–91.
Taylor, G. (2002) Populus: Arabidopsis for forestry. Do we need a
model tree? Ann. Bot. 90, 681–689.
Tuskan, G.A., DiFazio, S.P. and Teichmann, T. (2004a) Poplar
genomics is getting popular: the impact of the poplar genome
project on tree research. Plant Biol. 6, 2–4.
Tuskan, G.A., Gunter, L.E., Yang, Z.M.K., Yin, T.M., Sewell, M.M.
and DiFazio, S.P. (2004b) Characterization of microsatellites revealed by genomic sequencing of Populus trichocarpa. Can. J.
For. Res. 34, 85–93.
Tuskan, G.A., Difazio, S., Jansson, S. et al. (2006) The genome of
black cottonwood, Populus trichocarpa (Torr. & Gray). Science,
313, 1596–1604.
Van Ooijen, J. and Voorrips, R. (2001) JoinMap 3.0, Software for
the Calculation of Genetic Linkage Maps. Wageningen, The
Netherlands: Plant Research International.
Voorrips, R.E. (2002) MapChart: software for the graphical presentation of linkage maps and QTLs. J. Hered. 93, 77–78.
Vos, P., Hogers, R., Bleeker, M. et al. (1995) AFLP – a new technique
for DNA-fingerprinting. Nucleic Acids Res. 23, 4407–4414.
Wallis, J.W., Aerts, J., Groenen, M.A.M. et al. (2004) A physical map
of the chicken genome. Nature, 432, 761–764.
Wang, Q. and Dooner, H.K. (2006) Remarkable variation in maize
genome structure inferred from haplotype diversity at the bz locus. Proc. Natl Acad. Sci. USA, 103, 17644–17649.
Wang, R.-L., Stec, A., Hey, J., Lukens, L. and Doebley, J. (1999) The
limits of selection during maize domestication. Nature, 398, 236–
Wu, C.C., Sun, S.K., Nimmakayala, P., Santos, F.A., Meksem, K.,
Springman, R., Ding, K., Lightfoot, D.A. and Zhang, H.B. (2004) A
BAC and BIBAC-based physical map of the soybean genome.
Genome Res. 14, 319–326.
Xu, Z.Y., Sun, S.K., Covaleda, L., Ding, K., Zhang, A.M., Wu, C.C.,
Scheuring, C. and Zhang, H.B. (2004) Genome physical mapping
with large-insert bacterial clones by fingerprint analysis: methodologies, source clone genome coverage, and contig map
quality. Genomics, 84, 941–951.
Yang, G.S., Stott, J.M., Smailus, D., Barber, S.A., Balasundaram, M.,
Marra, M.A. and Holt, R.A. (2005) High-throughput sequencing: a
failure mode analysis. BMC Genomics, 6, 2.
Yin, T.M., DiFazio, S.P., Gunter, L.E., Riemenschneider, D. and
Tuskan, G.A. (2004) Large-scale heterospecific segregation distortion in Populus revealed by a dense genetic map. Theor. Appl.
Genet. 109, 451–463.
Zhu, Y.L., Song, Q.J., Hyten, D.L., Van Tassell, C.P., Matukumalli,
L.K., Grimm, D.R., Hyatt, S.M., Fickus, E.W., Young, N.D. and
Cregan, P.B. (2003) Single-nucleotide polymorphisms in soybean.
Genetics, 163, 1123–1134.
ª 2007 The Authors
Journal compilation ª 2007 Blackwell Publishing Ltd, The Plant Journal, (2007), 50, 1063–1078