Insights into the evolution of hydroxyproline rich

Plant Physiology Preview. Published on April 26, 2017, as DOI:10.1104/pp.17.00295
1
Short Title: Evolution of the plant HRGP superfamily
2
3
Corresponding Author
4
Carolyn J. Schultz, School of Agriculture, Food and Wine, The University of Adelaide, Waite
5
Research Institute, PMB1 Glen Osmond, SA, 5064, Australia. Ph: 61 8 448 909 881. Email address:
6
[email protected]
7
8
Insights into the evolution of hydroxyproline rich glycoproteins from 1000 plant transcriptomes
9
10
Kim L. Johnson1*, Andrew M. Cassin1*, Andrew Lonsdale1, Gane Ka-Shu Wong2, Douglas E. Soltis3,
11
Nicholas W. Miles3, Michael Melkonian4, Barbara Surek4, Michael K. Deyholos5, James Leebens-
12
Mack6, Carl J. Rothfels7, Dennis W. Stevenson8, Sean W. Graham9, Xumin Wang10, Shuangxiu Wu10,
13
J. Chris Pires11, Patrick P. Edger12, Eric J. Carpenter2, Antony Bacic1, Monika S. Doblin1^ and Carolyn
14
J. Schultz13^
15
1
16
Parkville, Vic, 3010, Australia
17
2
18
Canada, and BGI-Shenzhen, Bei Shan Industrial Zone, Yantian District, Shenzhen, China
19
3
20
FL32611, USA
21
4
Botanical Institute, Cologne Biocenter, University of Cologne, D50674, Germany
22
5
Department of Biology, University of British Columbia, Kelowna BC, V1V 1V7, Canada
23
6
Department of Plant Biology, University of Georgie, Athens, GA3062, USA
ARC Centre of Excellence in Plant Cell Walls, School of BioSciences, The University of Melbourne,
Departments of Biological Sciences and Medicine, University of Alberta, Edmonton, Alberta,
Florida Museum of Natural History, Department of Biology, University of Florida, Gainsville,
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Copyright 2017 by the American Society of Plant Biologists
1
24
7
25
CA 94720, USA
26
8
New York Botanical Garden, 2900 Southern Blvd, Bronx, NY 10458, USA
27
9
Department of Botany, University of British Columbia, Vancouver BC, V6T1Z4, Canada
28
10
29
Academy of Sciences, Beijing 100101, China
30
11
31
MO 65211, USA
32
12
Department of Horticulture, Michigan State University, East Lansing, MI 48823, USA
33
13
School of Agriculture, Food and Wine, The University of Adelaide, Waite Research Institute, PMB1
34
Glen Osmond, SA, 5064, Australia
35
* co-first contributors, ^ co-senior authors.
University Herbarium and Department of Integrative Biology, University of California, Berkeley,
CAS Key Laboratory of Genome Science and Information, Beijing Institute of Genomics, Chinese
Division of Biological Sciences and Bond Life Sciences Center, University of Missouri, Columbia,
36
37
One-sentence summary
38
Evaluation of hydroxyproline-rich glycoproteins (HRGPs) identified in 1KP transcriptomic datasets to
39
gain insight into their evolutionary history from chlorophyte algae to flowering plants.
40
41
List of Author Contributions
42
D.E.S., N.W.M., M.M., B.S., M.K.D., J.L.M., C.J.R., D.W.S., S.W.G., J.C.P., P.P.E., X.W. and S.W.
43
selected species for analysis and collected plant and algal samples and M.M., B.S., M.K.D., J.C.P.,
44
C.J.R., D.W.S., S.W., extracted RNA for sequencing. J.L.M. organized transcripts into gene families
45
and S.W. screened RNA-seq data for contamination. E.J.C. and G.K.S.W. generated the sequence
46
data, E.J.C. setup and maintained 1KP web-resources used to communicate data and G.K.S.W.
47
oversaw the 1KP project. A.M.C. generated the HRGP website and K.L.J., A.M.C., M.S.D., C.J.S.
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
2
48
and A.B. designed and oversaw the research. A.L. provided bioinformatics support and assisted in
49
maintenance of the HRGP website. K.J.L., A.M.C., M.S.D. and C.J.S. analyzed the data and together
50
with A.B. wrote the manuscript.
51
52
Funding Information
53
The authors wish to acknowledge the 1000 plants (1KP) initiative, which is led by GKSW and funded
54
by the Alberta Ministry of Enterprise and Advanced Education, Alberta Innovates Technology Futures
55
(AITF), Innovates Centre of Research Excellence (iCORE), and Musea Ventures. KLJ, AMC, AL,
56
MSD & AB acknowledge the support of a grant from the Australia Research Council to the ARC
57
Centre of Excellence in Plant Cell Walls (CE1101007) that is partly funded by the Australian
58
Government’s National Collaborative Research Infrastructure Program (NCRIS) administered through
59
Bioplatforms Australia (BPA) Ltd.
60
Computation Initiative (VLSCI) grant number VR0191 on its Peak Computing Facility at the
61
University of Melbourne, an initiative of the Victorian Government, Australia.
This research was supported by a Victorian Life Sciences
62
63
Corresponding Author
64
Email address: [email protected]
65
66
Keywords
67
Hydroxyproline-rich glycoproteins (HRGPs), arabinogalactan-proteins (AGPs), extensins (EXTs),
68
proline-rich proteins (PRPs), transcriptomics, plant evolution, 1000 plants (1KP),
69
glycosylphosphatidylinositol (GPI)-anchors, intrinsically disordered proteins (IDP)
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
3
70
Abstract
71
The carbohydrate-rich cell walls of land plants and algae have been the focus of much interest given
72
the value of cell wall based products to our current and future economies. Hydroxyproline-rich
73
glycoproteins (HRGPs), a major group of wall glycoproteins, play important roles in plant growth and
74
development, yet little is known about how they have evolved in parallel with the polysaccharide
75
components of walls. We investigate the origins and evolution of the HRGP superfamily, which is
76
commonly divided into three major multigene families: the arabinogalactan-proteins (AGPs),
77
extensins (EXTs) and proline-rich proteins (PRPs).
78
Using MAAB, a newly developed bioinformatics pipeline, we identified HRGPs in sequences from
79
the 1000 plants (1KP) transcriptome project (www.onekp.com). Our analyses provide new insights
80
into the evolution of HRGPs across major evolutionary milestones, including the transition to land and
81
early radiation of angiosperms. Significantly, data mining reveals the origin of GPI-anchored AGPs in
82
green algae and a 3-4 fold increase in GPI-AGPs in liverworts and mosses. The first detection of
83
cross-linking (CL)-EXTs is observed in bryophytes, which suggest that CL-EXTs arose though the
84
juxtaposition of pre-existing SPn EXT glycomotifs with refined Y-based motifs. We also detected the
85
loss of CL-EXT in a few lineages, including the grass family (Poaceae), that have a cell wall
86
composition distinct from other monocots and eudicots. A key challenge in HRGP research is tracking
87
individual HRGPs throughout evolution. Using the 1KP output we were able to find putative orthologs
88
of Arabidopsis pollen-specific GPI-AGPs in basal eudicots.
89
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
4
90
Introduction
91
Cell walls of plants and algae are widely used for food, textiles, paper and timber, yet our
92
understanding of their assembly and dynamic re-modelling in response to growth, development and
93
environmental stresses (abiotic and biotic) remains rudimentary (Doblin et al., 2010; Doblin et al.,
94
2014). There are two contrasting types of walls/extracellular matrices in the green plant lineage,
95
protein-rich walls of some green algae and the cellulose-rich walls of embryophytes (land plants).
96
Recent studies of cell wall evolution suggest that the origins of many wall components occurred in the
97
streptophyte green algae prior to the evolution of embryophytes (Sørensen et al., 2010; Popper et al.,
98
2011; Domozych et al., 2012) with elaboration of a pre-existing set of polysaccharides rather than an
99
entirely new polymer framework. Although both the ultrastructure of plant walls and the fine structure
100
of their component polymers varies widely, they all typically comprise a fibrillar network of cellulose
101
microfibrils that is embedded within a gel-like matrix of non-cellulosic polysaccharides, pectins and
102
(glyco)proteins; the latter being both structural and enzymatic (Somerville et al., 2004; Doblin et al.,
103
2010).
104
105
Research to date has largely focused on the evolution of the polysaccharides, cellulose and the non-
106
cellulosic polysaccharides (hemicelluloses) and pectins, primarily through the availability of
107
polysaccharide epitope-specific antibodies coupled with immuno-fluorescence and/or transmission
108
electron microscopy and arraying techniques such as comprehensive microarray polymer profiling,
109
(CoMPP), and plate-based arrays (Moller et al., 2007; Pattathil et al., 2010; Moore et al., 2014). In
110
contrast, relatively little is known about the origin and evolution of the hydroxyproline-rich
111
glycoproteins (HRGPs), a major group of wall glycoproteins, despite their widespread occurrence
112
(Supplemental Table S1) and importance in plant growth and development (reviewed in Fincher et
113
al., 1983; Kieliszewski and Lamport, 1994; Majewska-Sawka and Nothnagel, 2000; Ellis et al., 2010;
114
Lamport et al., 2011; Draeger et al., 2015; Velasquez et al., 2015; Showalter and Basu, 2016).
115
Examination of these glycoproteins in a wider range of plant species should provide valuable insights
116
into how they have evolved in parallel with the polysaccharide components in plant walls.
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
5
117
118
HRGPs are commonly divided into three major multigene families, the highly glycosylated
119
arabinogalactan-proteins (AGPs), the moderately glycosylated extensins (EXTs) and minimally
120
glycosylated proline-rich proteins (PRPs). The protein backbones of these HRGPs are distinguished
121
by differences in amino acid bias and the repeat motifs characteristic to each family, and include the
122
post-translational modification of proline (Pro, P) to hydroxyproline (Hyp, O) (see Johnson et al.,
123
companion paper). Each of these families is further elaborated by chimeric and hybrid HRGPs,
124
complicating not only the classification of members but also elucidation of their functions.
125
126
Classical AGPs consist of up to 99% carbohydrate due to the addition of large type II arabino-3,6-
127
galactan (AG) polysaccharides (degree of polymerization, DP, 30-120) on to the minor (1-10%)
128
protein backbone. The complexity of this family is immense and lies both in the diversity of protein
129
backbones containing AGP glycomotifs and the incredible heterogeneity of glycan structures and
130
sugars decorating these proteins. The specific binding of AGPs to the dye β-glucosyl Yariv reagent
131
(β-Glc Yariv) and the generation of AGP-glycan specific antibodies have provided valuable insight
132
into their function, structure and evolution (Supplemental Table S1) (Jermyn and Yeow, 1975; van
133
Holst and Clarke, 1985; Kitazawa et al., 2013; Paulsen et al., 2014). These studies suggest AGPs are
134
evolutionarily ancient. O-glycosylation occurs in bryophytes, some chlorophytes and brown algae as
135
shown by binding of β-Glc Yariv and/or isolation of Hyp-containing glycoproteins (Godl et al., 1997;
136
Ligrone et al., 2002; Lee et al., 2005; Shibaya and Sugawara, 2009; Herve et al., 2016) (Johnson et al.,
137
companion paper). Classical AGPs are likely to exist in the chlorophyte algae as one of the first-
138
cloned HRGP genes (NCBI AAB23258.2) from Chlamydomonas reinhardtii, CrZSP1 (Woessner and
139
Goodenough, 1989), is predicted to be a GPI-AGP according to our criteria (Johnson et al.,
140
companion paper). It is likely that AGPs in the chlorophyte algae have distinct functions from
141
streptophytes, as no large AG polysaccharides have been reported (Ferris et al., 2001). Short,
142
branched (Ferris et al., 2001) and linear Ara- and galactose (Gal)-containing oligosaccharides (Bollig
143
et al., 2007) have been isolated from C. reinhardtii walls.
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
6
144
145
The highly glycosylated AGPs are soluble and have been proposed to act as mobile signals,
146
potentially involved in cell-cell communication (Schultz et al., 1998; Motose et al., 2004; Pereira et
147
al., 2014). Linking of AGPs to the plasma membrane by a GPI-anchor is significant, as it points
148
towards their increased organization within the cell wall-plasma membrane interface. The finding of
149
GPI-AGPs in lipid ‘rafts/nano-domains’ in eudicots is intriguing as these have been proposed to
150
stabilize and increase the activity of proteins and protein complexes at the plasma membrane involved
151
in cell-cell communication, signal transduction, immune responses, and transport (Borner et al., 2005;
152
Grennan, 2007; Tapken and Murphy, 2015). It is tempting to speculate on a role for GPI-AGPs in
153
similar processes in all species in which they occur, although experimental evidence is largely
154
lacking.
155
156
AGPs have been most studied in embryophytes and have been isolated from a wide range of tissues,
157
including seeds, roots, stems, leaves, inflorescences, and are particularly abundant in the stems of
158
gymnosperms and angiosperms (Gaspar et al., 2001). The co-occurrence of specific glycan epitopes
159
with developmental stages, the expression pattern of AGP genes, and mutant studies has implicated
160
AGPs in many processes. These include roles in plant growth and development, cell expansion and
161
division, hormone signaling, embryogenesis of somatic cells, differentiation of xylem and responses
162
to abiotic stress. The heterogeneity of the AGP family and vast number of family members suggests
163
they are multifunctional, similar to what is found in mammalian proteoglycan/glycoproteins and
164
intrinsically disordered proteins (IDPs) (Filmus et al., 2008; Schaefer and Schaefer, 2010; Tan et al.,
165
2012). Given that diverse members could have the same glycosylation and conversely that the same
166
backbone could have alternate glycosylated forms, determining the function of any single AGP and
167
the establishment of precise AGP function is a major challenge. Despite the magnitude of this task,
168
enormous progress has been made. For example, two well characterized AGPs are the pollen-specific
169
AtAGP6 and AtAGP11 that play important roles in pollen development and female tissue
170
interactions, and are suggested to be involved in multiple, integrated signaling pathways (Nguema-
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
7
171
Ona et al., 2012). Investigation of when these AGPs evolved would aid us to ascertain their specific
172
functions in pollen development.
173
174
The EXTs are an equally diverse family of HRGPs whose members contain repetitive SO3-5 motifs
175
that direct moderate glycosylation with Ara-oligosaccharides (arabinosides) (DP of 1 to 4). Classical
176
EXTs play important structural roles through cross-linking into the wall, facilitated by Tyr (Y)-based
177
motifs such as VXY and YXY (Tierney and Varner, 1987; Lamport et al., 2011). This cross-linking
178
into the wall can have essential roles in plant development, for example the requirement for
179
Arabdiopsis AtEXT3 during embryogenesis for cell plate formation (Cannon et al., 2008) and
180
AtEXT18 in pollen cell wall integrity (Choudhary et al., 2015). Cross-linking of EXTs into the wall is
181
proposed to result in the cessation of cell expansion, and yet some EXTs are important for cell
182
elongation in root hairs (Velasquez et al., 2011; Velasquez et al., 2015).
183
glycosylation on EXT-like motifs in the moss Physcomitrella patens has recently been shown through
184
mutants in Hyp O-arabinosyltransferases (HPATs) (MacAlister et al., 2016). Reduced levels of cell
185
wall-associated Hyp-arabinosides (as found in EXTs) in P. patens resulted in increased elongation of
186
protonemal tip cells. Therefore, different classes of EXTs have evolved different biological roles and
187
these functions are governed by both their protein backbone and glycosylation sequences and patterns.
The importance of
188
189
The first EXTs were isolated from tomato (Lamport, 1977) and proteins with blocks of SO3-5 motifs
190
have been extracted from other angiosperms such as tobacco, carrot, sugar beet and alfalfa (Chen and
191
and Varner, 1985; Smith et al., 1986; Li et al., 1990) as well as other green plant lineages, including
192
gymnosperms (Fong et al., 1992), the likely sister group to angiosperms; and chlorophycean alga
193
(Woessner and Goodenough, 1989), representing the sister to the remaining green plants. EXTs from
194
these other green plant lineages can have distinct features from embryophyte EXTs, and we use the
195
term cross-linking (CL)-EXTs to distinguish EXTs with alternating SOn and Y-based motifs, which
196
predominantly occur in land plants, from the HRGPs that have Y residues in a different domain from
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
8
197
the SOn-rich domains, as occurs in volvocine algae (see Fig. 1, Johnson et al., companion paper).
198
Exceptions may exist, for example no ‘classical’ EXTs were detected in the Hyp-poor cell walls of
199
maize, which also do not have detectable isodityrosine (IDT), an intramolecular cross-link that occurs
200
between Y-residues in EXTs (Kieliszewski et al., 1990; Kieliszewski and Lamport, 1994).
201
202
The PRPs are the most difficult family of HRGPs to define, as very few have been studied to date.
203
Investigation of PRPs has largely been restricted to legumes and Arabidopsis where they have been
204
shown to be minimally glycosylated, if at all (Averyhart-Fullard et al., 1988; Datta et al., 1989; Kleis-
205
San Francisco and Tierney, 1990; Lindstrom and Vodkin, 1991; Fowler et al., 1999). The majority of
206
PRPs are chimeric as they contain an Ole PFAM domain, and they have been implicated in roles in
207
root development (Bernhardt and Tierney, 2000), stress responses (Li et al., 2016), fiber development
208
(Xu et al., 2013) and nodule formation (Sherrier et al., 2005). Despite these important functions,
209
information on the origins and evolution of PRPs remains fundamentally unexplored.
210
211
Collectively, the functional characterization of HRGPs points towards their involvement in many
212
processes within the wall, including signaling and cell wall integrity pathways (Costa et al., 2013;
213
Pereira et al., 2015; Pereira et al., 2016; Voxeur and Hofte, 2016). As plants evolved multicellularity
214
and transitioned to land it is likely that greater complexity in signaling networks was required, a role
215
that could seemingly be fulfilled by HRGPs due to their distinct protein and glycan characteristics. It
216
is currently unknown if the occurrence of specific HRGPs is associated with evolutionary milestones
217
and specialization of plant cell walls. Greater insight into the origins, evolution and diversity of
218
HRGPs will enable researchers to address these key issues.
219
220
To study HRGP function throughout plant evolution, we developed a robust method for their detection
221
that exploits the amino acid bias of the predicted protein backbone and characteristic glycomotifs (see
222
Johnson et al., companion paper). The motif and amino acid bias (MAAB) pipeline classifies HRGP
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
9
223
sequences into one of 24 pre-defined descriptive sub-classes (23 HRGP classes and one MAAB class)
224
and is optimized for recovery of GPI-AGPs, non-GPI-AGPs and cross-linking (CL)-EXTs. Here, we
225
use this automated bioinformatics pipeline to identify and classify HRGPs within the extensive 1000
226
plant (1KP) transcriptome data.
227
228
The 1KP project (www.onekp.com) has generated large-scale transcriptomic data for over 1000 plant
229
and algal species, thereby providing deeper sampling of key plant taxa beyond those with sequenced
230
genomes (Johnson et al., 2012; Matasci et al., 2014; Wickett et al., 2014). Species selected include
231
mostly non-model land plants, many of which are important for medicine, agriculture, forestry and
232
biodiversity/conservation, as well as red algae (Rhodophyta), green algae (chlorophytes and algal
233
streptophytes) and taxa from a small group of freshwater unicellular algae known as glaucophytes -
234
collectively termed the Archaeplastida (Plantae). Analysis of the 1KP data is already helping resolve
235
long-standing questions, such as from which streptophyte green algal lineage the progenitors of land
236
plants likely arose (Wickett et al., 2014).
237
unprecedented opportunity to investigate HRGP superfamily evolution and determine if changes
238
coincide with major evolutionary transitions in the Archaeplastida.
The 1KP transcriptome dataset therefore offers an
239
240
A tandem repeat annotation library (TRAL) is also used to identify repeats in MAAB sequences and
241
support findings of selective loss and/or gain of CL-EXTs in bryophytes and commelinid monocots.
242
A combination of BLAST and profile hidden Markov models was used to search for putative
243
orthologs of pollen-specific AtAGP6/11 in angiosperms (flowering plants) with hits validated by
244
phylogenetic analysis. These vignettes demonstrate the broad utility of the bioinformatics pipeline on
245
the 1KP data, providing evolutionary insights into the complexity of the HRGP superfamily.
246
247
Results and Discussion
248
MAAB pipeline for GPI-AGPs and CL-EXTs
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
10
249
Using the newly developed bioinformatics tool, the motif and amino acid bias (MAAB) pipeline (see
250
Johnson et al., companion paper), we evaluated the 1KP dataset for HRGPs. We set parameters in
251
MAAB to best detect the classical, non-chimeric GPI-AGPs, CL-EXTs, PRPs and non-GPI-AGPs
252
(classes 1-4, respectively) as well as the hybrid HRGPs (classes 5-23) using features of the well
253
characterized Arabidopsis sequences. MAAB was validated using 15 proteomes from Phytozome and
254
shown to reliably detect and accurately classify HRGPs into 23 descriptive sub-classes (see Johnson
255
et al., companion paper). MAAB represents a significant advance of previous methods to detect
256
HRGPs in being able to identify all classes in one pipeline, without manual curation.
257
258
MAAB did not identify any false positives in Phytozome datasets and evaluation of a subset of the
259
1KP datasets also suggests a very low rate of false positives (see Johnson et al., companion paper). In
260
Ranunculales, a well-sampled basal eudicot order (39 samples derived from 20 species), we analyzed
261
143 non-redundant GPI-AGP sequences (five Ranunculaceae and six Papaveraceae datasets;
262
Supplemental Fig. S1A) and observed 98.6% true positives, with only two sequences not being clear
263
GPI-AGPs (class 1) and four other sequences flagged as likely contaminants based on their
264
appearance in multiple unrelated datasets (see legend, Supplemental Fig. S1). The rate of false
265
positives for CL-EXTs (class 2) was estimated based on analysis of 45 non-redundant bryophyte
266
sequences (Supplemental Fig. S2) derived from the larger set of 67 identified within the 1KP data
267
(Fig. 1). Shading of motifs reveals that 42/45 sequences (93.3%) have the expected CL-EXT motifs
268
throughout the entire backbone. The three remaining sequences could be considered CL-EXTs, or
269
hybrid CL-EXTs-AGPs or AGPs as they have more AGP motifs than SPn motifs, but both motifs
270
types are present (Supplemental Fig. S2). In either case, they are definitive HRGPs suggesting the
271
true positive rate is 100% (45/45), although we cannot rule out that the sequences are partial chimeric
272
HRGPs until full length sequences become available.
273
274
Using the MAAB pipeline to identify and classify HRGPs in transcriptomic datasets spanning
275
large evolutionary timescales
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
11
276
Across the entire 1KP data, 45304 HRGPs (class 1 to 23) were identified (Fig. 1) using a multiple k-
277
mer approach (see Johnson et al., companion paper). The majority of sequences detected were in
278
HRGP classes 1 to 4 (38051) and MAAB class 24 (39933), with a grand total of 85237 sequences
279
across all 24 MAAB classes. We compared HRGPs identified for each of the 1KP taxonomic groups
280
(hereafter
281
http://www.onekp.com/samples/list.php), by calculating the percentage of sequences in the classical
282
(class 1-4) and minor HRGP classes (class 5-23) and the final MAAB class (class 24) out of the total
283
sequences classified by MAAB (Fig. 2A). The majority of 1KP groups had a similar proportion of
284
sequences in classes 1-4 (avg. ~50%), 5-23 (avg. ~10%) and 24 (avg. ~40%). However, the algal
285
groups including those well represented in the 1KP dataset (green algae, comprising chlorophytes and
286
algal streptophytes, and red algae and Chromista) exhibited a trend towards lower percentages of
287
HRGP sequences (classes 1-4 and 5-23) and a higher percentage of sequences in MAAB class 24 (Fig.
288
2A). The relatively low number of GPI-AGPs (class 1) and higher number of class 4 sequences most
289
likely reflects underlying functional differences in algal AGPs. The high number of class 24
290
sequences, likely either non-HRGPs or unknown HRGPs, suggests that algae have a more diverse
291
array of IDPs than land plants. The possibility that HRGP transcripts have not been detected within k-
292
mer assemblies is also possible and could be due to sequence quality and/or sampling issues (lower
293
than optimal number of datasets and tissue/s, see below).
1KP
groups,
which
includes
clades
and
grades,
see
294
295
As a measure of the depth of taxon sampling, the number of taxonomic orders represented and the
296
number of samples within each 1KP group is reported in Fig. 2B. The green alga group are the best
297
represented of the algae with 34 orders and 152 samples included in our analyses and hence provide
298
the most comprehensive dataset for comparison. The average number of sequences for all MAAB
299
classes (1-24) is summarized by 1KP group (Fig. 2C). As each group represents a variable number of
300
species, and not all species in a group have the same HRGPs, we have also summarized the detection
301
rate as a heat map by 1KP group (Fig. 2C) and by taxonomic order (Supplemental Table S2). This
302
allowed us to assess whether detection of a particular HRGP class is either real or spurious. If ≥50% of
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
12
303
species in a given taxonomic order have ≥1 sequence for a given HRGP class, we report a >50%
304
detection rate, and assume this HRGP class is widespread throughout the taxonomic order. The few
305
class 2 (CL-EXTs) sequences identified in green algae and Chromista are likely sporadic contaminants
306
based on the low mean number of CL-EXTs (0.1 for green algae, 0.2 for Chromista) (Fig. 2C), low
307
detection rates (Fig. 2C; Supplemental Table S2) and BLASTp analysis revealing high sequence
308
similarity to vascular plant CL-EXTs (data not shown).
309
310
Here we present a preliminary analysis of the minor HRGP classes (5 to 23), and will then focus on
311
the major HRGP classes (1 to 4). Classes 5 to 23 include ‘hybrid’ type HRGPs with motifs associated
312
with two or more classes of HRGP (see Fig 4, Johnson et al., companion paper). For example, class 5
313
HRGPs contain AGP-bias with CL-EXT motifs (both SP3-5- and Y-based), and are most highly
314
represented in gymnosperms (conifers, Gnetales, Ginkgo and Cycadales) (Fig. 2C). In general, the
315
detection of classes 5 to 23 is low and sporadic for most clades with the exception of conifers and core
316
eudicots. In these clades a high detection rate (>50%) of sequences occurs for most HRGP classes
317
(Fig. 2C). This suggests that more hybrid/unusual HRGPs occur in these clades, although the low
318
mean numbers indicate that these sequences are found in a few, rather than all datasets within core
319
eudicot and conifer orders. Datasets from the 1KP green algae group have a relatively high number of
320
sequences in class 6 (1.4 ± 4, AGP-bias, EXT glycomotifs (SPn) but not Y-based cross-linking motifs)
321
(Fig. 2C), which is consistent with our current knowledge of HRGPs isolated from Chlamydomonas
322
and Volvox (Adair et al., 1983; Woessner and Goodenough, 1989; Woessner et al., 1994; Ferris et al.,
323
2005). The majority of CL-EXTs identified in Arabidopsis are not predicted to be GPI-anchored (see
324
Table I; Supplemental Table S2; Johnson et al., companion paper) and consistent with this, only a few
325
GPI-anchored EXTs (class 9) were observed in Phytozome and 1KP data. Only one class, GPI-
326
anchored PRPs (class 14), had no representatives from any 1KP group, which is consistent with our
327
current, albeit limited, knowledge of PRPs (Averyhart-Fullard et al., 1988; Datta et al., 1989; Fowler
328
et al., 1999).
329
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
13
330
The mean number of GPI-AGPs and CL-EXTs in the 1KP data was lower than expected, particularly
331
in the core eudicots/rosids (~4 and 7 respectively, Fig. 2C), compared to the numbers in the majority
332
of eudicot and commelinid genomic datasets (see Table I, Johnson et al., companion paper). We
333
suspected that this might be due to the restricted number of tissue types used for sequencing and so we
334
investigated a subset of 1KP datasets in the Ranunculales (39 samples derived from 20 species). For
335
each dataset, we manually removed likely redundant sequences (see Methods), reducing the mean
336
number of GPI-AGP sequences from 7.3 to 4.8 (Supplemental Fig. S1A) and CL-EXTs from 5.1 to
337
2.9 (Supplemental Fig. S1B).
338
positioned deletions (data not shown). The majority of the 1KP angiosperm datasets are from leaf
339
tissue, yet this tissue has the lowest average number of GPI-AGPs (4.0), with 4.4 in roots, 5.0 in
340
flowers, and 6.0 in developing fruits (Supplemental Fig. S1A). Leaf tissue datasets also had a low
341
number of CL-EXTs (2.6) compared to roots (3.8) and fruits (3.3) (Supplemental Fig. S1B). When
342
all unique sequences are tallied across the four Papaver spp, 12 GPI-AGPs (Supplemental Fig. S1C)
343
and six CL-EXTs were identified (Supplemental Fig. S1D). Thus, the total numbers of GPI-AGPs
344
and CL-EXTs observed in most 1KP datasets are under-estimates due to limited tissue sampling,
345
despite the likely sequence redundancy that was not routinely removed by MAAB pipeline processing.
Most of the redundant sequences contained variably sized and
346
347
Significantly, the greater species sampling within 1KP mostly corroborated our Phytozome
348
observations and allow us to draw some major conclusions: (1) AGPs are widespread, both GPI-AGPs
349
and non-GPI-AGPs being convincingly found in the green algae group as well as Chromista and
350
Glaucophyta; (2) CL-EXTs occur in most groups of extant land plants including bryophytes (non-
351
vascular plants); and (3) non-chimeric PRPs (class 3) are uncommon and largely confined to eudicots
352
(Fig. 2C).
353
354
GPI-AGPs evolved early in the green plant lineage
355
Our data show that GPI-AGPs are present in all algal groups sampled in the 1KP project with the
356
exception of the Rhodophyta. Absence of GPI-AGPs in red algae is consistent with several other
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
14
357
independent studies. A detailed analysis of the completed genome of two unicellular red algae,
358
Cyanidioschyzon merolae (Hashimoto et al., 2009; Ulvskov et al., 2013) and Galdieria sulphuraria
359
(Ulvskov et al., 2013) revealed that they lack the key enzymes for the synthesis of GPI-anchors, and
360
therefore GPI-AGPs would not be expected to be present in these algae. To our knowledge there are
361
no reports in the literature of red algal AGPs based on either AGP glycan epitopes or β-Glc Yariv
362
staining (Supplemental Table S1). AGPs are not well known in the Chromista, which include brown
363
algae, although there are reports of low levels of Hyp (<0.04%) in the soluble fraction of cell walls
364
from a variety of brown algal species (Gotelli and Cleland, 1968). Also, in a recent study, O-linked
365
glycosylation typical of AGPs has been detected in five species of brown algae and analysis of the
366
Ectocarpus proteome show that proteins containing AGP-like glycomotifs are present (Herve et al.,
367
2016). A similar distribution of AP, SP and TP motifs is seen in all 1KP groups from Chromista to
368
eudicots (see Supplemental Fig. S5, Johnson et al., companion paper), suggesting that AGP-type
369
glycomotifs are evolutionarily ancient.
370
371
A consistent trend in genomic and transcriptomic datasets was the decrease in the number of non-GPI-
372
AGPs from the green algae group to vascular plants and the opposite trend (an increase) for GPI-
373
AGPs, including a 3-4-fold increase in GPI-AGPs in liverworts and mosses (approximately 4
374
sequences per species), compared to the green algae group and hornworts (Figs. 1 and 2). The green
375
algae (chlorophytes and algal streptophytes) are well represented in the 1KP data, including several
376
species from Volvocaceae (2 species, 3 samples) and Chlamydomonaceae (5 species, 5 samples). The
377
average number of GPI- and non-GPI-AGPs in the 1KP sample of Chlamydomonas spp is 0.8 and
378
43.6, respectively, whereas in Volvox spp it is 1 and 78.3, respectively (Data file 6). It is possible that
379
some of the non-GPI-AGPs could be partial chimeric HRGPs rather than true non-GPI-AGPs; further
380
work is needed to resolve this issue.
381
382
Functional specification of selected AGPs may have been established early in green plant evolution.
383
For example, AGPs have been shown to be involved in apical growth in the moss Physcomitrella
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
15
384
patens (Lee et al., 2005) and some vascular plants (Nguema-Ona et al., 2012). β-Glc Yariv staining in
385
eight liverwort species revealed ubiquitous tissue staining, with darker staining in apices of some
386
species, suggesting that apical targeting of AGPs may be a feature that arose in bryophytes (hornworts,
387
mosses and liverworts) (Basile and Basile, 1987). AGPs have also been implicated in cell plate
388
formation in liverworts (Shibaya and Sugawara, 2009). Since apical targeting as well as AGP delivery
389
to phragmoplasts (a scaffold for cell plate assembly) exists in streptophyte algae (Charales,
390
Coleochaetales and Zygnematales), it is possible that these two functions of AGPs evolved earlier than
391
the emergence of bryophytes (Leliaert et al., 2012; Bowman, 2013).
392
393
It should be noted, however, that β-Glc Yariv staining and glycan epitope labelling can provide only
394
general information about presence/absence of AGP-specific glycans, as these tools are not able to
395
distinguish the different sub-classes of AGPs. The information revealed by MAAB in the 1KP data
396
detailing the suite of AGPs present in bryophytes in addition to the increasing use of the bryophyte
397
models P. patens (Kofuji and Hasebe, 2014) and Marchantia polymorpha (Ishizaki et al., 2016) will
398
facilitate molecular approaches to investigate the function of specific AGPs through, for example,
399
mutant studies.
400
401
Origin and selective loss of CL-EXTs
402
Previous studies of HRGPs show that EXT-like Hyp-arabinosides are evolutionarily ancient and are
403
found in chlorophyte algae, such as Chlorella vulgaris (Lamport and Miller, 1971) and C. reinhardtii
404
(Ferris et al., 2001; Bollig et al., 2007). Our data suggest that CL-EXTs arose in bryophytes, as no
405
CL-EXTs were detected in the two volvocine (chlorophyte green algal) genomes (see Table I, Johnson
406
et al., companion paper) and only contaminating sequences were detected in algal 1KP datasets (Fig.
407
2). EXT epitopes have been identified in some chlorophyte algal species (Domozych et al., 2009;
408
Sørensen et al., 2011) (Supplemental Table S1) but this likely reflects the presence of chimeric
409
and/or hybrid EXTs rather than CL-EXTs. Similar to the Phytozome analysis, hybrid HRGPs with
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
16
410
both AGP and EXT glycomotifs were found in the 1KP green algae group (e.g. RYJX_Locus_857
411
(Pandorina morum) and VALZ_Locus_7817 (C. noctigama), both class 19). Where Y residues are
412
present, these are found in a separate domain to the P-rich domain (see Fig. 1, Johnson et al.,
413
companion paper). This result suggests that CL-EXTs arose though the juxtaposition of pre-existing
414
SPn EXT glycomotifs with refined Y-based motifs (e.g. YXY and VXY), rather than via the
415
simultaneous evolution of both protein motifs.
416
417
To gain a better understanding of the occurrence of CL-EXTs in bryophytes, the phylogenetic
418
distribution of CL-EXTs present in the 1KP data was assessed in more detail (Fig. 3). CL-EXTs were
419
detected by the MAAB pipeline in all five hornwort species representing two taxonomic orders
420
(Supplemental Fig. S2) whereas they were identified in only 5 of 16 orders of mosses (Hypnales,
421
Pottiales, Diphysciales, Buxbaumiales, and Sphagnales) and 2 of 8 orders of liverworts (Marchantiales
422
and Sphaerocarpales) that were sampled by 1KP (Supplemental Table S2). This relatively low
423
detection rate in mosses and liverworts is unlikely to be due to low species representation as it is
424
comparable to the sampling depth in hornworts (Fig. 3). The data therefore suggest that while CL-
425
EXTs are widespread in hornworts, they have a more limited distribution within the moss and
426
liverwort lineages. If hornworts are the sister group of other land plants (Wickett et al., 2014), then
427
this may imply that CL-EXTs were lost from some lineages of liverworts and mosses. An alternative
428
explanation is that CL-EXT genes are present in moss and liverworts but are expressed at very low
429
levels and were therefore not detected.
430
431
To exclude the possibility that MAAB missed partial CL-EXT sequences due to a strict criterion for an
432
ER signal sequence (see Johnson et al., companion paper), a tandem repeat annotation library (TRAL)
433
was generated to identify CL-EXT repeat motifs (Schaper et al., 2015). The TRAL library identified
434
all repeats in class 2 sequences that were longer than 10 amino acids, present in ≥4 copies and
435
significantly different from random sequence evolution (p-value < 0.05; see Methods). A library for
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
17
436
MAAB class 24 sequences was also generated as a control. Each repeat was used to construct a
437
circular sequence profile hidden Markov model (cphmm) and these models were used to search the
438
MAAB input data (see Data file 2, Johnson et al., companion paper) to identify all sequences that
439
contain these repeats. A simple tool for browsing the results of TRAL is available at
440
http://services.plantcell.unimelb.edu.au/hrgp/index.html and this tool was used to search all the
441
bryophyte species for TRAL hits.
442
443
There is a good correlation between the number of species with CL-EXTs from MAAB and those with
444
SPn-/Y-based repeats identified by TRAL (Fig. 3; Data file 6). Most importantly, there was no
445
increase in the number of bryophyte species containing TRAL repeats (in the MAAB input data; see
446
Data file 2, Johnson et al., companion paper) compared to MAAB class 2 sequences, suggesting that
447
only a small number of partial CL-EXTs are excluded by the MAAB pipeline.
448
449
TRAL repeats from class 24 were also searched as these could identify both novel HRGP motifs and
450
other repeats of interest that may have been excluded by the MAAB-imposed filters. The majority of
451
bryophyte species had hits to TRAL repeats generated from class 24 sequences; however, none
452
contained both SPn and Y repeats and only a few have repeats containing Y, suggesting that class 24
453
contains few, if any CL-EXTs. Combined, TRAL and MAAB outputs indicate that there are a number
454
of moss and liverwort species that lack CL-EXTs. This leads us to suggest that there has been either
455
selective loss (e.g. Jungermanniales and most species from Hypnales) or multiple independent gains
456
(e.g. Sphagnales and Marchantiales) of CL-EXTs in mosses and liverworts. Confirming this result
457
using genome sequencing of selected species and investigation of the morphology and development of
458
mosses and liverworts with and without CL-EXTs would be an interesting avenue for future research.
459
These non-vascular plants may also help uncover the role of CL-EXTs with different sequence motifs
460
and repeat lengths.
461
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
18
462
CL-EXTs have been lost in many grass species
463
Evaluation of Phytozome data shows CL-EXTs containing both SPn≥3 and Y-based motifs are found in
464
almost all eudicot species, but are surprisingly absent from Eucalyptus grandis (eudicot/rosid),
465
Aquilegia coerulea (basal eudicot), and the four species of grasses (monocot/commelinid; Poaceae).
466
No CL-EXTs were found in either P. patens or the two volvocine algal species (see Table I, Johnson
467
et al., companion paper). The low number of HRGPs in some of these species is suspected to be due
468
to annotation and assembly issues, based on our detection of CL-EXT-like open reading frames in
469
these genomes using 1KP-identified CL-EXTs as query sequences (see Methods). In contrast, this
470
approach was not successful for genomes from the commelinid monocots ((APG IV, 2016);
471
designated as ‘monocots/commelinids’ in the 1KP group nomenclature). This absence is convincing
472
because there are 17 different grass species (21 datasets) in the 1KP data and one species, Eleusine
473
coracana, is represented by four different datasets (leaves, flowers, un/stressed roots) (see Data file 3,
474
Johnson et al., companion paper). As an additional check we employed BUSCO (Benchmarking
475
Universal Single-Copy Orthologs), an analysis tool that quantitatively assesses transcriptome
476
completeness based on the presence of near-universal single-copy orthologous genes selected from
477
OrthoDB (Simao et al., 2015). Between 65-92% (excluding three outliers) of 956 single-copy genes
478
were identified in the monocots/commelinids datasets (Fig. 4B). The value for the Poaceae was at the
479
high end of this range (median 88.4%, mean 83.8%), suggesting that the grass transcriptome
480
assemblies are high quality and that CL-EXTs should have been detected if they were present. Our
481
findings are also supported by a recent bioinformatics study of Z. mays and O. sativa genomes, where
482
no CL-EXTs were identified (Liu et al., 2016).
483
484
Primary cell walls of commelinid monocots have a composition distinct from eudicots such as
485
Arabidopsis, and gymnosperms (Bacic et al., 1988; Carpita and Gibeaut, 1993; Doblin et al., 2010).
486
We therefore investigated whether an absence of CL-EXTs was also observed in other commelinid
487
families represented by 1KP. As for Poaceae, CL-EXTs were not detected in the sedge family
488
Cyperaceae (total of three species), and three other families within Poales, although they were
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
19
489
identified in Bromeliaceae and Restionaceae (1 species each) (Fig. 4A). CL-EXTs were also not
490
detected in two families within the Zingiberales (Zingiberaceae and Marantaceae). The commelinid
491
monocots are a major clade within the larger monocot group (APG IV, 2016). As CL-EXTs are
492
absent from at least four grass genomes, including the Sanger-sequenced rice genome (see Table I,
493
Johnson et al., companion paper), and occur widely in the 1KP monocot group (which excludes the
494
commelinid monocots) (Fig. 2; Supplemental Table S2; Data file 6), we suggest that there has been
495
selective loss of CL-EXTs within the commelinid monocots. Further studies are required to confirm
496
their absence from specific commelinid species.
497
498
It is likely that the cross-linking function of CL-EXTs would need to be replaced by something else in
499
commelinid monocots to provide the necessary scaffold for cell wall development.
500
candidates are monomeric phenolic acids such as the hydroxycinnamic acids (ferulic and p-coumaric)
501
of commelinid monocot walls, absent in the non-commelinid monocot, eudicots (with the exception of
502
the Caryophyllales) and gymnosperm walls, which form covalent cross-links between arabinoxylan
503
chains (reviewed in Bacic et al., 1988; Harris, 2005). Additional experimentation is needed to test this
504
hypothesis.
Obvious
505
506
Orthologs of GPI-AGPs and CL-EXTs in Brassicales
507
The broad sampling of species in the 1KP data provides an opportunity to investigate how far
508
orthologous relationships of selected sequences can be traced in plant lineages (Wickett et al., 2014).
509
Given IDPs are usually poorly conserved throughout evolution we wanted to test if putative HRGP
510
orthologs could be detected. As Arabidopsis HRGP sequences are the most well characterized, we
511
undertook phylogenetic analysis of Arabidopsis and GPI-AGPs and CL-EXTs identified by MAAB in
512
the Brassicales (see Methods) (Fig. 5). The maximum likelihood (ML) tree of the GPI-AGPs shows
513
that sequences from Brassicaceae datasets group most closely with the Arabidopsis sequences,
514
generally with good (>70%) bootstrap values. Sequences from other Brassicales families cluster
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
20
515
further away according to their evolutionary relatedness and with lower bootstrap support (Fig. 5A).
516
Although showing low bootstrap support, AtAGP59, the additional GPI-AGP identified by MAAB
517
analysis, clusters with AGP6 and AGP11 (Fig. 5A).
518
519
The phylogenetic analysis of Brassicales CL-EXTs revealed a similar hierarchical pattern to the GPI-
520
AGPs, with Brassicaceae sequences generally clustering nearer to the Arabidopsis sequences, but with
521
variable bootstrap support (Fig. 5B).
522
Arabidopsis sequence, making prediction of orthologous relationships difficult, even within the
523
Brassicaceae. This is perhaps not surprising since tandem repeat proteins often evolve more rapidly
524
than other proteins (Moesa et al., 2012; van der Lee et al., 2014). The clustering pattern of GPI-AGPs
525
in the Brassicales tree (Fig. 5A) largely reflects that observed for the Arabidopsis sequences (see Fig.
526
2C, Johnson et al., companion paper). This suggests that despite the low bootstrap values it may be
527
possible to identify orthologous sequences using phylogenetic analysis over larger evolutionary
528
distances, such as within angiosperms (flowering plants).
Some CL-EXT 1KP sequences did not group with an
529
530
Detection of orthologs in angiosperms: AtAGP6/11 as an example
531
One pair of Arabidopsis GPI-AGPs, AtAGP6 and 11 (see sub-clade AGP-h, Fig. 2C, Johnson et al.,
532
companion paper), have been well studied as they are involved in pollen development and pollen tube
533
growth and therefore impact reproductive capacity (Levitin et al., 2008; Coimbra et al., 2009; Coimbra
534
et al., 2010). Orthologs of these genes could therefore be expected to be found in a broader subset of
535
eudicots and potentially early angiosperms, depending on gene conservation. Although a recent
536
attempt to find putative orthologs of AtAGP6/11 in Quercus suber (cork oak; core eudicots/rosids,
537
Fagales) pollen EST data was unsuccessful (Costa et al., 2015), we predicted that it should be possible
538
to identify orthologs in the 1KP data, particularly within datasets that include floral tissue. A
539
combination of BLAST (Phytozome and NCBI) and profile hidden Markov (HMMER) models (see
540
Methods) were used to search for putative AtAGP6/11 orthologs in the 1KP multiple k-mer data
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
21
541
(MAAB input and output, see Data files 2, 3, 4, Johnson et al., companion paper). In some cases,
542
additional sequences were identified by BLAST from other sources (for example, the Amborella
543
trichopoda genome) for use as full-length evolutionary markers (see Methods). The HMMER models
544
were optimized by varying the input sequences and the length of the aligned region (see Methods).
545
The best results were obtained using full-length AtAGP6/11-type proteins (including the N-terminal
546
ER and C-terminal GPI-anchor signal sequences) from the Brassicaceae family (model 1;
547
Supplemental Table S3). This was measured by a higher ‘true’ positive and lower false positive rate,
548
as assessed by manual curation (Supplemental Table S4; see Methods).
549
550
The resulting sequences were subjected to phylogenetic analysis and the ML tree displayed in Fig. 6
551
shows the relatedness of a large sub-set of AtAGP6/11-like sequences derived from 1KP data and
552
sequenced genomes. Almost all the eudicot sequences group together with AtAGP6/11. The monocot
553
and monocot commelinid sequences form a separate sub-clade with AtAGP58 and 59 (Fig. 6), albeit
554
with low bootstrap support.
555
samples from Fagales, the order that includes Q. suber, including those containing floral tissue
556
(Supplemental Table S4) despite two similar sequences being identified in the Q. robor genome
557
(Qurob_scaffold_362 and 554). Only two sequences identified by HMMER model 1 were apparent
558
false positives (OHAE_12357 and 4810; Polygala lutea, Fabales), positioned in a sub-clade with
559
Arabidopsis AtAGP1/2/4/5 and OsAGP1 from rice (Fig. 6). The relatively clear separation of putative
560
AtAGP6/11 orthologs from the other GPI-AGPs suggests that phylogenetic analysis may provide a
561
reasonable prediction of gene orthology for these IDPs. In this case, potential AtAGP6/11 orthologs
562
could be confidently detected in both eudicot and monocot lineages, suggesting the existence of an
563
ancestral gene in a common ancestor prior to their divergence ~140-150 Myr ago (Chaw et al., 2004).
564
However, no potential AtAGP6/11 ortholog was identified in Amborella trichopoda, which is the
565
sister group of all other angiosperms. Unfortunately, all eight 1KP datasets from ‘basal’ angiosperms
566
(members of Amborellaceae, Nymphaeales, Austrobaileyales) were derived from either leaf or shoot
567
tissue, not floral tissue, and A. trichopoda is the only basal angiosperm genome that is publically
Notably, no AtAGP6/11-like sequences were detected within 1KP
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
22
568
available. Our demonstrated success at finding more HRGPs using multiple k-mer assemblies (see
569
Johnson et al., companion paper) suggests that using this approach to analyze existing and future
570
basal-most angiosperm floral transcriptome datasets would be advantageous before concluding an
571
absence of AtAGP6/11 orthologs amongst this group.
572
573
To explore the robustness of the AtAGP6/11 sub-clade further, all GPI-AGPs from the 1KP datasets
574
reported in Fig. 6 were used to produce a larger angiosperm GPI-AGP tree (Supplemental Fig. S3).
575
As expected the bootstrap values were low, however, most of the additional 290 GPI-AGP sequences
576
grouped into the sub-clades observed in Fig. 2C of our companion paper (see Johnson et al.). These
577
reflect putative Arabidopsis orthologs for AtAGP17/18, AtAGP9, AtAGP4/7/10, AtAGP2/3,
578
AtAGP58, AtAGP25/26/27, AtAGP1 and AtAGP6/11/59. Importantly, despite the low bootstrap
579
support, the majority of 1KP sequences in the AtAGP6/11 sub-clade in Fig. 6 also group with
580
AtAGP6/11 in the angiosperm GPI-AGP tree (Supplemental Fig. S3), suggesting that these
581
sequences are indeed the most similar and therefore likely to be AtAGP6/11 orthologs within the
582
analyzed transcriptomes.
583
584
Investigation of the GPI-AGP sequence alignment and further analysis of each individual GPI-AGP
585
sub-clade revealed differences in the presence and distribution of specific amino acids (e.g. K, M, Q,
586
and D/E) among sub-clade members (Fig. 7, Supplemental Fig. S4). For example, putative orthologs
587
of AtAGP6/11 from both eudicots and some monocots share scattered K residues, concentrated in the
588
first half of the protein followed by a region containing several acidic residues (Fig. 7; Supplemental
589
Fig. S4K).
590
591
The scattered K residues are noticeably absent from the putative orthologs from Q. robur
592
(Supplemental Fig. S4A,K). Further analysis is therefore required to confirm these sequences, and
593
other putative AtAG6/11 genes in order to determine the ‘true’ AtAGP6/11 orthologs. With additional
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
23
594
taxonomic and appropriate tissue sampling, it should be possible to fill the evolutionary gaps and trace
595
the origin of AtAGP6/11 genes, as well as other HRGPs. Complementation of angiosperm mutant
596
lines, for example, the agp6 agp11 double mutant of Arabidopsis that displays pollen phenotypes
597
(Levitin et al., 2008; Coimbra et al., 2009; Coimbra et al., 2010), with progenitor AtAGP6/11 genes
598
will be essential to test their functional orthology and may reveal information about these motifs, and
599
possibly the specific glycosylation required for HRGP function in particular tissue types.
600
601
Conclusions
602
HRGPs: remaining challenges and future perspectives
603
Despite being minor components of the wall, the HRGPs make a significant contribution to wall
604
properties and to cellular identity (Knox et al., 1991). With access to the 1KP data we have made
605
significant progress towards understanding the evolutionary timeframe of when this large gene family
606
of diverse and complex proteins likely evolved. We have provided a platform to guide experimental
607
approaches to confirm our data and answer questions as to how HRGPs evolved, as well as
608
investigating the losses of major classes of HRGPs through evolution (e.g. loss of CL-EXTs in
609
mosses, liverworts and commelinid monocots).
610
611
Tracking individual HRGPs throughout evolution is a major challenge. Our investigation of
612
AtAGP6/11 orthologues shows that it is possible to trace HRGPs with specific characteristics and
613
determine their evolutionary origins. However, the CL-EXTs in particular, as a consequence of their
614
variable tandem repeat motifs and diverse sequence length, cannot be aligned in a meaningful way
615
with the tools developed for folded proteins. Alternate approaches are needed, such as investigating
616
length variation and indels rather than pairwise alignment. An approach that could be useful is to
617
perform global multiple sequence alignments using a graph-based method that takes into account the
618
observation that indels can occur anywhere in a repeat (Szalkowski and Anisimova, 2013).
619
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
24
620
The biggest challenge will be to understand the individual and coordinated roles of HRGPs within the
621
context of either a single cell or tissue type. Our data provide a valuable resource to initiate system
622
wide analyses to identify spatio-termporal expression changes of HRGPs throughout development.
623
Even in the less-complex non-vascular plants, GPI-AGPs and CL-EXTs are multigene families, and
624
understanding their individual function will provide exciting challenges for researchers for many years
625
to come. Defining the role of individual AGPs and EXTs is confounded by their interactions with
626
other cell wall polymers; cross-linking of AGPs (Tan et al., 2013) and EXTs to other polymers such as
627
pectins and heteroxylans has been experimentally verified (reviewed in Velasquez et al., 2012)).
628
Understanding how widespread cross-linking of HRGPs to other cell wall polymers will be
629
particularly challenging. The emerging availability of tractable genetic systems for non-model land
630
plants will allow researchers to use the information provided by MAAB to explore these questions in a
631
range of species.
632
633
Glycosylation is a characteristic feature of HRGPs and defines the interactive molecular surface.
634
Predicting the post-translational modification of HRGPs including the sites of P-hydroxylation and
635
types of glycosylation will require detailed structural analysis of many members of each multigene
636
family, from key transitions throughout the plant lineage. Even within a single species, glycosylation
637
is tissue-dependent, and directed by the context of amino acids surrounding the glycomotif (Tan et al.,
638
2003; Shimizu et al., 2005; Kurotani and Sakurai, 2015). For many HRGPs, glycosylation often
639
completely encases and hence masks the protein backbone, as is the case for GPI-AGPs, AG-peptides
640
and CL-EXTs (see Fig. 1, Johnson et al., companion paper). Furthermore, it also drives the three
641
dimensional conformation of the protein backbone, and therefore determines the presentation of
642
epitopes on the molecules surface. Given that plant HRGPs have the same structural complexity as
643
animal mucins, it is expected that the core protein sequence will influence the extent and type of
644
glycosylation, which can also vary in a tissue-/cell-specific manner (Gerken et al., 2013). It is known
645
that HRGP glycosylation differs between volvocine algae and embryophytes, but information on
646
streptophyte green algae is lacking. Introducing the same HRGP backbone into multiple species and
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
25
647
tissue types throughout evolution would be an interesting approach to investigate context-dependent
648
glycosylation patterns and/or function. A complementary investigation of the glycoysltransferases
649
(GTs) that act to glycosylate HRGPs would also greatly assist our understanding of this complex
650
process. An exciting future of HRGP research beckons to determine the biological and functional
651
significance of this fascinating superfamily of secreted glycoproteins.
652
653
654
Methods
655
Phylogenetic Analysis
656
Phylogenetic analyses were performed using MEGA 6.06 (Tamura et al., 2013). Sequences
657
(untrimmed) were first aligned using MUSCLE (Edgar, 2004) using the default settings, except for
658
Supplemental Fig. S3, gap = -2 (default gap = -2.9). The best model was identified (find best protein
659
model (ML) option) and used for each alignment to generate a maximum likelihood tree (with 100
660
bootstrap replications), using all sites in the alignment. Sequences used for phylogenetic analysis
661
(fasta format), and the best model, are in Supplemental Fig. S5.
662
663
Datasets
664
A
665
http://services.plantcell.unimelb.edu.au/hrgp/index.html.
666
Data analysis was based on 1282 samples downloaded from the official 1KP mirror
667
(onekp.westgrid.ca; as of March 2014) (see Johnson et al., companion paper). Five additional datasets
668
were analyzed (July 2015) to improve coverage of hornworts (UCRN-Megaceros_tosanus, FAJB-
669
Paraphymatoceros_hallii,
670
Phaeoceros_carolinianus-gametophyte) and to include the single sample from clade Ginkgoales
671
(SGTW-Ginkgo biloba). These additional five datasets are included in the MAAB input and output
summary
of
datasets
and
methods
is
also
RXRQ-Phaeoceros_carolinianus-sporophyte,
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
available
at
WCZB-
26
672
files (see Data file 2, 3, 4, Johnson et al., companion paper) and most of the analysis with these
673
samples is presented, with the exception of Supplemental Table S2.
674
675
MAAB summary analysis
676
The sequences and associated annotations from MAAB classification, including the assembly in which
677
they were identified, are provided in Data files 3, 4 (of Johnson et al., companion paper). Data from
678
individual samples (hereafter referred to as datasets) that were identified by the 1KP consortium as
679
containing contaminants were removed from analyses (unless otherwise noted) and can be found in
680
Data file 5 (of Johnson et al., companion paper). The number of sequences in each MAAB class for
681
each individual 1KP sample is provided in Data file 6. Mean number of HRGPs by MAAB class (for
682
each 1KP group (Fig. 2C) or commelinid monocot familiy (Fig. 4A and C)) was calculated by
683
averaging the appropriate data in Data file 6 (that reports the number of sequences per HRGP class
684
(columns) for each 1KP dataset (rows)). Total numbers of HRGPs by 1KP group (Fig. 1) were also
685
generated from Data file 6. Graphs (Fig. 4A and C) were generated using R/Bioconductor
686
(Gentleman et al., 2004). DNA sequences of GPI-AGPs (class 1) and CL-EXTs (class 2) are provided
687
in Data files 7, 8 respectively.
688
689
Reducing redundancy in 1KP datasets
690
For phylogenetic analyses, redundant sequences were eliminated from datasets. The most probable
691
sequence was identified based on existing knowledge, as follows: MAAB 1KP output (see Data file 3,
692
4, Johnson et al., companion paper) was sorted by 1KP locus. Sequences for each dataset (1KP locus)
693
were then sorted by HRGP class and the sequences from the desired HRGP class pasted into a new
694
Microsoft Excel sheet. Sequences were sorted alphabetically by sequence (to group sequences with
695
similar N-termini) and converted to tfasta format. Multiple sequence alignments were performed with
696
MUSCLE (http://www.ebi.ac.uk/Tools/msa/muscle/) to identify redundant sequences. Where two or
697
more sequences had large regions of identity, the longer sequence was retained unless there were clear
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
27
698
artefacts at either the N- or C-terminus. Retained sequences were subjected to further rounds of
699
alignments to ensure only unique sequences remained. When redundancy was unclear, both sequences
700
were retained in small datasets but only one for larger datasets.
701
702
Removing redundancies from the 1KP sequence data was necessary because like other short-read
703
sequence data, it offers a wealth of information, but redundancy can occur from multiple sources
704
including splice variants, sequencing errors, recent duplications, chimeric transcripts and polyploidy
705
(Gruenheit et al., 2012; Yang and Smith, 2013; Xie et al., 2014). Sequence redundancies were obvious
706
in our analysis (Supplemental Fig. S1A,B) and two relatively common occurrences were read-
707
through of GC-rich hairpins in the cDNA during reverse transcription and poor-quality sequence at
708
one end of the paired-read. cDNA synthesis for all 1KP datasets was performed at 42°C (Chen Li,
709
BGI-Shenzhen, personal communication).
710
711
Tandem repeat annotation library (TRAL) construction
712
TRAL v0.3.5 (Schaper et al., 2015) was used to identify and evaluate tandem repeats present in 1KP
713
sequences from HRGP class 2 (CL-EXTs), class 3 (PRPs) and class 24 (<15% known motifs) as these
714
MAAB classes are the most likely to include tandem repeats. Tandem repeats were identified using T-
715
REKS (Jorda and Kajava, 2009) and HHrepID (Biegert and Soding, 2008) algorithms. Tandem repeats
716
(TR) were accepted only if they passed all the following criteria: 1) TR unit length of at least 10 amino
717
acids, 2) at least four TR unit copies and 3) p-value < 0.05 using phylo_gap01 model and 4) repeat
718
identified by T-REKS and/or HHrepID. A total of 5139 repeats were identified and accepted from
719
MAAB sequences in classes 2, 3 and 24. Each repeat was used to construct a cphmm model (Schaper
720
et al., 2015) and then searched with HMMER v3.1b1 hmmsearch (evalue cutoff 1e-5)
721
(http://hmmer.org/) to identify repeats in MAAB input data (Data file 2 of companion paper). The
722
resulting 4070 models had two or more hits above threshold to the input data. These hits were used to
723
construct alignments (max. 300 sequences per alignment, randomly chosen after 95% per-sample
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
28
724
clustering using USEARCH (Edgar, 2010) and aligned with MUSCLE v3.81 (Edgar, 2004) for each
725
model.
726
http://services.plantcell.unimelb.edu.au/hrgp/index.html, and was used to determine the number of
727
species with class 2 and class 24 TRAL repeats, using keyword searches by order and/or species (Fig.
728
3).
A
tool
for
browsing
and
searching
TRAL
results
is
available
at
729
730
BUSCO (Benchmarking Universal Single-Copy Orthologs) analysis
731
1KP transcriptomes were assessed for completeness using the BUSCO analysis tool (Simao et al.,
732
2015). The BUSCO plant dataset and software were downloaded from http://buscos.ezlab.org/.
733
734
Profile hidden Markov (HMMER) models for finding putative AtAGP6/11 orthologs
735
Protein sequences were aligned with MUSCLE v3.81 (Edgar, 2004) using default parameter settings.
736
Sequence alignments were then used as input into HMMER v3.1b1 hmmbuild (http://hmmer.org/) to
737
construct a profile HMM. Subsequently, hmmsearch was used (with default thresholds) to search
738
against MAAB 1KP input sequences (see Data file 2, Johnson et al., companion paper). Seven
739
different models were evaluated (Supplemental Table S3) and the results of the best model (see Data
740
file 4 Johnson et al., companion paper) are summarized in Supplemental Table S4. For each positive
741
1KP dataset a single ‘best’ sequence was selected manually based on presence of N-terminal ER
742
signal, full or partial C-terminal GPI-signal sequences and then checked by iterative multiple sequence
743
alignment and phylogenetic analysis.
744
potential partial GPI-AGPs, proteins needed to have the following GPI-anchor signal characteristics: a
745
potential omega (ω), ω+1 and ω+2 sites and a basic residue and/or a hydrophobic domain) (Eisenhaber
746
et al., 2003), unless otherwise noted (e.g. ZHMB_16956, XMVD_14518), as summarized in
747
Supplemental Table S4.
To distinguish between true class 4 non-GPI-AGPs and
748
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
29
749
Additional AtAGP6/11-like sequences were obtained from other sources (outlined below) for use as
750
full-length controls and to increase the sequence representation in various 1KP groups. TBLASTn
751
was used to search Phytozome and/or the nucleotide collection (NR/NT at NCBI) using word size 2,
752
no filtering, and restricting organisms to same species, genera or family as appropriate, with either
753
AtAGP6/11 or OsAGP7/10 as query sequences. If unsuccessful, a suitable 1KP sequence was selected
754
(coding sequence only, Data file 7) and BLASTn performed (word size 7 and no filtering). When
755
positive hits were identified, these were used as query sequences to find additional sequences
756
(maximum of two per species used). Q. robur AtAGP6/11 putative orthologs were identified in
757
TBLASTn searches of Q. robur assembly v1 scaffolds available at https://urgi.versailles.inra.fr/blast/.
758
For the basal angiosperm Amborella trichopoda (not present in Phytozome v9) GPI-AGPs were
759
identified amongst annotated proteins accessible at ftp://ftp.ensemblgenomes.org/pub/plants/release-
760
31/fasta/amborella_trichopoda/ using 50% PAST as outlined in Schultz et al. (2002). A similar
761
approach was used to find evidence for genomic sequences of CL-EXTs from selected genomes using
762
1KP-identified CL-EXTs as query sequences (Data file 8). For example, 1KP sequence
763
AYMT_Locus_34756
identified
764
36791844..36792375,
and
765
1845754..1846286).
partial
CL-EXT
VGHH_Locus_4377
sequence
identified
E.
A.
grandis
coerulea
scaffold_6:
scaffold_34:
766
767
Data access & Datasets
768
Additional datasets for this paper. Numbering continues from Johnson et al., companion paper, and a
769
full list of data files is provided there.
770
Data file 6. Mean number of HRGPs in each HRGP class (columns) for each 1KP dataset (rows),
771
calculated from MAAB output files (Data files 3, 4). Filename: SA003_MAAB-hits-summary-by-
772
class-and-sample.xls.
773
Data file 7. DNA sequences for 1KP GPI-AGPs (class 1) detected by MAAB. Locus ID (Data files
774
3, 4) of class 1 GPI-AGPs was used to extract the DNA sequence from the appropriate multiple k-mer
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
30
775
assembly. Where more than one locus ID is reported for a single protein (e.g. with different k-mers),
776
only one DNA sequence is reported. Filename: 1kp_agp_incl_dnas_9-12-2015.xls.
777
Data file 8. DNA sequences for 1KP CL-EXTs (class 2) detected by MAAB. Locus ID (Data files 3,
778
4) of class 2 CL-EXTs was used to extract the DNA sequence from the appropriate multiple k-mer
779
assembly. Where more than one locus ID is reported for a single protein (e.g. with different k-mers),
780
only one DNA sequence is reported. Filename: class2_incl_dnas_9-12-2015.xls.
781
Data file 9. Sequences identified by HMMER model 1 (putative AtAGP6/11 orthologs). Filename:
782
agp6_model1_hmm_hits_20150922.xls.
783
784
Supplemental Data
785
786
Supplemental Table S1. Summary of selected AGP and extensin epitopes described in the literature.
787
788
Supplemental Table S2. HRGP detection rate for each 1KP taxonomic order.
789
790
Supplemental Table S3. Summary of HMMER models used to identify putative orthologs of
791
AtAGP6 and AtAGP11.
792
793
Supplemental Table S4. Summary of HMMER model performance by 1KP sample and analysis of
794
sequences selected for phylogenetic analysis.
795
796
Supplemental Fig. S1. Number of GPI-AGPs and CL-EXTs by tissue in Ranunculaceae and
797
Papavaceae samples represented in 1KP.
798
799
Supplemental Fig. S2. Bryophyte CL-EXTs sequences.
800
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
31
801
Supplemental Fig. S3. Phylogenetic analysis (ML-tree) of GPI-AGPs (all sub-clades) from selected
802
angiosperms.
803
804
Supplemental Fig. S4. Conserved features for putative orthologs of selected Arabidopsis GPI-AGPs.
805
Supplemental Fig. S5. Sequences used for phylogenetic analysis (in fasta format).
806
807
Acknowledgements
808
The authors thank Mark Chase from the Kew Royal Botanic Gardens, Markus Ruhsam and Philip
809
Thomas at the Royal Botanic Garden Edinburgh, and Steven Joya from the University of British
810
Columbia for supplying plant samples to 1KP. We also thank Drs Birgit and Frank Eisenhaber,
811
Bioinformatics Institute A*Star Singapore for providing us with a standalone version of big-PI plant
812
predictor.
813
814
Figure Legends
815
Figure 1. Number of HRGP sequences identified by MAAB in each HRGP class by 1KP group.
816
The number of datasets per clade is shown in brackets after the 1KP group name. All MAAB classes
817
are reported for multiple k-mers (multk: 39,49,59,69) (unshaded columns). No sequences were
818
detected for HRGP class 14. Selected data is reported for assemblies with k-mer = 25 (k25, shaded
819
column). Red text indicates that the sequences detected within this class and 1KP group are likely to
820
be contaminants (see Results/Discussion).
821
822
Figure 2. Summary of HRGP sequences identified using the MAAB pipeline. A, Percentage of
823
total HRGP sequences (by 1KP group) that are found in the classical HRGP classes (1 to 4), hybrid
824
HRGP classes (5 to 23) and non-HRGPs (class 24). B, Number of orders analyzed and number of
825
samples per order for each 1KP group. The monocots group excludes the commelinid monocots. C,
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
32
826
Overview of the mean number of HRGPs for each MAAB class in the 1KP dataset (by 1KP group).
827
Shading of boxes represents the detection rate, indicated as the % of orders with hits (see
828
Supplemental Table S1). A shaded box showing detection of HRGPs with a value of zero indicates
829
an average number of sequences between 0-0.1. Red text indicates that the sequences detected within
830
this HRGP class and 1KP group are likely to be contaminants (see Results/Discussion).
831
832
Figure 3. Distribution and number of CL-EXTs (class 2) in the bryophytes. MAAB detected CL-
833
EXTs in all hornwort species in the 1KP data whereas the distribution in mosses and liverworts shows
834
either loss of, or multiple independent gains of, CL-EXTs. Tandem Repeat Annotation Libraries
835
(TRAL) of repeats identified in class 2 and class 24 (control) sequences were used to search the
836
MAAB input sequences and identify repeats present in the bryophyte lineages.
837
correlation between the species with CL-EXTs identified by MAAB and those with SPn/Y-based
838
repeats identified by TRAL using class 2 repeats. No CL-EXTs were identified using TRAL repeats
839
from class 24 sequences. The phylogenetic tree is based on Cole and Hilger (2013) and was selected
840
because it includes all bryophyte orders.
There is good
841
842
Figure 4. Mean number of MAAB class 2 CL-EXT sequences (A) identified using the MAAB
843
pipeline within each family of monocots/commelinids. Many families lack CL-EXTs but this is not
844
due to poor quality transcriptomes as BUSCO analysis (B) shows the percentage of 956 single-copy
845
orthologous genes found in the monocots/commelinids families is high (avg. ~80%) Family data (k-
846
mer 39 only) are summarized in a box-and-whisker plot after removal of outliers (2 Poaceae sp.: 59.4,
847
65.7; 1 Aracaeae sp.: 18.1). The mean number of class 1 GPI-AGPs (C) was evaluated and these are
848
present in all families that lack CL-EXTs. Only Cannaceae and Restionaceae have both GPI-AGPs
849
and CL-EXTs, although for most families the number of datasets is low (1 to 3). The total number of
850
datasets per family is indicated in parentheses.
851
852
Figure 5. ML tree of Brassicales 1KP and Arabidopsis GPI-AGPs (A), and CL-EXTs (B). (A,B)
853
ML trees generally show strong support for sub-clades with Arabidopsis sequences and 1KP
854
sequences from family Brassicaceae. Putative GPI-AGP sub-clades (A) are separated by a horizontal
855
dotted line and the Arabidopsis ortholog(s) are indicated by boxes (right of tree). Sequences from the
856
Brassicaceae family are in green with Arabidopsis sequences in larger, bold font. AtEXT3 was
857
included with CL-EXTs (B) even though it was classified as class 20 because of its shared bias (<2%
858
difference (Δ)) between % PSKY (74.2%) and % PVYK (73.1%); see Johnson et al., companion
859
paper). Numbers on the nodes represent support with 100 bootstrap replicates (≥70 green, 60-69
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
33
860
orange, 40-59 black). 1KP datasets (1KP locus ID and family name) are shown on the Brassicales tree
861
(inset) (Stevens (2001), Version 12, July 2012 http://www.mobot.org/mobot/research/apweb/). Scale
862
bars for branch length measures the number of substitutions per site. Sequences: GPI-AGPs
863
Brassicales (Supplemental Fig. S5A) and CL-EXTs Brassicales (Supplemental Fig. S5B).
864
865
Figure 6. Phylogenetic analysis of class 1 GPI-AGPs to identify AtAGP6/11 orthologs. ML tree
866
(MEGA) was constructed using putative AtAGP6/11 orthologs identified predominantly from 1KP
867
transcriptomes and HMMER model 1 (see Supplemental Table S3), with a few sequences from other
868
sources to increase the breadth of sampling (see Methods; Supplemental Table S4). The tree also
869
includes at least one sequence from each Arabidopsis GPI-AGP sub-clade, representative rice GPI-
870
AGPs identified by MAAB (see Fig. 2, Johnson et al., companion paper) and four GPI-AGPs from
871
Amborella
872
Amtri_ERN06202). Sequence names are coloured by 1KP group: basal angiosperms (grey), non-
873
commelinid monocots (purple), commelinid monocots (pink), basal eudicots (aqua), core eudicots
874
(green), asterids (red) and rosids (orange). Numbers on the nodes represent support with 100 bootstrap
875
replicates (≥70 green, 60-69 orange, 40-59 black). 1KP sequences are identified by Group_Order_1KP
876
identifier_sequence locus. Genomic sequences and other sequences from NCBI follow a similar
877
format, but include a five letter abbreviation of genus and species (see Methods). Symbols next to
878
sequence names are used to indicate the source and MAAB class of sequences and other relevant
879
information. An asterisk (*) after the sequence name indicates additional information is provided in
880
Supplemental Table S4, for example, the YNXR (Poales) dataset is contaminated with Asparagales,
881
however, the ML tree suggests these sequences are indeed Poales. Scale bar for branch length
882
measures the number of substitutions per site.
trichopoda
(Amtri_ERM96654,
Amtri_ERM95342,
Amtri_ERN01113
and
883
884
Figure 7.
Schematic representation of the GPI-AGP sub-clades showing the presence and
885
distribution of specific amino acids in the PAST-rich protein backbone. Order of GPI-AGPs is
886
based on clades in angiosperm GPI-AGP tree (Supplemental Fig. S3); figure part numbers (A to K)
887
are based on alignments (Supplemental Fig. S4) and the AtAGP5/10 (G) subclade is included for
888
comparison (based on AGP-c subclade (see Fig. 2C, Johnson et al., companion paper)). A, Putative
889
orthologs of Lys-rich AtAGP17/18 have scattered lysine (K) residues throughout the sequence and
890
contain a short K-rich domain near the C-terminus. The glycomotifs are predominantly XP1-2. B, The
891
AGP9 sub-clade also has a short Lys-rich region with ≥3 K residues and scattered K residues,
892
however, the glycomotifs are distinct from AtAGP17/18, being predominantly [S/T]P3. AGP4/7/10
893
(C), AGP2/3 (D), AGP5/10 (G), AGP5 (H) and AGP1(I) sub-clades have a ‘classical’ PAST-rich
894
backbone with no obvious bias towards other amino acids. E, Most of the putative orthologs of
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
34
895
AtAGP58 are relatively methionine (M)-rich, with scattered M residues between AGP glycomotifs
896
and contain a Q residue at the presumed N-terminus. This Q residue at the presumed mature N-
897
terminus is also found in putative orthologs of Lys-rich AtAGP17/18 (A), AtAGP9 (B),
898
AtAGP1/2/3/4/7/5/10 (C,D,G,H,I), and AtAGP58 (E), but not AtAGP25/26/27 (F), or AGP6/11/59
899
(J/K). J/K, Putative orthologs of AtAGP6/11/59 share scattered K residues, concentrated in the first
900
half of the protein and also a small cluster of acidic residues, either aspartic acid (D) or glutamic acid
901
(E).
902
903
904
905
906
Literature Cited
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
Adair WS, Hwang C, Goodenough UW (1983) Identification and visualization of the sexual agglutinin
from the mating-type plus flagellar membrane of Chlamydomonas. Cell 33: 183-193
APG IV (2016) An update of the Angiosperm Phylogeny Group classification for the orders and
families of flowering plants: APG IV. Botanical Journal of the Linnean Society 181: 1-20
Averyhart-Fullard V, Datta K, Marcus A (1988) A hydroxyproline-rich protein in the soybean cell
wall. Proc. Natl. Acad. Sci. USA 85: 1082-1085
Bacic A, Harris PJ, Stone BA (1988) Structure and function of plant cell walls. In J Priess, ed, The
Biochemistry of Plants, Vol 14. Academic Press, New York, pp 297-371
Basile DV, Basile MR (1987) The Occurrence of Cell Wall-Associated Arabinogalactan Proteins in the
Hepaticae. Bryologist 90: 401-404
Bernhardt C, Tierney ML (2000) Expression of AtPRP3, a proline-rich structural cell wall protein from
Arabidopsis, is regulated by cell-type-specific developmental pathways involved in root hair
formation. Plant Physiology 122: 705-714
Biegert A, Soding J (2008) De novo identification of highly diverged protein repeats by probabilistic
consistency. Bioinformatics 24: 807-814
Bollig K, Lamshoft M, Schweimer K, Marner FJ, Budzikiewicz H, Waffenschmidt S (2007) Structural
analysis of linear hydroxyproline-bound O-glycans of Chlamydomonas reinhardtii-conservation of the inner core in Chlamydomonas and land plants. Carbohydr Res 342: 25572566
Borner GHH, Sherrier DJ, Weimar T, Michaelson LV, Hawkins ND, MacAskill A, Napier JA, Beale
MH, Lilley KS, Dupree P (2005) Analysis of detergent-resistant membranes in Arabidopsis.
Evidence for plasma membrane lipids rafts. Plant Physiology 137: 104-116
Bowman JL (2013) Walkabout on the long branches of plant evolution. Current Opinion in Plant
Biology 16: 70-77
Cannon MC, Terneus K, Hall Q, Tan L, Wang Y, Wegenhart BL, Chen L, Lamport DTA, Chen Y,
Kieliszewski MJ (2008) Self-assembly of the plant cell wall requires an extensin scaffold.
Proceedings of the National Academy of Sciences, USA 105: 2226-2231
Carpita NC, Gibeaut DM (1993) Structural models of primary cell walls in flowering plants:
consistency of molecular structure with the physical properties of the walls during growth.
Plant Journal 3: 1-30
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
35
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
Chaw SM, Chang CC, Chen HL, Li WH (2004) Dating the monocot-dicot divergence and the origin of
core eudicots using whole chloroplast genomes. Journal of Molecular Evolution 58: 424-441
Chen J, and Varner JE (1985) Isolation and characterization of cDNA clones for carrot extensin and a
proline-rich 33-kDa protein. Proc Natl Acad Sci USA 82: 4399-4403
Choudhary P, Saha P, Ray T, Tang Y, Yang D, Cannon MC (2015) EXTENSIN18 is required for full male
fertility as well as normal vegetative growth in Arabidopsis. Front Plant Sci 6: 553
Coimbra S, Costa M, Jones B, Mendes MA, Pereira LG (2009) Pollen grain development is
compromised in Arabidopsis agp6 agp11 null mutants. J Exp Bot 60: 3133-3142
Coimbra S, Costa M, Mendes MA, Pereira AM, Pinto J, Pereira LG (2010) Early germination of
Arabidopsis pollen in a double null mutant for the arabinogalactan protein genes AGP6 and
AGP11. Sexual Plant Reproduction 23: 199-205
Cole TCH, Hilger HH (2013) Bryophyte phylogeny, viewed 07/09/2015, http://www2.biologie.fuberlin.de/sysbot/poster/BPP-E.pdf.
Costa M, Nobre MS, Becker JD, Masiero S, Amorim MI, Pereira LG, Coimbra S (2013) Expressionbased and co-localization detection of arabinogalactan protein 6 and arabinogalactan
protein 11 interactors in Arabidopsis pollen and pollen tubes. BMC Plant Biol 13: 7
Costa ML, Sobral R, Ribeiro Costa MM, Amorim MI, Coimbra S (2015) Evaluation of the presence of
arabinogalactan proteins and pectins during Quercus suber male gametogenesis. Ann Bot
115: 81-92
Datta K, Schmidt A, Marcus A (1989) Characterization of two soybean repetitive proline-rich
proteins and a cognate cDNA from germinated axes. Plant Cell 1: 945-952
Doblin MS, Johnson KL, Humphries J, Newbigin EJ, Bacic A (2014) Are designer plant cell walls a
realistic aspiration or will the plasticity of the plant's metabolism win out? Current Opinion
in Biotechnology 26: 108-114
Doblin MS, Pettolino F, Bacic A (2010) Plant cell walls: the skeleton of the plant world. Functional
Plant Biology 37: 357-381
Domozych DS, Ciancia M, Fangel JU, Mikkelsen MD, Ulvskov P, Willats WG (2012) The Cell Walls of
Green Algae: A Journey through Evolution and Diversity. Front Plant Sci 3: 82
Domozych DS, Wilson R, Domozych CR (2009) Photosynthetic eukaryotes of freshwater wetland
biofilms: adaptations and structural characteristics of the extracellular matrix in the green
alga, Cosmarium reniforme (Zygnematophyceae, Streptophyta). Journal of Eukaryotic
Microbiology 56: 314-322
Draeger C, Ndinyanka Fabrice T, Gineau E, Mouille G, Kuhn BM, Moller I, Abdou MT, Frey B, Pauly
M, Bacic A, Ringli C (2015) Arabidopsis leucine-rich repeat extensin (LRX) proteins modify
cell wall composition and influence plant growth. BMC Plant Biol 15: 155
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Research 32: 1792-1797
Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:
2460-2461
Eisenhaber B, Wildpaner M, Schultz CJ, Borner GHH, Dupree P, Eisenhaber F (2003)
Glycosylphosphatidylinositol lipid anchoring of plant proteins. Sensitive prediction from
sequence- and genome-wide studies for Arabidopsis and rice. Plant Physiology 133: 16911701
Ellis M, Egelund J, Schultz CJ, Bacic A (2010) Arabinogalactan-proteins: key regulators at the cell
surface? Plant Physiology 153: 403-419
Ferris PJ, Waffenschmidt S, Umen JG, Lin H, Lee JH, Ishida K, Kubo T, Lau J, Goodenough UW (2005)
Plus and minus sexual agglutinins from Chlamydomonas reinhardtii. Plant Cell 17: 597-615
Ferris PJ, Woessner JP, Waffenschmidt S, Kilz S, Drees J, and Goodenough UW (2001) Glycosylated
polyproline II rods with kinks as a structural motif in plant hydroxyproline-rich glycoproteins.
Biochemistry 40: 2978-2987
Filmus J, Capurro M, Rast J (2008) Glypicans. Genome Biol. 9: 224
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
36
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
Fincher GB, Stone BA, Clarke AE (1983) Arabinogalactan-proteins: structure, biosynthesis and
function. Annual Review of Plant Physiology 34: 47-70
Fong C, Kieliszewski MJ, Zacks Rd, Leykam JF, Lamport DTA (1992) A gymnosperm extensin contains
the serine-tetrahydroxyproline motif. Plant Physiology 99: 548-552
Fowler TJ, Bernhardt C, Tierney ML (1999) Characterization and expression of four proline-rich cell
wall protein genes in Arabidopsis encoding two distinct subsets of multiple domain proteins.
Plant Physiology 121: 1081-1091
Gaspar Y, Johnson KL, McKenna JA, Bacic A, Schultz CJ (2001) The complex structures of
arabinogalactan-proteins and the journey towards understanding function. Plant Mol Biol.
47: 161-176
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge YC,
Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M,
Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang JH (2004)
Bioconductor: open software development for computational biology and bioinformatics.
Genome Biology 5
Gerken TA, Revoredo L, Thome JJ, Tabak LA, Vester-Christensen MB, Clausen H, Gahlay GK, Jarvis
DL, Johnson RW, Moniz HA, Moremen K (2013) The lectin domain of the polypeptide
GalNAc transferase family of glycosyltransferases (ppGalNAc Ts) acts as a switch directing
glycopeptide substrate glycosylation in an N- or C-terminal direction, further controlling
mucin type O-glycosylation. J Biol Chem 288: 19900-19914
Godl K, Hallmann A, Wenzl S, Sumper M (1997) Differential targeting of closely related ECM
glycoproteins: The pherophorin family from Volvox. EMBO Journal 16: 25-34
Gotelli IB, Cleland R (1968) Differences in the occurrence and distribution of hydroxyprolineproteins among the algae. Am J Bot 55: 907-914
Grennan AK (2007) Lipid rafts in plants. Plant Physiol 143: 1083-1085
Gruenheit N, Deusch O, Esser C, Becker M, Voelckel C, Lockhart P (2012) Cutoffs and k-mers:
implications from a transcriptome study in allopolyploid plants. Bmc Genomics 13
Harris PJ (2005) Diversity in plant cell walls In RJ Henry, ed, Plant Diversity and Evolution: Genotypic
and Phenotypic Variation in Higher Plants. CAB International Publishing, Wallingford, pp
201–227
Hashimoto K, Tokimatsu T, Kawano S, Yoshizawa AC, Okuda S, Goto S, Kanehisa M (2009)
Comprehensive analysis of glycosyltransferases in eukaryotic genomes for structural and
functional characterization of glycans. Carbohydrate Research 344: 881-887
Herve C, Simeon A, Jam M, Cassin A, Johnson KL, Salmean AA, Willats WG, Doblin MS, Bacic A,
Kloareg B (2016) Arabinogalactan proteins have deep roots in eukaryotes: identification of
genes and epitopes in brown algae and their role in Fucus serratus embryo development.
New Phytol 209: 1428-1441
Ishizaki K, Nishihama R, Yamato KT, Kohchi T (2016) Molecular Genetic Tools and Techniques for
Marchantia polymorpha Research. Plant Cell Physiol 57: 262-270
Jermyn MA, Yeow YM (1975) A class of lectins present in the tissues of seed plants. Australian
Journal of Plant Physiology 2: 501–531
Johnson MTJ, Carpenter EJ, Tian Z, Bruskiewich R, Burris JN, Carrigan CT, Chase MW, Clarke ND,
Covshoff S, dePamphilis CW, Edger PP, Goh F, Graham S, Greiner S, Hibberd JM, JordonThaden I, Kutchan TM, Leebens-Mack J, Melkonian M, Miles N, Myburg H, Patterson J,
Pires JC, Ralph P, Rolf M, Sage RF, Soltis D, Soltis P, Stevenson D, Stewart CN, Jr., Surek B,
Thomsen CJM, Villarreal JC, Wu X, Zhang Y, Deyholos MK, Wong GK-S (2012) Evaluating
methods for isolating total RNA and predicting the success of sequencing phylogenetically
diverse plant transcriptomes. Plos One 7: e50226
Jorda J, Kajava AV (2009) T-REKS: identification of Tandem REpeats in sequences with a K-meanS
based algorithm. Bioinformatics 25: 2632-2638
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
37
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
Kieliszewski M, Leykam JF, Lamport DTA (1990) Structure of the threonine-rich extensin from Zea
mays. Plant Physiology 85: 823-827
Kieliszewski MJ, Lamport DTA (1994) Extensin: repetitive motifs, functional sites, post-translational
codes, and phylogeny. Plant Journal 5: 157-172
Kitazawa K, Tryfona T, Yoshimi Y, Hayashi Y, Kawauchi S, Antonov L, Tanaka H, Takahashi T,
Kaneko S, Dupree P, Tsumuraya Y, Kotake T (2013) beta-galactosyl Yariv reagent binds to
the beta-1,3-galactan of arabinogalactan proteins. Plant Physiol 161: 1117-1126
Kleis-San Francisco SM, Tierney ML (1990) Isolation and characterization of a proline-rich cell wall
protein from soybean seedlings. Plant Physiology 94: 1897-1902
Knox JP, Linstead PJ, Peart JM, Cooper C, Roberts K (1991) Developmentally regulated epitopes of
cell surface arabinogalactan proteins and their relation to root tissue pattern formation.
Plant Journal 1: 317-326
Kofuji R, Hasebe M (2014) Eight types of stem cells in the life cycle of the moss Physcomitrella
patens. Curr Opin Plant Biol 17: 13-21
Kurotani A, Sakurai T (2015) In Silico Analysis of Correlations between Protein Disorder and PostTranslational Modifications in Algae. Int J Mol Sci 16: 19812-19835
Lamport DTA (1977) Structure, biosynthesis and significance of cell wall glycoproteins. Recent
Advances in Phytochemistry 11: 79-115
Lamport DTA, Kieliszewski MJ, Chen Y, Cannon MC (2011) Role of the extensin superfamily in
primary cell wall architecture. Plant Physiology 156: 11-19
Lamport DTA, Miller DH (1971) Hydroxyproline arabinosides in the plant kingdom. Plant Physiol. 48:
454-456
Lee KJD, Sakata Y, Mau SL, Pettolino F, Bacic A, Quatrano RS, Knight CD, Knox JP (2005)
Arabinogalactan proteins are required for apical cell extension in the moss Physcomitrella
patens. Plant Cell 17: 3051-3065
Leliaert F, Smith DR, Moreau H, Herron MD, Verbruggen H, Delwiche CF, De Clerck O (2012)
Phylogeny and molecular evolution of the green algae. Critical Reviews in Plant Sciences 31:
1-46
Levitin B, Richter D, Markovich I, Zik M (2008) Arabinogalactan proteins 6 and 11 are required for
stamen and pollen function in Arabidopsis. Plant J 56: 351-363
Li J, Ouyang B, Wang T, Luo Z, Yang C, Li H, Sima W, Zhang J, Ye Z (2016) HyPRP1 Gene Suppressed
by Multiple Stresses Plays a Negative Role in Abiotic Stress Tolerance in Tomato. Front Plant
Sci 7: 967
Li X-B, Kieliszewski M, Lamport DTA (1990) A chenopod extensin lacks repetitive
tetrahydroxyproline blocks. Plant Physiology 92: 327-333
Ligrone R, Vaughn KC, Renzaglia KS, Knox JP, Duckett JG (2002) Diversity in the distribution of
polysaccharide and glycoprotein epitopes in the cell walls of bryophytes: new evidence for
the multiple evolution of water-conducting cells. New Phytologist 156: 491-508
Lindstrom JT, Vodkin LO (1991) A soybean cell wall protein is affected by seed color genotype. Plant
Cell 3: 561-571
Liu X, Wolfe R, Welch LR, Domozych DS, Popper ZA, Showalter AM (2016) Bioinformatic
Identification and Analysis of Extensins in the Plant Kingdom. PLoS One 11: e0150177
MacAlister CA, Ortiz-Ramírez C, Becker JD, Feijó JA, Lippman ZB (2016) Hydroxyproline Oarabinosyltransferase mutants oppositely alter tip growth in Arabidopsis thaliana and
Physcomitrella patens. Plant Journal 85: 193-208
Majewska-Sawka A, Nothnagel EA (2000) The multiple roles of arabinogalactan proteins in plant
development. Plant Physiology 122: 3-9
Matasci N, Hung LH, Yan Z, Carpenter EJ, Wickett NJ, Mirarab S, Nguyen N, Warnow T,
Ayyampalayam S, Barker M, Burleigh JG, Gitzendanner MA, Wafula E, Der JP, DePamphilis
CW, Roure B, Philippe H, Ruhfel BR, Miles NW, Graham SW, Mathews S, Surek B,
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
38
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
Melkonian M, Soltis DE (2014) Data access for the 1,000 plants (1KP) project. Gigascience 3:
1
Moesa HA, Wakabayashi S, Nakai K, Patil A (2012) Chemical composition is maintained in poorly
conserved intrinsically disordered regions and suggests a means for their classification.
Molecular Biosystems 8: 3262-3273
Moller I, Sørensen I, Bernal AJ, Blaukopf C, Lee K, Øbro J, Pettolino F, Roberts A, Mikkelsen JD,
Knox JP, Bacic A, Willats WGT (2007) High-throughput mapping of cell-wall polymers within
and between plants using novel microarrays. Plant Journal 50: 1118-1128
Moore JP, Nguema-Ona E, Fangel JU, Willats WG, Hugo A, Vivier MA (2014) Profiling the main cell
wall polysaccharides of grapevine leaves using high-throughput and fractionation methods.
Carbohydr Polym 99: 190-198
Motose H, Sugiyama M, Fukuda H (2004) A proteoglycan mediates inductive interaction during
plant vascular development. Nature 429: 873-878
Nguema-Ona E, Coimbra S, Vicré-Gibouin M, Mollet J-C, Driouich A (2012) Arabinogalactan proteins
in root and pollen-tube cells: distribution and functional aspects. Annals of Botany 110: 383404
Pattathil S, Avci U, Baldwin D, Swennes AG, McGill JA, Popper Z, Bootten T, Albert A, Davis RH,
Chennareddy C, Dong R, O'Shea B, Rossi R, Leoff C, Freshour G, Narra R, O'Neil M, York WS,
Hahn MG (2010) A comprehensive toolkit of plant cell wall glycan-directed monoclonal
antibodies. Plant Physiology 153: 514-525
Paulsen BS, Craik DJ, Dunstan DE, Stone BA, Bacic A (2014) The Yariv reagent: behaviour in different
solvents and interaction with a gum arabic arabinogalactan-protein. Carbohydr Polym 106:
460-468
Pereira AM, Lopes AL, Coimbra S (2016) JAGGER, an AGP essential for persistent synergid
degeneration and polytubey block in Arabidopsis. Plant Signal Behav 11: e1209616
Pereira AM, Masiero S, Nobre MS, Costa ML, Solis MT, Testillano PS, Sprunck S, Coimbra S (2014)
Differential expression patterns of arabinogalactan proteins in Arabidopsis thaliana
reproductive tissues. J Exp Bot 65: 5459-5471
Pereira AM, Pereira LG, Coimbra S (2015) Arabinogalactan proteins: rising attention from plant
biologists. Plant Reprod 28: 1-15
Popper ZA, Michel G, Herve C, Domozych DS, Willats WG, Tuohy MG, Kloareg B, Stengel DB (2011)
Evolution and diversity of plant cell walls: from algae to flowering plants. Annu Rev Plant Biol
62: 567-590
Prochnik SE, Umen J, Nedelcu AM, Hallmann A, Miller SM, Nishii I, Ferris P, Kuo A, Mitros T, FritzLaylin LK, Hellsten U, Chapman J, Simakov O, Rensing SA, Terry A, Pangilinan J, Kapitonov
V, Jurka J, Salamov A, Shapiro H, Schmutz J, Grimwood J, Lindquist E, Lucas S, Grigoriev IV,
Schmitt R, Kirk D, Rokhsar DS (2010) Genomic analysis of organismal complexity in the
multicellular green alga Volvox carteri. Science 329: 223-226
Schaefer L, Schaefer RM (2010) Proteoglycans: from structural compounds to signaling molecules.
Cell Tissue Res 339: 237-246
Schaper E, Korsunsky A, Pecerska J, Messina A, Murri R, Stockinger H, Zoller S, Xenarios I,
Anisimova M (2015) TRAL: tandem repeat annotation library. Bioinformatics 31: 3051-3053
Schultz CJ, Gilson P, Oxley D, Youl JJ, Bacic A (1998) GPI-anchors on arabinogalactan-proteins:
implications for signalling in plants. Trends in Plant Science 3: 426-431
Schultz CJ, Rumsewicz MP, Johnson KL, Jones BJ, Gaspar YM, Bacic A (2002) Using genomic
resources to guide research directions. The arabinogalactan protein gene family as a test
case. Plant Physiology 129: 1448-1463
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
39
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
Sherrier DJ, Taylor GS, Silverstein KA, Gonzales MB, VandenBosch KA (2005) Accumulation of
extracellular proteins bearing unique proline-rich motifs in intercellular spaces of the legume
nodule parenchyma. Protoplasma 225: 43-55
Shibaya T, Sugawara Y (2009) Induction of multinucleation by β-glucosyl Yariv reagent in
regenerated cells from Marchantia polymorpha protoplasts and involvement of
arabinogalactan proteins in cell plate formation. Planta 230: 581-588
Shimizu M, Igasaki T, Yamada M, Yuasa K, Hasegawa J, Kato T, Tsukagoshi H, Nakamura K, Fukuda
H, Matsuoka K (2005) Experimental determination of proline hydroxylation and
hydroxyproline arabinogalactosylation motifs in secretory proteins. Plant Journal 42: 877889
Showalter AM, Basu D (2016) Extensin and arabinogalactan-protein biosynthesis:
glycosyltransferases, research challenges, and biosensors. Frontiers in Plant Science 7: 814
Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing
genome assembly and annotation completeness with single-copy orthologs. Bioinformatics
31: 3210-3212
Smith JJ, Muldoon EP, Willard JJ, Lamport DTA (1986) Tomato Extensin Precursors P1 and P2 Are
Highly Periodic Structures. Phytochemistry 25: 1021-1030
Somerville C, Bauer S, Brininstool G, Facette M, Hamann T, Milne J, Osborne E, Paredez A, Persson
S, Raab T, Vorwerk S, Youngs H (2004) Toward a systems approach to understanding plant
cell walls. Science 306: 2206-2211
Sørensen I, Domozych D, Willats WGT (2010) How have plant cell walls evolved? Plant Physiology
153: 366-372
Sørensen I, Pettolino FA, Bacic A, Ralph J, Lu F, O'Neill MA, Fei Z, Rose JKC, Domozych DS, Willats
WGT (2011) The charophycean green algae provide insights into the early origins of plant
cell walls. Plant Journal 68: 201-211
Szalkowski AM, Anisimova M (2013) Graph-based modeling of tandem repeats improves global
multiple sequence alignment. Nucleic Acids Res 41: e162
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular Evolutionary
Genetics Analysis version 6.0. Mol Biol Evol 30: 2725-2729
Tan L, Eberhard S, Pattathil S, Warder C, Glushka J, Yuan C, Hao Z, Zhu X, Avci U, Miller JS, Baldwin
D, Pham C, Orlando R, Darvill A, Hahn MG, Kieliszewski MJ, Mohnen D (2013) An
Arabidopsis cell wall proteoglycan consists of pectin and arabinoxylan covalently linked to an
arabinogalactan protein. Plant Cell 25: 270-287
Tan L, Leykam JF, Kieliszewski MJ (2003) Glycosylation motifs that direct arabinogalactan addition to
arabinogalactan-proteins. Plant Physiology 132: 1362-1369
Tan L, Showalter AM, Egelund J, Hernandez-Sanchez A, Doblin MS, Bacic A (2012) Arabinogalactanproteinsandtheresearchchallengesfortheseenigmaticplantcellsurfaceproteoglycans. Front.
Plant Sc 3: 1-10
Tapken W, Murphy AS (2015) Membrane nanodomains in plants: capturing form, function, and
movement. J Exp Bot 66: 1573-1586
Tierney ML, Varner JE (1987) The extensins. Plant Physiol 84: 1-2
Ulvskov P, Paiva DS, Domozych D, Harholt J (2013) Classification, naming and evolutionary history
of gycosyltransferases from sequenced green and red algal genomes. Plos One 8
van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J,
Gsponer J, Jones DT, Kim PM, Kriwacki RW, Oldfield CJ, Pappu RV, Tompa P, Uversky VN,
Wright PE, Babu MM (2014) Classification of intrinsically disordered regions and proteins.
Chemical Reviews 114: 6589-6631
van Holst G-J, Clarke AE (1985) Quantification of arabinogalactan-protein in plant extracts by single
radial gel diffusion. Anal. Biochem. 148: 446-450
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
40
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
Velasquez M, Salgado Salter J, Gloazzo Dorosz J, Petersen BL, Estevez JM (2012) Recent advances
on the posttranslational modifications of EXTs and their roles in plant cell walls. Frontiers in
Plant Science 3
Velasquez SM, Marzol E, Borassi C, Pol-Fachin L, Ricardi MM, Mangano S, Juarez SP, Salter JD,
Dorosz JG, Marcus SE, Knox JP, Dinneny JR, Iusem ND, Verli H, Estevez JM (2015) Low Sugar
Is Not Always Good: Impact of Specific O-Glycan Defects on Tip Growth in Arabidopsis. Plant
Physiol 168: 808-813
Velasquez SM, Ricardi MM, Dorosz JG, Fernandez PV, Nadra AD, Pol-Fachin L, Egelund J, Gille S,
Harholt J, Ciancia M, Verli H, Pauly M, Bacic A, Olsen CE, Ulvskov P, Petersen BL, Somerville
C, Iusem ND, Estevez JM (2011) O-glycosylated cell wall proteins are essential in root hair
growth. Science 332: 1401-1403
Voxeur A, Hofte H (2016) Cell wall integrity signaling in plants: "To grow or not to grow that's the
question". Glycobiology 26: 950-960
Wickett NJ, Mirarab S, Nam N, Warnow T, Carpenter E, Matasci N, Ayyampalayam S, Barker MS,
Burleigh JG, Gitzendanner MA, Ruhfel BR, Wafula E, Der JP, Graham SW, Mathews S,
Melkonian M, Soltis DE, Soltis PS, Miles NW, Rothfels CJ, Pokorny L, Shaw AJ, DeGironimo
L, Stevenson DW, Surek B, Villarreal JC, Roure B, Philippe H, dePamphilis CW, Chen T,
Deyholos MK, Baucom RS, Kutchan TM, Augustin MM, Wang J, Zhang Y, Tian Z, Yan Z, Wu
X, Sun X, Wong GK-S, Leebens-Mack J (2014) Phylotranscriptomic analysis of the origin and
early diversification of land plants. Proceedings of the National Academy of Sciences of the
United States of America 111: E4859-E4868
Woessner JP, Goodenough UW (1989) Molecular characterization of a zygote wall protein: An
extensin-like molecule in Chlamydomonas reinhardtii. Plant Cell 1: 901-911
Woessner JP, Molendijk AJ, Egmond Pv, Klis FM, Goodenough UW, Haring MA (1994) Domain
conservation in several volvocalean cell wall proteins. Plant Mol. Biol. 26: 947-960
Xie Y, Wu G, Tang J, Luo R, Patterson J, Liu S, Huang W, He G, Gu S, Li S, Zhou X, Lam T-W, Li Y, Xu
X, Wong GK-S, Wang J (2014) SOAPdenovo-Trans: de novo transcriptome assembly with
short RNA-Seq reads. Bioinformatics 30: 1660-1666
Xu WL, Zhang DJ, Wu YF, Qin LX, Huang GQ, Li J, Li L, Li XB (2013) Cotton PRP5 gene encoding a
proline-rich protein is involved in fiber development. Plant Mol Biol 82: 353-365
Yang Y, Smith SA (2013) Optimizing de novo assembly of short-read RNA-seq data for
phylogenomics. BMC Genomics 14: 328
1222
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
41
1KP group
Class
GPI-AGP
1
multk
CL-EXT
Non-GPIAGP
PRP
1
2
2
3
3
4
k 25 multk k 25 multk k 25 multk
4
k 25
GPIEXT
AGP Bias
5
6
7
8
9
EXT Bias
10
11
12
PRP Bias
13 15 16 17
<15%
motif
Shared Bias
18
19
20
21
22 23
Total
Total
Grand
Class 1-4
Class 5-23
Total
24
multk
k 25
multk
multk
k 25
Vascular
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Core eudicots/Rosids (227)
1629 411
801
159
167
19
4271 1853 186 164 49 1 33
15
24
26
13
8
5 206 111 210
452
84
17 16 5281 6868 2442
1620
16211 5798
Core eudicots/Asterids (224)
1517 461
767
177
140
16
3547 2236 128 184 51 2 48
20
37
46
12
5
3 157 47
154
505
98
32 15 4286 5971 2890
1544
14691 5022
Core eudicots (101)
343
83
288
47
24
6
1437
769
54
40
11
18
15
14
38
32
3
5 119 23
113
138
17
17
Basal eudicots (50)
342
81
223
39
39
803
368
55
65
1
5
12
45
13
11
2
4
62
141
44
11
1
2
2
2
3
4
Magnoliids (25)
134
43
90
19
9
385
226
16
4
Monocots/Commelinid (42)
85
25
24
1
1
932
628
11
8
4
Monocots (63)
226
53
165
28
3
1313
706
33
20
6
Basalmost angiosperms (8)
21
10
22
4
1
91
46
3
4
Conifers (70)
587
158
367
88
50
5
997
562
Gnetales (3)
17
4
3
1
25
12
Chloranthales (1)
1
Ginkgoales (1)
1
2
1
2
2
3
Cycadales (4)
16
7
17
3
29
15
Eusporangiate monilophytes (8)
25
7
32
7
Leptosporangiate monilophytes (54)
154
27
216
44
Lycophytes (16)
45
12
96
24
11
25
18
7
1
122 13
1
2
3
4
1
4
1
2
7
1
2
6
11
17
1
15
12
16
1
1
2
4
1
5
15
11
14 14 60
1
1
1
2
196
126
1608
679
31
66
3
319
187
2
10
1171
609
1
35
5
709
515
2
22
1
144
18
2
5
1
3
1
3
12
17
23
40
10
5
23
10
38
13
2
11
19
10
81
75
22
3
4
2
5
13
1
2
7
10
68
159
13
1
1
1
16
4
16
5
58
54
38
70
7
3
5
10
20
67
5
15
3
2
7
2
3
64
14
1
14
1
18
3
1
1
29
50
39
23
4
5
5
12
32
4
26
6
58
13
4
7
25
9
3
5
2
3
2
12
2
3
71
6
105
41
34
28
905
664
5542 2166
694
1407
488
514
3103 1001
7
6
8
1
22
13
595
618
288
141
1642
662
1670 1042
655
155
3522 1600
4764 1745
1951 1707
787
319
1
138
135
60
43
376
16
909
2001
813
530
4253 1498
28
45
17
9
1
2
1881 2092
4
99
131
36
1
4
7
0
12
15
49
62
25
24
160
69
174
279
253
140
66
633
1798 1989
750
497
5034 1705
350
460
223
89
1122
1235 1357
697
246
3535 1555
1076
867
584
99
2626 1403
227
175
19
45
466
500
Non-vascular
Mosses (38)
170
81
15
6
Liverworts (24)
135
62
23
7
Hornworts (6)
2
1
29
11
1
1
2
2
3
1
52
Algae
Green algae (152)
242
156
Red algae (20)
5
2
Chromista (algae) (25)
50
21
7
2
Glaucophyta (algae) (5)
TOTAL
5
6
4
1
5754 1711 3193 663
455
1
8887 6900
604
348
935
703
241
139
45 220
7
1
3
1
5
1
4
6
1
2
2
1
3
2
2
14758 9146 7062
622
31588 18834
831
609
350
7
1797
1783
989
724
13
3509 2099
211
248
142
50 28649 17652 699 864 142 6 107 186 323 336 168 37 98 691 307 1030 1702 359 136 62 39933 38051 20076
5
7253
606
853
290
85237 47326
Figure 1. Number of HRGP sequences identified by MAAB in each HRGP class by 1KP group. The number of datasets per clade is shown in brackets after the 1KP group name. All MAAB classes are reported for
multiple k -mers (multk : 39,49,59,69) (unshaded columns). No sequences were detected for HRGP class 14. Selected data is reported for assemblies with k -mer = 25 (k 25, shaded column). Red text indicates that the
sequences detected within this class and 1KP group are likely to be contaminants (see Results/Discussion).
A
B
C
% of total HRGPs/group
MAAB Class
Number
1−4 5−23
24 orders samples
Average no. of proteins / MAAB class
1KP group
1
2
3
4
5
6
7
8
9
10 11 12 13 15 16 17 18 19 20 21 22 23 24
50.6
13.1
36.3
12
227
Core eudicots/Asterids
6.7 3.4 0.6 15.6 0.6 0.8 0.2
0 0.2 0.1 0.2 0.2 0.1
0
0
49.9
11.8
38.4
16
224
Core eudicots/Rosids
7.3 3.6 0.7 19.1 0.8 0.7 0.2
0 0.1 0.1 0.1 0.1 0.1
0
0 0.9 0.5 0.9 2
45.1
14.3
40.6
3
101
Core eudicots
3.4 2.9 0.2 14.2 0.5 0.4 0.1
0.2 0.1 0.1 0.4 0.3 0
53.8
19.7
26.5
8
50
Basal eudicots
6.8 4.5 0.8 16.1 1.1 1.3 0
0.1 0.2 0.9 0.3 0.2
42.9
7.1
50
1
1
Chloranthales
Magnoliids
1
2
5
25
36.3
5.4
58.2
3
42
42.9
8
49.1
6
63
42.7
13.6
43.7
4
8
58.2
15.4
26.4
1
70
Conifers
8.4 5.2 0.7 14.2 1.7 0.2 0
54.9
11
34.1
2
3
Gnetales
5.7
80
0
20
1
1
Ginkgoales
1
1
2
45.9
17.8
36.3
1
4
Cycadales
4
4.2
7.2
51.3
13.4
35.3
4
8
Eusp. monilophytes
3.1
4
24.5
46.4
11.6
42
8
54
Lept. monilophytes
2.9
4
0.2 29.8 0.6 1.2 0.1
51.2
9.9
38.9
3
16
Lycophytes
2.8
6
19.9 0.1 0.6
47.4
9.4
43.2
18
38
Mosses
42.4
4.9
52.7
8
24
Liverworts
5.6
39.1
10.1
50.8
3
6
Hornworts
0.3 4.8
37.3
2.5
60.2
34
152
Green algae
1.6 0.1
42.1
0.5
57.4
10
20
35.5
0.5
64
10
25
Chromista (algae)
53.4
1.1
45.5
2
5
Glaucophyta (algae)
22.2
0.5
77.4
1
1
Dinophyceae
53.4
0
46.6
1
1
Euglenozoa
5.4 3.6 0.4 15.4 0.6 0.2
0.6
0 22.2 0.3 0.2 0.1
3.6 2.6
0 20.8 0.5 0.3 0.1
1
0.2
1.4
0.1 0.3 0.4
0
0
0.2 0.2 0.3
0
0.1
0.5 0.7 0.9 1.6 0.4 0.2
0.5 0.2 0.9 0.3
0
0 0.3 0.2 1.3 1.2 0.3
0.1 0.2 0.5 0.1
31
0.2 13
9.3
0.3 0.3
1
0.2
0.5 0.2
0.4
0
2
0.5
1.1
1
0.7 1.3
0
0
0.4 0.1
0.5 0.9 0.7 0.4 0.5 0.1 33.3
0.1 0.2 0.3
0.2
21.9
0
0 0.5
0
0
0 32.5
1 0.4 0.1 0.1
0.8 0.3 0.5 0.3 2
0
0.1
21.8
0.7 0.2 1.5 0.3 0.6 0.1
0.3 1.2 0.3
0
0
0.2 0.3 0.3 0.8 2
0.2 0.6
24 0.3 0.8 0.2
0.2 12.2
2.2 0.4 0.1
0.3 0.5 1.8 0.1
29.5 0.1 0.9 0
3.5
0.2
0.6
0.4 0.2 0.3
0.9 0.1
0 58.5 0.3 1.4 0
2
0
97.1
0.1
41.5
0 0.1
0.4 0.4 0.2
48.2
44.8
37.8
0.3
0.7 0.3 0.2
0.2
0.2
47
4
0
1
0 30.8 0
37.4
39.8
1 2.3 0.2
0.3 0.3
30.2
0.2
0.3
0.5 0.2 0.6 1.6 0.1 0.2 0.1 17.2
0.1 0.2 0.2 0.2 0.2 0.9 0.1 0.1
0.3
0
0 23.8
1
71.3
42.2
164
75
69
% orders
with hits
25
50
75
100
Figure 2. Summary of HRGP sequences identified using the MAAB pipeline. A, Percentage of total HRGP
sequences (by 1KP group) that are found in the classical HRGP classes (1 to 4), hybrid HRGP classes (5 to 23)
and non-HRGPs (class 24). B, Number of orders analyzed and number of samples per order for each 1KP group.
The monocots group excludes the commelinid monocots. C, Overview of the mean number of HRGPs for each
MAAB class in the 1KP dataset (by 1KP group). Shading of boxes represents the detection rate, indicated as the
% of orders with hits (see Supplemental Table S1). A shaded box showing detection of HRGPs with a value of
zero indicates an average number of sequences between 0-0.1. Red text indicates that the sequences detected
within this HRGP class and 1KP group are likely to be contaminants (see Results/Discussion).
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Algae
2
0
7
Non-vascular
Red algae
4.5 0.4
8.3 1.3
3.6
0.1 0.3
0
Basalmost angiosperms 2.6 2.8 0.1 11.4 0.4 0.5
1
13.9
Vascular
43.9
Monocots
0.1 0.5 0.4 1.2 2.8 0.9 0.2
1
10.4
2
0.4 0.1 0.1 23.6
0 1.2 0.2 1.1 1.4 0.2 0.2 0.1 18.6
3
45.6
Monocots/Commelinids
0
0.7 0.2 0.7 2.2 0.4 0.1 0.1 18.9
Samples
Families
Species
MAAB (class 2)
TRAL (class 2)
TRAL (class 24)
no. species
+ CL-EXTs
- CL-EXTs
Liverworts
Mosses
Hornworts
(class 2) in the bryophytes. MAAB detected
CL-EXTs in all hornwort species in the 1KP data
whereas the distribution in mosses and liverworts
shows either loss of, or multiple independent
Treubiales
Haplomitriales
Blasiales
Neohodgsoniales
Sphaerocarpales
Marchantiales
Ricciales
Pelliales
Fossombroniales
Pallaviciniales
Pleuroziales
Metzgeriales
Ptilidiales
Porellales
Jungermanniales
Sphagnales
Takakiales
Andreaeaeales
Andreaeobryales
Tetraphidales
Polytrichales
Buxbaumiales
Diphysciales
Gigaspermales
Funariales
Timmiales
Dicranales
Pottiales
Grimmiales
Hedwigiales
Bartramiales
Splachnales
Bryales
Orthotrichales
Orthodontiales
Rhizogoniales
Aulacomniales
Hypnodendrales
Ptychomniales
Hookeriales
Hypnales
Figure 3. Distribution and number of CL-EXTs
1
1
1
0
0
1
gains of, CL-EXTs. Tandem Repeat Annotation
1
8
1
5
1
7
1
7
1
6
1
7
Libraries (TRAL) of repeats identified in class 2
2
1
2
0
0
2
and class 24 (control) sequences were used to
1
1
1
0
0
1
search the MAAB input sequences and identify
1
1
1
0
0
1
repeats present in the bryophyte lineages. There
1
8
3
1
1
1
7
1
1
1
2
7
3
1
1
0
0
3
0
0
0
0
3
0
0
1
8
2
0
1
2
1
1
2
1
1
2
1
1
0
1
1
0
0
1
1
0
1
2
1
4
1
3
1
1
2
1
3
1
2
1
1
2
1
4
1
3
1
1
0
0
0
1
0
0
0
0
0
0
1
0
0
0
1
2
4
1
2
1
0
3
1
3
1
3
1
0
0
0
0
1
1
11 11 11
1
0
9
2
2
3
3
3
3
3
1
Leiosporocerotales
Anthocerotales
3 1
Notothyladales
Phymatocerotales
3 3
Dendrocerotales
is good correlation between the species with CLEXTs identified by MAAB and those with SPn/Ybased repeats identified by TRAL using class 2
repeats.
No CL-EXTs were identified using
TRAL repeats from class 24 sequences. The
phylogenetic tree is based on Cole and Hilger
(2013) and was selected because it includes all
bryophyte orders.
Vascular Plants
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
A
Figure 4. Mean number of MAAB class 2 CL-EXT
4
Mean proteins by MAAB
CL-EXTs
3
% of BUSCO single-copy genes
pipeline
within
using
each
the
family
MAAB
of
monocots/commelinids.
Many families lack CL-
EXTs
due
but
this
is
not
to
poor
quality
percentage of 956 single-copy orthologous genes
found in the monocots/commelinids families is high
(avg. ~80%)
Family data (k-mer 39 only) are
summarized in a box-and-whisker plot after removal
90
of outliers (2 Poaceae sp.: 59.4, 65.7; 1 Aracaeae
sp.: 18.1). The mean number of class 1 GPI-AGPs
80
(C) was evaluated and these are present in all
70
families that lack CL-EXTs. Only Cannaceae and
Restionaceae have both GPI-AGPs and CL-EXTs,
60
although for most families the number of datasets is
8
GPI-AGPs
low (1 to 3). The total number of datasets per family
is indicated in parentheses.
6
4
2
Arecaceae (5)
Bromeliaceae (1)
Cyperaceae (3)
Joinvilleaceae (1)
Juncaceae (1)
Poaceae (21)
Restionaceae (1)
Typhaceae (2)
Cannaceae (1)
Heliconiaceae (2)
Marantaceae (1)
Strelitziaceae (1)
0
Zingiberaceae (2)
Mean proteins by MAAB
identified
transcriptomes as BUSCO analysis (B) shows the
1
100
C
(A)
2
0
B
sequences
Zingiberales
Poales
Arecales
Monocots/Commelinid family (as assigned by 1KP)
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
A
90 VMNH_Locus_18822
52 IPWB_Locus_30545
LAPO_Locus_128094
78
AtAGP1_At5G64310.1|PACid:19673006
52
VMNH_Locus_14415
UAXP_Locus_13472
SWPE_Locus_6427
62
DZTK_Locus_3934
41
HYZL_Locus_2834
99
HYZL_Locus_9103
HYZL_Locus_1464
42
HYZL_Locus_544
51
CZPV_Locus_36180
HYZL_Locus_244
100
CZPV_Locus_9612
AtAGP59_At3G27416.1|PACid:19660750
AtAGP11_At3G01700.1|PACid:19663150
45
VMNH_Locus_9336
83
VMNH_Locus_10786
74
VMNH_Locus_329
43
77 AtAGP6_At5G14380.1|PACid:19665343
MYZV_Locus_476
64
66
MYZV_Locus_9204
53
HYZL_Locus_5884
CZPV_Locus_47861
UAXP_Locus_29413
74
83
UAXP_Locus_4754
98 UVQL_Locus_476
LAPO_Locus_2422
58 100
AtAGP5_At1G35230.1|PACid:19657214
98 HABV_Locus_1044
54
UVQL_Locus_1887
AtAGP10_At4G09030.1|PACid:19644227
100
IPWB_Locus_34587
90
91 VMNH_Locus_19171
DZTK_Locus_3667
MYZV_Locus_5182
HYZL_Locus_12251
MYZV_Locus_4182
91
CRNC_Locus_20076
IPWB_Locus_2586
67
92
AtAGP3_At4G40090.1|PACid:19648772
AtAGP2_At2G22470.1|PACid:19638226
DZTK_Locus_800
JOIS_Locus_5719
100 TZWR_Locus_6853
43
AtAGP4_At5G10430.1|PACid:19665913
LAPO_Locus_16533
99
TZWR_Locus_4310
97
AtAGP7_At5G65390.1|PACid:19672144
64
HYZL_Locus_6813
83
CZPV_Locus_24329
CRNC_Locus_4364
95 VMNH_Locus_2993
76 IPWB_Locus_377
IPWB_Locus_13012
100
TZWR_Locus_973
65
89 50 AtAGP9_At2G14890.1|PACid:19640833
UAXP_Locus_11432
HYZL_Locus_1046
56
CZPV_Locus_36036
62
46
57
77
80
AtAGP1
AtAGP6/11
AtAGP59
56
100 IPWB_Locus_7995
LAPO_Locus_140
DZTK_Locus_278
AtEXT3_AT1G21310_Class20_manual_add
66
CSUV_Locus_1111
VMNH_Locus_21622
92
HABV_Locus_8657
AtEXT1_At1G76930.1|PACid:19657443
79
CRNC_Locus_5130
HYZL_Locus_6673
MYZV_Locus_17462
48
HYZL_Locus_11975
CZPV_Locus_41301
45
JOIS_Locus_247
JOIS_Locus_9845
JOIS_Locus_502
UAXP_Locus_12039
UAXP_Locus_16149
SWPE_Locus_8953
AtEXT21_At1G26250.1|PACid:19658113
43
tandem duplication
AtEXT20_At1G26240.1|PACid:19655134
98
AtEXT22_At4G08370.1|PACid:19648501
AtEXT17_At4G08380.1|PACid:19645658
41
AtAGP5/10
SWPE_Locus_1416
AtEXT18_At4G13390.1|PACid:19648172
TZWR_Locus_6475
AtEXT15_At5G35190.1|PACid:19670300
AtEXT11_At4G08400.1|PACid:19645018
89 AtEXT7_At2G24980.1|PACid:19638555
71 AtEXT13_At5G06630.1|PACid:19667515
AtEXT2_At3G54590.1|PACid:19665143
56
tandem duplication
88 AtEXT10_At3G54580.1|PACid:19664950
AtEXT9_At3G28550.1|PACid:19661711
AtEXT35_At3G20850.1|PACid:19664633
HYZL_Locus_199
42
MYZV_Locus_22840
HYZL_Locus_10140
69
JOIS_Locus_1958
51
CRNC_Locus_1155
DZTK_Locus_3778
UAXP_Locus_18193
81
IPWB_Locus_10603
85
CSUV_Locus_11630
97
AtEXT8_At2G43150.1|PACid:19639362
UVQL_Locus_10839
49
HABV_Locus_1095
59
LAPO_Locus_26253
100 HYZL_Locus_41339
HYZL_Locus_25529
MYZV_Locus_4163
96
92 HABV_Locus_341
96
LAPO_Locus_6067
TZWR_Locus_943
64
0.5
IPWB_Locus_15342
69
49 HABV_Locus_9830
81
92
AtAGP2/3
AtAGP4/7
AtAGP9
JOIS_Locus_1119
AtAGP
HYZL_Locus_1648
17/18
CZPV_Locus_16268
MYZV_Locus_8965
DZTK_Locus_3085
67
IPWB_Locus_8564
AtAGP17_At2G23130.1|PACid:19639138
AtAGP18_At4G37450.1|PACid:19645067
80
100 CSUV_Locus_9603_T2/2
89
CSUV_Locus_9603_T1/2
IPWB_Locus_633
100 VMNH_Locus_2336
VMNH_Locus_27554
MYZV_Locus_28246
MYZV_Locus_23577
98
HYZL_Locus_1144
61
VMNH_Locus_7551
AtAGP
75
88
AtAGP25_At5G18690.1|PACid:19666952
25/26/27
AtAGP27_At3G06360.1|PACid:19664968
VMNH_Locus_17896
AtAGP26_At2G47930.1|PACid:19640209
99
TZWR_Locus_23380
CRNC_Locus_15389
MYZV_Locus_3750
HYZL_Locus_1766
77
AtAGP58
40
CZPV_Locus_22718
CRNC_Locus_98
MYZV_Locus_1492
CSUV_Locus_2765
67
VMNH_Locus_1191
89
AtAGP58_At4G16980.1|PACid:19647627
54
TZWR_Locus_39
0.5
69
LAPO_Locus_1150
69
67
56
B
Brassicales
HYZL Akaniaceae
MYZV Tropaeolaceae
CZPV Moringaceae
CRNC Limnanthaceae
JOIS Koeberliniaceae
DZTK Bataceae
SWPE Resedaceae
UAXP Gyrostemonaceae
Arabidopsis Brassicaceae
CSUV Brassicaceae
HABV Brassicaceae
IPWB Brassicaceae
LAPO Brassicaceae
TZWR Brassicaceae
UVQL Brassicaceae
VMNH Brassicaceae
Figure 5. ML tree of Brassicales 1KP and Arabidopsis GPI-AGPs (A), and CL-EXTs
(B). (A,B) ML trees generally show strong support for sub-clades with Arabidopsis
sequences and 1KP sequences from family Brassicaceae. Putative GPI-AGP sub-clades
(A) are separated by a horizontal dotted line and the Arabidopsis ortholog(s) are indicated
by boxes (right of tree). Sequences from the Brassicaceae family are in green with
Arabidopsis sequences in larger, bold font. AtEXT3 was included with CL-EXTs (B) even
though it was classified as class 20 because of its shared bias (<2% difference (Δ))
between % PSKY (74.2%) and % PVYK (73.1%); see Johnson et al., companion paper).
Numbers on the nodes represent support with 100 bootstrap replicates (≥70 green, 60-69
orange, 40-59 black). 1KP datasets (1KP locus ID and family name) are shown on the
Downloaded from on June 17, 2017 Brassicales
- Published bytree
www.plantphysiol.org
(inset)
(Stevens
(2001),
Version
12,
July
2012
Copyright © 2017 American Society of
Plant Biologists. All rights reserved.
http://www.mobot.org/mobot/research/apweb/).
Scale bars for branch length measures the
number of substitutions per site. Sequences: GPI-AGPs Brassicales (Supplemental Fig.
S5A) and CL-EXTs Brassicales (Supplemental Fig. S5B).
Rosid_Brassicales_AtAGP11
Rosid_Brassicales_VMNH_10291
85
Rosid_Brassicales_AtAGP6
Rosid_Brassicales_Brrap_XP_009121671
97
Rosid_Brassicales_VMNH_329
100
Rosid_Brassicales_Brrap_XP_009131402
Rosid_Fabales_Medtr6g021940
P
Rosid_Fabales_RKFX_3135
Asterid_Apiales_CWYJ_8974
Asterid_Dipsacales_CAQZ_225
Asterid_Solanales_Solyc_XP_010314526.1*
88
MonocotComm_Poales_ROEI_15291*
P
Asterid_Asterales_DESP_2262*
P
100 Asterid_Asterales_TEZA_272*
Asterid_Ericales_SERM_1392
100
Asterid_Ericales_SERM_4447
Asterid_Gentianales_ECTD_9536
Asterid_Asterales_DESP_3601*
BasalEudicots_Proteales_Nelnu_XP_010262888_1
BasalEudicots_Proteales_Nelnu_XP_010262887.1
64
Rosid_Myrtales_Eucgr_XP_010034647
71
Asterid_Gentianales_UFQC_11993*
P
Rosid_Vitales_Vivin_XP_010660830_1
Rosid_Rosales_KYAD_1380
Rosid_Brassicales_AtAGP50*
Rosid_Malpighiales_CKDK_252*
42
P
Rosid_Malpighiales_CKDK_16890
BasalEudicot_Ranunculales_Aquca_scaffold_11:4769931..4770658*
Rosid_Malpighiales_Potri_001G339700
Rosid_Malpighiales_OODC_940
Rosid_Malpighiales_OODC_6813
90
Rosid_Fagales_Qurob_scaffold_362
CoreEudicot_Caryophyllales_MRKX_37906 P
41
CoreEudicot_Caryophyllales_BJKT_5* P
BasalEudicot_Ranunculales_XHKT_2993*
BasalEudicot_Ranunculales_CCHG_13874
P
BasalEudicot_Ranunculales_NMGG_7583
BasalEudicot_Ranunculales_NMGG_682
Rosid_Zygophyllales_ZHMB_16956
P
BasalEudicot_Ranunculales_UDHA_3261
BasalEudicot_Ranunculales_CCHG_5462*
P
BasalEudicot_Ranunculales_ERXG_35249
P
BasalEudicot_Ranunculales_XMVD_14518*
BasalEudicot_Ranunculales_ERXG_12856
Rosid_Malpighiales_VVPY_1668
Rosid_Fagales_Qurob_scaffold_554*
Monocot_Arecales_Phdac_XP_008788907_1
100
Monocot_Arecales_Elgui_XP_010943292_1
P
Monocot_Asparagales_XZME_884*
Monocot_Asparagales_EMJJ_7786*
Rosid_Brassicales_AtAGP58
Rosid_Brassicales_AtAGP59
MonocotComm_Poales_Seita_4G125000_1*
100
80
MonocotComm_Poales_BXAY_2382
96
MonocotComm_PoalesAsparagales_YXNR_30* C
MonocotComm_Poales_OsAGP7_LOC_Os06g17450_1
Monocot_Liliales_OOSO_6521
MonocotComm_PoalesAsparagales_YXNR_16252* C P
MonocotComm_Poales_OsAGP10_LOC_Os08g38250_1
96
MonocotComm_Poales_Seita_6G192500_1*
64
MonocotComm_Poales_BXAY_427
94
MonocotComm_Poales_OsAGP1_LOC_Os08g37630_1
Rosid_Fabales_OHAE_12357
58
Rosid_Fabales_OHAE_4810
Rosid_Brassicales_AtAGP2
62
Rosid_Brassicales_AtAGP4
Rosid_Brassicales_AtAGP5
Rosid_Brassicales_AtAGP1
BasalAngiosperm_Amborellales_Amtri_ERM95342*
BasalAngiosperm_Amborellales_Amtri_ERN06202*
Rosid_Brassicales_AtAGP17
MonocotComm_Poales_LOC_Os04g41480_1_HisRichAGP
MonocotComm_Poales_LOC_Os11g10890_1
Rosid_Brassicales_AtAGP25
Rosid_Brassicales_AtAGP27
68
Rosid_Brassicales_AtAGP26
MonocotComm_Poales_LOC_Os01g71170_1
MonocotComm_Poales_LOC_Os11g05935_1
Rosid_Brassicales_AtAGP9
BasalAngiosperm_Amborellales_Amtri_ERM96654*
BasalAngiosperm_Amborellales_Amtri_ERN01113*
72
52
Figure 6. Phylogenetic analysis of class 1 GPI-AGPs
to identify AtAGP6/11 orthologs. ML tree (MEGA) was
constructed
using
putative
AtAGP6/11
orthologs
identified predominantly from 1KP transcriptomes and
HMMER model 1 (see Supplemental Table S3), with a
few sequences from other sources to increase the
breadth of sampling (see Methods; Supplemental Table
S4). The tree also includes at least one sequence from
each Arabidopsis GPI-AGP sub-clade, representative rice
GPI-AGPs identified by MAAB (see Fig. 2, Johnson et al.,
companion paper) and four GPI-AGPs from Amborella
trichopoda
(Amtri_ERM96654,
Amtri_ERN01113
and
Amtri_ERM95342,
Amtri_ERN06202).
Sequence
names are coloured by 1KP group: basal angiosperms
(grey), non-commelinid monocots (purple), commelinid
monocots (pink), basal eudicots (aqua), core eudicots
(green), asterids (red) and rosids (orange). Numbers on
the
nodes
represent
support
with
100
bootstrap
replicates (≥70 green, 60-69 orange, 40-59 black). 1KP
sequences
are
identified
by
Group_Order_1KP
identifier_sequence locus. Genomic sequences and other
sequences from NCBI follow a similar format, but include
a five letter abbreviation of genus and species (see
Methods). Symbols next to sequence names are used to
indicate the source and MAAB class of sequences and
other relevant information. An asterisk (*) after the
0.5
C
P
Phytozome v9, GPI-AGP (MAAB class 1)
NCBI or Phytozome v11, GPI-AGP
1KP multK, GPI-AGP (class 1)
1KP multK, GPI-AGP (class 1), contaminant, same order
1KP multK, class 4, partial GPI-AGP or big-PI false neg
1KP sequence used as query at Phytozome/NCBI
AtAGP6 or OsAGP7 used as query at Phytozome/NCBI
Sequence used to generate AGP6/11 HMMER model
AtAGP6 ortholog identified by HMMER model
False positive, HMMER model
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
A.
AGP17/18 subclade, XP1-2 dominant
M
Q
scattered K
K-rich
Figure 7. Schematic representation of the GPI-AGP subclades showing the presence and distribution of specific
amino acids in the PAST-rich protein backbone. Order of
GPI-AGPs is based on clades in angiosperm GPI-AGP tree
B.
AGP9 subclade, [S/T]P3 dominant
M
Q
(Supplemental Fig. S3); figure part numbers (A to K) are
scattered K
based on alignments (Supplemental Fig. S4) and the
AtAGP5/10 (G) subclade is included for comparison (based
C. AGP4/7/10 D. AGP2/3 G/H. AGP5/10 I. AGP1
M
subclade(s)
Q
on AGP-c subclade (see Fig. 2C, Johnson et al., companion
paper). A, Putative orthologs of Lys-rich AtAGP17/18 have
scattered lysine (K) residues throughout the sequence and
E.
AGP58 subclade
Q
M
contain a short K-rich domain near the C-terminus. The
scattered M
glycomotifs are predominantly XP1-2. B, The AGP9 subclade also has a short Lys-rich region with ≥3 K residues
F.
and scattered K residues, however, the glycomotifs are
AGP25/26/27 subclade
M
distinct from AtAGP17/18, being predominantly [S/T]P3.
AGP4/7/10 (C), AGP2/3 (D), AGP5/10 (G), AGP5 (H) and
AGP1(I) sub-clades have a ‘classical’ PAST-rich backbone
J/K.
AGP6/11/59 subclade
M
with no obvious bias towards other amino acids. E, Most of
scattered K scattered D/E
the putative orthologs of AtAGP58 are relatively methionine
(M)-rich,
with
scattered
M
residues
between
AGP
glycomotifs and contain a Q residue at the presumed NER signal
PAST-rich backbone
GPI signal
terminus. This Q residue at the presumed mature N-
ER or GPI
cleavage site
terminus is also found in putative orthologs of Lys-rich
AtAGP17/18
(A),
AtAGP9
(B),
AtAGP1/2/3/4/7/5/10
(C,D,G,H,I), and AtAGP58 (E), but not AtAGP25/26/27 (F),
or
AGP6/11/59
(J/K).
J/K,
Putative
orthologs
of
AtAGP6/11/59 share scattered K residues, concentrated in
the first half of the protein and also a small cluster of acidic
residues, either aspartic acid (D) or glutamic acid (E).
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Parsed Citations
Adair WS, Hwang C, Goodenough UW (1983) Identification and visualization of the sexual agglutinin from the mating-type plus
flagellar membrane of Chlamydomonas. Cell 33: 183-193
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
APG IV (2016) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV.
Botanical Journal of the Linnean Society 181: 1-20
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Averyhart-Fullard V, Datta K, Marcus A (1988) A hydroxyproline-rich protein in the soybean cell wall. Proc. Natl. Acad. Sci. USA 85:
1082-1085
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Bacic A, Harris PJ, Stone BA (1988) Structure and function of plant cell walls. In J Priess, ed, The Biochemistry of Plants, Vol 14.
Academic Press, New York, pp 297-371
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Basile DV, Basile MR (1987) The Occurrence of Cell Wall-Associated Arabinogalactan Proteins in the Hepaticae. Bryologist 90: 401404
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Bernhardt C, Tierney ML (2000) Expression of AtPRP3, a proline-rich structural cell wall protein from Arabidopsis, is regulated by
cell-type-specific developmental pathways involved in root hair formation. Plant Physiology 122: 705-714
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Biegert A, Soding J (2008) De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics
24: 807-814
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Bollig K, Lamshoft M, Schweimer K, Marner FJ, Budzikiewicz H, Waffenschmidt S (2007) Structural analysis of linear
hydroxyproline-bound O-glycans of Chlamydomonas reinhardtii--conservation of the inner core in Chlamydomonas and land
plants. Carbohydr Res 342: 2557-2566
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Borner GHH, Sherrier DJ, Weimar T, Michaelson LV, Hawkins ND, MacAskill A, Napier JA, Beale MH, Lilley KS, Dupree P (2005)
Analysis of detergent-resistant membranes in Arabidopsis. Evidence for plasma membrane lipids rafts. Plant Physiology 137: 104116
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Bowman JL (2013) Walkabout on the long branches of plant evolution. Current Opinion in Plant Biology 16: 70-77
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Cannon MC, Terneus K, Hall Q, Tan L, Wang Y, Wegenhart BL, Chen L, Lamport DTA, Chen Y, Kieliszewski MJ (2008) Selfassembly of the plant cell wall requires an extensin scaffold. Proceedings of the National Academy of Sciences, USA 105: 22262231
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Carpita NC, Gibeaut DM (1993) Structural models of primary cell walls in flowering plants: consistency of molecular structure with
the physical properties of the walls during growth. Plant Journal 3: 1-30
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Chaw SM, Chang CC, Chen HL, Li WH (2004) Dating the monocot-dicot divergence and the origin of core eudicots using whole
chloroplast genomes. Journal of Molecular Evolution 58: 424-441
Pubmed: Author and Title
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Chen J, and Varner JE (1985) Isolation and characterization of cDNA clones for carrot extensin and a proline-rich 33-kDa protein.
Proc Natl Acad Sci USA 82: 4399-4403
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Choudhary P, Saha P, Ray T, Tang Y, Yang D, Cannon MC (2015) EXTENSIN18 is required for full male fertility as well as normal
vegetative growth in Arabidopsis. Front Plant Sci 6: 553
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Coimbra S, Costa M, Jones B, Mendes MA, Pereira LG (2009) Pollen grain development is compromised in Arabidopsis agp6 agp11
null mutants. J Exp Bot 60: 3133-3142
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Coimbra S, Costa M, Mendes MA, Pereira AM, Pinto J, Pereira LG (2010) Early germination of Arabidopsis pollen in a double null
mutant for the arabinogalactan protein genes AGP6 and AGP11. Sexual Plant Reproduction 23: 199-205
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Cole TCH, Hilger HH (2013) Bryophyte phylogeny, viewed 07/09/2015, http://www2.biologie.fu-berlin.de/sysbot/poster/BPP-E.pdf.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Costa M, Nobre MS, Becker JD, Masiero S, Amorim MI, Pereira LG, Coimbra S (2013) Expression-based and co-localization
detection of arabinogalactan protein 6 and arabinogalactan protein 11 interactors in Arabidopsis pollen and pollen tubes. BMC
Plant Biol 13: 7
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Costa ML, Sobral R, Ribeiro Costa MM, Amorim MI, Coimbra S (2015) Evaluation of the presence of arabinogalactan proteins and
pectins during Quercus suber male gametogenesis. Ann Bot 115: 81-92
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Datta K, Schmidt A, Marcus A (1989) Characterization of two soybean repetitive proline-rich proteins and a cognate cDNA from
germinated axes. Plant Cell 1: 945-952
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Doblin MS, Johnson KL, Humphries J, Newbigin EJ, Bacic A (2014) Are designer plant cell walls a realistic aspiration or will the
plasticity of the plant's metabolism win out? Current Opinion in Biotechnology 26: 108-114
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Doblin MS, Pettolino F, Bacic A (2010) Plant cell walls: the skeleton of the plant world. Functional Plant Biology 37: 357-381
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Domozych DS, Ciancia M, Fangel JU, Mikkelsen MD, Ulvskov P, Willats WG (2012) The Cell Walls of Green Algae: A Journey
through Evolution and Diversity. Front Plant Sci 3: 82
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Domozych DS, Wilson R, Domozych CR (2009) Photosynthetic eukaryotes of freshwater wetland biofilms: adaptations and
structural characteristics of the extracellular matrix in the green alga, Cosmarium reniforme (Zygnematophyceae, Streptophyta).
Journal of Eukaryotic Microbiology 56: 314-322
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Draeger C, Ndinyanka Fabrice T, Gineau E, Mouille G, Kuhn BM, Moller I, Abdou MT, Frey B, Pauly M, Bacic A, Ringli C (2015)
Arabidopsis leucine-rich repeat extensin (LRX) proteins modify cell wall composition and influence plant growth. BMC Plant Biol
15: 155
Pubmed: Author and Title
CrossRef: Author and Title
Downloaded
Google Scholar: Author Only Title Only Author
and Title from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 17921797
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460-2461
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Eisenhaber B, Wildpaner M, Schultz CJ, Borner GHH, Dupree P, Eisenhaber F (2003) Glycosylphosphatidylinositol lipid anchoring
of plant proteins. Sensitive prediction from sequence- and genome-wide studies for Arabidopsis and rice. Plant Physiology 133:
1691-1701
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Ellis M, Egelund J, Schultz CJ, Bacic A (2010) Arabinogalactan-proteins: key regulators at the cell surface? Plant Physiology 153:
403-419
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Ferris PJ, Waffenschmidt S, Umen JG, Lin H, Lee JH, Ishida K, Kubo T, Lau J, Goodenough UW (2005) Plus and minus sexual
agglutinins from Chlamydomonas reinhardtii. Plant Cell 17: 597-615
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Ferris PJ, Woessner JP, Waffenschmidt S, Kilz S, Drees J, and Goodenough UW (2001) Glycosylated polyproline II rods with kinks
as a structural motif in plant hydroxyproline-rich glycoproteins. Biochemistry 40: 2978-2987
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Filmus J, Capurro M, Rast J (2008) Glypicans. Genome Biol. 9: 224
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Fincher GB, Stone BA, Clarke AE (1983) Arabinogalactan-proteins: structure, biosynthesis and function. Annual Review of Plant
Physiology 34: 47-70
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Fong C, Kieliszewski MJ, Zacks Rd, Leykam JF, Lamport DTA (1992) A gymnosperm extensin contains the serinetetrahydroxyproline motif. Plant Physiology 99: 548-552
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Fowler TJ, Bernhardt C, Tierney ML (1999) Characterization and expression of four proline-rich cell wall protein genes in
Arabidopsis encoding two distinct subsets of multiple domain proteins. Plant Physiology 121: 1081-1091
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Gaspar Y, Johnson KL, McKenna JA, Bacic A, Schultz CJ (2001) The complex structures of arabinogalactan-proteins and the
journey towards understanding function. Plant Mol Biol. 47: 161-176
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge YC, Gentry J, Hornik K, Hothorn T,
Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang JH
(2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 5
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Gerken TA, Revoredo L, Thome JJ, Tabak LA, Vester-Christensen MB, Clausen H, Gahlay GK, Jarvis DL, Johnson RW, Moniz HA,
Moremen K (2013) The lectin domain of the polypeptide GalNAc transferase family of glycosyltransferases (ppGalNAc Ts) acts as a
switch directing glycopeptide substrate glycosylation in an N- or C-terminal direction, further controlling mucin type Oglycosylation. J Biol Chem 288: 19900-19914
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Godl K, Hallmann A, Wenzl S, Sumper M (1997) Differential targeting of closely related ECM glycoproteins: The pherophorin family
from Volvox. EMBO Journal 16: 25-34
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Gotelli IB, Cleland R (1968) Differences in the occurrence and distribution of hydroxyproline-proteins among the algae. Am J Bot
55: 907-914
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Grennan AK (2007) Lipid rafts in plants. Plant Physiol 143: 1083-1085
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Gruenheit N, Deusch O, Esser C, Becker M, Voelckel C, Lockhart P (2012) Cutoffs and k-mers: implications from a transcriptome
study in allopolyploid plants. Bmc Genomics 13
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Harris PJ (2005) Diversity in plant cell walls In RJ Henry, ed, Plant Diversity and Evolution: Genotypic and Phenotypic Variation in
Higher Plants. CAB International Publishing, Wallingford, pp 201-227
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Hashimoto K, Tokimatsu T, Kawano S, Yoshizawa AC, Okuda S, Goto S, Kanehisa M (2009) Comprehensive analysis of
glycosyltransferases in eukaryotic genomes for structural and functional characterization of glycans. Carbohydrate Research 344:
881-887
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Herve C, Simeon A, Jam M, Cassin A, Johnson KL, Salmean AA, Willats WG, Doblin MS, Bacic A, Kloareg B (2016) Arabinogalactan
proteins have deep roots in eukaryotes: identification of genes and epitopes in brown algae and their role in Fucus serratus
embryo development. New Phytol 209: 1428-1441
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Ishizaki K, Nishihama R, Yamato KT, Kohchi T (2016) Molecular Genetic Tools and Techniques for Marchantia polymorpha
Research. Plant Cell Physiol 57: 262-270
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Jermyn MA, Yeow YM (1975) A class of lectins present in the tissues of seed plants. Australian Journal of Plant Physiology 2: 501531
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Johnson MTJ, Carpenter EJ, Tian Z, Bruskiewich R, Burris JN, Carrigan CT, Chase MW, Clarke ND, Covshoff S, dePamphilis CW,
Edger PP, Goh F, Graham S, Greiner S, Hibberd JM, Jordon-Thaden I, Kutchan TM, Leebens-Mack J, Melkonian M, Miles N,
Myburg H, Patterson J, Pires JC, Ralph P, Rolf M, Sage RF, Soltis D, Soltis P, Stevenson D, Stewart CN, Jr., Surek B, Thomsen
CJM, Villarreal JC, Wu X, Zhang Y, Deyholos MK, Wong GK-S (2012) Evaluating methods for isolating total RNA and predicting the
success of sequencing phylogenetically diverse plant transcriptomes. Plos One 7: e50226
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Jorda J, Kajava AV (2009) T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm.
Bioinformatics 25: 2632-2638
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Kieliszewski M, Leykam JF, Lamport DTA (1990) Structure of the threonine-rich extensin from Zea mays. Plant Physiology 85: 823827
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Kieliszewski MJ, Lamport DTA (1994) Extensin: repetitive motifs, functional sites, post-translational codes, and phylogeny. Plant
Journal 5: 157-172
Pubmed: Author and Title
CrossRef: Author and Title
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Google Scholar: Author Only Title Only Author and Title
Kitazawa K, Tryfona T, Yoshimi Y, Hayashi Y, Kawauchi S, Antonov L, Tanaka H, Takahashi T, Kaneko S, Dupree P, Tsumuraya Y,
Kotake T (2013) beta-galactosyl Yariv reagent binds to the beta-1,3-galactan of arabinogalactan proteins. Plant Physiol 161: 11171126
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Kleis-San Francisco SM, Tierney ML (1990) Isolation and characterization of a proline-rich cell wall protein from soybean
seedlings. Plant Physiology 94: 1897-1902
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Knox JP, Linstead PJ, Peart JM, Cooper C, Roberts K (1991) Developmentally regulated epitopes of cell surface arabinogalactan
proteins and their relation to root tissue pattern formation. Plant Journal 1: 317-326
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Kofuji R, Hasebe M (2014) Eight types of stem cells in the life cycle of the moss Physcomitrella patens. Curr Opin Plant Biol 17: 1321
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Kurotani A, Sakurai T (2015) In Silico Analysis of Correlations between Protein Disorder and Post-Translational Modifications in
Algae. Int J Mol Sci 16: 19812-19835
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Lamport DTA (1977) Structure, biosynthesis and significance of cell wall glycoproteins. Recent Advances in Phytochemistry 11: 79115
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Lamport DTA, Kieliszewski MJ, Chen Y, Cannon MC (2011) Role of the extensin superfamily in primary cell wall architecture. Plant
Physiology 156: 11-19
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Lamport DTA, Miller DH (1971) Hydroxyproline arabinosides in the plant kingdom. Plant Physiol. 48: 454-456
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Lee KJD, Sakata Y, Mau SL, Pettolino F, Bacic A, Quatrano RS, Knight CD, Knox JP (2005) Arabinogalactan proteins are required
for apical cell extension in the moss Physcomitrella patens. Plant Cell 17: 3051-3065
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Leliaert F, Smith DR, Moreau H, Herron MD, Verbruggen H, Delwiche CF, De Clerck O (2012) Phylogeny and molecular evolution of
the green algae. Critical Reviews in Plant Sciences 31: 1-46
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Levitin B, Richter D, Markovich I, Zik M (2008) Arabinogalactan proteins 6 and 11 are required for stamen and pollen function in
Arabidopsis. Plant J 56: 351-363
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Li J, Ouyang B, Wang T, Luo Z, Yang C, Li H, Sima W, Zhang J, Ye Z (2016) HyPRP1 Gene Suppressed by Multiple Stresses Plays a
Negative Role in Abiotic Stress Tolerance in Tomato. Front Plant Sci 7: 967
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Li X-B, Kieliszewski M, Lamport DTA (1990) A chenopod extensin lacks repetitive tetrahydroxyproline blocks. Plant Physiology 92:
327-333
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Downloaded
on June
2017Diversity
- Published
Ligrone R, Vaughn KC, Renzaglia KS,
Knox JP,from
Duckett
JG 17,
(2002)
in by
thewww.plantphysiol.org
distribution of polysaccharide and glycoprotein
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
epitopes in the cell walls of bryophytes: new evidence for the multiple evolution of water-conducting cells. New Phytologist 156:
491-508
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Lindstrom JT, Vodkin LO (1991) A soybean cell wall protein is affected by seed color genotype. Plant Cell 3: 561-571
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Liu X, Wolfe R, Welch LR, Domozych DS, Popper ZA, Showalter AM (2016) Bioinformatic Identification and Analysis of Extensins in
the Plant Kingdom. PLoS One 11: e0150177
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
MacAlister CA, Ortiz-Ramírez C, Becker JD, Feijó JA, Lippman ZB (2016) Hydroxyproline O-arabinosyltransferase mutants
oppositely alter tip growth in Arabidopsis thaliana and Physcomitrella patens. Plant Journal 85: 193-208
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Majewska-Sawka A, Nothnagel EA (2000) The multiple roles of arabinogalactan proteins in plant development. Plant Physiology
122: 3-9
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Matasci N, Hung LH, Yan Z, Carpenter EJ, Wickett NJ, Mirarab S, Nguyen N, Warnow T, Ayyampalayam S, Barker M, Burleigh JG,
Gitzendanner MA, Wafula E, Der JP, DePamphilis CW, Roure B, Philippe H, Ruhfel BR, Miles NW, Graham SW, Mathews S, Surek
B, Melkonian M, Soltis DE (2014) Data access for the 1,000 plants (1KP) project. Gigascience 3: 1
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Moesa HA, Wakabayashi S, Nakai K, Patil A (2012) Chemical composition is maintained in poorly conserved intrinsically disordered
regions and suggests a means for their classification. Molecular Biosystems 8: 3262-3273
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Moller I, Sørensen I, Bernal AJ, Blaukopf C, Lee K, Øbro J, Pettolino F, Roberts A, Mikkelsen JD, Knox JP, Bacic A, Willats WGT
(2007) High-throughput mapping of cell-wall polymers within and between plants using novel microarrays. Plant Journal 50: 11181128
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Moore JP, Nguema-Ona E, Fangel JU, Willats WG, Hugo A, Vivier MA (2014) Profiling the main cell wall polysaccharides of
grapevine leaves using high-throughput and fractionation methods. Carbohydr Polym 99: 190-198
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Motose H, Sugiyama M, Fukuda H (2004) A proteoglycan mediates inductive interaction during plant vascular development. Nature
429: 873-878
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Nguema-Ona E, Coimbra S, Vicré-Gibouin M, Mollet J-C, Driouich A (2012) Arabinogalactan proteins in root and pollen-tube cells:
distribution and functional aspects. Annals of Botany 110: 383-404
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Pattathil S, Avci U, Baldwin D, Swennes AG, McGill JA, Popper Z, Bootten T, Albert A, Davis RH, Chennareddy C, Dong R, O'Shea
B, Rossi R, Leoff C, Freshour G, Narra R, O'Neil M, York WS, Hahn MG (2010) A comprehensive toolkit of plant cell wall glycandirected monoclonal antibodies. Plant Physiology 153: 514-525
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Paulsen BS, Craik DJ, Dunstan DE, Stone BA, Bacic A (2014) The Yariv reagent: behaviour in different solvents and interaction
with a gum arabic arabinogalactan-protein. Carbohydr Polym 106: 460-468
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Pereira AM, Lopes AL, Coimbra S (2016) JAGGER, an AGP essential for persistent synergid degeneration and polytubey block in
Arabidopsis. Plant Signal Behav 11: e1209616
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Pereira AM, Masiero S, Nobre MS, Costa ML, Solis MT, Testillano PS, Sprunck S, Coimbra S (2014) Differential expression
patterns of arabinogalactan proteins in Arabidopsis thaliana reproductive tissues. J Exp Bot 65: 5459-5471
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Pereira AM, Pereira LG, Coimbra S (2015) Arabinogalactan proteins: rising attention from plant biologists. Plant Reprod 28: 1-15
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Popper ZA, Michel G, Herve C, Domozych DS, Willats WG, Tuohy MG, Kloareg B, Stengel DB (2011) Evolution and diversity of
plant cell walls: from algae to flowering plants. Annu Rev Plant Biol 62: 567-590
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Prochnik SE, Umen J, Nedelcu AM, Hallmann A, Miller SM, Nishii I, Ferris P, Kuo A, Mitros T, Fritz-Laylin LK, Hellsten U, Chapman
J, Simakov O, Rensing SA, Terry A, Pangilinan J, Kapitonov V, Jurka J, Salamov A, Shapiro H, Schmutz J, Grimwood J, Lindquist E,
Lucas S, Grigoriev IV, Schmitt R, Kirk D, Rokhsar DS (2010) Genomic analysis of organismal complexity in the multicellular green
alga Volvox carteri. Science 329: 223-226
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Schaefer L, Schaefer RM (2010) Proteoglycans: from structural compounds to signaling molecules. Cell Tissue Res 339: 237-246
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Schaper E, Korsunsky A, Pecerska J, Messina A, Murri R, Stockinger H, Zoller S, Xenarios I, Anisimova M (2015) TRAL: tandem
repeat annotation library. Bioinformatics 31: 3051-3053
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Schultz CJ, Gilson P, Oxley D, Youl JJ, Bacic A (1998) GPI-anchors on arabinogalactan-proteins: implications for signalling in
plants. Trends in Plant Science 3: 426-431
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Schultz CJ, Rumsewicz MP, Johnson KL, Jones BJ, Gaspar YM, Bacic A (2002) Using genomic resources to guide research
directions. The arabinogalactan protein gene family as a test case. Plant Physiology 129: 1448-1463
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Sherrier DJ, Taylor GS, Silverstein KA, Gonzales MB, VandenBosch KA (2005) Accumulation of extracellular proteins bearing
unique proline-rich motifs in intercellular spaces of the legume nodule parenchyma. Protoplasma 225: 43-55
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Shibaya T, Sugawara Y (2009) Induction of multinucleation by ß-glucosyl Yariv reagent in regenerated cells from Marchantia
polymorpha protoplasts and involvement of arabinogalactan proteins in cell plate formation. Planta 230: 581-588
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Shimizu M, Igasaki T, Yamada M, Yuasa K, Hasegawa J, Kato T, Tsukagoshi H, Nakamura K, Fukuda H, Matsuoka K (2005)
Experimental determination of proline hydroxylation and hydroxyproline arabinogalactosylation motifs in secretory proteins. Plant
Journal 42: 877-889
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Showalter AM, Basu D (2016) Extensin and arabinogalactan-protein biosynthesis: glycosyltransferases, research challenges, and
biosensors. Frontiers in Plant Science 7: 814
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
completeness with single-copy orthologs. Bioinformatics 31: 3210-3212
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Smith JJ, Muldoon EP, Willard JJ, Lamport DTA (1986) Tomato Extensin Precursors P1 and P2 Are Highly Periodic Structures.
Phytochemistry 25: 1021-1030
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Somerville C, Bauer S, Brininstool G, Facette M, Hamann T, Milne J, Osborne E, Paredez A, Persson S, Raab T, Vorwerk S,
Youngs H (2004) Toward a systems approach to understanding plant cell walls. Science 306: 2206-2211
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Sørensen I, Domozych D, Willats WGT (2010) How have plant cell walls evolved? Plant Physiology 153: 366-372
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Sørensen I, Pettolino FA, Bacic A, Ralph J, Lu F, O'Neill MA, Fei Z, Rose JKC, Domozych DS, Willats WGT (2011) The charophycean
green algae provide insights into the early origins of plant cell walls. Plant Journal 68: 201-211
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Szalkowski AM, Anisimova M (2013) Graph-based modeling of tandem repeats improves global multiple sequence alignment.
Nucleic Acids Res 41: e162
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol
Biol Evol 30: 2725-2729
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Tan L, Eberhard S, Pattathil S, Warder C, Glushka J, Yuan C, Hao Z, Zhu X, Avci U, Miller JS, Baldwin D, Pham C, Orlando R, Darvill
A, Hahn MG, Kieliszewski MJ, Mohnen D (2013) An Arabidopsis cell wall proteoglycan consists of pectin and arabinoxylan
covalently linked to an arabinogalactan protein. Plant Cell 25: 270-287
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Tan L, Leykam JF, Kieliszewski MJ (2003) Glycosylation motifs that direct arabinogalactan addition to arabinogalactan-proteins.
Plant Physiology 132: 1362-1369
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Tan L, Showalter AM, Egelund J, Hernandez-Sanchez A, Doblin MS, Bacic A (2012) Arabinogalactanproteinsandtheresearchchallengesfortheseenigmaticplantcellsurfaceproteoglycans. Front. Plant Sc 3: 1-10
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Tapken W, Murphy AS (2015) Membrane nanodomains in plants: capturing form, function, and movement. J Exp Bot 66: 1573-1586
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Tierney ML, Varner JE (1987) The extensins. Plant Physiol 84: 1-2
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Ulvskov P, Paiva DS, Domozych D, Harholt J (2013) Classification, naming and evolutionary history of gycosyltransferases from
sequenced green and red algal genomes. Plos One 8
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones DT, Kim PM,
Kriwacki RW, Oldfield CJ, Pappu RV, Tompa P, Uversky VN, Wright PE, Babu MM (2014) Classification of intrinsically disordered
regions and proteins. Chemical Reviews 114: 6589-6631
Pubmed: Author and Title
CrossRef: Author and Title
Downloaded
Google Scholar: Author Only Title Only Author
and Title from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.
van Holst G-J, Clarke AE (1985) Quantification of arabinogalactan-protein in plant extracts by single radial gel diffusion. Anal.
Biochem. 148: 446-450
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Velasquez M, Salgado Salter J, Gloazzo Dorosz J, Petersen BL, Estevez JM (2012) Recent advances on the posttranslational
modifications of EXTs and their roles in plant cell walls. Frontiers in Plant Science 3
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Velasquez SM, Marzol E, Borassi C, Pol-Fachin L, Ricardi MM, Mangano S, Juarez SP, Salter JD, Dorosz JG, Marcus SE, Knox JP,
Dinneny JR, Iusem ND, Verli H, Estevez JM (2015) Low Sugar Is Not Always Good: Impact of Specific O-Glycan Defects on Tip
Growth in Arabidopsis. Plant Physiol 168: 808-813
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Velasquez SM, Ricardi MM, Dorosz JG, Fernandez PV, Nadra AD, Pol-Fachin L, Egelund J, Gille S, Harholt J, Ciancia M, Verli H,
Pauly M, Bacic A, Olsen CE, Ulvskov P, Petersen BL, Somerville C, Iusem ND, Estevez JM (2011) O-glycosylated cell wall proteins
are essential in root hair growth. Science 332: 1401-1403
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Voxeur A, Hofte H (2016) Cell wall integrity signaling in plants: "To grow or not to grow that's the question". Glycobiology 26: 950960
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Wickett NJ, Mirarab S, Nam N, Warnow T, Carpenter E, Matasci N, Ayyampalayam S, Barker MS, Burleigh JG, Gitzendanner MA,
Ruhfel BR, Wafula E, Der JP, Graham SW, Mathews S, Melkonian M, Soltis DE, Soltis PS, Miles NW, Rothfels CJ, Pokorny L, Shaw
AJ, DeGironimo L, Stevenson DW, Surek B, Villarreal JC, Roure B, Philippe H, dePamphilis CW, Chen T, Deyholos MK, Baucom
RS, Kutchan TM, Augustin MM, Wang J, Zhang Y, Tian Z, Yan Z, Wu X, Sun X, Wong GK-S, Leebens-Mack J (2014)
Phylotranscriptomic analysis of the origin and early diversification of land plants. Proceedings of the National Academy of
Sciences of the United States of America 111: E4859-E4868
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Woessner JP, Goodenough UW (1989) Molecular characterization of a zygote wall protein: An extensin-like molecule in
Chlamydomonas reinhardtii. Plant Cell 1: 901-911
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Woessner JP, Molendijk AJ, Egmond Pv, Klis FM, Goodenough UW, Haring MA (1994) Domain conservation in several volvocalean
cell wall proteins. Plant Mol. Biol. 26: 947-960
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Xie Y, Wu G, Tang J, Luo R, Patterson J, Liu S, Huang W, He G, Gu S, Li S, Zhou X, Lam T-W, Li Y, Xu X, Wong GK-S, Wang J (2014)
SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30: 1660-1666
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Xu WL, Zhang DJ, Wu YF, Qin LX, Huang GQ, Li J, Li L, Li XB (2013) Cotton PRP5 gene encoding a proline-rich protein is involved
in fiber development. Plant Mol Biol 82: 353-365
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Yang Y, Smith SA (2013) Optimizing de novo assembly of short-read RNA-seq data for phylogenomics. BMC Genomics 14: 328
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2017 American Society of Plant Biologists. All rights reserved.