Plant Physiology Preview. Published on April 26, 2017, as DOI:10.1104/pp.17.00295 1 Short Title: Evolution of the plant HRGP superfamily 2 3 Corresponding Author 4 Carolyn J. Schultz, School of Agriculture, Food and Wine, The University of Adelaide, Waite 5 Research Institute, PMB1 Glen Osmond, SA, 5064, Australia. Ph: 61 8 448 909 881. Email address: 6 [email protected] 7 8 Insights into the evolution of hydroxyproline rich glycoproteins from 1000 plant transcriptomes 9 10 Kim L. Johnson1*, Andrew M. Cassin1*, Andrew Lonsdale1, Gane Ka-Shu Wong2, Douglas E. Soltis3, 11 Nicholas W. Miles3, Michael Melkonian4, Barbara Surek4, Michael K. Deyholos5, James Leebens- 12 Mack6, Carl J. Rothfels7, Dennis W. Stevenson8, Sean W. Graham9, Xumin Wang10, Shuangxiu Wu10, 13 J. Chris Pires11, Patrick P. Edger12, Eric J. Carpenter2, Antony Bacic1, Monika S. Doblin1^ and Carolyn 14 J. Schultz13^ 15 1 16 Parkville, Vic, 3010, Australia 17 2 18 Canada, and BGI-Shenzhen, Bei Shan Industrial Zone, Yantian District, Shenzhen, China 19 3 20 FL32611, USA 21 4 Botanical Institute, Cologne Biocenter, University of Cologne, D50674, Germany 22 5 Department of Biology, University of British Columbia, Kelowna BC, V1V 1V7, Canada 23 6 Department of Plant Biology, University of Georgie, Athens, GA3062, USA ARC Centre of Excellence in Plant Cell Walls, School of BioSciences, The University of Melbourne, Departments of Biological Sciences and Medicine, University of Alberta, Edmonton, Alberta, Florida Museum of Natural History, Department of Biology, University of Florida, Gainsville, Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. Copyright 2017 by the American Society of Plant Biologists 1 24 7 25 CA 94720, USA 26 8 New York Botanical Garden, 2900 Southern Blvd, Bronx, NY 10458, USA 27 9 Department of Botany, University of British Columbia, Vancouver BC, V6T1Z4, Canada 28 10 29 Academy of Sciences, Beijing 100101, China 30 11 31 MO 65211, USA 32 12 Department of Horticulture, Michigan State University, East Lansing, MI 48823, USA 33 13 School of Agriculture, Food and Wine, The University of Adelaide, Waite Research Institute, PMB1 34 Glen Osmond, SA, 5064, Australia 35 * co-first contributors, ^ co-senior authors. University Herbarium and Department of Integrative Biology, University of California, Berkeley, CAS Key Laboratory of Genome Science and Information, Beijing Institute of Genomics, Chinese Division of Biological Sciences and Bond Life Sciences Center, University of Missouri, Columbia, 36 37 One-sentence summary 38 Evaluation of hydroxyproline-rich glycoproteins (HRGPs) identified in 1KP transcriptomic datasets to 39 gain insight into their evolutionary history from chlorophyte algae to flowering plants. 40 41 List of Author Contributions 42 D.E.S., N.W.M., M.M., B.S., M.K.D., J.L.M., C.J.R., D.W.S., S.W.G., J.C.P., P.P.E., X.W. and S.W. 43 selected species for analysis and collected plant and algal samples and M.M., B.S., M.K.D., J.C.P., 44 C.J.R., D.W.S., S.W., extracted RNA for sequencing. J.L.M. organized transcripts into gene families 45 and S.W. screened RNA-seq data for contamination. E.J.C. and G.K.S.W. generated the sequence 46 data, E.J.C. setup and maintained 1KP web-resources used to communicate data and G.K.S.W. 47 oversaw the 1KP project. A.M.C. generated the HRGP website and K.L.J., A.M.C., M.S.D., C.J.S. Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 2 48 and A.B. designed and oversaw the research. A.L. provided bioinformatics support and assisted in 49 maintenance of the HRGP website. K.J.L., A.M.C., M.S.D. and C.J.S. analyzed the data and together 50 with A.B. wrote the manuscript. 51 52 Funding Information 53 The authors wish to acknowledge the 1000 plants (1KP) initiative, which is led by GKSW and funded 54 by the Alberta Ministry of Enterprise and Advanced Education, Alberta Innovates Technology Futures 55 (AITF), Innovates Centre of Research Excellence (iCORE), and Musea Ventures. KLJ, AMC, AL, 56 MSD & AB acknowledge the support of a grant from the Australia Research Council to the ARC 57 Centre of Excellence in Plant Cell Walls (CE1101007) that is partly funded by the Australian 58 Government’s National Collaborative Research Infrastructure Program (NCRIS) administered through 59 Bioplatforms Australia (BPA) Ltd. 60 Computation Initiative (VLSCI) grant number VR0191 on its Peak Computing Facility at the 61 University of Melbourne, an initiative of the Victorian Government, Australia. This research was supported by a Victorian Life Sciences 62 63 Corresponding Author 64 Email address: [email protected] 65 66 Keywords 67 Hydroxyproline-rich glycoproteins (HRGPs), arabinogalactan-proteins (AGPs), extensins (EXTs), 68 proline-rich proteins (PRPs), transcriptomics, plant evolution, 1000 plants (1KP), 69 glycosylphosphatidylinositol (GPI)-anchors, intrinsically disordered proteins (IDP) Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 3 70 Abstract 71 The carbohydrate-rich cell walls of land plants and algae have been the focus of much interest given 72 the value of cell wall based products to our current and future economies. Hydroxyproline-rich 73 glycoproteins (HRGPs), a major group of wall glycoproteins, play important roles in plant growth and 74 development, yet little is known about how they have evolved in parallel with the polysaccharide 75 components of walls. We investigate the origins and evolution of the HRGP superfamily, which is 76 commonly divided into three major multigene families: the arabinogalactan-proteins (AGPs), 77 extensins (EXTs) and proline-rich proteins (PRPs). 78 Using MAAB, a newly developed bioinformatics pipeline, we identified HRGPs in sequences from 79 the 1000 plants (1KP) transcriptome project (www.onekp.com). Our analyses provide new insights 80 into the evolution of HRGPs across major evolutionary milestones, including the transition to land and 81 early radiation of angiosperms. Significantly, data mining reveals the origin of GPI-anchored AGPs in 82 green algae and a 3-4 fold increase in GPI-AGPs in liverworts and mosses. The first detection of 83 cross-linking (CL)-EXTs is observed in bryophytes, which suggest that CL-EXTs arose though the 84 juxtaposition of pre-existing SPn EXT glycomotifs with refined Y-based motifs. We also detected the 85 loss of CL-EXT in a few lineages, including the grass family (Poaceae), that have a cell wall 86 composition distinct from other monocots and eudicots. A key challenge in HRGP research is tracking 87 individual HRGPs throughout evolution. Using the 1KP output we were able to find putative orthologs 88 of Arabidopsis pollen-specific GPI-AGPs in basal eudicots. 89 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 4 90 Introduction 91 Cell walls of plants and algae are widely used for food, textiles, paper and timber, yet our 92 understanding of their assembly and dynamic re-modelling in response to growth, development and 93 environmental stresses (abiotic and biotic) remains rudimentary (Doblin et al., 2010; Doblin et al., 94 2014). There are two contrasting types of walls/extracellular matrices in the green plant lineage, 95 protein-rich walls of some green algae and the cellulose-rich walls of embryophytes (land plants). 96 Recent studies of cell wall evolution suggest that the origins of many wall components occurred in the 97 streptophyte green algae prior to the evolution of embryophytes (Sørensen et al., 2010; Popper et al., 98 2011; Domozych et al., 2012) with elaboration of a pre-existing set of polysaccharides rather than an 99 entirely new polymer framework. Although both the ultrastructure of plant walls and the fine structure 100 of their component polymers varies widely, they all typically comprise a fibrillar network of cellulose 101 microfibrils that is embedded within a gel-like matrix of non-cellulosic polysaccharides, pectins and 102 (glyco)proteins; the latter being both structural and enzymatic (Somerville et al., 2004; Doblin et al., 103 2010). 104 105 Research to date has largely focused on the evolution of the polysaccharides, cellulose and the non- 106 cellulosic polysaccharides (hemicelluloses) and pectins, primarily through the availability of 107 polysaccharide epitope-specific antibodies coupled with immuno-fluorescence and/or transmission 108 electron microscopy and arraying techniques such as comprehensive microarray polymer profiling, 109 (CoMPP), and plate-based arrays (Moller et al., 2007; Pattathil et al., 2010; Moore et al., 2014). In 110 contrast, relatively little is known about the origin and evolution of the hydroxyproline-rich 111 glycoproteins (HRGPs), a major group of wall glycoproteins, despite their widespread occurrence 112 (Supplemental Table S1) and importance in plant growth and development (reviewed in Fincher et 113 al., 1983; Kieliszewski and Lamport, 1994; Majewska-Sawka and Nothnagel, 2000; Ellis et al., 2010; 114 Lamport et al., 2011; Draeger et al., 2015; Velasquez et al., 2015; Showalter and Basu, 2016). 115 Examination of these glycoproteins in a wider range of plant species should provide valuable insights 116 into how they have evolved in parallel with the polysaccharide components in plant walls. Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 5 117 118 HRGPs are commonly divided into three major multigene families, the highly glycosylated 119 arabinogalactan-proteins (AGPs), the moderately glycosylated extensins (EXTs) and minimally 120 glycosylated proline-rich proteins (PRPs). The protein backbones of these HRGPs are distinguished 121 by differences in amino acid bias and the repeat motifs characteristic to each family, and include the 122 post-translational modification of proline (Pro, P) to hydroxyproline (Hyp, O) (see Johnson et al., 123 companion paper). Each of these families is further elaborated by chimeric and hybrid HRGPs, 124 complicating not only the classification of members but also elucidation of their functions. 125 126 Classical AGPs consist of up to 99% carbohydrate due to the addition of large type II arabino-3,6- 127 galactan (AG) polysaccharides (degree of polymerization, DP, 30-120) on to the minor (1-10%) 128 protein backbone. The complexity of this family is immense and lies both in the diversity of protein 129 backbones containing AGP glycomotifs and the incredible heterogeneity of glycan structures and 130 sugars decorating these proteins. The specific binding of AGPs to the dye β-glucosyl Yariv reagent 131 (β-Glc Yariv) and the generation of AGP-glycan specific antibodies have provided valuable insight 132 into their function, structure and evolution (Supplemental Table S1) (Jermyn and Yeow, 1975; van 133 Holst and Clarke, 1985; Kitazawa et al., 2013; Paulsen et al., 2014). These studies suggest AGPs are 134 evolutionarily ancient. O-glycosylation occurs in bryophytes, some chlorophytes and brown algae as 135 shown by binding of β-Glc Yariv and/or isolation of Hyp-containing glycoproteins (Godl et al., 1997; 136 Ligrone et al., 2002; Lee et al., 2005; Shibaya and Sugawara, 2009; Herve et al., 2016) (Johnson et al., 137 companion paper). Classical AGPs are likely to exist in the chlorophyte algae as one of the first- 138 cloned HRGP genes (NCBI AAB23258.2) from Chlamydomonas reinhardtii, CrZSP1 (Woessner and 139 Goodenough, 1989), is predicted to be a GPI-AGP according to our criteria (Johnson et al., 140 companion paper). It is likely that AGPs in the chlorophyte algae have distinct functions from 141 streptophytes, as no large AG polysaccharides have been reported (Ferris et al., 2001). Short, 142 branched (Ferris et al., 2001) and linear Ara- and galactose (Gal)-containing oligosaccharides (Bollig 143 et al., 2007) have been isolated from C. reinhardtii walls. Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 6 144 145 The highly glycosylated AGPs are soluble and have been proposed to act as mobile signals, 146 potentially involved in cell-cell communication (Schultz et al., 1998; Motose et al., 2004; Pereira et 147 al., 2014). Linking of AGPs to the plasma membrane by a GPI-anchor is significant, as it points 148 towards their increased organization within the cell wall-plasma membrane interface. The finding of 149 GPI-AGPs in lipid ‘rafts/nano-domains’ in eudicots is intriguing as these have been proposed to 150 stabilize and increase the activity of proteins and protein complexes at the plasma membrane involved 151 in cell-cell communication, signal transduction, immune responses, and transport (Borner et al., 2005; 152 Grennan, 2007; Tapken and Murphy, 2015). It is tempting to speculate on a role for GPI-AGPs in 153 similar processes in all species in which they occur, although experimental evidence is largely 154 lacking. 155 156 AGPs have been most studied in embryophytes and have been isolated from a wide range of tissues, 157 including seeds, roots, stems, leaves, inflorescences, and are particularly abundant in the stems of 158 gymnosperms and angiosperms (Gaspar et al., 2001). The co-occurrence of specific glycan epitopes 159 with developmental stages, the expression pattern of AGP genes, and mutant studies has implicated 160 AGPs in many processes. These include roles in plant growth and development, cell expansion and 161 division, hormone signaling, embryogenesis of somatic cells, differentiation of xylem and responses 162 to abiotic stress. The heterogeneity of the AGP family and vast number of family members suggests 163 they are multifunctional, similar to what is found in mammalian proteoglycan/glycoproteins and 164 intrinsically disordered proteins (IDPs) (Filmus et al., 2008; Schaefer and Schaefer, 2010; Tan et al., 165 2012). Given that diverse members could have the same glycosylation and conversely that the same 166 backbone could have alternate glycosylated forms, determining the function of any single AGP and 167 the establishment of precise AGP function is a major challenge. Despite the magnitude of this task, 168 enormous progress has been made. For example, two well characterized AGPs are the pollen-specific 169 AtAGP6 and AtAGP11 that play important roles in pollen development and female tissue 170 interactions, and are suggested to be involved in multiple, integrated signaling pathways (Nguema- Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 7 171 Ona et al., 2012). Investigation of when these AGPs evolved would aid us to ascertain their specific 172 functions in pollen development. 173 174 The EXTs are an equally diverse family of HRGPs whose members contain repetitive SO3-5 motifs 175 that direct moderate glycosylation with Ara-oligosaccharides (arabinosides) (DP of 1 to 4). Classical 176 EXTs play important structural roles through cross-linking into the wall, facilitated by Tyr (Y)-based 177 motifs such as VXY and YXY (Tierney and Varner, 1987; Lamport et al., 2011). This cross-linking 178 into the wall can have essential roles in plant development, for example the requirement for 179 Arabdiopsis AtEXT3 during embryogenesis for cell plate formation (Cannon et al., 2008) and 180 AtEXT18 in pollen cell wall integrity (Choudhary et al., 2015). Cross-linking of EXTs into the wall is 181 proposed to result in the cessation of cell expansion, and yet some EXTs are important for cell 182 elongation in root hairs (Velasquez et al., 2011; Velasquez et al., 2015). 183 glycosylation on EXT-like motifs in the moss Physcomitrella patens has recently been shown through 184 mutants in Hyp O-arabinosyltransferases (HPATs) (MacAlister et al., 2016). Reduced levels of cell 185 wall-associated Hyp-arabinosides (as found in EXTs) in P. patens resulted in increased elongation of 186 protonemal tip cells. Therefore, different classes of EXTs have evolved different biological roles and 187 these functions are governed by both their protein backbone and glycosylation sequences and patterns. The importance of 188 189 The first EXTs were isolated from tomato (Lamport, 1977) and proteins with blocks of SO3-5 motifs 190 have been extracted from other angiosperms such as tobacco, carrot, sugar beet and alfalfa (Chen and 191 and Varner, 1985; Smith et al., 1986; Li et al., 1990) as well as other green plant lineages, including 192 gymnosperms (Fong et al., 1992), the likely sister group to angiosperms; and chlorophycean alga 193 (Woessner and Goodenough, 1989), representing the sister to the remaining green plants. EXTs from 194 these other green plant lineages can have distinct features from embryophyte EXTs, and we use the 195 term cross-linking (CL)-EXTs to distinguish EXTs with alternating SOn and Y-based motifs, which 196 predominantly occur in land plants, from the HRGPs that have Y residues in a different domain from Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 8 197 the SOn-rich domains, as occurs in volvocine algae (see Fig. 1, Johnson et al., companion paper). 198 Exceptions may exist, for example no ‘classical’ EXTs were detected in the Hyp-poor cell walls of 199 maize, which also do not have detectable isodityrosine (IDT), an intramolecular cross-link that occurs 200 between Y-residues in EXTs (Kieliszewski et al., 1990; Kieliszewski and Lamport, 1994). 201 202 The PRPs are the most difficult family of HRGPs to define, as very few have been studied to date. 203 Investigation of PRPs has largely been restricted to legumes and Arabidopsis where they have been 204 shown to be minimally glycosylated, if at all (Averyhart-Fullard et al., 1988; Datta et al., 1989; Kleis- 205 San Francisco and Tierney, 1990; Lindstrom and Vodkin, 1991; Fowler et al., 1999). The majority of 206 PRPs are chimeric as they contain an Ole PFAM domain, and they have been implicated in roles in 207 root development (Bernhardt and Tierney, 2000), stress responses (Li et al., 2016), fiber development 208 (Xu et al., 2013) and nodule formation (Sherrier et al., 2005). Despite these important functions, 209 information on the origins and evolution of PRPs remains fundamentally unexplored. 210 211 Collectively, the functional characterization of HRGPs points towards their involvement in many 212 processes within the wall, including signaling and cell wall integrity pathways (Costa et al., 2013; 213 Pereira et al., 2015; Pereira et al., 2016; Voxeur and Hofte, 2016). As plants evolved multicellularity 214 and transitioned to land it is likely that greater complexity in signaling networks was required, a role 215 that could seemingly be fulfilled by HRGPs due to their distinct protein and glycan characteristics. It 216 is currently unknown if the occurrence of specific HRGPs is associated with evolutionary milestones 217 and specialization of plant cell walls. Greater insight into the origins, evolution and diversity of 218 HRGPs will enable researchers to address these key issues. 219 220 To study HRGP function throughout plant evolution, we developed a robust method for their detection 221 that exploits the amino acid bias of the predicted protein backbone and characteristic glycomotifs (see 222 Johnson et al., companion paper). The motif and amino acid bias (MAAB) pipeline classifies HRGP Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 9 223 sequences into one of 24 pre-defined descriptive sub-classes (23 HRGP classes and one MAAB class) 224 and is optimized for recovery of GPI-AGPs, non-GPI-AGPs and cross-linking (CL)-EXTs. Here, we 225 use this automated bioinformatics pipeline to identify and classify HRGPs within the extensive 1000 226 plant (1KP) transcriptome data. 227 228 The 1KP project (www.onekp.com) has generated large-scale transcriptomic data for over 1000 plant 229 and algal species, thereby providing deeper sampling of key plant taxa beyond those with sequenced 230 genomes (Johnson et al., 2012; Matasci et al., 2014; Wickett et al., 2014). Species selected include 231 mostly non-model land plants, many of which are important for medicine, agriculture, forestry and 232 biodiversity/conservation, as well as red algae (Rhodophyta), green algae (chlorophytes and algal 233 streptophytes) and taxa from a small group of freshwater unicellular algae known as glaucophytes - 234 collectively termed the Archaeplastida (Plantae). Analysis of the 1KP data is already helping resolve 235 long-standing questions, such as from which streptophyte green algal lineage the progenitors of land 236 plants likely arose (Wickett et al., 2014). 237 unprecedented opportunity to investigate HRGP superfamily evolution and determine if changes 238 coincide with major evolutionary transitions in the Archaeplastida. The 1KP transcriptome dataset therefore offers an 239 240 A tandem repeat annotation library (TRAL) is also used to identify repeats in MAAB sequences and 241 support findings of selective loss and/or gain of CL-EXTs in bryophytes and commelinid monocots. 242 A combination of BLAST and profile hidden Markov models was used to search for putative 243 orthologs of pollen-specific AtAGP6/11 in angiosperms (flowering plants) with hits validated by 244 phylogenetic analysis. These vignettes demonstrate the broad utility of the bioinformatics pipeline on 245 the 1KP data, providing evolutionary insights into the complexity of the HRGP superfamily. 246 247 Results and Discussion 248 MAAB pipeline for GPI-AGPs and CL-EXTs Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 10 249 Using the newly developed bioinformatics tool, the motif and amino acid bias (MAAB) pipeline (see 250 Johnson et al., companion paper), we evaluated the 1KP dataset for HRGPs. We set parameters in 251 MAAB to best detect the classical, non-chimeric GPI-AGPs, CL-EXTs, PRPs and non-GPI-AGPs 252 (classes 1-4, respectively) as well as the hybrid HRGPs (classes 5-23) using features of the well 253 characterized Arabidopsis sequences. MAAB was validated using 15 proteomes from Phytozome and 254 shown to reliably detect and accurately classify HRGPs into 23 descriptive sub-classes (see Johnson 255 et al., companion paper). MAAB represents a significant advance of previous methods to detect 256 HRGPs in being able to identify all classes in one pipeline, without manual curation. 257 258 MAAB did not identify any false positives in Phytozome datasets and evaluation of a subset of the 259 1KP datasets also suggests a very low rate of false positives (see Johnson et al., companion paper). In 260 Ranunculales, a well-sampled basal eudicot order (39 samples derived from 20 species), we analyzed 261 143 non-redundant GPI-AGP sequences (five Ranunculaceae and six Papaveraceae datasets; 262 Supplemental Fig. S1A) and observed 98.6% true positives, with only two sequences not being clear 263 GPI-AGPs (class 1) and four other sequences flagged as likely contaminants based on their 264 appearance in multiple unrelated datasets (see legend, Supplemental Fig. S1). The rate of false 265 positives for CL-EXTs (class 2) was estimated based on analysis of 45 non-redundant bryophyte 266 sequences (Supplemental Fig. S2) derived from the larger set of 67 identified within the 1KP data 267 (Fig. 1). Shading of motifs reveals that 42/45 sequences (93.3%) have the expected CL-EXT motifs 268 throughout the entire backbone. The three remaining sequences could be considered CL-EXTs, or 269 hybrid CL-EXTs-AGPs or AGPs as they have more AGP motifs than SPn motifs, but both motifs 270 types are present (Supplemental Fig. S2). In either case, they are definitive HRGPs suggesting the 271 true positive rate is 100% (45/45), although we cannot rule out that the sequences are partial chimeric 272 HRGPs until full length sequences become available. 273 274 Using the MAAB pipeline to identify and classify HRGPs in transcriptomic datasets spanning 275 large evolutionary timescales Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 11 276 Across the entire 1KP data, 45304 HRGPs (class 1 to 23) were identified (Fig. 1) using a multiple k- 277 mer approach (see Johnson et al., companion paper). The majority of sequences detected were in 278 HRGP classes 1 to 4 (38051) and MAAB class 24 (39933), with a grand total of 85237 sequences 279 across all 24 MAAB classes. We compared HRGPs identified for each of the 1KP taxonomic groups 280 (hereafter 281 http://www.onekp.com/samples/list.php), by calculating the percentage of sequences in the classical 282 (class 1-4) and minor HRGP classes (class 5-23) and the final MAAB class (class 24) out of the total 283 sequences classified by MAAB (Fig. 2A). The majority of 1KP groups had a similar proportion of 284 sequences in classes 1-4 (avg. ~50%), 5-23 (avg. ~10%) and 24 (avg. ~40%). However, the algal 285 groups including those well represented in the 1KP dataset (green algae, comprising chlorophytes and 286 algal streptophytes, and red algae and Chromista) exhibited a trend towards lower percentages of 287 HRGP sequences (classes 1-4 and 5-23) and a higher percentage of sequences in MAAB class 24 (Fig. 288 2A). The relatively low number of GPI-AGPs (class 1) and higher number of class 4 sequences most 289 likely reflects underlying functional differences in algal AGPs. The high number of class 24 290 sequences, likely either non-HRGPs or unknown HRGPs, suggests that algae have a more diverse 291 array of IDPs than land plants. The possibility that HRGP transcripts have not been detected within k- 292 mer assemblies is also possible and could be due to sequence quality and/or sampling issues (lower 293 than optimal number of datasets and tissue/s, see below). 1KP groups, which includes clades and grades, see 294 295 As a measure of the depth of taxon sampling, the number of taxonomic orders represented and the 296 number of samples within each 1KP group is reported in Fig. 2B. The green alga group are the best 297 represented of the algae with 34 orders and 152 samples included in our analyses and hence provide 298 the most comprehensive dataset for comparison. The average number of sequences for all MAAB 299 classes (1-24) is summarized by 1KP group (Fig. 2C). As each group represents a variable number of 300 species, and not all species in a group have the same HRGPs, we have also summarized the detection 301 rate as a heat map by 1KP group (Fig. 2C) and by taxonomic order (Supplemental Table S2). This 302 allowed us to assess whether detection of a particular HRGP class is either real or spurious. If ≥50% of Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 12 303 species in a given taxonomic order have ≥1 sequence for a given HRGP class, we report a >50% 304 detection rate, and assume this HRGP class is widespread throughout the taxonomic order. The few 305 class 2 (CL-EXTs) sequences identified in green algae and Chromista are likely sporadic contaminants 306 based on the low mean number of CL-EXTs (0.1 for green algae, 0.2 for Chromista) (Fig. 2C), low 307 detection rates (Fig. 2C; Supplemental Table S2) and BLASTp analysis revealing high sequence 308 similarity to vascular plant CL-EXTs (data not shown). 309 310 Here we present a preliminary analysis of the minor HRGP classes (5 to 23), and will then focus on 311 the major HRGP classes (1 to 4). Classes 5 to 23 include ‘hybrid’ type HRGPs with motifs associated 312 with two or more classes of HRGP (see Fig 4, Johnson et al., companion paper). For example, class 5 313 HRGPs contain AGP-bias with CL-EXT motifs (both SP3-5- and Y-based), and are most highly 314 represented in gymnosperms (conifers, Gnetales, Ginkgo and Cycadales) (Fig. 2C). In general, the 315 detection of classes 5 to 23 is low and sporadic for most clades with the exception of conifers and core 316 eudicots. In these clades a high detection rate (>50%) of sequences occurs for most HRGP classes 317 (Fig. 2C). This suggests that more hybrid/unusual HRGPs occur in these clades, although the low 318 mean numbers indicate that these sequences are found in a few, rather than all datasets within core 319 eudicot and conifer orders. Datasets from the 1KP green algae group have a relatively high number of 320 sequences in class 6 (1.4 ± 4, AGP-bias, EXT glycomotifs (SPn) but not Y-based cross-linking motifs) 321 (Fig. 2C), which is consistent with our current knowledge of HRGPs isolated from Chlamydomonas 322 and Volvox (Adair et al., 1983; Woessner and Goodenough, 1989; Woessner et al., 1994; Ferris et al., 323 2005). The majority of CL-EXTs identified in Arabidopsis are not predicted to be GPI-anchored (see 324 Table I; Supplemental Table S2; Johnson et al., companion paper) and consistent with this, only a few 325 GPI-anchored EXTs (class 9) were observed in Phytozome and 1KP data. Only one class, GPI- 326 anchored PRPs (class 14), had no representatives from any 1KP group, which is consistent with our 327 current, albeit limited, knowledge of PRPs (Averyhart-Fullard et al., 1988; Datta et al., 1989; Fowler 328 et al., 1999). 329 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 13 330 The mean number of GPI-AGPs and CL-EXTs in the 1KP data was lower than expected, particularly 331 in the core eudicots/rosids (~4 and 7 respectively, Fig. 2C), compared to the numbers in the majority 332 of eudicot and commelinid genomic datasets (see Table I, Johnson et al., companion paper). We 333 suspected that this might be due to the restricted number of tissue types used for sequencing and so we 334 investigated a subset of 1KP datasets in the Ranunculales (39 samples derived from 20 species). For 335 each dataset, we manually removed likely redundant sequences (see Methods), reducing the mean 336 number of GPI-AGP sequences from 7.3 to 4.8 (Supplemental Fig. S1A) and CL-EXTs from 5.1 to 337 2.9 (Supplemental Fig. S1B). 338 positioned deletions (data not shown). The majority of the 1KP angiosperm datasets are from leaf 339 tissue, yet this tissue has the lowest average number of GPI-AGPs (4.0), with 4.4 in roots, 5.0 in 340 flowers, and 6.0 in developing fruits (Supplemental Fig. S1A). Leaf tissue datasets also had a low 341 number of CL-EXTs (2.6) compared to roots (3.8) and fruits (3.3) (Supplemental Fig. S1B). When 342 all unique sequences are tallied across the four Papaver spp, 12 GPI-AGPs (Supplemental Fig. S1C) 343 and six CL-EXTs were identified (Supplemental Fig. S1D). Thus, the total numbers of GPI-AGPs 344 and CL-EXTs observed in most 1KP datasets are under-estimates due to limited tissue sampling, 345 despite the likely sequence redundancy that was not routinely removed by MAAB pipeline processing. Most of the redundant sequences contained variably sized and 346 347 Significantly, the greater species sampling within 1KP mostly corroborated our Phytozome 348 observations and allow us to draw some major conclusions: (1) AGPs are widespread, both GPI-AGPs 349 and non-GPI-AGPs being convincingly found in the green algae group as well as Chromista and 350 Glaucophyta; (2) CL-EXTs occur in most groups of extant land plants including bryophytes (non- 351 vascular plants); and (3) non-chimeric PRPs (class 3) are uncommon and largely confined to eudicots 352 (Fig. 2C). 353 354 GPI-AGPs evolved early in the green plant lineage 355 Our data show that GPI-AGPs are present in all algal groups sampled in the 1KP project with the 356 exception of the Rhodophyta. Absence of GPI-AGPs in red algae is consistent with several other Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 14 357 independent studies. A detailed analysis of the completed genome of two unicellular red algae, 358 Cyanidioschyzon merolae (Hashimoto et al., 2009; Ulvskov et al., 2013) and Galdieria sulphuraria 359 (Ulvskov et al., 2013) revealed that they lack the key enzymes for the synthesis of GPI-anchors, and 360 therefore GPI-AGPs would not be expected to be present in these algae. To our knowledge there are 361 no reports in the literature of red algal AGPs based on either AGP glycan epitopes or β-Glc Yariv 362 staining (Supplemental Table S1). AGPs are not well known in the Chromista, which include brown 363 algae, although there are reports of low levels of Hyp (<0.04%) in the soluble fraction of cell walls 364 from a variety of brown algal species (Gotelli and Cleland, 1968). Also, in a recent study, O-linked 365 glycosylation typical of AGPs has been detected in five species of brown algae and analysis of the 366 Ectocarpus proteome show that proteins containing AGP-like glycomotifs are present (Herve et al., 367 2016). A similar distribution of AP, SP and TP motifs is seen in all 1KP groups from Chromista to 368 eudicots (see Supplemental Fig. S5, Johnson et al., companion paper), suggesting that AGP-type 369 glycomotifs are evolutionarily ancient. 370 371 A consistent trend in genomic and transcriptomic datasets was the decrease in the number of non-GPI- 372 AGPs from the green algae group to vascular plants and the opposite trend (an increase) for GPI- 373 AGPs, including a 3-4-fold increase in GPI-AGPs in liverworts and mosses (approximately 4 374 sequences per species), compared to the green algae group and hornworts (Figs. 1 and 2). The green 375 algae (chlorophytes and algal streptophytes) are well represented in the 1KP data, including several 376 species from Volvocaceae (2 species, 3 samples) and Chlamydomonaceae (5 species, 5 samples). The 377 average number of GPI- and non-GPI-AGPs in the 1KP sample of Chlamydomonas spp is 0.8 and 378 43.6, respectively, whereas in Volvox spp it is 1 and 78.3, respectively (Data file 6). It is possible that 379 some of the non-GPI-AGPs could be partial chimeric HRGPs rather than true non-GPI-AGPs; further 380 work is needed to resolve this issue. 381 382 Functional specification of selected AGPs may have been established early in green plant evolution. 383 For example, AGPs have been shown to be involved in apical growth in the moss Physcomitrella Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 15 384 patens (Lee et al., 2005) and some vascular plants (Nguema-Ona et al., 2012). β-Glc Yariv staining in 385 eight liverwort species revealed ubiquitous tissue staining, with darker staining in apices of some 386 species, suggesting that apical targeting of AGPs may be a feature that arose in bryophytes (hornworts, 387 mosses and liverworts) (Basile and Basile, 1987). AGPs have also been implicated in cell plate 388 formation in liverworts (Shibaya and Sugawara, 2009). Since apical targeting as well as AGP delivery 389 to phragmoplasts (a scaffold for cell plate assembly) exists in streptophyte algae (Charales, 390 Coleochaetales and Zygnematales), it is possible that these two functions of AGPs evolved earlier than 391 the emergence of bryophytes (Leliaert et al., 2012; Bowman, 2013). 392 393 It should be noted, however, that β-Glc Yariv staining and glycan epitope labelling can provide only 394 general information about presence/absence of AGP-specific glycans, as these tools are not able to 395 distinguish the different sub-classes of AGPs. The information revealed by MAAB in the 1KP data 396 detailing the suite of AGPs present in bryophytes in addition to the increasing use of the bryophyte 397 models P. patens (Kofuji and Hasebe, 2014) and Marchantia polymorpha (Ishizaki et al., 2016) will 398 facilitate molecular approaches to investigate the function of specific AGPs through, for example, 399 mutant studies. 400 401 Origin and selective loss of CL-EXTs 402 Previous studies of HRGPs show that EXT-like Hyp-arabinosides are evolutionarily ancient and are 403 found in chlorophyte algae, such as Chlorella vulgaris (Lamport and Miller, 1971) and C. reinhardtii 404 (Ferris et al., 2001; Bollig et al., 2007). Our data suggest that CL-EXTs arose in bryophytes, as no 405 CL-EXTs were detected in the two volvocine (chlorophyte green algal) genomes (see Table I, Johnson 406 et al., companion paper) and only contaminating sequences were detected in algal 1KP datasets (Fig. 407 2). EXT epitopes have been identified in some chlorophyte algal species (Domozych et al., 2009; 408 Sørensen et al., 2011) (Supplemental Table S1) but this likely reflects the presence of chimeric 409 and/or hybrid EXTs rather than CL-EXTs. Similar to the Phytozome analysis, hybrid HRGPs with Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 16 410 both AGP and EXT glycomotifs were found in the 1KP green algae group (e.g. RYJX_Locus_857 411 (Pandorina morum) and VALZ_Locus_7817 (C. noctigama), both class 19). Where Y residues are 412 present, these are found in a separate domain to the P-rich domain (see Fig. 1, Johnson et al., 413 companion paper). This result suggests that CL-EXTs arose though the juxtaposition of pre-existing 414 SPn EXT glycomotifs with refined Y-based motifs (e.g. YXY and VXY), rather than via the 415 simultaneous evolution of both protein motifs. 416 417 To gain a better understanding of the occurrence of CL-EXTs in bryophytes, the phylogenetic 418 distribution of CL-EXTs present in the 1KP data was assessed in more detail (Fig. 3). CL-EXTs were 419 detected by the MAAB pipeline in all five hornwort species representing two taxonomic orders 420 (Supplemental Fig. S2) whereas they were identified in only 5 of 16 orders of mosses (Hypnales, 421 Pottiales, Diphysciales, Buxbaumiales, and Sphagnales) and 2 of 8 orders of liverworts (Marchantiales 422 and Sphaerocarpales) that were sampled by 1KP (Supplemental Table S2). This relatively low 423 detection rate in mosses and liverworts is unlikely to be due to low species representation as it is 424 comparable to the sampling depth in hornworts (Fig. 3). The data therefore suggest that while CL- 425 EXTs are widespread in hornworts, they have a more limited distribution within the moss and 426 liverwort lineages. If hornworts are the sister group of other land plants (Wickett et al., 2014), then 427 this may imply that CL-EXTs were lost from some lineages of liverworts and mosses. An alternative 428 explanation is that CL-EXT genes are present in moss and liverworts but are expressed at very low 429 levels and were therefore not detected. 430 431 To exclude the possibility that MAAB missed partial CL-EXT sequences due to a strict criterion for an 432 ER signal sequence (see Johnson et al., companion paper), a tandem repeat annotation library (TRAL) 433 was generated to identify CL-EXT repeat motifs (Schaper et al., 2015). The TRAL library identified 434 all repeats in class 2 sequences that were longer than 10 amino acids, present in ≥4 copies and 435 significantly different from random sequence evolution (p-value < 0.05; see Methods). A library for Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 17 436 MAAB class 24 sequences was also generated as a control. Each repeat was used to construct a 437 circular sequence profile hidden Markov model (cphmm) and these models were used to search the 438 MAAB input data (see Data file 2, Johnson et al., companion paper) to identify all sequences that 439 contain these repeats. A simple tool for browsing the results of TRAL is available at 440 http://services.plantcell.unimelb.edu.au/hrgp/index.html and this tool was used to search all the 441 bryophyte species for TRAL hits. 442 443 There is a good correlation between the number of species with CL-EXTs from MAAB and those with 444 SPn-/Y-based repeats identified by TRAL (Fig. 3; Data file 6). Most importantly, there was no 445 increase in the number of bryophyte species containing TRAL repeats (in the MAAB input data; see 446 Data file 2, Johnson et al., companion paper) compared to MAAB class 2 sequences, suggesting that 447 only a small number of partial CL-EXTs are excluded by the MAAB pipeline. 448 449 TRAL repeats from class 24 were also searched as these could identify both novel HRGP motifs and 450 other repeats of interest that may have been excluded by the MAAB-imposed filters. The majority of 451 bryophyte species had hits to TRAL repeats generated from class 24 sequences; however, none 452 contained both SPn and Y repeats and only a few have repeats containing Y, suggesting that class 24 453 contains few, if any CL-EXTs. Combined, TRAL and MAAB outputs indicate that there are a number 454 of moss and liverwort species that lack CL-EXTs. This leads us to suggest that there has been either 455 selective loss (e.g. Jungermanniales and most species from Hypnales) or multiple independent gains 456 (e.g. Sphagnales and Marchantiales) of CL-EXTs in mosses and liverworts. Confirming this result 457 using genome sequencing of selected species and investigation of the morphology and development of 458 mosses and liverworts with and without CL-EXTs would be an interesting avenue for future research. 459 These non-vascular plants may also help uncover the role of CL-EXTs with different sequence motifs 460 and repeat lengths. 461 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 18 462 CL-EXTs have been lost in many grass species 463 Evaluation of Phytozome data shows CL-EXTs containing both SPn≥3 and Y-based motifs are found in 464 almost all eudicot species, but are surprisingly absent from Eucalyptus grandis (eudicot/rosid), 465 Aquilegia coerulea (basal eudicot), and the four species of grasses (monocot/commelinid; Poaceae). 466 No CL-EXTs were found in either P. patens or the two volvocine algal species (see Table I, Johnson 467 et al., companion paper). The low number of HRGPs in some of these species is suspected to be due 468 to annotation and assembly issues, based on our detection of CL-EXT-like open reading frames in 469 these genomes using 1KP-identified CL-EXTs as query sequences (see Methods). In contrast, this 470 approach was not successful for genomes from the commelinid monocots ((APG IV, 2016); 471 designated as ‘monocots/commelinids’ in the 1KP group nomenclature). This absence is convincing 472 because there are 17 different grass species (21 datasets) in the 1KP data and one species, Eleusine 473 coracana, is represented by four different datasets (leaves, flowers, un/stressed roots) (see Data file 3, 474 Johnson et al., companion paper). As an additional check we employed BUSCO (Benchmarking 475 Universal Single-Copy Orthologs), an analysis tool that quantitatively assesses transcriptome 476 completeness based on the presence of near-universal single-copy orthologous genes selected from 477 OrthoDB (Simao et al., 2015). Between 65-92% (excluding three outliers) of 956 single-copy genes 478 were identified in the monocots/commelinids datasets (Fig. 4B). The value for the Poaceae was at the 479 high end of this range (median 88.4%, mean 83.8%), suggesting that the grass transcriptome 480 assemblies are high quality and that CL-EXTs should have been detected if they were present. Our 481 findings are also supported by a recent bioinformatics study of Z. mays and O. sativa genomes, where 482 no CL-EXTs were identified (Liu et al., 2016). 483 484 Primary cell walls of commelinid monocots have a composition distinct from eudicots such as 485 Arabidopsis, and gymnosperms (Bacic et al., 1988; Carpita and Gibeaut, 1993; Doblin et al., 2010). 486 We therefore investigated whether an absence of CL-EXTs was also observed in other commelinid 487 families represented by 1KP. As for Poaceae, CL-EXTs were not detected in the sedge family 488 Cyperaceae (total of three species), and three other families within Poales, although they were Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 19 489 identified in Bromeliaceae and Restionaceae (1 species each) (Fig. 4A). CL-EXTs were also not 490 detected in two families within the Zingiberales (Zingiberaceae and Marantaceae). The commelinid 491 monocots are a major clade within the larger monocot group (APG IV, 2016). As CL-EXTs are 492 absent from at least four grass genomes, including the Sanger-sequenced rice genome (see Table I, 493 Johnson et al., companion paper), and occur widely in the 1KP monocot group (which excludes the 494 commelinid monocots) (Fig. 2; Supplemental Table S2; Data file 6), we suggest that there has been 495 selective loss of CL-EXTs within the commelinid monocots. Further studies are required to confirm 496 their absence from specific commelinid species. 497 498 It is likely that the cross-linking function of CL-EXTs would need to be replaced by something else in 499 commelinid monocots to provide the necessary scaffold for cell wall development. 500 candidates are monomeric phenolic acids such as the hydroxycinnamic acids (ferulic and p-coumaric) 501 of commelinid monocot walls, absent in the non-commelinid monocot, eudicots (with the exception of 502 the Caryophyllales) and gymnosperm walls, which form covalent cross-links between arabinoxylan 503 chains (reviewed in Bacic et al., 1988; Harris, 2005). Additional experimentation is needed to test this 504 hypothesis. Obvious 505 506 Orthologs of GPI-AGPs and CL-EXTs in Brassicales 507 The broad sampling of species in the 1KP data provides an opportunity to investigate how far 508 orthologous relationships of selected sequences can be traced in plant lineages (Wickett et al., 2014). 509 Given IDPs are usually poorly conserved throughout evolution we wanted to test if putative HRGP 510 orthologs could be detected. As Arabidopsis HRGP sequences are the most well characterized, we 511 undertook phylogenetic analysis of Arabidopsis and GPI-AGPs and CL-EXTs identified by MAAB in 512 the Brassicales (see Methods) (Fig. 5). The maximum likelihood (ML) tree of the GPI-AGPs shows 513 that sequences from Brassicaceae datasets group most closely with the Arabidopsis sequences, 514 generally with good (>70%) bootstrap values. Sequences from other Brassicales families cluster Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 20 515 further away according to their evolutionary relatedness and with lower bootstrap support (Fig. 5A). 516 Although showing low bootstrap support, AtAGP59, the additional GPI-AGP identified by MAAB 517 analysis, clusters with AGP6 and AGP11 (Fig. 5A). 518 519 The phylogenetic analysis of Brassicales CL-EXTs revealed a similar hierarchical pattern to the GPI- 520 AGPs, with Brassicaceae sequences generally clustering nearer to the Arabidopsis sequences, but with 521 variable bootstrap support (Fig. 5B). 522 Arabidopsis sequence, making prediction of orthologous relationships difficult, even within the 523 Brassicaceae. This is perhaps not surprising since tandem repeat proteins often evolve more rapidly 524 than other proteins (Moesa et al., 2012; van der Lee et al., 2014). The clustering pattern of GPI-AGPs 525 in the Brassicales tree (Fig. 5A) largely reflects that observed for the Arabidopsis sequences (see Fig. 526 2C, Johnson et al., companion paper). This suggests that despite the low bootstrap values it may be 527 possible to identify orthologous sequences using phylogenetic analysis over larger evolutionary 528 distances, such as within angiosperms (flowering plants). Some CL-EXT 1KP sequences did not group with an 529 530 Detection of orthologs in angiosperms: AtAGP6/11 as an example 531 One pair of Arabidopsis GPI-AGPs, AtAGP6 and 11 (see sub-clade AGP-h, Fig. 2C, Johnson et al., 532 companion paper), have been well studied as they are involved in pollen development and pollen tube 533 growth and therefore impact reproductive capacity (Levitin et al., 2008; Coimbra et al., 2009; Coimbra 534 et al., 2010). Orthologs of these genes could therefore be expected to be found in a broader subset of 535 eudicots and potentially early angiosperms, depending on gene conservation. Although a recent 536 attempt to find putative orthologs of AtAGP6/11 in Quercus suber (cork oak; core eudicots/rosids, 537 Fagales) pollen EST data was unsuccessful (Costa et al., 2015), we predicted that it should be possible 538 to identify orthologs in the 1KP data, particularly within datasets that include floral tissue. A 539 combination of BLAST (Phytozome and NCBI) and profile hidden Markov (HMMER) models (see 540 Methods) were used to search for putative AtAGP6/11 orthologs in the 1KP multiple k-mer data Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 21 541 (MAAB input and output, see Data files 2, 3, 4, Johnson et al., companion paper). In some cases, 542 additional sequences were identified by BLAST from other sources (for example, the Amborella 543 trichopoda genome) for use as full-length evolutionary markers (see Methods). The HMMER models 544 were optimized by varying the input sequences and the length of the aligned region (see Methods). 545 The best results were obtained using full-length AtAGP6/11-type proteins (including the N-terminal 546 ER and C-terminal GPI-anchor signal sequences) from the Brassicaceae family (model 1; 547 Supplemental Table S3). This was measured by a higher ‘true’ positive and lower false positive rate, 548 as assessed by manual curation (Supplemental Table S4; see Methods). 549 550 The resulting sequences were subjected to phylogenetic analysis and the ML tree displayed in Fig. 6 551 shows the relatedness of a large sub-set of AtAGP6/11-like sequences derived from 1KP data and 552 sequenced genomes. Almost all the eudicot sequences group together with AtAGP6/11. The monocot 553 and monocot commelinid sequences form a separate sub-clade with AtAGP58 and 59 (Fig. 6), albeit 554 with low bootstrap support. 555 samples from Fagales, the order that includes Q. suber, including those containing floral tissue 556 (Supplemental Table S4) despite two similar sequences being identified in the Q. robor genome 557 (Qurob_scaffold_362 and 554). Only two sequences identified by HMMER model 1 were apparent 558 false positives (OHAE_12357 and 4810; Polygala lutea, Fabales), positioned in a sub-clade with 559 Arabidopsis AtAGP1/2/4/5 and OsAGP1 from rice (Fig. 6). The relatively clear separation of putative 560 AtAGP6/11 orthologs from the other GPI-AGPs suggests that phylogenetic analysis may provide a 561 reasonable prediction of gene orthology for these IDPs. In this case, potential AtAGP6/11 orthologs 562 could be confidently detected in both eudicot and monocot lineages, suggesting the existence of an 563 ancestral gene in a common ancestor prior to their divergence ~140-150 Myr ago (Chaw et al., 2004). 564 However, no potential AtAGP6/11 ortholog was identified in Amborella trichopoda, which is the 565 sister group of all other angiosperms. Unfortunately, all eight 1KP datasets from ‘basal’ angiosperms 566 (members of Amborellaceae, Nymphaeales, Austrobaileyales) were derived from either leaf or shoot 567 tissue, not floral tissue, and A. trichopoda is the only basal angiosperm genome that is publically Notably, no AtAGP6/11-like sequences were detected within 1KP Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 22 568 available. Our demonstrated success at finding more HRGPs using multiple k-mer assemblies (see 569 Johnson et al., companion paper) suggests that using this approach to analyze existing and future 570 basal-most angiosperm floral transcriptome datasets would be advantageous before concluding an 571 absence of AtAGP6/11 orthologs amongst this group. 572 573 To explore the robustness of the AtAGP6/11 sub-clade further, all GPI-AGPs from the 1KP datasets 574 reported in Fig. 6 were used to produce a larger angiosperm GPI-AGP tree (Supplemental Fig. S3). 575 As expected the bootstrap values were low, however, most of the additional 290 GPI-AGP sequences 576 grouped into the sub-clades observed in Fig. 2C of our companion paper (see Johnson et al.). These 577 reflect putative Arabidopsis orthologs for AtAGP17/18, AtAGP9, AtAGP4/7/10, AtAGP2/3, 578 AtAGP58, AtAGP25/26/27, AtAGP1 and AtAGP6/11/59. Importantly, despite the low bootstrap 579 support, the majority of 1KP sequences in the AtAGP6/11 sub-clade in Fig. 6 also group with 580 AtAGP6/11 in the angiosperm GPI-AGP tree (Supplemental Fig. S3), suggesting that these 581 sequences are indeed the most similar and therefore likely to be AtAGP6/11 orthologs within the 582 analyzed transcriptomes. 583 584 Investigation of the GPI-AGP sequence alignment and further analysis of each individual GPI-AGP 585 sub-clade revealed differences in the presence and distribution of specific amino acids (e.g. K, M, Q, 586 and D/E) among sub-clade members (Fig. 7, Supplemental Fig. S4). For example, putative orthologs 587 of AtAGP6/11 from both eudicots and some monocots share scattered K residues, concentrated in the 588 first half of the protein followed by a region containing several acidic residues (Fig. 7; Supplemental 589 Fig. S4K). 590 591 The scattered K residues are noticeably absent from the putative orthologs from Q. robur 592 (Supplemental Fig. S4A,K). Further analysis is therefore required to confirm these sequences, and 593 other putative AtAG6/11 genes in order to determine the ‘true’ AtAGP6/11 orthologs. With additional Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 23 594 taxonomic and appropriate tissue sampling, it should be possible to fill the evolutionary gaps and trace 595 the origin of AtAGP6/11 genes, as well as other HRGPs. Complementation of angiosperm mutant 596 lines, for example, the agp6 agp11 double mutant of Arabidopsis that displays pollen phenotypes 597 (Levitin et al., 2008; Coimbra et al., 2009; Coimbra et al., 2010), with progenitor AtAGP6/11 genes 598 will be essential to test their functional orthology and may reveal information about these motifs, and 599 possibly the specific glycosylation required for HRGP function in particular tissue types. 600 601 Conclusions 602 HRGPs: remaining challenges and future perspectives 603 Despite being minor components of the wall, the HRGPs make a significant contribution to wall 604 properties and to cellular identity (Knox et al., 1991). With access to the 1KP data we have made 605 significant progress towards understanding the evolutionary timeframe of when this large gene family 606 of diverse and complex proteins likely evolved. We have provided a platform to guide experimental 607 approaches to confirm our data and answer questions as to how HRGPs evolved, as well as 608 investigating the losses of major classes of HRGPs through evolution (e.g. loss of CL-EXTs in 609 mosses, liverworts and commelinid monocots). 610 611 Tracking individual HRGPs throughout evolution is a major challenge. Our investigation of 612 AtAGP6/11 orthologues shows that it is possible to trace HRGPs with specific characteristics and 613 determine their evolutionary origins. However, the CL-EXTs in particular, as a consequence of their 614 variable tandem repeat motifs and diverse sequence length, cannot be aligned in a meaningful way 615 with the tools developed for folded proteins. Alternate approaches are needed, such as investigating 616 length variation and indels rather than pairwise alignment. An approach that could be useful is to 617 perform global multiple sequence alignments using a graph-based method that takes into account the 618 observation that indels can occur anywhere in a repeat (Szalkowski and Anisimova, 2013). 619 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 24 620 The biggest challenge will be to understand the individual and coordinated roles of HRGPs within the 621 context of either a single cell or tissue type. Our data provide a valuable resource to initiate system 622 wide analyses to identify spatio-termporal expression changes of HRGPs throughout development. 623 Even in the less-complex non-vascular plants, GPI-AGPs and CL-EXTs are multigene families, and 624 understanding their individual function will provide exciting challenges for researchers for many years 625 to come. Defining the role of individual AGPs and EXTs is confounded by their interactions with 626 other cell wall polymers; cross-linking of AGPs (Tan et al., 2013) and EXTs to other polymers such as 627 pectins and heteroxylans has been experimentally verified (reviewed in Velasquez et al., 2012)). 628 Understanding how widespread cross-linking of HRGPs to other cell wall polymers will be 629 particularly challenging. The emerging availability of tractable genetic systems for non-model land 630 plants will allow researchers to use the information provided by MAAB to explore these questions in a 631 range of species. 632 633 Glycosylation is a characteristic feature of HRGPs and defines the interactive molecular surface. 634 Predicting the post-translational modification of HRGPs including the sites of P-hydroxylation and 635 types of glycosylation will require detailed structural analysis of many members of each multigene 636 family, from key transitions throughout the plant lineage. Even within a single species, glycosylation 637 is tissue-dependent, and directed by the context of amino acids surrounding the glycomotif (Tan et al., 638 2003; Shimizu et al., 2005; Kurotani and Sakurai, 2015). For many HRGPs, glycosylation often 639 completely encases and hence masks the protein backbone, as is the case for GPI-AGPs, AG-peptides 640 and CL-EXTs (see Fig. 1, Johnson et al., companion paper). Furthermore, it also drives the three 641 dimensional conformation of the protein backbone, and therefore determines the presentation of 642 epitopes on the molecules surface. Given that plant HRGPs have the same structural complexity as 643 animal mucins, it is expected that the core protein sequence will influence the extent and type of 644 glycosylation, which can also vary in a tissue-/cell-specific manner (Gerken et al., 2013). It is known 645 that HRGP glycosylation differs between volvocine algae and embryophytes, but information on 646 streptophyte green algae is lacking. Introducing the same HRGP backbone into multiple species and Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 25 647 tissue types throughout evolution would be an interesting approach to investigate context-dependent 648 glycosylation patterns and/or function. A complementary investigation of the glycoysltransferases 649 (GTs) that act to glycosylate HRGPs would also greatly assist our understanding of this complex 650 process. An exciting future of HRGP research beckons to determine the biological and functional 651 significance of this fascinating superfamily of secreted glycoproteins. 652 653 654 Methods 655 Phylogenetic Analysis 656 Phylogenetic analyses were performed using MEGA 6.06 (Tamura et al., 2013). Sequences 657 (untrimmed) were first aligned using MUSCLE (Edgar, 2004) using the default settings, except for 658 Supplemental Fig. S3, gap = -2 (default gap = -2.9). The best model was identified (find best protein 659 model (ML) option) and used for each alignment to generate a maximum likelihood tree (with 100 660 bootstrap replications), using all sites in the alignment. Sequences used for phylogenetic analysis 661 (fasta format), and the best model, are in Supplemental Fig. S5. 662 663 Datasets 664 A 665 http://services.plantcell.unimelb.edu.au/hrgp/index.html. 666 Data analysis was based on 1282 samples downloaded from the official 1KP mirror 667 (onekp.westgrid.ca; as of March 2014) (see Johnson et al., companion paper). Five additional datasets 668 were analyzed (July 2015) to improve coverage of hornworts (UCRN-Megaceros_tosanus, FAJB- 669 Paraphymatoceros_hallii, 670 Phaeoceros_carolinianus-gametophyte) and to include the single sample from clade Ginkgoales 671 (SGTW-Ginkgo biloba). These additional five datasets are included in the MAAB input and output summary of datasets and methods is also RXRQ-Phaeoceros_carolinianus-sporophyte, Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. available at WCZB- 26 672 files (see Data file 2, 3, 4, Johnson et al., companion paper) and most of the analysis with these 673 samples is presented, with the exception of Supplemental Table S2. 674 675 MAAB summary analysis 676 The sequences and associated annotations from MAAB classification, including the assembly in which 677 they were identified, are provided in Data files 3, 4 (of Johnson et al., companion paper). Data from 678 individual samples (hereafter referred to as datasets) that were identified by the 1KP consortium as 679 containing contaminants were removed from analyses (unless otherwise noted) and can be found in 680 Data file 5 (of Johnson et al., companion paper). The number of sequences in each MAAB class for 681 each individual 1KP sample is provided in Data file 6. Mean number of HRGPs by MAAB class (for 682 each 1KP group (Fig. 2C) or commelinid monocot familiy (Fig. 4A and C)) was calculated by 683 averaging the appropriate data in Data file 6 (that reports the number of sequences per HRGP class 684 (columns) for each 1KP dataset (rows)). Total numbers of HRGPs by 1KP group (Fig. 1) were also 685 generated from Data file 6. Graphs (Fig. 4A and C) were generated using R/Bioconductor 686 (Gentleman et al., 2004). DNA sequences of GPI-AGPs (class 1) and CL-EXTs (class 2) are provided 687 in Data files 7, 8 respectively. 688 689 Reducing redundancy in 1KP datasets 690 For phylogenetic analyses, redundant sequences were eliminated from datasets. The most probable 691 sequence was identified based on existing knowledge, as follows: MAAB 1KP output (see Data file 3, 692 4, Johnson et al., companion paper) was sorted by 1KP locus. Sequences for each dataset (1KP locus) 693 were then sorted by HRGP class and the sequences from the desired HRGP class pasted into a new 694 Microsoft Excel sheet. Sequences were sorted alphabetically by sequence (to group sequences with 695 similar N-termini) and converted to tfasta format. Multiple sequence alignments were performed with 696 MUSCLE (http://www.ebi.ac.uk/Tools/msa/muscle/) to identify redundant sequences. Where two or 697 more sequences had large regions of identity, the longer sequence was retained unless there were clear Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 27 698 artefacts at either the N- or C-terminus. Retained sequences were subjected to further rounds of 699 alignments to ensure only unique sequences remained. When redundancy was unclear, both sequences 700 were retained in small datasets but only one for larger datasets. 701 702 Removing redundancies from the 1KP sequence data was necessary because like other short-read 703 sequence data, it offers a wealth of information, but redundancy can occur from multiple sources 704 including splice variants, sequencing errors, recent duplications, chimeric transcripts and polyploidy 705 (Gruenheit et al., 2012; Yang and Smith, 2013; Xie et al., 2014). Sequence redundancies were obvious 706 in our analysis (Supplemental Fig. S1A,B) and two relatively common occurrences were read- 707 through of GC-rich hairpins in the cDNA during reverse transcription and poor-quality sequence at 708 one end of the paired-read. cDNA synthesis for all 1KP datasets was performed at 42°C (Chen Li, 709 BGI-Shenzhen, personal communication). 710 711 Tandem repeat annotation library (TRAL) construction 712 TRAL v0.3.5 (Schaper et al., 2015) was used to identify and evaluate tandem repeats present in 1KP 713 sequences from HRGP class 2 (CL-EXTs), class 3 (PRPs) and class 24 (<15% known motifs) as these 714 MAAB classes are the most likely to include tandem repeats. Tandem repeats were identified using T- 715 REKS (Jorda and Kajava, 2009) and HHrepID (Biegert and Soding, 2008) algorithms. Tandem repeats 716 (TR) were accepted only if they passed all the following criteria: 1) TR unit length of at least 10 amino 717 acids, 2) at least four TR unit copies and 3) p-value < 0.05 using phylo_gap01 model and 4) repeat 718 identified by T-REKS and/or HHrepID. A total of 5139 repeats were identified and accepted from 719 MAAB sequences in classes 2, 3 and 24. Each repeat was used to construct a cphmm model (Schaper 720 et al., 2015) and then searched with HMMER v3.1b1 hmmsearch (evalue cutoff 1e-5) 721 (http://hmmer.org/) to identify repeats in MAAB input data (Data file 2 of companion paper). The 722 resulting 4070 models had two or more hits above threshold to the input data. These hits were used to 723 construct alignments (max. 300 sequences per alignment, randomly chosen after 95% per-sample Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 28 724 clustering using USEARCH (Edgar, 2010) and aligned with MUSCLE v3.81 (Edgar, 2004) for each 725 model. 726 http://services.plantcell.unimelb.edu.au/hrgp/index.html, and was used to determine the number of 727 species with class 2 and class 24 TRAL repeats, using keyword searches by order and/or species (Fig. 728 3). A tool for browsing and searching TRAL results is available at 729 730 BUSCO (Benchmarking Universal Single-Copy Orthologs) analysis 731 1KP transcriptomes were assessed for completeness using the BUSCO analysis tool (Simao et al., 732 2015). The BUSCO plant dataset and software were downloaded from http://buscos.ezlab.org/. 733 734 Profile hidden Markov (HMMER) models for finding putative AtAGP6/11 orthologs 735 Protein sequences were aligned with MUSCLE v3.81 (Edgar, 2004) using default parameter settings. 736 Sequence alignments were then used as input into HMMER v3.1b1 hmmbuild (http://hmmer.org/) to 737 construct a profile HMM. Subsequently, hmmsearch was used (with default thresholds) to search 738 against MAAB 1KP input sequences (see Data file 2, Johnson et al., companion paper). Seven 739 different models were evaluated (Supplemental Table S3) and the results of the best model (see Data 740 file 4 Johnson et al., companion paper) are summarized in Supplemental Table S4. For each positive 741 1KP dataset a single ‘best’ sequence was selected manually based on presence of N-terminal ER 742 signal, full or partial C-terminal GPI-signal sequences and then checked by iterative multiple sequence 743 alignment and phylogenetic analysis. 744 potential partial GPI-AGPs, proteins needed to have the following GPI-anchor signal characteristics: a 745 potential omega (ω), ω+1 and ω+2 sites and a basic residue and/or a hydrophobic domain) (Eisenhaber 746 et al., 2003), unless otherwise noted (e.g. ZHMB_16956, XMVD_14518), as summarized in 747 Supplemental Table S4. To distinguish between true class 4 non-GPI-AGPs and 748 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 29 749 Additional AtAGP6/11-like sequences were obtained from other sources (outlined below) for use as 750 full-length controls and to increase the sequence representation in various 1KP groups. TBLASTn 751 was used to search Phytozome and/or the nucleotide collection (NR/NT at NCBI) using word size 2, 752 no filtering, and restricting organisms to same species, genera or family as appropriate, with either 753 AtAGP6/11 or OsAGP7/10 as query sequences. If unsuccessful, a suitable 1KP sequence was selected 754 (coding sequence only, Data file 7) and BLASTn performed (word size 7 and no filtering). When 755 positive hits were identified, these were used as query sequences to find additional sequences 756 (maximum of two per species used). Q. robur AtAGP6/11 putative orthologs were identified in 757 TBLASTn searches of Q. robur assembly v1 scaffolds available at https://urgi.versailles.inra.fr/blast/. 758 For the basal angiosperm Amborella trichopoda (not present in Phytozome v9) GPI-AGPs were 759 identified amongst annotated proteins accessible at ftp://ftp.ensemblgenomes.org/pub/plants/release- 760 31/fasta/amborella_trichopoda/ using 50% PAST as outlined in Schultz et al. (2002). A similar 761 approach was used to find evidence for genomic sequences of CL-EXTs from selected genomes using 762 1KP-identified CL-EXTs as query sequences (Data file 8). For example, 1KP sequence 763 AYMT_Locus_34756 identified 764 36791844..36792375, and 765 1845754..1846286). partial CL-EXT VGHH_Locus_4377 sequence identified E. A. grandis coerulea scaffold_6: scaffold_34: 766 767 Data access & Datasets 768 Additional datasets for this paper. Numbering continues from Johnson et al., companion paper, and a 769 full list of data files is provided there. 770 Data file 6. Mean number of HRGPs in each HRGP class (columns) for each 1KP dataset (rows), 771 calculated from MAAB output files (Data files 3, 4). Filename: SA003_MAAB-hits-summary-by- 772 class-and-sample.xls. 773 Data file 7. DNA sequences for 1KP GPI-AGPs (class 1) detected by MAAB. Locus ID (Data files 774 3, 4) of class 1 GPI-AGPs was used to extract the DNA sequence from the appropriate multiple k-mer Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 30 775 assembly. Where more than one locus ID is reported for a single protein (e.g. with different k-mers), 776 only one DNA sequence is reported. Filename: 1kp_agp_incl_dnas_9-12-2015.xls. 777 Data file 8. DNA sequences for 1KP CL-EXTs (class 2) detected by MAAB. Locus ID (Data files 3, 778 4) of class 2 CL-EXTs was used to extract the DNA sequence from the appropriate multiple k-mer 779 assembly. Where more than one locus ID is reported for a single protein (e.g. with different k-mers), 780 only one DNA sequence is reported. Filename: class2_incl_dnas_9-12-2015.xls. 781 Data file 9. Sequences identified by HMMER model 1 (putative AtAGP6/11 orthologs). Filename: 782 agp6_model1_hmm_hits_20150922.xls. 783 784 Supplemental Data 785 786 Supplemental Table S1. Summary of selected AGP and extensin epitopes described in the literature. 787 788 Supplemental Table S2. HRGP detection rate for each 1KP taxonomic order. 789 790 Supplemental Table S3. Summary of HMMER models used to identify putative orthologs of 791 AtAGP6 and AtAGP11. 792 793 Supplemental Table S4. Summary of HMMER model performance by 1KP sample and analysis of 794 sequences selected for phylogenetic analysis. 795 796 Supplemental Fig. S1. Number of GPI-AGPs and CL-EXTs by tissue in Ranunculaceae and 797 Papavaceae samples represented in 1KP. 798 799 Supplemental Fig. S2. Bryophyte CL-EXTs sequences. 800 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 31 801 Supplemental Fig. S3. Phylogenetic analysis (ML-tree) of GPI-AGPs (all sub-clades) from selected 802 angiosperms. 803 804 Supplemental Fig. S4. Conserved features for putative orthologs of selected Arabidopsis GPI-AGPs. 805 Supplemental Fig. S5. Sequences used for phylogenetic analysis (in fasta format). 806 807 Acknowledgements 808 The authors thank Mark Chase from the Kew Royal Botanic Gardens, Markus Ruhsam and Philip 809 Thomas at the Royal Botanic Garden Edinburgh, and Steven Joya from the University of British 810 Columbia for supplying plant samples to 1KP. We also thank Drs Birgit and Frank Eisenhaber, 811 Bioinformatics Institute A*Star Singapore for providing us with a standalone version of big-PI plant 812 predictor. 813 814 Figure Legends 815 Figure 1. Number of HRGP sequences identified by MAAB in each HRGP class by 1KP group. 816 The number of datasets per clade is shown in brackets after the 1KP group name. All MAAB classes 817 are reported for multiple k-mers (multk: 39,49,59,69) (unshaded columns). No sequences were 818 detected for HRGP class 14. Selected data is reported for assemblies with k-mer = 25 (k25, shaded 819 column). Red text indicates that the sequences detected within this class and 1KP group are likely to 820 be contaminants (see Results/Discussion). 821 822 Figure 2. Summary of HRGP sequences identified using the MAAB pipeline. A, Percentage of 823 total HRGP sequences (by 1KP group) that are found in the classical HRGP classes (1 to 4), hybrid 824 HRGP classes (5 to 23) and non-HRGPs (class 24). B, Number of orders analyzed and number of 825 samples per order for each 1KP group. The monocots group excludes the commelinid monocots. C, Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 32 826 Overview of the mean number of HRGPs for each MAAB class in the 1KP dataset (by 1KP group). 827 Shading of boxes represents the detection rate, indicated as the % of orders with hits (see 828 Supplemental Table S1). A shaded box showing detection of HRGPs with a value of zero indicates 829 an average number of sequences between 0-0.1. Red text indicates that the sequences detected within 830 this HRGP class and 1KP group are likely to be contaminants (see Results/Discussion). 831 832 Figure 3. Distribution and number of CL-EXTs (class 2) in the bryophytes. MAAB detected CL- 833 EXTs in all hornwort species in the 1KP data whereas the distribution in mosses and liverworts shows 834 either loss of, or multiple independent gains of, CL-EXTs. Tandem Repeat Annotation Libraries 835 (TRAL) of repeats identified in class 2 and class 24 (control) sequences were used to search the 836 MAAB input sequences and identify repeats present in the bryophyte lineages. 837 correlation between the species with CL-EXTs identified by MAAB and those with SPn/Y-based 838 repeats identified by TRAL using class 2 repeats. No CL-EXTs were identified using TRAL repeats 839 from class 24 sequences. The phylogenetic tree is based on Cole and Hilger (2013) and was selected 840 because it includes all bryophyte orders. There is good 841 842 Figure 4. Mean number of MAAB class 2 CL-EXT sequences (A) identified using the MAAB 843 pipeline within each family of monocots/commelinids. Many families lack CL-EXTs but this is not 844 due to poor quality transcriptomes as BUSCO analysis (B) shows the percentage of 956 single-copy 845 orthologous genes found in the monocots/commelinids families is high (avg. ~80%) Family data (k- 846 mer 39 only) are summarized in a box-and-whisker plot after removal of outliers (2 Poaceae sp.: 59.4, 847 65.7; 1 Aracaeae sp.: 18.1). The mean number of class 1 GPI-AGPs (C) was evaluated and these are 848 present in all families that lack CL-EXTs. Only Cannaceae and Restionaceae have both GPI-AGPs 849 and CL-EXTs, although for most families the number of datasets is low (1 to 3). The total number of 850 datasets per family is indicated in parentheses. 851 852 Figure 5. ML tree of Brassicales 1KP and Arabidopsis GPI-AGPs (A), and CL-EXTs (B). (A,B) 853 ML trees generally show strong support for sub-clades with Arabidopsis sequences and 1KP 854 sequences from family Brassicaceae. Putative GPI-AGP sub-clades (A) are separated by a horizontal 855 dotted line and the Arabidopsis ortholog(s) are indicated by boxes (right of tree). Sequences from the 856 Brassicaceae family are in green with Arabidopsis sequences in larger, bold font. AtEXT3 was 857 included with CL-EXTs (B) even though it was classified as class 20 because of its shared bias (<2% 858 difference (Δ)) between % PSKY (74.2%) and % PVYK (73.1%); see Johnson et al., companion 859 paper). Numbers on the nodes represent support with 100 bootstrap replicates (≥70 green, 60-69 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 33 860 orange, 40-59 black). 1KP datasets (1KP locus ID and family name) are shown on the Brassicales tree 861 (inset) (Stevens (2001), Version 12, July 2012 http://www.mobot.org/mobot/research/apweb/). Scale 862 bars for branch length measures the number of substitutions per site. Sequences: GPI-AGPs 863 Brassicales (Supplemental Fig. S5A) and CL-EXTs Brassicales (Supplemental Fig. S5B). 864 865 Figure 6. Phylogenetic analysis of class 1 GPI-AGPs to identify AtAGP6/11 orthologs. ML tree 866 (MEGA) was constructed using putative AtAGP6/11 orthologs identified predominantly from 1KP 867 transcriptomes and HMMER model 1 (see Supplemental Table S3), with a few sequences from other 868 sources to increase the breadth of sampling (see Methods; Supplemental Table S4). The tree also 869 includes at least one sequence from each Arabidopsis GPI-AGP sub-clade, representative rice GPI- 870 AGPs identified by MAAB (see Fig. 2, Johnson et al., companion paper) and four GPI-AGPs from 871 Amborella 872 Amtri_ERN06202). Sequence names are coloured by 1KP group: basal angiosperms (grey), non- 873 commelinid monocots (purple), commelinid monocots (pink), basal eudicots (aqua), core eudicots 874 (green), asterids (red) and rosids (orange). Numbers on the nodes represent support with 100 bootstrap 875 replicates (≥70 green, 60-69 orange, 40-59 black). 1KP sequences are identified by Group_Order_1KP 876 identifier_sequence locus. Genomic sequences and other sequences from NCBI follow a similar 877 format, but include a five letter abbreviation of genus and species (see Methods). Symbols next to 878 sequence names are used to indicate the source and MAAB class of sequences and other relevant 879 information. An asterisk (*) after the sequence name indicates additional information is provided in 880 Supplemental Table S4, for example, the YNXR (Poales) dataset is contaminated with Asparagales, 881 however, the ML tree suggests these sequences are indeed Poales. Scale bar for branch length 882 measures the number of substitutions per site. trichopoda (Amtri_ERM96654, Amtri_ERM95342, Amtri_ERN01113 and 883 884 Figure 7. Schematic representation of the GPI-AGP sub-clades showing the presence and 885 distribution of specific amino acids in the PAST-rich protein backbone. Order of GPI-AGPs is 886 based on clades in angiosperm GPI-AGP tree (Supplemental Fig. S3); figure part numbers (A to K) 887 are based on alignments (Supplemental Fig. S4) and the AtAGP5/10 (G) subclade is included for 888 comparison (based on AGP-c subclade (see Fig. 2C, Johnson et al., companion paper)). A, Putative 889 orthologs of Lys-rich AtAGP17/18 have scattered lysine (K) residues throughout the sequence and 890 contain a short K-rich domain near the C-terminus. The glycomotifs are predominantly XP1-2. B, The 891 AGP9 sub-clade also has a short Lys-rich region with ≥3 K residues and scattered K residues, 892 however, the glycomotifs are distinct from AtAGP17/18, being predominantly [S/T]P3. AGP4/7/10 893 (C), AGP2/3 (D), AGP5/10 (G), AGP5 (H) and AGP1(I) sub-clades have a ‘classical’ PAST-rich 894 backbone with no obvious bias towards other amino acids. E, Most of the putative orthologs of Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 34 895 AtAGP58 are relatively methionine (M)-rich, with scattered M residues between AGP glycomotifs 896 and contain a Q residue at the presumed N-terminus. This Q residue at the presumed mature N- 897 terminus is also found in putative orthologs of Lys-rich AtAGP17/18 (A), AtAGP9 (B), 898 AtAGP1/2/3/4/7/5/10 (C,D,G,H,I), and AtAGP58 (E), but not AtAGP25/26/27 (F), or AGP6/11/59 899 (J/K). J/K, Putative orthologs of AtAGP6/11/59 share scattered K residues, concentrated in the first 900 half of the protein and also a small cluster of acidic residues, either aspartic acid (D) or glutamic acid 901 (E). 902 903 904 905 906 Literature Cited 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 Adair WS, Hwang C, Goodenough UW (1983) Identification and visualization of the sexual agglutinin from the mating-type plus flagellar membrane of Chlamydomonas. Cell 33: 183-193 APG IV (2016) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Botanical Journal of the Linnean Society 181: 1-20 Averyhart-Fullard V, Datta K, Marcus A (1988) A hydroxyproline-rich protein in the soybean cell wall. Proc. Natl. Acad. Sci. USA 85: 1082-1085 Bacic A, Harris PJ, Stone BA (1988) Structure and function of plant cell walls. In J Priess, ed, The Biochemistry of Plants, Vol 14. Academic Press, New York, pp 297-371 Basile DV, Basile MR (1987) The Occurrence of Cell Wall-Associated Arabinogalactan Proteins in the Hepaticae. Bryologist 90: 401-404 Bernhardt C, Tierney ML (2000) Expression of AtPRP3, a proline-rich structural cell wall protein from Arabidopsis, is regulated by cell-type-specific developmental pathways involved in root hair formation. Plant Physiology 122: 705-714 Biegert A, Soding J (2008) De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics 24: 807-814 Bollig K, Lamshoft M, Schweimer K, Marner FJ, Budzikiewicz H, Waffenschmidt S (2007) Structural analysis of linear hydroxyproline-bound O-glycans of Chlamydomonas reinhardtii-conservation of the inner core in Chlamydomonas and land plants. Carbohydr Res 342: 25572566 Borner GHH, Sherrier DJ, Weimar T, Michaelson LV, Hawkins ND, MacAskill A, Napier JA, Beale MH, Lilley KS, Dupree P (2005) Analysis of detergent-resistant membranes in Arabidopsis. Evidence for plasma membrane lipids rafts. Plant Physiology 137: 104-116 Bowman JL (2013) Walkabout on the long branches of plant evolution. Current Opinion in Plant Biology 16: 70-77 Cannon MC, Terneus K, Hall Q, Tan L, Wang Y, Wegenhart BL, Chen L, Lamport DTA, Chen Y, Kieliszewski MJ (2008) Self-assembly of the plant cell wall requires an extensin scaffold. Proceedings of the National Academy of Sciences, USA 105: 2226-2231 Carpita NC, Gibeaut DM (1993) Structural models of primary cell walls in flowering plants: consistency of molecular structure with the physical properties of the walls during growth. Plant Journal 3: 1-30 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 35 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 Chaw SM, Chang CC, Chen HL, Li WH (2004) Dating the monocot-dicot divergence and the origin of core eudicots using whole chloroplast genomes. Journal of Molecular Evolution 58: 424-441 Chen J, and Varner JE (1985) Isolation and characterization of cDNA clones for carrot extensin and a proline-rich 33-kDa protein. Proc Natl Acad Sci USA 82: 4399-4403 Choudhary P, Saha P, Ray T, Tang Y, Yang D, Cannon MC (2015) EXTENSIN18 is required for full male fertility as well as normal vegetative growth in Arabidopsis. Front Plant Sci 6: 553 Coimbra S, Costa M, Jones B, Mendes MA, Pereira LG (2009) Pollen grain development is compromised in Arabidopsis agp6 agp11 null mutants. J Exp Bot 60: 3133-3142 Coimbra S, Costa M, Mendes MA, Pereira AM, Pinto J, Pereira LG (2010) Early germination of Arabidopsis pollen in a double null mutant for the arabinogalactan protein genes AGP6 and AGP11. Sexual Plant Reproduction 23: 199-205 Cole TCH, Hilger HH (2013) Bryophyte phylogeny, viewed 07/09/2015, http://www2.biologie.fuberlin.de/sysbot/poster/BPP-E.pdf. Costa M, Nobre MS, Becker JD, Masiero S, Amorim MI, Pereira LG, Coimbra S (2013) Expressionbased and co-localization detection of arabinogalactan protein 6 and arabinogalactan protein 11 interactors in Arabidopsis pollen and pollen tubes. BMC Plant Biol 13: 7 Costa ML, Sobral R, Ribeiro Costa MM, Amorim MI, Coimbra S (2015) Evaluation of the presence of arabinogalactan proteins and pectins during Quercus suber male gametogenesis. Ann Bot 115: 81-92 Datta K, Schmidt A, Marcus A (1989) Characterization of two soybean repetitive proline-rich proteins and a cognate cDNA from germinated axes. Plant Cell 1: 945-952 Doblin MS, Johnson KL, Humphries J, Newbigin EJ, Bacic A (2014) Are designer plant cell walls a realistic aspiration or will the plasticity of the plant's metabolism win out? Current Opinion in Biotechnology 26: 108-114 Doblin MS, Pettolino F, Bacic A (2010) Plant cell walls: the skeleton of the plant world. Functional Plant Biology 37: 357-381 Domozych DS, Ciancia M, Fangel JU, Mikkelsen MD, Ulvskov P, Willats WG (2012) The Cell Walls of Green Algae: A Journey through Evolution and Diversity. Front Plant Sci 3: 82 Domozych DS, Wilson R, Domozych CR (2009) Photosynthetic eukaryotes of freshwater wetland biofilms: adaptations and structural characteristics of the extracellular matrix in the green alga, Cosmarium reniforme (Zygnematophyceae, Streptophyta). Journal of Eukaryotic Microbiology 56: 314-322 Draeger C, Ndinyanka Fabrice T, Gineau E, Mouille G, Kuhn BM, Moller I, Abdou MT, Frey B, Pauly M, Bacic A, Ringli C (2015) Arabidopsis leucine-rich repeat extensin (LRX) proteins modify cell wall composition and influence plant growth. BMC Plant Biol 15: 155 Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 1792-1797 Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460-2461 Eisenhaber B, Wildpaner M, Schultz CJ, Borner GHH, Dupree P, Eisenhaber F (2003) Glycosylphosphatidylinositol lipid anchoring of plant proteins. Sensitive prediction from sequence- and genome-wide studies for Arabidopsis and rice. Plant Physiology 133: 16911701 Ellis M, Egelund J, Schultz CJ, Bacic A (2010) Arabinogalactan-proteins: key regulators at the cell surface? Plant Physiology 153: 403-419 Ferris PJ, Waffenschmidt S, Umen JG, Lin H, Lee JH, Ishida K, Kubo T, Lau J, Goodenough UW (2005) Plus and minus sexual agglutinins from Chlamydomonas reinhardtii. Plant Cell 17: 597-615 Ferris PJ, Woessner JP, Waffenschmidt S, Kilz S, Drees J, and Goodenough UW (2001) Glycosylated polyproline II rods with kinks as a structural motif in plant hydroxyproline-rich glycoproteins. Biochemistry 40: 2978-2987 Filmus J, Capurro M, Rast J (2008) Glypicans. Genome Biol. 9: 224 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 36 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 Fincher GB, Stone BA, Clarke AE (1983) Arabinogalactan-proteins: structure, biosynthesis and function. Annual Review of Plant Physiology 34: 47-70 Fong C, Kieliszewski MJ, Zacks Rd, Leykam JF, Lamport DTA (1992) A gymnosperm extensin contains the serine-tetrahydroxyproline motif. Plant Physiology 99: 548-552 Fowler TJ, Bernhardt C, Tierney ML (1999) Characterization and expression of four proline-rich cell wall protein genes in Arabidopsis encoding two distinct subsets of multiple domain proteins. Plant Physiology 121: 1081-1091 Gaspar Y, Johnson KL, McKenna JA, Bacic A, Schultz CJ (2001) The complex structures of arabinogalactan-proteins and the journey towards understanding function. Plant Mol Biol. 47: 161-176 Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge YC, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang JH (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 5 Gerken TA, Revoredo L, Thome JJ, Tabak LA, Vester-Christensen MB, Clausen H, Gahlay GK, Jarvis DL, Johnson RW, Moniz HA, Moremen K (2013) The lectin domain of the polypeptide GalNAc transferase family of glycosyltransferases (ppGalNAc Ts) acts as a switch directing glycopeptide substrate glycosylation in an N- or C-terminal direction, further controlling mucin type O-glycosylation. J Biol Chem 288: 19900-19914 Godl K, Hallmann A, Wenzl S, Sumper M (1997) Differential targeting of closely related ECM glycoproteins: The pherophorin family from Volvox. EMBO Journal 16: 25-34 Gotelli IB, Cleland R (1968) Differences in the occurrence and distribution of hydroxyprolineproteins among the algae. Am J Bot 55: 907-914 Grennan AK (2007) Lipid rafts in plants. Plant Physiol 143: 1083-1085 Gruenheit N, Deusch O, Esser C, Becker M, Voelckel C, Lockhart P (2012) Cutoffs and k-mers: implications from a transcriptome study in allopolyploid plants. Bmc Genomics 13 Harris PJ (2005) Diversity in plant cell walls In RJ Henry, ed, Plant Diversity and Evolution: Genotypic and Phenotypic Variation in Higher Plants. CAB International Publishing, Wallingford, pp 201–227 Hashimoto K, Tokimatsu T, Kawano S, Yoshizawa AC, Okuda S, Goto S, Kanehisa M (2009) Comprehensive analysis of glycosyltransferases in eukaryotic genomes for structural and functional characterization of glycans. Carbohydrate Research 344: 881-887 Herve C, Simeon A, Jam M, Cassin A, Johnson KL, Salmean AA, Willats WG, Doblin MS, Bacic A, Kloareg B (2016) Arabinogalactan proteins have deep roots in eukaryotes: identification of genes and epitopes in brown algae and their role in Fucus serratus embryo development. New Phytol 209: 1428-1441 Ishizaki K, Nishihama R, Yamato KT, Kohchi T (2016) Molecular Genetic Tools and Techniques for Marchantia polymorpha Research. Plant Cell Physiol 57: 262-270 Jermyn MA, Yeow YM (1975) A class of lectins present in the tissues of seed plants. Australian Journal of Plant Physiology 2: 501–531 Johnson MTJ, Carpenter EJ, Tian Z, Bruskiewich R, Burris JN, Carrigan CT, Chase MW, Clarke ND, Covshoff S, dePamphilis CW, Edger PP, Goh F, Graham S, Greiner S, Hibberd JM, JordonThaden I, Kutchan TM, Leebens-Mack J, Melkonian M, Miles N, Myburg H, Patterson J, Pires JC, Ralph P, Rolf M, Sage RF, Soltis D, Soltis P, Stevenson D, Stewart CN, Jr., Surek B, Thomsen CJM, Villarreal JC, Wu X, Zhang Y, Deyholos MK, Wong GK-S (2012) Evaluating methods for isolating total RNA and predicting the success of sequencing phylogenetically diverse plant transcriptomes. Plos One 7: e50226 Jorda J, Kajava AV (2009) T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics 25: 2632-2638 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 37 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 Kieliszewski M, Leykam JF, Lamport DTA (1990) Structure of the threonine-rich extensin from Zea mays. Plant Physiology 85: 823-827 Kieliszewski MJ, Lamport DTA (1994) Extensin: repetitive motifs, functional sites, post-translational codes, and phylogeny. Plant Journal 5: 157-172 Kitazawa K, Tryfona T, Yoshimi Y, Hayashi Y, Kawauchi S, Antonov L, Tanaka H, Takahashi T, Kaneko S, Dupree P, Tsumuraya Y, Kotake T (2013) beta-galactosyl Yariv reagent binds to the beta-1,3-galactan of arabinogalactan proteins. Plant Physiol 161: 1117-1126 Kleis-San Francisco SM, Tierney ML (1990) Isolation and characterization of a proline-rich cell wall protein from soybean seedlings. Plant Physiology 94: 1897-1902 Knox JP, Linstead PJ, Peart JM, Cooper C, Roberts K (1991) Developmentally regulated epitopes of cell surface arabinogalactan proteins and their relation to root tissue pattern formation. Plant Journal 1: 317-326 Kofuji R, Hasebe M (2014) Eight types of stem cells in the life cycle of the moss Physcomitrella patens. Curr Opin Plant Biol 17: 13-21 Kurotani A, Sakurai T (2015) In Silico Analysis of Correlations between Protein Disorder and PostTranslational Modifications in Algae. Int J Mol Sci 16: 19812-19835 Lamport DTA (1977) Structure, biosynthesis and significance of cell wall glycoproteins. Recent Advances in Phytochemistry 11: 79-115 Lamport DTA, Kieliszewski MJ, Chen Y, Cannon MC (2011) Role of the extensin superfamily in primary cell wall architecture. Plant Physiology 156: 11-19 Lamport DTA, Miller DH (1971) Hydroxyproline arabinosides in the plant kingdom. Plant Physiol. 48: 454-456 Lee KJD, Sakata Y, Mau SL, Pettolino F, Bacic A, Quatrano RS, Knight CD, Knox JP (2005) Arabinogalactan proteins are required for apical cell extension in the moss Physcomitrella patens. Plant Cell 17: 3051-3065 Leliaert F, Smith DR, Moreau H, Herron MD, Verbruggen H, Delwiche CF, De Clerck O (2012) Phylogeny and molecular evolution of the green algae. Critical Reviews in Plant Sciences 31: 1-46 Levitin B, Richter D, Markovich I, Zik M (2008) Arabinogalactan proteins 6 and 11 are required for stamen and pollen function in Arabidopsis. Plant J 56: 351-363 Li J, Ouyang B, Wang T, Luo Z, Yang C, Li H, Sima W, Zhang J, Ye Z (2016) HyPRP1 Gene Suppressed by Multiple Stresses Plays a Negative Role in Abiotic Stress Tolerance in Tomato. Front Plant Sci 7: 967 Li X-B, Kieliszewski M, Lamport DTA (1990) A chenopod extensin lacks repetitive tetrahydroxyproline blocks. Plant Physiology 92: 327-333 Ligrone R, Vaughn KC, Renzaglia KS, Knox JP, Duckett JG (2002) Diversity in the distribution of polysaccharide and glycoprotein epitopes in the cell walls of bryophytes: new evidence for the multiple evolution of water-conducting cells. New Phytologist 156: 491-508 Lindstrom JT, Vodkin LO (1991) A soybean cell wall protein is affected by seed color genotype. Plant Cell 3: 561-571 Liu X, Wolfe R, Welch LR, Domozych DS, Popper ZA, Showalter AM (2016) Bioinformatic Identification and Analysis of Extensins in the Plant Kingdom. PLoS One 11: e0150177 MacAlister CA, Ortiz-Ramírez C, Becker JD, Feijó JA, Lippman ZB (2016) Hydroxyproline Oarabinosyltransferase mutants oppositely alter tip growth in Arabidopsis thaliana and Physcomitrella patens. Plant Journal 85: 193-208 Majewska-Sawka A, Nothnagel EA (2000) The multiple roles of arabinogalactan proteins in plant development. Plant Physiology 122: 3-9 Matasci N, Hung LH, Yan Z, Carpenter EJ, Wickett NJ, Mirarab S, Nguyen N, Warnow T, Ayyampalayam S, Barker M, Burleigh JG, Gitzendanner MA, Wafula E, Der JP, DePamphilis CW, Roure B, Philippe H, Ruhfel BR, Miles NW, Graham SW, Mathews S, Surek B, Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 38 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 Melkonian M, Soltis DE (2014) Data access for the 1,000 plants (1KP) project. Gigascience 3: 1 Moesa HA, Wakabayashi S, Nakai K, Patil A (2012) Chemical composition is maintained in poorly conserved intrinsically disordered regions and suggests a means for their classification. Molecular Biosystems 8: 3262-3273 Moller I, Sørensen I, Bernal AJ, Blaukopf C, Lee K, Øbro J, Pettolino F, Roberts A, Mikkelsen JD, Knox JP, Bacic A, Willats WGT (2007) High-throughput mapping of cell-wall polymers within and between plants using novel microarrays. Plant Journal 50: 1118-1128 Moore JP, Nguema-Ona E, Fangel JU, Willats WG, Hugo A, Vivier MA (2014) Profiling the main cell wall polysaccharides of grapevine leaves using high-throughput and fractionation methods. Carbohydr Polym 99: 190-198 Motose H, Sugiyama M, Fukuda H (2004) A proteoglycan mediates inductive interaction during plant vascular development. Nature 429: 873-878 Nguema-Ona E, Coimbra S, Vicré-Gibouin M, Mollet J-C, Driouich A (2012) Arabinogalactan proteins in root and pollen-tube cells: distribution and functional aspects. Annals of Botany 110: 383404 Pattathil S, Avci U, Baldwin D, Swennes AG, McGill JA, Popper Z, Bootten T, Albert A, Davis RH, Chennareddy C, Dong R, O'Shea B, Rossi R, Leoff C, Freshour G, Narra R, O'Neil M, York WS, Hahn MG (2010) A comprehensive toolkit of plant cell wall glycan-directed monoclonal antibodies. Plant Physiology 153: 514-525 Paulsen BS, Craik DJ, Dunstan DE, Stone BA, Bacic A (2014) The Yariv reagent: behaviour in different solvents and interaction with a gum arabic arabinogalactan-protein. Carbohydr Polym 106: 460-468 Pereira AM, Lopes AL, Coimbra S (2016) JAGGER, an AGP essential for persistent synergid degeneration and polytubey block in Arabidopsis. Plant Signal Behav 11: e1209616 Pereira AM, Masiero S, Nobre MS, Costa ML, Solis MT, Testillano PS, Sprunck S, Coimbra S (2014) Differential expression patterns of arabinogalactan proteins in Arabidopsis thaliana reproductive tissues. J Exp Bot 65: 5459-5471 Pereira AM, Pereira LG, Coimbra S (2015) Arabinogalactan proteins: rising attention from plant biologists. Plant Reprod 28: 1-15 Popper ZA, Michel G, Herve C, Domozych DS, Willats WG, Tuohy MG, Kloareg B, Stengel DB (2011) Evolution and diversity of plant cell walls: from algae to flowering plants. Annu Rev Plant Biol 62: 567-590 Prochnik SE, Umen J, Nedelcu AM, Hallmann A, Miller SM, Nishii I, Ferris P, Kuo A, Mitros T, FritzLaylin LK, Hellsten U, Chapman J, Simakov O, Rensing SA, Terry A, Pangilinan J, Kapitonov V, Jurka J, Salamov A, Shapiro H, Schmutz J, Grimwood J, Lindquist E, Lucas S, Grigoriev IV, Schmitt R, Kirk D, Rokhsar DS (2010) Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri. Science 329: 223-226 Schaefer L, Schaefer RM (2010) Proteoglycans: from structural compounds to signaling molecules. Cell Tissue Res 339: 237-246 Schaper E, Korsunsky A, Pecerska J, Messina A, Murri R, Stockinger H, Zoller S, Xenarios I, Anisimova M (2015) TRAL: tandem repeat annotation library. Bioinformatics 31: 3051-3053 Schultz CJ, Gilson P, Oxley D, Youl JJ, Bacic A (1998) GPI-anchors on arabinogalactan-proteins: implications for signalling in plants. Trends in Plant Science 3: 426-431 Schultz CJ, Rumsewicz MP, Johnson KL, Jones BJ, Gaspar YM, Bacic A (2002) Using genomic resources to guide research directions. The arabinogalactan protein gene family as a test case. Plant Physiology 129: 1448-1463 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 39 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 Sherrier DJ, Taylor GS, Silverstein KA, Gonzales MB, VandenBosch KA (2005) Accumulation of extracellular proteins bearing unique proline-rich motifs in intercellular spaces of the legume nodule parenchyma. Protoplasma 225: 43-55 Shibaya T, Sugawara Y (2009) Induction of multinucleation by β-glucosyl Yariv reagent in regenerated cells from Marchantia polymorpha protoplasts and involvement of arabinogalactan proteins in cell plate formation. Planta 230: 581-588 Shimizu M, Igasaki T, Yamada M, Yuasa K, Hasegawa J, Kato T, Tsukagoshi H, Nakamura K, Fukuda H, Matsuoka K (2005) Experimental determination of proline hydroxylation and hydroxyproline arabinogalactosylation motifs in secretory proteins. Plant Journal 42: 877889 Showalter AM, Basu D (2016) Extensin and arabinogalactan-protein biosynthesis: glycosyltransferases, research challenges, and biosensors. Frontiers in Plant Science 7: 814 Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31: 3210-3212 Smith JJ, Muldoon EP, Willard JJ, Lamport DTA (1986) Tomato Extensin Precursors P1 and P2 Are Highly Periodic Structures. Phytochemistry 25: 1021-1030 Somerville C, Bauer S, Brininstool G, Facette M, Hamann T, Milne J, Osborne E, Paredez A, Persson S, Raab T, Vorwerk S, Youngs H (2004) Toward a systems approach to understanding plant cell walls. Science 306: 2206-2211 Sørensen I, Domozych D, Willats WGT (2010) How have plant cell walls evolved? Plant Physiology 153: 366-372 Sørensen I, Pettolino FA, Bacic A, Ralph J, Lu F, O'Neill MA, Fei Z, Rose JKC, Domozych DS, Willats WGT (2011) The charophycean green algae provide insights into the early origins of plant cell walls. Plant Journal 68: 201-211 Szalkowski AM, Anisimova M (2013) Graph-based modeling of tandem repeats improves global multiple sequence alignment. Nucleic Acids Res 41: e162 Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30: 2725-2729 Tan L, Eberhard S, Pattathil S, Warder C, Glushka J, Yuan C, Hao Z, Zhu X, Avci U, Miller JS, Baldwin D, Pham C, Orlando R, Darvill A, Hahn MG, Kieliszewski MJ, Mohnen D (2013) An Arabidopsis cell wall proteoglycan consists of pectin and arabinoxylan covalently linked to an arabinogalactan protein. Plant Cell 25: 270-287 Tan L, Leykam JF, Kieliszewski MJ (2003) Glycosylation motifs that direct arabinogalactan addition to arabinogalactan-proteins. Plant Physiology 132: 1362-1369 Tan L, Showalter AM, Egelund J, Hernandez-Sanchez A, Doblin MS, Bacic A (2012) Arabinogalactanproteinsandtheresearchchallengesfortheseenigmaticplantcellsurfaceproteoglycans. Front. Plant Sc 3: 1-10 Tapken W, Murphy AS (2015) Membrane nanodomains in plants: capturing form, function, and movement. J Exp Bot 66: 1573-1586 Tierney ML, Varner JE (1987) The extensins. Plant Physiol 84: 1-2 Ulvskov P, Paiva DS, Domozych D, Harholt J (2013) Classification, naming and evolutionary history of gycosyltransferases from sequenced green and red algal genomes. Plos One 8 van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones DT, Kim PM, Kriwacki RW, Oldfield CJ, Pappu RV, Tompa P, Uversky VN, Wright PE, Babu MM (2014) Classification of intrinsically disordered regions and proteins. Chemical Reviews 114: 6589-6631 van Holst G-J, Clarke AE (1985) Quantification of arabinogalactan-protein in plant extracts by single radial gel diffusion. Anal. Biochem. 148: 446-450 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 40 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 Velasquez M, Salgado Salter J, Gloazzo Dorosz J, Petersen BL, Estevez JM (2012) Recent advances on the posttranslational modifications of EXTs and their roles in plant cell walls. Frontiers in Plant Science 3 Velasquez SM, Marzol E, Borassi C, Pol-Fachin L, Ricardi MM, Mangano S, Juarez SP, Salter JD, Dorosz JG, Marcus SE, Knox JP, Dinneny JR, Iusem ND, Verli H, Estevez JM (2015) Low Sugar Is Not Always Good: Impact of Specific O-Glycan Defects on Tip Growth in Arabidopsis. Plant Physiol 168: 808-813 Velasquez SM, Ricardi MM, Dorosz JG, Fernandez PV, Nadra AD, Pol-Fachin L, Egelund J, Gille S, Harholt J, Ciancia M, Verli H, Pauly M, Bacic A, Olsen CE, Ulvskov P, Petersen BL, Somerville C, Iusem ND, Estevez JM (2011) O-glycosylated cell wall proteins are essential in root hair growth. Science 332: 1401-1403 Voxeur A, Hofte H (2016) Cell wall integrity signaling in plants: "To grow or not to grow that's the question". Glycobiology 26: 950-960 Wickett NJ, Mirarab S, Nam N, Warnow T, Carpenter E, Matasci N, Ayyampalayam S, Barker MS, Burleigh JG, Gitzendanner MA, Ruhfel BR, Wafula E, Der JP, Graham SW, Mathews S, Melkonian M, Soltis DE, Soltis PS, Miles NW, Rothfels CJ, Pokorny L, Shaw AJ, DeGironimo L, Stevenson DW, Surek B, Villarreal JC, Roure B, Philippe H, dePamphilis CW, Chen T, Deyholos MK, Baucom RS, Kutchan TM, Augustin MM, Wang J, Zhang Y, Tian Z, Yan Z, Wu X, Sun X, Wong GK-S, Leebens-Mack J (2014) Phylotranscriptomic analysis of the origin and early diversification of land plants. Proceedings of the National Academy of Sciences of the United States of America 111: E4859-E4868 Woessner JP, Goodenough UW (1989) Molecular characterization of a zygote wall protein: An extensin-like molecule in Chlamydomonas reinhardtii. Plant Cell 1: 901-911 Woessner JP, Molendijk AJ, Egmond Pv, Klis FM, Goodenough UW, Haring MA (1994) Domain conservation in several volvocalean cell wall proteins. Plant Mol. Biol. 26: 947-960 Xie Y, Wu G, Tang J, Luo R, Patterson J, Liu S, Huang W, He G, Gu S, Li S, Zhou X, Lam T-W, Li Y, Xu X, Wong GK-S, Wang J (2014) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30: 1660-1666 Xu WL, Zhang DJ, Wu YF, Qin LX, Huang GQ, Li J, Li L, Li XB (2013) Cotton PRP5 gene encoding a proline-rich protein is involved in fiber development. Plant Mol Biol 82: 353-365 Yang Y, Smith SA (2013) Optimizing de novo assembly of short-read RNA-seq data for phylogenomics. BMC Genomics 14: 328 1222 Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. 41 1KP group Class GPI-AGP 1 multk CL-EXT Non-GPIAGP PRP 1 2 2 3 3 4 k 25 multk k 25 multk k 25 multk 4 k 25 GPIEXT AGP Bias 5 6 7 8 9 EXT Bias 10 11 12 PRP Bias 13 15 16 17 <15% motif Shared Bias 18 19 20 21 22 23 Total Total Grand Class 1-4 Class 5-23 Total 24 multk k 25 multk multk k 25 Vascular Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. Core eudicots/Rosids (227) 1629 411 801 159 167 19 4271 1853 186 164 49 1 33 15 24 26 13 8 5 206 111 210 452 84 17 16 5281 6868 2442 1620 16211 5798 Core eudicots/Asterids (224) 1517 461 767 177 140 16 3547 2236 128 184 51 2 48 20 37 46 12 5 3 157 47 154 505 98 32 15 4286 5971 2890 1544 14691 5022 Core eudicots (101) 343 83 288 47 24 6 1437 769 54 40 11 18 15 14 38 32 3 5 119 23 113 138 17 17 Basal eudicots (50) 342 81 223 39 39 803 368 55 65 1 5 12 45 13 11 2 4 62 141 44 11 1 2 2 2 3 4 Magnoliids (25) 134 43 90 19 9 385 226 16 4 Monocots/Commelinid (42) 85 25 24 1 1 932 628 11 8 4 Monocots (63) 226 53 165 28 3 1313 706 33 20 6 Basalmost angiosperms (8) 21 10 22 4 1 91 46 3 4 Conifers (70) 587 158 367 88 50 5 997 562 Gnetales (3) 17 4 3 1 25 12 Chloranthales (1) 1 Ginkgoales (1) 1 2 1 2 2 3 Cycadales (4) 16 7 17 3 29 15 Eusporangiate monilophytes (8) 25 7 32 7 Leptosporangiate monilophytes (54) 154 27 216 44 Lycophytes (16) 45 12 96 24 11 25 18 7 1 122 13 1 2 3 4 1 4 1 2 7 1 2 6 11 17 1 15 12 16 1 1 2 4 1 5 15 11 14 14 60 1 1 1 2 196 126 1608 679 31 66 3 319 187 2 10 1171 609 1 35 5 709 515 2 22 1 144 18 2 5 1 3 1 3 12 17 23 40 10 5 23 10 38 13 2 11 19 10 81 75 22 3 4 2 5 13 1 2 7 10 68 159 13 1 1 1 16 4 16 5 58 54 38 70 7 3 5 10 20 67 5 15 3 2 7 2 3 64 14 1 14 1 18 3 1 1 29 50 39 23 4 5 5 12 32 4 26 6 58 13 4 7 25 9 3 5 2 3 2 12 2 3 71 6 105 41 34 28 905 664 5542 2166 694 1407 488 514 3103 1001 7 6 8 1 22 13 595 618 288 141 1642 662 1670 1042 655 155 3522 1600 4764 1745 1951 1707 787 319 1 138 135 60 43 376 16 909 2001 813 530 4253 1498 28 45 17 9 1 2 1881 2092 4 99 131 36 1 4 7 0 12 15 49 62 25 24 160 69 174 279 253 140 66 633 1798 1989 750 497 5034 1705 350 460 223 89 1122 1235 1357 697 246 3535 1555 1076 867 584 99 2626 1403 227 175 19 45 466 500 Non-vascular Mosses (38) 170 81 15 6 Liverworts (24) 135 62 23 7 Hornworts (6) 2 1 29 11 1 1 2 2 3 1 52 Algae Green algae (152) 242 156 Red algae (20) 5 2 Chromista (algae) (25) 50 21 7 2 Glaucophyta (algae) (5) TOTAL 5 6 4 1 5754 1711 3193 663 455 1 8887 6900 604 348 935 703 241 139 45 220 7 1 3 1 5 1 4 6 1 2 2 1 3 2 2 14758 9146 7062 622 31588 18834 831 609 350 7 1797 1783 989 724 13 3509 2099 211 248 142 50 28649 17652 699 864 142 6 107 186 323 336 168 37 98 691 307 1030 1702 359 136 62 39933 38051 20076 5 7253 606 853 290 85237 47326 Figure 1. Number of HRGP sequences identified by MAAB in each HRGP class by 1KP group. The number of datasets per clade is shown in brackets after the 1KP group name. All MAAB classes are reported for multiple k -mers (multk : 39,49,59,69) (unshaded columns). No sequences were detected for HRGP class 14. Selected data is reported for assemblies with k -mer = 25 (k 25, shaded column). Red text indicates that the sequences detected within this class and 1KP group are likely to be contaminants (see Results/Discussion). A B C % of total HRGPs/group MAAB Class Number 1−4 5−23 24 orders samples Average no. of proteins / MAAB class 1KP group 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 50.6 13.1 36.3 12 227 Core eudicots/Asterids 6.7 3.4 0.6 15.6 0.6 0.8 0.2 0 0.2 0.1 0.2 0.2 0.1 0 0 49.9 11.8 38.4 16 224 Core eudicots/Rosids 7.3 3.6 0.7 19.1 0.8 0.7 0.2 0 0.1 0.1 0.1 0.1 0.1 0 0 0.9 0.5 0.9 2 45.1 14.3 40.6 3 101 Core eudicots 3.4 2.9 0.2 14.2 0.5 0.4 0.1 0.2 0.1 0.1 0.4 0.3 0 53.8 19.7 26.5 8 50 Basal eudicots 6.8 4.5 0.8 16.1 1.1 1.3 0 0.1 0.2 0.9 0.3 0.2 42.9 7.1 50 1 1 Chloranthales Magnoliids 1 2 5 25 36.3 5.4 58.2 3 42 42.9 8 49.1 6 63 42.7 13.6 43.7 4 8 58.2 15.4 26.4 1 70 Conifers 8.4 5.2 0.7 14.2 1.7 0.2 0 54.9 11 34.1 2 3 Gnetales 5.7 80 0 20 1 1 Ginkgoales 1 1 2 45.9 17.8 36.3 1 4 Cycadales 4 4.2 7.2 51.3 13.4 35.3 4 8 Eusp. monilophytes 3.1 4 24.5 46.4 11.6 42 8 54 Lept. monilophytes 2.9 4 0.2 29.8 0.6 1.2 0.1 51.2 9.9 38.9 3 16 Lycophytes 2.8 6 19.9 0.1 0.6 47.4 9.4 43.2 18 38 Mosses 42.4 4.9 52.7 8 24 Liverworts 5.6 39.1 10.1 50.8 3 6 Hornworts 0.3 4.8 37.3 2.5 60.2 34 152 Green algae 1.6 0.1 42.1 0.5 57.4 10 20 35.5 0.5 64 10 25 Chromista (algae) 53.4 1.1 45.5 2 5 Glaucophyta (algae) 22.2 0.5 77.4 1 1 Dinophyceae 53.4 0 46.6 1 1 Euglenozoa 5.4 3.6 0.4 15.4 0.6 0.2 0.6 0 22.2 0.3 0.2 0.1 3.6 2.6 0 20.8 0.5 0.3 0.1 1 0.2 1.4 0.1 0.3 0.4 0 0 0.2 0.2 0.3 0 0.1 0.5 0.7 0.9 1.6 0.4 0.2 0.5 0.2 0.9 0.3 0 0 0.3 0.2 1.3 1.2 0.3 0.1 0.2 0.5 0.1 31 0.2 13 9.3 0.3 0.3 1 0.2 0.5 0.2 0.4 0 2 0.5 1.1 1 0.7 1.3 0 0 0.4 0.1 0.5 0.9 0.7 0.4 0.5 0.1 33.3 0.1 0.2 0.3 0.2 21.9 0 0 0.5 0 0 0 32.5 1 0.4 0.1 0.1 0.8 0.3 0.5 0.3 2 0 0.1 21.8 0.7 0.2 1.5 0.3 0.6 0.1 0.3 1.2 0.3 0 0 0.2 0.3 0.3 0.8 2 0.2 0.6 24 0.3 0.8 0.2 0.2 12.2 2.2 0.4 0.1 0.3 0.5 1.8 0.1 29.5 0.1 0.9 0 3.5 0.2 0.6 0.4 0.2 0.3 0.9 0.1 0 58.5 0.3 1.4 0 2 0 97.1 0.1 41.5 0 0.1 0.4 0.4 0.2 48.2 44.8 37.8 0.3 0.7 0.3 0.2 0.2 0.2 47 4 0 1 0 30.8 0 37.4 39.8 1 2.3 0.2 0.3 0.3 30.2 0.2 0.3 0.5 0.2 0.6 1.6 0.1 0.2 0.1 17.2 0.1 0.2 0.2 0.2 0.2 0.9 0.1 0.1 0.3 0 0 23.8 1 71.3 42.2 164 75 69 % orders with hits 25 50 75 100 Figure 2. Summary of HRGP sequences identified using the MAAB pipeline. A, Percentage of total HRGP sequences (by 1KP group) that are found in the classical HRGP classes (1 to 4), hybrid HRGP classes (5 to 23) and non-HRGPs (class 24). B, Number of orders analyzed and number of samples per order for each 1KP group. The monocots group excludes the commelinid monocots. C, Overview of the mean number of HRGPs for each MAAB class in the 1KP dataset (by 1KP group). Shading of boxes represents the detection rate, indicated as the % of orders with hits (see Supplemental Table S1). A shaded box showing detection of HRGPs with a value of zero indicates an average number of sequences between 0-0.1. Red text indicates that the sequences detected within this HRGP class and 1KP group are likely to be contaminants (see Results/Discussion). Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. Algae 2 0 7 Non-vascular Red algae 4.5 0.4 8.3 1.3 3.6 0.1 0.3 0 Basalmost angiosperms 2.6 2.8 0.1 11.4 0.4 0.5 1 13.9 Vascular 43.9 Monocots 0.1 0.5 0.4 1.2 2.8 0.9 0.2 1 10.4 2 0.4 0.1 0.1 23.6 0 1.2 0.2 1.1 1.4 0.2 0.2 0.1 18.6 3 45.6 Monocots/Commelinids 0 0.7 0.2 0.7 2.2 0.4 0.1 0.1 18.9 Samples Families Species MAAB (class 2) TRAL (class 2) TRAL (class 24) no. species + CL-EXTs - CL-EXTs Liverworts Mosses Hornworts (class 2) in the bryophytes. MAAB detected CL-EXTs in all hornwort species in the 1KP data whereas the distribution in mosses and liverworts shows either loss of, or multiple independent Treubiales Haplomitriales Blasiales Neohodgsoniales Sphaerocarpales Marchantiales Ricciales Pelliales Fossombroniales Pallaviciniales Pleuroziales Metzgeriales Ptilidiales Porellales Jungermanniales Sphagnales Takakiales Andreaeaeales Andreaeobryales Tetraphidales Polytrichales Buxbaumiales Diphysciales Gigaspermales Funariales Timmiales Dicranales Pottiales Grimmiales Hedwigiales Bartramiales Splachnales Bryales Orthotrichales Orthodontiales Rhizogoniales Aulacomniales Hypnodendrales Ptychomniales Hookeriales Hypnales Figure 3. Distribution and number of CL-EXTs 1 1 1 0 0 1 gains of, CL-EXTs. Tandem Repeat Annotation 1 8 1 5 1 7 1 7 1 6 1 7 Libraries (TRAL) of repeats identified in class 2 2 1 2 0 0 2 and class 24 (control) sequences were used to 1 1 1 0 0 1 search the MAAB input sequences and identify 1 1 1 0 0 1 repeats present in the bryophyte lineages. There 1 8 3 1 1 1 7 1 1 1 2 7 3 1 1 0 0 3 0 0 0 0 3 0 0 1 8 2 0 1 2 1 1 2 1 1 2 1 1 0 1 1 0 0 1 1 0 1 2 1 4 1 3 1 1 2 1 3 1 2 1 1 2 1 4 1 3 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 2 4 1 2 1 0 3 1 3 1 3 1 0 0 0 0 1 1 11 11 11 1 0 9 2 2 3 3 3 3 3 1 Leiosporocerotales Anthocerotales 3 1 Notothyladales Phymatocerotales 3 3 Dendrocerotales is good correlation between the species with CLEXTs identified by MAAB and those with SPn/Ybased repeats identified by TRAL using class 2 repeats. No CL-EXTs were identified using TRAL repeats from class 24 sequences. The phylogenetic tree is based on Cole and Hilger (2013) and was selected because it includes all bryophyte orders. Vascular Plants Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. A Figure 4. Mean number of MAAB class 2 CL-EXT 4 Mean proteins by MAAB CL-EXTs 3 % of BUSCO single-copy genes pipeline within using each the family MAAB of monocots/commelinids. Many families lack CL- EXTs due but this is not to poor quality percentage of 956 single-copy orthologous genes found in the monocots/commelinids families is high (avg. ~80%) Family data (k-mer 39 only) are summarized in a box-and-whisker plot after removal 90 of outliers (2 Poaceae sp.: 59.4, 65.7; 1 Aracaeae sp.: 18.1). The mean number of class 1 GPI-AGPs 80 (C) was evaluated and these are present in all 70 families that lack CL-EXTs. Only Cannaceae and Restionaceae have both GPI-AGPs and CL-EXTs, 60 although for most families the number of datasets is 8 GPI-AGPs low (1 to 3). The total number of datasets per family is indicated in parentheses. 6 4 2 Arecaceae (5) Bromeliaceae (1) Cyperaceae (3) Joinvilleaceae (1) Juncaceae (1) Poaceae (21) Restionaceae (1) Typhaceae (2) Cannaceae (1) Heliconiaceae (2) Marantaceae (1) Strelitziaceae (1) 0 Zingiberaceae (2) Mean proteins by MAAB identified transcriptomes as BUSCO analysis (B) shows the 1 100 C (A) 2 0 B sequences Zingiberales Poales Arecales Monocots/Commelinid family (as assigned by 1KP) Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. A 90 VMNH_Locus_18822 52 IPWB_Locus_30545 LAPO_Locus_128094 78 AtAGP1_At5G64310.1|PACid:19673006 52 VMNH_Locus_14415 UAXP_Locus_13472 SWPE_Locus_6427 62 DZTK_Locus_3934 41 HYZL_Locus_2834 99 HYZL_Locus_9103 HYZL_Locus_1464 42 HYZL_Locus_544 51 CZPV_Locus_36180 HYZL_Locus_244 100 CZPV_Locus_9612 AtAGP59_At3G27416.1|PACid:19660750 AtAGP11_At3G01700.1|PACid:19663150 45 VMNH_Locus_9336 83 VMNH_Locus_10786 74 VMNH_Locus_329 43 77 AtAGP6_At5G14380.1|PACid:19665343 MYZV_Locus_476 64 66 MYZV_Locus_9204 53 HYZL_Locus_5884 CZPV_Locus_47861 UAXP_Locus_29413 74 83 UAXP_Locus_4754 98 UVQL_Locus_476 LAPO_Locus_2422 58 100 AtAGP5_At1G35230.1|PACid:19657214 98 HABV_Locus_1044 54 UVQL_Locus_1887 AtAGP10_At4G09030.1|PACid:19644227 100 IPWB_Locus_34587 90 91 VMNH_Locus_19171 DZTK_Locus_3667 MYZV_Locus_5182 HYZL_Locus_12251 MYZV_Locus_4182 91 CRNC_Locus_20076 IPWB_Locus_2586 67 92 AtAGP3_At4G40090.1|PACid:19648772 AtAGP2_At2G22470.1|PACid:19638226 DZTK_Locus_800 JOIS_Locus_5719 100 TZWR_Locus_6853 43 AtAGP4_At5G10430.1|PACid:19665913 LAPO_Locus_16533 99 TZWR_Locus_4310 97 AtAGP7_At5G65390.1|PACid:19672144 64 HYZL_Locus_6813 83 CZPV_Locus_24329 CRNC_Locus_4364 95 VMNH_Locus_2993 76 IPWB_Locus_377 IPWB_Locus_13012 100 TZWR_Locus_973 65 89 50 AtAGP9_At2G14890.1|PACid:19640833 UAXP_Locus_11432 HYZL_Locus_1046 56 CZPV_Locus_36036 62 46 57 77 80 AtAGP1 AtAGP6/11 AtAGP59 56 100 IPWB_Locus_7995 LAPO_Locus_140 DZTK_Locus_278 AtEXT3_AT1G21310_Class20_manual_add 66 CSUV_Locus_1111 VMNH_Locus_21622 92 HABV_Locus_8657 AtEXT1_At1G76930.1|PACid:19657443 79 CRNC_Locus_5130 HYZL_Locus_6673 MYZV_Locus_17462 48 HYZL_Locus_11975 CZPV_Locus_41301 45 JOIS_Locus_247 JOIS_Locus_9845 JOIS_Locus_502 UAXP_Locus_12039 UAXP_Locus_16149 SWPE_Locus_8953 AtEXT21_At1G26250.1|PACid:19658113 43 tandem duplication AtEXT20_At1G26240.1|PACid:19655134 98 AtEXT22_At4G08370.1|PACid:19648501 AtEXT17_At4G08380.1|PACid:19645658 41 AtAGP5/10 SWPE_Locus_1416 AtEXT18_At4G13390.1|PACid:19648172 TZWR_Locus_6475 AtEXT15_At5G35190.1|PACid:19670300 AtEXT11_At4G08400.1|PACid:19645018 89 AtEXT7_At2G24980.1|PACid:19638555 71 AtEXT13_At5G06630.1|PACid:19667515 AtEXT2_At3G54590.1|PACid:19665143 56 tandem duplication 88 AtEXT10_At3G54580.1|PACid:19664950 AtEXT9_At3G28550.1|PACid:19661711 AtEXT35_At3G20850.1|PACid:19664633 HYZL_Locus_199 42 MYZV_Locus_22840 HYZL_Locus_10140 69 JOIS_Locus_1958 51 CRNC_Locus_1155 DZTK_Locus_3778 UAXP_Locus_18193 81 IPWB_Locus_10603 85 CSUV_Locus_11630 97 AtEXT8_At2G43150.1|PACid:19639362 UVQL_Locus_10839 49 HABV_Locus_1095 59 LAPO_Locus_26253 100 HYZL_Locus_41339 HYZL_Locus_25529 MYZV_Locus_4163 96 92 HABV_Locus_341 96 LAPO_Locus_6067 TZWR_Locus_943 64 0.5 IPWB_Locus_15342 69 49 HABV_Locus_9830 81 92 AtAGP2/3 AtAGP4/7 AtAGP9 JOIS_Locus_1119 AtAGP HYZL_Locus_1648 17/18 CZPV_Locus_16268 MYZV_Locus_8965 DZTK_Locus_3085 67 IPWB_Locus_8564 AtAGP17_At2G23130.1|PACid:19639138 AtAGP18_At4G37450.1|PACid:19645067 80 100 CSUV_Locus_9603_T2/2 89 CSUV_Locus_9603_T1/2 IPWB_Locus_633 100 VMNH_Locus_2336 VMNH_Locus_27554 MYZV_Locus_28246 MYZV_Locus_23577 98 HYZL_Locus_1144 61 VMNH_Locus_7551 AtAGP 75 88 AtAGP25_At5G18690.1|PACid:19666952 25/26/27 AtAGP27_At3G06360.1|PACid:19664968 VMNH_Locus_17896 AtAGP26_At2G47930.1|PACid:19640209 99 TZWR_Locus_23380 CRNC_Locus_15389 MYZV_Locus_3750 HYZL_Locus_1766 77 AtAGP58 40 CZPV_Locus_22718 CRNC_Locus_98 MYZV_Locus_1492 CSUV_Locus_2765 67 VMNH_Locus_1191 89 AtAGP58_At4G16980.1|PACid:19647627 54 TZWR_Locus_39 0.5 69 LAPO_Locus_1150 69 67 56 B Brassicales HYZL Akaniaceae MYZV Tropaeolaceae CZPV Moringaceae CRNC Limnanthaceae JOIS Koeberliniaceae DZTK Bataceae SWPE Resedaceae UAXP Gyrostemonaceae Arabidopsis Brassicaceae CSUV Brassicaceae HABV Brassicaceae IPWB Brassicaceae LAPO Brassicaceae TZWR Brassicaceae UVQL Brassicaceae VMNH Brassicaceae Figure 5. ML tree of Brassicales 1KP and Arabidopsis GPI-AGPs (A), and CL-EXTs (B). (A,B) ML trees generally show strong support for sub-clades with Arabidopsis sequences and 1KP sequences from family Brassicaceae. Putative GPI-AGP sub-clades (A) are separated by a horizontal dotted line and the Arabidopsis ortholog(s) are indicated by boxes (right of tree). Sequences from the Brassicaceae family are in green with Arabidopsis sequences in larger, bold font. AtEXT3 was included with CL-EXTs (B) even though it was classified as class 20 because of its shared bias (<2% difference (Δ)) between % PSKY (74.2%) and % PVYK (73.1%); see Johnson et al., companion paper). Numbers on the nodes represent support with 100 bootstrap replicates (≥70 green, 60-69 orange, 40-59 black). 1KP datasets (1KP locus ID and family name) are shown on the Downloaded from on June 17, 2017 Brassicales - Published bytree www.plantphysiol.org (inset) (Stevens (2001), Version 12, July 2012 Copyright © 2017 American Society of Plant Biologists. All rights reserved. http://www.mobot.org/mobot/research/apweb/). Scale bars for branch length measures the number of substitutions per site. Sequences: GPI-AGPs Brassicales (Supplemental Fig. S5A) and CL-EXTs Brassicales (Supplemental Fig. S5B). Rosid_Brassicales_AtAGP11 Rosid_Brassicales_VMNH_10291 85 Rosid_Brassicales_AtAGP6 Rosid_Brassicales_Brrap_XP_009121671 97 Rosid_Brassicales_VMNH_329 100 Rosid_Brassicales_Brrap_XP_009131402 Rosid_Fabales_Medtr6g021940 P Rosid_Fabales_RKFX_3135 Asterid_Apiales_CWYJ_8974 Asterid_Dipsacales_CAQZ_225 Asterid_Solanales_Solyc_XP_010314526.1* 88 MonocotComm_Poales_ROEI_15291* P Asterid_Asterales_DESP_2262* P 100 Asterid_Asterales_TEZA_272* Asterid_Ericales_SERM_1392 100 Asterid_Ericales_SERM_4447 Asterid_Gentianales_ECTD_9536 Asterid_Asterales_DESP_3601* BasalEudicots_Proteales_Nelnu_XP_010262888_1 BasalEudicots_Proteales_Nelnu_XP_010262887.1 64 Rosid_Myrtales_Eucgr_XP_010034647 71 Asterid_Gentianales_UFQC_11993* P Rosid_Vitales_Vivin_XP_010660830_1 Rosid_Rosales_KYAD_1380 Rosid_Brassicales_AtAGP50* Rosid_Malpighiales_CKDK_252* 42 P Rosid_Malpighiales_CKDK_16890 BasalEudicot_Ranunculales_Aquca_scaffold_11:4769931..4770658* Rosid_Malpighiales_Potri_001G339700 Rosid_Malpighiales_OODC_940 Rosid_Malpighiales_OODC_6813 90 Rosid_Fagales_Qurob_scaffold_362 CoreEudicot_Caryophyllales_MRKX_37906 P 41 CoreEudicot_Caryophyllales_BJKT_5* P BasalEudicot_Ranunculales_XHKT_2993* BasalEudicot_Ranunculales_CCHG_13874 P BasalEudicot_Ranunculales_NMGG_7583 BasalEudicot_Ranunculales_NMGG_682 Rosid_Zygophyllales_ZHMB_16956 P BasalEudicot_Ranunculales_UDHA_3261 BasalEudicot_Ranunculales_CCHG_5462* P BasalEudicot_Ranunculales_ERXG_35249 P BasalEudicot_Ranunculales_XMVD_14518* BasalEudicot_Ranunculales_ERXG_12856 Rosid_Malpighiales_VVPY_1668 Rosid_Fagales_Qurob_scaffold_554* Monocot_Arecales_Phdac_XP_008788907_1 100 Monocot_Arecales_Elgui_XP_010943292_1 P Monocot_Asparagales_XZME_884* Monocot_Asparagales_EMJJ_7786* Rosid_Brassicales_AtAGP58 Rosid_Brassicales_AtAGP59 MonocotComm_Poales_Seita_4G125000_1* 100 80 MonocotComm_Poales_BXAY_2382 96 MonocotComm_PoalesAsparagales_YXNR_30* C MonocotComm_Poales_OsAGP7_LOC_Os06g17450_1 Monocot_Liliales_OOSO_6521 MonocotComm_PoalesAsparagales_YXNR_16252* C P MonocotComm_Poales_OsAGP10_LOC_Os08g38250_1 96 MonocotComm_Poales_Seita_6G192500_1* 64 MonocotComm_Poales_BXAY_427 94 MonocotComm_Poales_OsAGP1_LOC_Os08g37630_1 Rosid_Fabales_OHAE_12357 58 Rosid_Fabales_OHAE_4810 Rosid_Brassicales_AtAGP2 62 Rosid_Brassicales_AtAGP4 Rosid_Brassicales_AtAGP5 Rosid_Brassicales_AtAGP1 BasalAngiosperm_Amborellales_Amtri_ERM95342* BasalAngiosperm_Amborellales_Amtri_ERN06202* Rosid_Brassicales_AtAGP17 MonocotComm_Poales_LOC_Os04g41480_1_HisRichAGP MonocotComm_Poales_LOC_Os11g10890_1 Rosid_Brassicales_AtAGP25 Rosid_Brassicales_AtAGP27 68 Rosid_Brassicales_AtAGP26 MonocotComm_Poales_LOC_Os01g71170_1 MonocotComm_Poales_LOC_Os11g05935_1 Rosid_Brassicales_AtAGP9 BasalAngiosperm_Amborellales_Amtri_ERM96654* BasalAngiosperm_Amborellales_Amtri_ERN01113* 72 52 Figure 6. Phylogenetic analysis of class 1 GPI-AGPs to identify AtAGP6/11 orthologs. ML tree (MEGA) was constructed using putative AtAGP6/11 orthologs identified predominantly from 1KP transcriptomes and HMMER model 1 (see Supplemental Table S3), with a few sequences from other sources to increase the breadth of sampling (see Methods; Supplemental Table S4). The tree also includes at least one sequence from each Arabidopsis GPI-AGP sub-clade, representative rice GPI-AGPs identified by MAAB (see Fig. 2, Johnson et al., companion paper) and four GPI-AGPs from Amborella trichopoda (Amtri_ERM96654, Amtri_ERN01113 and Amtri_ERM95342, Amtri_ERN06202). Sequence names are coloured by 1KP group: basal angiosperms (grey), non-commelinid monocots (purple), commelinid monocots (pink), basal eudicots (aqua), core eudicots (green), asterids (red) and rosids (orange). Numbers on the nodes represent support with 100 bootstrap replicates (≥70 green, 60-69 orange, 40-59 black). 1KP sequences are identified by Group_Order_1KP identifier_sequence locus. Genomic sequences and other sequences from NCBI follow a similar format, but include a five letter abbreviation of genus and species (see Methods). Symbols next to sequence names are used to indicate the source and MAAB class of sequences and other relevant information. An asterisk (*) after the 0.5 C P Phytozome v9, GPI-AGP (MAAB class 1) NCBI or Phytozome v11, GPI-AGP 1KP multK, GPI-AGP (class 1) 1KP multK, GPI-AGP (class 1), contaminant, same order 1KP multK, class 4, partial GPI-AGP or big-PI false neg 1KP sequence used as query at Phytozome/NCBI AtAGP6 or OsAGP7 used as query at Phytozome/NCBI Sequence used to generate AGP6/11 HMMER model AtAGP6 ortholog identified by HMMER model False positive, HMMER model Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. A. AGP17/18 subclade, XP1-2 dominant M Q scattered K K-rich Figure 7. Schematic representation of the GPI-AGP subclades showing the presence and distribution of specific amino acids in the PAST-rich protein backbone. Order of GPI-AGPs is based on clades in angiosperm GPI-AGP tree B. AGP9 subclade, [S/T]P3 dominant M Q (Supplemental Fig. S3); figure part numbers (A to K) are scattered K based on alignments (Supplemental Fig. S4) and the AtAGP5/10 (G) subclade is included for comparison (based C. AGP4/7/10 D. AGP2/3 G/H. AGP5/10 I. AGP1 M subclade(s) Q on AGP-c subclade (see Fig. 2C, Johnson et al., companion paper). A, Putative orthologs of Lys-rich AtAGP17/18 have scattered lysine (K) residues throughout the sequence and E. AGP58 subclade Q M contain a short K-rich domain near the C-terminus. The scattered M glycomotifs are predominantly XP1-2. B, The AGP9 subclade also has a short Lys-rich region with ≥3 K residues F. and scattered K residues, however, the glycomotifs are AGP25/26/27 subclade M distinct from AtAGP17/18, being predominantly [S/T]P3. AGP4/7/10 (C), AGP2/3 (D), AGP5/10 (G), AGP5 (H) and AGP1(I) sub-clades have a ‘classical’ PAST-rich backbone J/K. AGP6/11/59 subclade M with no obvious bias towards other amino acids. E, Most of scattered K scattered D/E the putative orthologs of AtAGP58 are relatively methionine (M)-rich, with scattered M residues between AGP glycomotifs and contain a Q residue at the presumed NER signal PAST-rich backbone GPI signal terminus. This Q residue at the presumed mature N- ER or GPI cleavage site terminus is also found in putative orthologs of Lys-rich AtAGP17/18 (A), AtAGP9 (B), AtAGP1/2/3/4/7/5/10 (C,D,G,H,I), and AtAGP58 (E), but not AtAGP25/26/27 (F), or AGP6/11/59 (J/K). J/K, Putative orthologs of AtAGP6/11/59 share scattered K residues, concentrated in the first half of the protein and also a small cluster of acidic residues, either aspartic acid (D) or glutamic acid (E). Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. Parsed Citations Adair WS, Hwang C, Goodenough UW (1983) Identification and visualization of the sexual agglutinin from the mating-type plus flagellar membrane of Chlamydomonas. Cell 33: 183-193 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title APG IV (2016) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Botanical Journal of the Linnean Society 181: 1-20 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Averyhart-Fullard V, Datta K, Marcus A (1988) A hydroxyproline-rich protein in the soybean cell wall. Proc. Natl. Acad. Sci. USA 85: 1082-1085 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Bacic A, Harris PJ, Stone BA (1988) Structure and function of plant cell walls. In J Priess, ed, The Biochemistry of Plants, Vol 14. Academic Press, New York, pp 297-371 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Basile DV, Basile MR (1987) The Occurrence of Cell Wall-Associated Arabinogalactan Proteins in the Hepaticae. Bryologist 90: 401404 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Bernhardt C, Tierney ML (2000) Expression of AtPRP3, a proline-rich structural cell wall protein from Arabidopsis, is regulated by cell-type-specific developmental pathways involved in root hair formation. Plant Physiology 122: 705-714 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Biegert A, Soding J (2008) De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics 24: 807-814 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Bollig K, Lamshoft M, Schweimer K, Marner FJ, Budzikiewicz H, Waffenschmidt S (2007) Structural analysis of linear hydroxyproline-bound O-glycans of Chlamydomonas reinhardtii--conservation of the inner core in Chlamydomonas and land plants. Carbohydr Res 342: 2557-2566 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Borner GHH, Sherrier DJ, Weimar T, Michaelson LV, Hawkins ND, MacAskill A, Napier JA, Beale MH, Lilley KS, Dupree P (2005) Analysis of detergent-resistant membranes in Arabidopsis. Evidence for plasma membrane lipids rafts. Plant Physiology 137: 104116 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Bowman JL (2013) Walkabout on the long branches of plant evolution. Current Opinion in Plant Biology 16: 70-77 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Cannon MC, Terneus K, Hall Q, Tan L, Wang Y, Wegenhart BL, Chen L, Lamport DTA, Chen Y, Kieliszewski MJ (2008) Selfassembly of the plant cell wall requires an extensin scaffold. Proceedings of the National Academy of Sciences, USA 105: 22262231 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Carpita NC, Gibeaut DM (1993) Structural models of primary cell walls in flowering plants: consistency of molecular structure with the physical properties of the walls during growth. Plant Journal 3: 1-30 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Chaw SM, Chang CC, Chen HL, Li WH (2004) Dating the monocot-dicot divergence and the origin of core eudicots using whole chloroplast genomes. Journal of Molecular Evolution 58: 424-441 Pubmed: Author and Title Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Chen J, and Varner JE (1985) Isolation and characterization of cDNA clones for carrot extensin and a proline-rich 33-kDa protein. Proc Natl Acad Sci USA 82: 4399-4403 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Choudhary P, Saha P, Ray T, Tang Y, Yang D, Cannon MC (2015) EXTENSIN18 is required for full male fertility as well as normal vegetative growth in Arabidopsis. Front Plant Sci 6: 553 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Coimbra S, Costa M, Jones B, Mendes MA, Pereira LG (2009) Pollen grain development is compromised in Arabidopsis agp6 agp11 null mutants. J Exp Bot 60: 3133-3142 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Coimbra S, Costa M, Mendes MA, Pereira AM, Pinto J, Pereira LG (2010) Early germination of Arabidopsis pollen in a double null mutant for the arabinogalactan protein genes AGP6 and AGP11. Sexual Plant Reproduction 23: 199-205 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Cole TCH, Hilger HH (2013) Bryophyte phylogeny, viewed 07/09/2015, http://www2.biologie.fu-berlin.de/sysbot/poster/BPP-E.pdf. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Costa M, Nobre MS, Becker JD, Masiero S, Amorim MI, Pereira LG, Coimbra S (2013) Expression-based and co-localization detection of arabinogalactan protein 6 and arabinogalactan protein 11 interactors in Arabidopsis pollen and pollen tubes. BMC Plant Biol 13: 7 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Costa ML, Sobral R, Ribeiro Costa MM, Amorim MI, Coimbra S (2015) Evaluation of the presence of arabinogalactan proteins and pectins during Quercus suber male gametogenesis. Ann Bot 115: 81-92 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Datta K, Schmidt A, Marcus A (1989) Characterization of two soybean repetitive proline-rich proteins and a cognate cDNA from germinated axes. Plant Cell 1: 945-952 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Doblin MS, Johnson KL, Humphries J, Newbigin EJ, Bacic A (2014) Are designer plant cell walls a realistic aspiration or will the plasticity of the plant's metabolism win out? Current Opinion in Biotechnology 26: 108-114 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Doblin MS, Pettolino F, Bacic A (2010) Plant cell walls: the skeleton of the plant world. Functional Plant Biology 37: 357-381 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Domozych DS, Ciancia M, Fangel JU, Mikkelsen MD, Ulvskov P, Willats WG (2012) The Cell Walls of Green Algae: A Journey through Evolution and Diversity. Front Plant Sci 3: 82 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Domozych DS, Wilson R, Domozych CR (2009) Photosynthetic eukaryotes of freshwater wetland biofilms: adaptations and structural characteristics of the extracellular matrix in the green alga, Cosmarium reniforme (Zygnematophyceae, Streptophyta). Journal of Eukaryotic Microbiology 56: 314-322 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Draeger C, Ndinyanka Fabrice T, Gineau E, Mouille G, Kuhn BM, Moller I, Abdou MT, Frey B, Pauly M, Bacic A, Ringli C (2015) Arabidopsis leucine-rich repeat extensin (LRX) proteins modify cell wall composition and influence plant growth. BMC Plant Biol 15: 155 Pubmed: Author and Title CrossRef: Author and Title Downloaded Google Scholar: Author Only Title Only Author and Title from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 17921797 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460-2461 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Eisenhaber B, Wildpaner M, Schultz CJ, Borner GHH, Dupree P, Eisenhaber F (2003) Glycosylphosphatidylinositol lipid anchoring of plant proteins. Sensitive prediction from sequence- and genome-wide studies for Arabidopsis and rice. Plant Physiology 133: 1691-1701 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Ellis M, Egelund J, Schultz CJ, Bacic A (2010) Arabinogalactan-proteins: key regulators at the cell surface? Plant Physiology 153: 403-419 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Ferris PJ, Waffenschmidt S, Umen JG, Lin H, Lee JH, Ishida K, Kubo T, Lau J, Goodenough UW (2005) Plus and minus sexual agglutinins from Chlamydomonas reinhardtii. Plant Cell 17: 597-615 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Ferris PJ, Woessner JP, Waffenschmidt S, Kilz S, Drees J, and Goodenough UW (2001) Glycosylated polyproline II rods with kinks as a structural motif in plant hydroxyproline-rich glycoproteins. Biochemistry 40: 2978-2987 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Filmus J, Capurro M, Rast J (2008) Glypicans. Genome Biol. 9: 224 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Fincher GB, Stone BA, Clarke AE (1983) Arabinogalactan-proteins: structure, biosynthesis and function. Annual Review of Plant Physiology 34: 47-70 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Fong C, Kieliszewski MJ, Zacks Rd, Leykam JF, Lamport DTA (1992) A gymnosperm extensin contains the serinetetrahydroxyproline motif. Plant Physiology 99: 548-552 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Fowler TJ, Bernhardt C, Tierney ML (1999) Characterization and expression of four proline-rich cell wall protein genes in Arabidopsis encoding two distinct subsets of multiple domain proteins. Plant Physiology 121: 1081-1091 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Gaspar Y, Johnson KL, McKenna JA, Bacic A, Schultz CJ (2001) The complex structures of arabinogalactan-proteins and the journey towards understanding function. Plant Mol Biol. 47: 161-176 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge YC, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang JH (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 5 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Gerken TA, Revoredo L, Thome JJ, Tabak LA, Vester-Christensen MB, Clausen H, Gahlay GK, Jarvis DL, Johnson RW, Moniz HA, Moremen K (2013) The lectin domain of the polypeptide GalNAc transferase family of glycosyltransferases (ppGalNAc Ts) acts as a switch directing glycopeptide substrate glycosylation in an N- or C-terminal direction, further controlling mucin type Oglycosylation. J Biol Chem 288: 19900-19914 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. Godl K, Hallmann A, Wenzl S, Sumper M (1997) Differential targeting of closely related ECM glycoproteins: The pherophorin family from Volvox. EMBO Journal 16: 25-34 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Gotelli IB, Cleland R (1968) Differences in the occurrence and distribution of hydroxyproline-proteins among the algae. Am J Bot 55: 907-914 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Grennan AK (2007) Lipid rafts in plants. Plant Physiol 143: 1083-1085 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Gruenheit N, Deusch O, Esser C, Becker M, Voelckel C, Lockhart P (2012) Cutoffs and k-mers: implications from a transcriptome study in allopolyploid plants. Bmc Genomics 13 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Harris PJ (2005) Diversity in plant cell walls In RJ Henry, ed, Plant Diversity and Evolution: Genotypic and Phenotypic Variation in Higher Plants. CAB International Publishing, Wallingford, pp 201-227 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Hashimoto K, Tokimatsu T, Kawano S, Yoshizawa AC, Okuda S, Goto S, Kanehisa M (2009) Comprehensive analysis of glycosyltransferases in eukaryotic genomes for structural and functional characterization of glycans. Carbohydrate Research 344: 881-887 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Herve C, Simeon A, Jam M, Cassin A, Johnson KL, Salmean AA, Willats WG, Doblin MS, Bacic A, Kloareg B (2016) Arabinogalactan proteins have deep roots in eukaryotes: identification of genes and epitopes in brown algae and their role in Fucus serratus embryo development. New Phytol 209: 1428-1441 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Ishizaki K, Nishihama R, Yamato KT, Kohchi T (2016) Molecular Genetic Tools and Techniques for Marchantia polymorpha Research. Plant Cell Physiol 57: 262-270 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Jermyn MA, Yeow YM (1975) A class of lectins present in the tissues of seed plants. Australian Journal of Plant Physiology 2: 501531 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Johnson MTJ, Carpenter EJ, Tian Z, Bruskiewich R, Burris JN, Carrigan CT, Chase MW, Clarke ND, Covshoff S, dePamphilis CW, Edger PP, Goh F, Graham S, Greiner S, Hibberd JM, Jordon-Thaden I, Kutchan TM, Leebens-Mack J, Melkonian M, Miles N, Myburg H, Patterson J, Pires JC, Ralph P, Rolf M, Sage RF, Soltis D, Soltis P, Stevenson D, Stewart CN, Jr., Surek B, Thomsen CJM, Villarreal JC, Wu X, Zhang Y, Deyholos MK, Wong GK-S (2012) Evaluating methods for isolating total RNA and predicting the success of sequencing phylogenetically diverse plant transcriptomes. Plos One 7: e50226 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Jorda J, Kajava AV (2009) T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics 25: 2632-2638 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Kieliszewski M, Leykam JF, Lamport DTA (1990) Structure of the threonine-rich extensin from Zea mays. Plant Physiology 85: 823827 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Kieliszewski MJ, Lamport DTA (1994) Extensin: repetitive motifs, functional sites, post-translational codes, and phylogeny. Plant Journal 5: 157-172 Pubmed: Author and Title CrossRef: Author and Title Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. Google Scholar: Author Only Title Only Author and Title Kitazawa K, Tryfona T, Yoshimi Y, Hayashi Y, Kawauchi S, Antonov L, Tanaka H, Takahashi T, Kaneko S, Dupree P, Tsumuraya Y, Kotake T (2013) beta-galactosyl Yariv reagent binds to the beta-1,3-galactan of arabinogalactan proteins. Plant Physiol 161: 11171126 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Kleis-San Francisco SM, Tierney ML (1990) Isolation and characterization of a proline-rich cell wall protein from soybean seedlings. Plant Physiology 94: 1897-1902 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Knox JP, Linstead PJ, Peart JM, Cooper C, Roberts K (1991) Developmentally regulated epitopes of cell surface arabinogalactan proteins and their relation to root tissue pattern formation. Plant Journal 1: 317-326 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Kofuji R, Hasebe M (2014) Eight types of stem cells in the life cycle of the moss Physcomitrella patens. Curr Opin Plant Biol 17: 1321 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Kurotani A, Sakurai T (2015) In Silico Analysis of Correlations between Protein Disorder and Post-Translational Modifications in Algae. Int J Mol Sci 16: 19812-19835 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Lamport DTA (1977) Structure, biosynthesis and significance of cell wall glycoproteins. Recent Advances in Phytochemistry 11: 79115 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Lamport DTA, Kieliszewski MJ, Chen Y, Cannon MC (2011) Role of the extensin superfamily in primary cell wall architecture. Plant Physiology 156: 11-19 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Lamport DTA, Miller DH (1971) Hydroxyproline arabinosides in the plant kingdom. Plant Physiol. 48: 454-456 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Lee KJD, Sakata Y, Mau SL, Pettolino F, Bacic A, Quatrano RS, Knight CD, Knox JP (2005) Arabinogalactan proteins are required for apical cell extension in the moss Physcomitrella patens. Plant Cell 17: 3051-3065 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Leliaert F, Smith DR, Moreau H, Herron MD, Verbruggen H, Delwiche CF, De Clerck O (2012) Phylogeny and molecular evolution of the green algae. Critical Reviews in Plant Sciences 31: 1-46 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Levitin B, Richter D, Markovich I, Zik M (2008) Arabinogalactan proteins 6 and 11 are required for stamen and pollen function in Arabidopsis. Plant J 56: 351-363 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Li J, Ouyang B, Wang T, Luo Z, Yang C, Li H, Sima W, Zhang J, Ye Z (2016) HyPRP1 Gene Suppressed by Multiple Stresses Plays a Negative Role in Abiotic Stress Tolerance in Tomato. Front Plant Sci 7: 967 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Li X-B, Kieliszewski M, Lamport DTA (1990) A chenopod extensin lacks repetitive tetrahydroxyproline blocks. Plant Physiology 92: 327-333 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Downloaded on June 2017Diversity - Published Ligrone R, Vaughn KC, Renzaglia KS, Knox JP,from Duckett JG 17, (2002) in by thewww.plantphysiol.org distribution of polysaccharide and glycoprotein Copyright © 2017 American Society of Plant Biologists. All rights reserved. epitopes in the cell walls of bryophytes: new evidence for the multiple evolution of water-conducting cells. New Phytologist 156: 491-508 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Lindstrom JT, Vodkin LO (1991) A soybean cell wall protein is affected by seed color genotype. Plant Cell 3: 561-571 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Liu X, Wolfe R, Welch LR, Domozych DS, Popper ZA, Showalter AM (2016) Bioinformatic Identification and Analysis of Extensins in the Plant Kingdom. PLoS One 11: e0150177 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title MacAlister CA, Ortiz-Ramírez C, Becker JD, Feijó JA, Lippman ZB (2016) Hydroxyproline O-arabinosyltransferase mutants oppositely alter tip growth in Arabidopsis thaliana and Physcomitrella patens. Plant Journal 85: 193-208 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Majewska-Sawka A, Nothnagel EA (2000) The multiple roles of arabinogalactan proteins in plant development. Plant Physiology 122: 3-9 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Matasci N, Hung LH, Yan Z, Carpenter EJ, Wickett NJ, Mirarab S, Nguyen N, Warnow T, Ayyampalayam S, Barker M, Burleigh JG, Gitzendanner MA, Wafula E, Der JP, DePamphilis CW, Roure B, Philippe H, Ruhfel BR, Miles NW, Graham SW, Mathews S, Surek B, Melkonian M, Soltis DE (2014) Data access for the 1,000 plants (1KP) project. Gigascience 3: 1 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Moesa HA, Wakabayashi S, Nakai K, Patil A (2012) Chemical composition is maintained in poorly conserved intrinsically disordered regions and suggests a means for their classification. Molecular Biosystems 8: 3262-3273 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Moller I, Sørensen I, Bernal AJ, Blaukopf C, Lee K, Øbro J, Pettolino F, Roberts A, Mikkelsen JD, Knox JP, Bacic A, Willats WGT (2007) High-throughput mapping of cell-wall polymers within and between plants using novel microarrays. Plant Journal 50: 11181128 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Moore JP, Nguema-Ona E, Fangel JU, Willats WG, Hugo A, Vivier MA (2014) Profiling the main cell wall polysaccharides of grapevine leaves using high-throughput and fractionation methods. Carbohydr Polym 99: 190-198 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Motose H, Sugiyama M, Fukuda H (2004) A proteoglycan mediates inductive interaction during plant vascular development. Nature 429: 873-878 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Nguema-Ona E, Coimbra S, Vicré-Gibouin M, Mollet J-C, Driouich A (2012) Arabinogalactan proteins in root and pollen-tube cells: distribution and functional aspects. Annals of Botany 110: 383-404 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Pattathil S, Avci U, Baldwin D, Swennes AG, McGill JA, Popper Z, Bootten T, Albert A, Davis RH, Chennareddy C, Dong R, O'Shea B, Rossi R, Leoff C, Freshour G, Narra R, O'Neil M, York WS, Hahn MG (2010) A comprehensive toolkit of plant cell wall glycandirected monoclonal antibodies. Plant Physiology 153: 514-525 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Paulsen BS, Craik DJ, Dunstan DE, Stone BA, Bacic A (2014) The Yariv reagent: behaviour in different solvents and interaction with a gum arabic arabinogalactan-protein. Carbohydr Polym 106: 460-468 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. Pereira AM, Lopes AL, Coimbra S (2016) JAGGER, an AGP essential for persistent synergid degeneration and polytubey block in Arabidopsis. Plant Signal Behav 11: e1209616 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Pereira AM, Masiero S, Nobre MS, Costa ML, Solis MT, Testillano PS, Sprunck S, Coimbra S (2014) Differential expression patterns of arabinogalactan proteins in Arabidopsis thaliana reproductive tissues. J Exp Bot 65: 5459-5471 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Pereira AM, Pereira LG, Coimbra S (2015) Arabinogalactan proteins: rising attention from plant biologists. Plant Reprod 28: 1-15 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Popper ZA, Michel G, Herve C, Domozych DS, Willats WG, Tuohy MG, Kloareg B, Stengel DB (2011) Evolution and diversity of plant cell walls: from algae to flowering plants. Annu Rev Plant Biol 62: 567-590 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Prochnik SE, Umen J, Nedelcu AM, Hallmann A, Miller SM, Nishii I, Ferris P, Kuo A, Mitros T, Fritz-Laylin LK, Hellsten U, Chapman J, Simakov O, Rensing SA, Terry A, Pangilinan J, Kapitonov V, Jurka J, Salamov A, Shapiro H, Schmutz J, Grimwood J, Lindquist E, Lucas S, Grigoriev IV, Schmitt R, Kirk D, Rokhsar DS (2010) Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri. Science 329: 223-226 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Schaefer L, Schaefer RM (2010) Proteoglycans: from structural compounds to signaling molecules. Cell Tissue Res 339: 237-246 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Schaper E, Korsunsky A, Pecerska J, Messina A, Murri R, Stockinger H, Zoller S, Xenarios I, Anisimova M (2015) TRAL: tandem repeat annotation library. Bioinformatics 31: 3051-3053 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Schultz CJ, Gilson P, Oxley D, Youl JJ, Bacic A (1998) GPI-anchors on arabinogalactan-proteins: implications for signalling in plants. Trends in Plant Science 3: 426-431 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Schultz CJ, Rumsewicz MP, Johnson KL, Jones BJ, Gaspar YM, Bacic A (2002) Using genomic resources to guide research directions. The arabinogalactan protein gene family as a test case. Plant Physiology 129: 1448-1463 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Sherrier DJ, Taylor GS, Silverstein KA, Gonzales MB, VandenBosch KA (2005) Accumulation of extracellular proteins bearing unique proline-rich motifs in intercellular spaces of the legume nodule parenchyma. Protoplasma 225: 43-55 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Shibaya T, Sugawara Y (2009) Induction of multinucleation by ß-glucosyl Yariv reagent in regenerated cells from Marchantia polymorpha protoplasts and involvement of arabinogalactan proteins in cell plate formation. Planta 230: 581-588 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Shimizu M, Igasaki T, Yamada M, Yuasa K, Hasegawa J, Kato T, Tsukagoshi H, Nakamura K, Fukuda H, Matsuoka K (2005) Experimental determination of proline hydroxylation and hydroxyproline arabinogalactosylation motifs in secretory proteins. Plant Journal 42: 877-889 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Showalter AM, Basu D (2016) Extensin and arabinogalactan-protein biosynthesis: glycosyltransferases, research challenges, and biosensors. Frontiers in Plant Science 7: 814 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. completeness with single-copy orthologs. Bioinformatics 31: 3210-3212 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Smith JJ, Muldoon EP, Willard JJ, Lamport DTA (1986) Tomato Extensin Precursors P1 and P2 Are Highly Periodic Structures. Phytochemistry 25: 1021-1030 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Somerville C, Bauer S, Brininstool G, Facette M, Hamann T, Milne J, Osborne E, Paredez A, Persson S, Raab T, Vorwerk S, Youngs H (2004) Toward a systems approach to understanding plant cell walls. Science 306: 2206-2211 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Sørensen I, Domozych D, Willats WGT (2010) How have plant cell walls evolved? Plant Physiology 153: 366-372 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Sørensen I, Pettolino FA, Bacic A, Ralph J, Lu F, O'Neill MA, Fei Z, Rose JKC, Domozych DS, Willats WGT (2011) The charophycean green algae provide insights into the early origins of plant cell walls. Plant Journal 68: 201-211 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Szalkowski AM, Anisimova M (2013) Graph-based modeling of tandem repeats improves global multiple sequence alignment. Nucleic Acids Res 41: e162 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30: 2725-2729 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Tan L, Eberhard S, Pattathil S, Warder C, Glushka J, Yuan C, Hao Z, Zhu X, Avci U, Miller JS, Baldwin D, Pham C, Orlando R, Darvill A, Hahn MG, Kieliszewski MJ, Mohnen D (2013) An Arabidopsis cell wall proteoglycan consists of pectin and arabinoxylan covalently linked to an arabinogalactan protein. Plant Cell 25: 270-287 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Tan L, Leykam JF, Kieliszewski MJ (2003) Glycosylation motifs that direct arabinogalactan addition to arabinogalactan-proteins. Plant Physiology 132: 1362-1369 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Tan L, Showalter AM, Egelund J, Hernandez-Sanchez A, Doblin MS, Bacic A (2012) Arabinogalactanproteinsandtheresearchchallengesfortheseenigmaticplantcellsurfaceproteoglycans. Front. Plant Sc 3: 1-10 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Tapken W, Murphy AS (2015) Membrane nanodomains in plants: capturing form, function, and movement. J Exp Bot 66: 1573-1586 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Tierney ML, Varner JE (1987) The extensins. Plant Physiol 84: 1-2 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Ulvskov P, Paiva DS, Domozych D, Harholt J (2013) Classification, naming and evolutionary history of gycosyltransferases from sequenced green and red algal genomes. Plos One 8 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones DT, Kim PM, Kriwacki RW, Oldfield CJ, Pappu RV, Tompa P, Uversky VN, Wright PE, Babu MM (2014) Classification of intrinsically disordered regions and proteins. Chemical Reviews 114: 6589-6631 Pubmed: Author and Title CrossRef: Author and Title Downloaded Google Scholar: Author Only Title Only Author and Title from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. van Holst G-J, Clarke AE (1985) Quantification of arabinogalactan-protein in plant extracts by single radial gel diffusion. Anal. Biochem. 148: 446-450 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Velasquez M, Salgado Salter J, Gloazzo Dorosz J, Petersen BL, Estevez JM (2012) Recent advances on the posttranslational modifications of EXTs and their roles in plant cell walls. Frontiers in Plant Science 3 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Velasquez SM, Marzol E, Borassi C, Pol-Fachin L, Ricardi MM, Mangano S, Juarez SP, Salter JD, Dorosz JG, Marcus SE, Knox JP, Dinneny JR, Iusem ND, Verli H, Estevez JM (2015) Low Sugar Is Not Always Good: Impact of Specific O-Glycan Defects on Tip Growth in Arabidopsis. Plant Physiol 168: 808-813 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Velasquez SM, Ricardi MM, Dorosz JG, Fernandez PV, Nadra AD, Pol-Fachin L, Egelund J, Gille S, Harholt J, Ciancia M, Verli H, Pauly M, Bacic A, Olsen CE, Ulvskov P, Petersen BL, Somerville C, Iusem ND, Estevez JM (2011) O-glycosylated cell wall proteins are essential in root hair growth. Science 332: 1401-1403 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Voxeur A, Hofte H (2016) Cell wall integrity signaling in plants: "To grow or not to grow that's the question". Glycobiology 26: 950960 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Wickett NJ, Mirarab S, Nam N, Warnow T, Carpenter E, Matasci N, Ayyampalayam S, Barker MS, Burleigh JG, Gitzendanner MA, Ruhfel BR, Wafula E, Der JP, Graham SW, Mathews S, Melkonian M, Soltis DE, Soltis PS, Miles NW, Rothfels CJ, Pokorny L, Shaw AJ, DeGironimo L, Stevenson DW, Surek B, Villarreal JC, Roure B, Philippe H, dePamphilis CW, Chen T, Deyholos MK, Baucom RS, Kutchan TM, Augustin MM, Wang J, Zhang Y, Tian Z, Yan Z, Wu X, Sun X, Wong GK-S, Leebens-Mack J (2014) Phylotranscriptomic analysis of the origin and early diversification of land plants. Proceedings of the National Academy of Sciences of the United States of America 111: E4859-E4868 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Woessner JP, Goodenough UW (1989) Molecular characterization of a zygote wall protein: An extensin-like molecule in Chlamydomonas reinhardtii. Plant Cell 1: 901-911 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Woessner JP, Molendijk AJ, Egmond Pv, Klis FM, Goodenough UW, Haring MA (1994) Domain conservation in several volvocalean cell wall proteins. Plant Mol. Biol. 26: 947-960 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Xie Y, Wu G, Tang J, Luo R, Patterson J, Liu S, Huang W, He G, Gu S, Li S, Zhou X, Lam T-W, Li Y, Xu X, Wong GK-S, Wang J (2014) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30: 1660-1666 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Xu WL, Zhang DJ, Wu YF, Qin LX, Huang GQ, Li J, Li L, Li XB (2013) Cotton PRP5 gene encoding a proline-rich protein is involved in fiber development. Plant Mol Biol 82: 353-365 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Yang Y, Smith SA (2013) Optimizing de novo assembly of short-read RNA-seq data for phylogenomics. BMC Genomics 14: 328 Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Downloaded from on June 17, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
© Copyright 2026 Paperzz