Southern Cross University ePublications@SCU Theses 2013 Characterisation of starch traits and genes in Australian rice germplasm Ardashir Kharabian Masouleh Southern Cross University Publication details Kharabian Masouleh, A 2013, 'Characterisation of starch traits and genes in Australian rice germplasm', PhD thesis, Southern Cross University, Lismore, NSW. Copyright A Kharabian Masouleh 2013 ePublications@SCU is an electronic repository administered by Southern Cross University Library. Its goal is to capture and preserve the intellectual output of Southern Cross University authors and researchers, and to increase visibility and impact through open access to researchers around the world. For further information please contact [email protected]. Characterisation of starch traits and genes in Australian rice germplasm Ardashir Kharabian Masouleh (B.Sc, M.Sc) A thesis submitted to Southern Cross University in fulfillment of the requirements for the degree of Doctor of Philosophy Southern Cross Plant Science Southern Cross University Lismore, NSW Australia March 2013 i Statement of originality I certify that the work presented in this thesis is, to the best of my knowledge and belief, original, except as acknowledged in the text, and that the material has not been submitted, either in whole or in part, for a degree at this or any other university. I acknowledge that I have read and understood the Universities rules, requirements, procedures and policy relating to my higher degree research award and to my thesis. I certify that I have complied with the rules, requirements, procedures and policy of the University. Ardashir Kharabian Masouleh March 2013 ii Acknowledgements First, my great gratitude to my principal supervisors, Robert J Henry, Daniel LE Waters and Russell F. Reinke for allowing me to undertake this project at the Southern Cross Plant Science. I would also like to thank them for their direction and endless support during this PhD project. Next I would like to thank my other supervisors in the centre, Graham King and Michael Heinrich, for their help, thoughts and valuable suggestions throughout my candidature. Thanks to the many people who have been of great help in the lab, Stirling Bowen, Peter Bundock, Timothy Sexton and everyone else who helped me out learning various techniques. Thanks to all in the post grad room and beyond, especially Tiffeny Byrnes and Cathy Nock who have been great support as Lab manager and administration. And last but not least thanks to my family, especially my wife Shiva for the endless support during these four and a half years. I could not have this big commitment done without my family support. iii Abstract Starch is a major component of human diets. The physio-chemical properties of starch influence the nutritional value of starch and the functional properties of starch containing foods. Many of these traits have been under strong selection in domestication of rice as a food. A population of 233 breeding lines of rice was analysed for variation in 17 rice starch synthesis genes, encoding seven classes of enzymes, including ADP-glucose pyrophosphorylase (AGPases), granule starch synthases (GBSS), soluble starch synthase (SS), starch branching enzyme (BE), starch debranching enzyme (DBE) and starch phosphorylase (SPHOL) and phosphate translocator (GPT1). This approach employed semito long-range PCR (LR-PCR) followed by next-generation sequencing technology. The amplification products were equimolarly pooled and sequenced using massively parallel sequencing technology (MPS). SNP/Indels in both coding and non-coding were identified and the distribution patterns among individual starch candidate genes characterized. Approximately, 60.9 million reads were generated, of which 54.8 million (90%) mapped to the reference sequences. The coverage rate ranged from 12,708× to 38,300× for SSIIa and SSIIIb, respectively. SNPs and single/multiple-base Indels were analysed in a total assembled length of 116,403 bp. In total, 501 SNPs, of which 110 were non-synonomous/ fuctional, and 113 Indels were detected across the 17 starch related loci. Five genes AGPL2a, Isoamylase1, SPHOL, SSIIb an SSIVb showed no polymorphism. The ratio of synonymous to nonsynonymous SNPs (Ka/Ks) test suggested GBSSI and Isoamylase 1 (ISA1) are the least diversified (most purified) and conservative genes as the studied populations have been through several cycles of selection for low amylose content and gelatinization temperature. The 110 functional SNP loci were analysed for associations with rice pasting and cooking quality. Associations of 65 functional SNPs with starch traits were detected. The GBSSI (waxy gene) and SSIIa had a major influence on starch properties and the other genes had iv minor associations. The ´G/T´ SNP at the boundary site of exon/intron1 in GBSSI showed the strongest association with retrogradation and amylose content. The TT allele has been selected in much of the domesticated japonica genepool providing rice with a desirable texture but less resistant starch with associated human health advantages. The GC/TT SNP at exon 8 of SSIIa showed a very significant association with pasting temperature (PT), gelatinization temperature (GT) and peak time. No significant association was found between SSIIa and retrogradation. Other genes contributing to retrogradation were SSI, BEI and SIIIa. The highest level of polymorphism was observed in SSIIIa with 22 SNPs but only limited associations were observed with starch phenotypic values. None of the SNP were found to be strongly associated with chalkiness except for a weak link with a ´T/C´ SNP at position 960 (Thr482 to Ala) in Isoamylase2. These associations provide new tools for deliberate selection of rice genotypes for specific functional and nutritional outcomes. Resistant-retrograded starch is widely associated with human health. The highly retrograded starches of cereals usually have a lower glycemic index (GI) which may be beneficial in many human diets. The data reported here suggests 6 glucose-phosphate translocator (GPT1) an enzyme early in the biochemical pathway of starch synthesis, has a major influence on resistant starch production in rice. A ´T/C´ SNP at position 1188 of the GPT1 encoding gene, alters Leu24 to Phe, and is highly associated with resistant-retrograded starch and amylose content. The ´T´ and ´C´ alleles produce high and low levels of retrograded starch, respectively. An association study of 233 genotypes demonstrated a highly significant correlation (R2) of 0.57 and 0.36 (P=0.00099) between this SNP and retrogradation degree and apparent amylose content, respectively. Haplotype and association analysis of this SNP and another ´G/T´ SNP at the boundary site of exon/intron1 in GBSSI encoding gene explains most of the variability of retrogradation degree and amylose content in the rice population. v These two SNP contribute to produce higher levels of resistant-retrograded starch, when ´T´ SNP in GPT1 and ´G´ in GBSSI are present. This ´T:G´ haplotype can provide a new tool for deliberate selection of rice genotypes for specific functional and nutritional outcomes such as resistant-retrograded starch and high amylose content non-sticky rices. Granule Bound Starch Synthase I (GBSSI) influences the grain quality of all cereals and, particularly, rice. Using GBSSI as a model plant gene, a number of different computational algorithm tools and programs were used to explore the functional SNPs of this important rice gene and the possible relationships between genetic mutation and phenotypic variation. A total of 51 SNPs/indels were retrieved from databases, including three important coding nonsynonymous SNPs, namely those in exons 6, 9 and 10. Sorting Intolerant from Tolerant (SIFT) results showed that a candidate [C/A] SNP (ID: OryzaSNP2) in exon 6 (coordinate 2494) is the most important non-synonymous SNP with the highest phenotypic impact on GBSSI. This SNP alters a tyrosine to serine at position 224 of the waxy protein. Computational simulation of GBSSI protein with the Geno3D suggested this mutant SNP creates a bigger loop on the surface of GBSSI and results in a shape different from that of native GBSSI. Here, we suggest a potential transcriptional binding factor site (TBF8) which has one [C/T] SNP [rs53176842] at coordinate 2777 in boundary site of intron 7/exon 8, according to Transcriptional Factor (TF) Search analysis. This SNP might potentially have a major effect on regulation and function of GBSSI. The application of single nucleotide polymorphisms (SNPs) in plant breeding involves the analysis of a large number of samples, and therefore requires rapid, inexpensive and highly automated multiplex methods to genotype the sequence variants. A high-throughput multiplexed SNP assay for eight polymorphisms which explain two agronomic and three vi grain quality traits in rice was optimised. Gene fragments coding for the agronomic traits plant height (semi-dwarf, sd-1) and blast disease resistance (Pi-ta) and the quality traits amylose content (waxy), gelatinization temperature (alk) and fragrance (fgr) were amplified in a multiplex polymerase chain reaction. A single base extension reaction carried out at the polymorphism responsible for each of these phenotypes within these genes generated extension products which were quantified by a matrix-assisted laser desorption ionizationtime of flight system. The assay detects both SNPs and indels and is co-dominant, simultaneously detecting both homozygous and heterozygous samples in a multiplex system. This assay analyses eight functional polymorphisms in one 5 μL reaction, demonstrating the high-throughput and cost-effective capability of this system. At this conservative level of multiplexing, 3072 assays can be performed in a single 384-well microtitre plate, allowing the rapid production of valuable information for selection in rice breeding. vii Table of Contents Title page i Statement of originality ii Acknowledgements iii Abstract iv Table of contents viii List of abbreviations xv Publications arising from thesis xvi Chapter 1 Allele mining and characterization of starch genes in rice: from SNPs to phenotype Starch structure 1 Starch synthesis 1 Starch synthesis enzymes and genes 2 Ka/Ks ratio ("purifying" vs "diversifying" genes) 3 Definition of purifying and diversifying genes 4 ADP-glucose pyrophosphorylase (AGPase), Starch phosphorylase (PHO) and Glucose phosphate translocator (GPT) gene families 5 AGPS2b (small subunit) 5 SPHOL (alpha 1,4 glucan starch phospholrylase) 5 GPT1 (Glucose-6-phosphate translocator) 6 Pathway to amylose 6 Granule bound starch synthesis (GBSS) 6 Pathway to amylopectin 8 Starch Synthase (SS) genes 8 SSI 10 SSII 11 SSIIa 11 SSIIb 12 SSIIIa 13 SSIIIb 13 SSIVa 14 Starch Branching enzymes (SBEs) 14 viii BEI 14 BEIIa 15 BEIIb 15 Debranching Enzymes (DBEs) 16 ISA1 (Iso 1) 16 ISA2 (Iso2) 17 Pullulanase (PUL) 17 Proteins 18 Lipids 19 Environmental factors: Nitrogen (N), Phosphorous (P) and Potassium (K) 19 Thermal stress 20 CO2 21 Objectives of thesis 21 Key concepts 21 Major activities reported in the thesis 22 Chapter 2 Discovery of polymorphisms in starch related genes in rice germplasm by amplification of pooled DNA and deeply parallel sequencing Summary 24 Introduction 25 Materials and methods 27 Plant materials 27 Variability of genotypes 27 Sample preparation and DNA extraction 27 Designation of starch-metabolizing enzymes/genes involved in starch synthesis 28 Target genes for sequence analysis 28 Designing primers to capture target genes 28 Long range PCR protocol (LR-PCR) 29 DNA equimolar pooling 29 Massively parallel sequencing 30 SNP detection and data analysis 30 Total polymorphism rate and functional SNPs 31 Results 31 ix Number of reads and average coverage 31 Polymorphism discovery and SNP/Indel detection 32 SNP variation across the starch related candidate loci 33 ADP-glucose pyrophosphorylase (AGPase), Starch phosphorylase (PHO) and Glucose phosphate translocator (GPT) gene families 34 AGPS2b (small subunit) 34 SPHOL (alpha 1,4 glucan starch phospholrylase) 35 GPT1 (Glucose-6-phosphate translocator) 35 Granule bound starch synthase (GBSS) gene family 37 GBSSI (Granule bound starch synthase I) 37 GBSSII (Granule bound starch synthase II) 38 Starch synthase (SS) family 38 SSI 38 SSIIa 40 SSIIb 41 SSIIIa 41 SSIIIb 42 SSIVa 42 Starch Branching enzymes (SBEs) 43 BEI 43 BEIIa 44 BEIIb 44 Debranching Enzymes (DBEs) 45 ISA1 (Iso 1) 45 ISA2 (Iso2) 46 Pullulanase (PUL) 46 Distribution of SNPs across the loci 47 Ka/Ks ratio ("purifying" vs "diversifying" genes) 47 Discussion 48 Chapter 3 Bioinformatic tools assist screening of functional SNPs in plants: GBSSI in rice as a model gene Summary 52 Introduction 52 x Materials and methods 54 GBSSI gene as a case study 54 Sequence alignment 54 SNP dataset 55 Computational tools for SNP analysis 55 3D Modelling of GBSSI and comparative study 56 Functional flow chart 58 Results 60 SNPs in GBSSI gene and comparative study 60 Computational algorithm tools 60 UTR Scan 60 TF Search 60 SIFT (Sorting Tolerant from Intolerant) 62 GeneSplicer 63 SEE ESE (Sequence Evaluator of Exonic Splicing Enhancers) 64 FAS-ESS (Systematic identification and analysis of exonic splicing silencers) 65 Simulation for finding functional, constructive changes of ns-coding SNPs 66 Discussion 68 Conclusion 71 Chapter 4 SNP in starch biosynthesis genes associated with the nutritional and functional properties of domesticated rice Summary 73 Introduction 74 Materials and methods 77 Plant materials 77 Physiochemical properties 77 Designation of starch-synthesis genes involved in starch metabolize 78 Candidate genes/enzymes for SNP genotyping 78 SNP dataset 78 Primer design and SNP genotyping 79 Capture PCR protocol, primer extension and mass spectrometry 79 Association analysis 79 xi Statistical parameters 80 Results 80 AGPS2b (small subunit) 80 SPHOL (alpha 1,4 glucan starch phospholrylase) 81 GBSSI (Granule bound starch synthase I) 81 GBSSII (Granule bound starch synthase II) 82 SSI 82 SSIIa 82 SSIIb 83 SSIIIa 83 SSIIIb 84 SSIVa 84 SSIVb 85 BEI 85 BEIIb 85 Debranching Enzymes (DBEs) 86 ISA1 (Isoamylase 1) 86 ISA2 (Isoamylase 2) 86 Pullulanase 86 Discussion 86 Neutral genes with no polymorphism or association 87 Major genes with highly significant associations 88 Contributory genes with low-medium associations 89 Minor genes with very low associations 89 Chapter 5 A SNP in GPT1 is closely associated with nutritionally important resistant-retrograded starch in rice Summary 91 Introduction 92 Materials and methods 95 Plant materials 95 Physiochemical properties 95 Designation of starch-synthesis genes involved in starch metabolism 95 xii Discovery of novel SNP in GPT1 and SNP genotyping in population 96 Association analysis 96 Results 96 GPT1 (Glucose-6-phosphate translocator) 97 GBSSI (Granule bound starch synthase I) 100 Allelic combination of SNPs in GPT1 and GBSSI 100 Discussion 101 Chapter 6 SNPs and marker assisted selection (MAS) in plant breeding. A high-throughput assay for rapid and simultaneous analysis of perfect markers for important quality and agronomic traits in rice using multiplexed MALDI-TOF Mass Spectrometry Summary 104 Introduction 105 Materials and methods 106 Genotypes 106 DNA extraction 106 Primer design/generation of SNP markers 107 Capture PCR protocol 107 Shrimp alkaline phosphatase (SAP) incubation 108 Primer extension and mass spectrometry 110 Results 110 Analysis of PCR products 110 Optimal capture primer concentration 110 MgCl2 concentration 111 Identification of SNPs and polymorphisms in agronomic and quality loci 112 sd-1 112 Pi-ta 115 waxy 115 alk 116 fgr 117 Missing data and heterozygosity 118 Discussion 118 xiii CHAPTER 7 General discussion - Characterisation of starch traits and genes in Australian rice germplasm Background principles 122 Search in SNP data bases and discovery of polymorphisms 122 Screening of functional SNPs 124 Gene copy number in the rice genome 125 Multiplexed MALDI-TOF Mass Spectrometry markers help to genotype individuals in a cost effective manner 125 Association between SNPs in starch biosynthesis genes and the nutritional and functional properties of domesticated rice 126 The 6-glucose-phosphate translocator (GPT1) may contribute to resistant starch 128 Conclusion and further directions 129 References 132 Appendices 150 xiv List of abbreviations AC Amylose content BDV Breakdown viscosity CHK Chalkiness FV Final Viscosity GPT1 Glucose 6 -Phosphate Translocator gene GT Gelatinisation temperature MT Martin Test MPS Massively parallel sequencing NGS Next generation sequencing Ns non-synonymous PT Pasting temperature PaT Paste temperature PeT Peak time P1 Peak PKT Peak time PKV Peak viscosity PN Predicted N TF Transcriptional factors TFBS Transcriptional factor binding site SB Set back T1 Through UTR Untranslated region xv Publications arising from thesis Publications arising from thesis 1) Masouleh AK, Waters DLE, Reinke RF, Henry RJ (2009) A high-throughput assay for rapid and simultaneous analysis of perfect markers for important quality and agronomic traits in rice using multiplexed MALDI-TOF mass spectrometry. Plant Biotechnology Journal. 7:355–363 2) Kharabian, A (2010) An efficient computational method for screening functional SNPs in plants. Journal of Theoretical Biology 265(1):55-62 3) Kharabian-Masouleh A, Waters DLE, Reinke RF, Henry RJ (2011) Discovery of polymorphisms in starch-related genes in rice germplasm by amplification of pooled DNA and deeply parallel sequencing. Plant Biotechnology Journal. 9:1074-1085. 4) Kharabian-Masouleh A, Waters DLE, Reinke RF, Ward R, Henry RJ (2012) SNP in starch biosynthesis genes associated with nutritional and functional properties of rice. Scientific Reports. 2:557; DOI:10.1038/srep00557. xvi xvii CHAPTER 1 Allele mining and characterization of starch genes in rice: From SNPs to phenotype Starch constitutes most of the dry matter in the harvested organs of crop plants and is one of the most important human foods. Starch is an end product of photosynthesis that is mainly stored in the form of granules in the endosperm of grains and specialized organelles such as chloroplasts and amyloplasts. Numerous studies have been undertaken to elucidate starch biosynthesis and its genetic control and to discover the relationship between its structure, physical properties and the influence of environment on starch properties. Although, a number of comprehensive research and review articles have been published on starch chemistry and pathways of synthesis, there is still much that is not known which means it is not possible to modify starch components or quality in a predictable way. Starch structure Starch, a complex carbohydrate, is a polymer of glucose molecules. It occurs as two main forms: amylose, consisting of predominantly linear chains of glucose monomers linked by α1-4 glycosidic bonds, and amylopectin, in which the chains are branched by the addition of α1-6 glycosidic bonds. Depending upon species and the site of storage, amylose generally constitutes approximately 10 to 35% of the starch found in plants and the remainder is amylopectin. Starch synthesis The biochemistry of starch synthesis is relatively well understood although it is a complex process (Buléon et al., 1998; Libessart et al., 1995). Many enzymes are involved in starch 1 synthesis and several isoforms of these enzymes exist, leading to a highly complex biosynthetic process. The starting point of starch synthesis is glucose which is derived from photosynthesis in the green parts of plants. This glucose is transported to and deposited in storage tissue including grain endosperm and tuberous roots. In the amyloplast, glucose is activated by the addition of ADP by ADP-glucose pyrophosphorylase (AGPase) (James et al., 2003). The ADP-glucose is then used by starch synthases which add glucose units to the growing polymer chain to build the starch molecules (Buléon et al., 1998). Starch synthesis enzymes and genes A significant number of enzyme isoforms and activities contribute to starch synthesis and therefore many genes are involved in the process. A simplified pathway diagram of starch bio-synthesis and the enzyme and genes involved is shown in Figure 1. If we consider ADPglucose as the main substrate then there are two different pathways which lead to starch, one toward amylose and the other to amylopectin. In each of these biochemical pathways, different enzymes and genes play a role. These enzymes and genes work in a complex process and each one makes a partial contribution to the starch end product and its quality (Tester et al., 2004). Some starch genes, such as SSIIa, are mainly expressed in the endosperm and others only in leaves while others are expressed in both green and storage tissues. Genes belonging to “non- 2 Figure 1. A schematic diagram showing the biochemical pathways of cereal starch production. endosperm type” are often expressed together with one “both tissue type”. For example GBSSII, SSIIB, SSIIIb are leaf expressed and are co-ordinately expressed with SSI which is expressed in both tissue types (Hirose and Terao, 2004). For this reason when investigating the association of starch genes with grain quality it is necessary to focus on all genes with a possible phenotypic effect. Mutations in genes which operate early in the starch bio-synthesis pathway (Fig 1) are likely to influence starch quality or quantity. Ka/Ks ratio ("purifying" vs "diversifying" genes) The ratio of non-synonymous (Ka) to synonymous (Ks) SNP can reveal whether a gene has been under purifying, neutral or diversifying selection. The Ka/Ks ratio has been created to classify candidate genes into two main categories of “purifying” and “diversifying” genes. 3 Under neutral conditions of evolution, at the amino acid level, Ka should equal Ks and hence the ratio Ka/Ks = 1. Any deviation from this score shows the selection pressure on genetic structure of population or candidate genes. The Ka/Ks ratio < 1 indicates negative (purifying) selection and positive (diversifying) selection is Ka/Ks>1 (Roth and Liberles, 2006). SNPs in the genes studied in this thesis were retrieved from The International Rice Functional Genomics Consortium (IRFGC) database (http://oryzasnp.plantbiology.msu.edu/). This database holds records of the sequence analysis, including SNPs, of 20 diverse rice (Oryza sativa L.) cultivars. The different Ka/Ks ratios were calculated for candidate genes, ranging from 0.11 to 2.40 for SSI and SSIIIa, respectively (Table 1). These results indicate that genes such as SSIIIa are under diversifying selection whereas others such as SSI are under purifying selection. Definition of purifying and diversifying genes These terms extend the concept of evolution, in which genes, or more accurately allele frequency, are diversified (diversifying) or purified (purifying) under natural or artificial selection pressure. In natural selection, purifying selection equals negative selection where deleterious alleles (SNP) (point mutations) are gradually removed from the population which tends to stabilise the population (selection). In contrast, diversifying (or disruptive) selection is where allele frequencies change and extreme trait values are favoured over intermediate values. This normally follows positive natural or artificial selection. 4 ADP-glucose pyrophosphorylase (AGPase), Starch phosphorylase (PHO) and Glucose phosphate translocator (GPT) gene families These enzymes/genes reside at the top of the starch bio-synthetic pathway and are the starting point of grain starch production. Glucose is first activated by the addition of ADP by AGPase which then becomes the substrate for the other major starch enzymes. There are several gene/isozymes in this classification but AGPS2b has the highest expression level in rice endosperm (Hirose et al., 2006). AGPS2b (small subunit) The role of this subunit in starch granule synthesis has been identified by way of its association with rice shrunken mutants (Kawagoe et al., 2005). A dramatic inhibition of starch synthesis has been observed in AGPase-deficient rice mutants and some other species and results in increased soluble sugars, a large number of underdeveloped granules, small grains and pleomorphic amyloplasts (Rolletschek et al., 2002). In total, 12 SNPs have been retrieved for 20 fully sequenced cultivars in OryzaSNP@MSU database. The polymorphism rate detected for AGPS2b in OryzaSNP database is relatively high at 1.876 and the Ka/Ks ratio (0.25) indicates that this gene has been under negative or purifying human selection. SPHOL (alpha 1,4 glucan starch phospholrylase) This gene is generally considered to be involved in starch degradation but recent studies suggest some important roles in starch biosynthesis. Although its precise mechanism and influence is still not well known, the mechanism appears to be associated with phosphorylation of some starch-related enzymes and proteins such as starch branching enzymes (SBEs) and starch synthase 5 (SSIIa) (Tetlow et al., 2004). In total, 11 SNPs are known in this gene, including two nonsynonymous and four synonymous. The SNP rate is 1.46 and gene has been under negative selection (Table 1). GPT1 (Glucose-6-phosphate translocator) GPT1 is strongly expressed in the endosperm. This gene is believed to be responsible for the import of essential carbon substrates such as Glc6P into the plastids during the grain development (Fischer and Weber, 2002; Jiang et al., 2003). Three SNPs, all in introns, and a Ka/Ks ratio suggests this gene has not been under any selection pressure by humans. Pathway to amylose This is the shortest and simplest pathway of starch synthesis and the most well recognised. ADP-glucose is converted to amylose by the contribution of one major enzyme, granule bound starch synthesis I (GBSS I). Granule bound starch synthesis (GBSSI) GBSSI coded by the Waxy gene is the most well characterised starch biosynthesis enzyme in plants and has very significant effect on starch composition and quality. The α1-4 glycosidic bonds of amylose are synthesised by GBSSI. In rice, high activity of GBSSI produces high amylose content leading to a non-waxy, non-sticky or non-glutinous phenotype. On the contrary, if GBSSI gene is partially active or inactive, the waxy (sticky), glutinous appearance will be produced. In maize the waxy phenotype contains no amylose due to a defect in GBSS encoding gene (Kiesselbach, 1944) while potato and cassava amylose free cultivars have been generated 6 by GBSS suppression (Raemakers et al., 2005; Visser et al., 1991; Hovenkamp-Hermelink et al, 1987; Kuipers et al., 1994). Several wx mutants and isoforms of GBSS have been reported in barley waxy cultivars which synthesize small amounts of endosperm amylose (Ishikawa et al., 1995; Patron et al, 2002). There are two isofoms of GBSS, GBSSI and GBSSII. These isoforms are homologous and have approximately 66-69% amino acid sequence identity but their encoding genes are situated at different loci. The gene encoding GBSSI is predominantly expressed in endosperm whereas GBSSII is expressed in leaves and other non-storage tissues (Vrinten and Nakamura, 2000). Therefore, GBSSI is the most important enzyme responsible for endosperm amylose content. GBSSI has been widely studied in different plant species (Nakamura et al., 1998; Nakamura, 2002; Domon et al., 2002; Saito et al, 2004; Shapter et al., 2009). In rice, a significant association between RVA pasting properties and the waxy gene sequence has been found. Three SNP sites in the waxy gene in exon1/intron1 boundary site, exon 6, and exon 10 were determined to be responsible for different apparent amylose content and pasting properties (Larkin and Park, 2003; Chen et al., 2008). Chen et al. (2008) identified four SNPhaplotypes/ alleles that explained the high variability of RVA pasting properties in international rice germplasm GBSSII in rice is exclusively bound to the starch granules of leaves and has an important function in amylose synthesis in the pericarp of the mature ovary (Nakamura et al, 1998). Starch produced by GBSSII may be stored temporarily in the pericarp and later converted to sugar and transferred to the endosperm as a substrate for starch synthesis during endosperm development (Sato, 1984). The possible existence of SNPs/indels in GBSSII and their impacts on starch properties and grain quality is still unknown. 7 GBSSI has widely been selecting by breeders in the past two decades and thus is under purifying or almost has been purified. In contrast, it seems GBSSII is one of the most conservative of all starch genes as only one SNP has been detected suggesting this gene has not undergone any artificial selection pressure (Novaes et al., 2008). Pathway to amylopectin This is the second branch of the starch synthesis pathway (Fig 1), the end product of which is amylopection. This pathway is more complex with many genes/enzymes and their isoforms being involved in the process. Although amylopectin is the most abundant constituent of grain starch, the role of different genes on starch composition in this pathway is relatively unknown, perhaps because of complexity of the pathway. Starch Synthase (SS) genes The Starch Synthases (SS) or Soluble Starch Synthases (SSS) exists in all plants in multiple isoforms and are responsible for the construction of α1-4 glycosidic bonds in amylopectin. There are five genes encoding five different SS isoforms in the rice genome (SSI, SSII, SSIII, SSIV and SSVI). All classes of SS are expressed in the endosperm of plants (Li et al., 1999a; Li et al., 1999b; Li et al., 2000) and probably in all starch synthesising cells (Smith, 1999). There is good evidence amylopectin chains are synthesized by the coordinated actions of SSI, SSIIa, and SSIIIa isoforms. 8 Table 1. Polymorphism in rice genes responsible for starch synthesis and their status during domestication. No Chro# Nucleotide length (bp) Total number of SNPs AGPS2b Locus No (Rice genome annotation Project) LOC_Os08g25734 Number of Synonymous SNPs 3 Functional SNP rate n.s/total 0.00 Polymorphism rate (SNP/Kb) Ka/Ks ratio* Selection type 12 Number of n.s SNPs 0 1.876 0.25 Negative Gene status during domestication Purifying 8 6394 SPHOL LOC_Os03g55090 3 7489 11 2 4 0.181 1.469 0.5 Negative Purifying GPT1 LOC_Os08g08840 8 4073 3 0 0 0.00 0.736 1.00 Neutral Intact GBSSI LOC_Os06g04200 6 5035 8 1 2 0.125 1..588 0.5 Negative Purifying GBSSII LOC_Os07g22930 7 8049 1 0 0 0.00 0.124 1.00 Neutral Intact SSI LOC_Os06g06560 6 7750 67 2 6 0.014 8.645 0.33 Negative Purifying SSIIa LOC_Os06g12450 6 4981 12 2 0 0.166 2.409 3.00 Positive Diversifying SSIIb LOC_Os02g51070 2 5323 16 6 4 0.375 3.00 1.5 Positive Diversifying SSIIIa LOC_Os08g09230 8 11263 52 19 11 1.686 4.61 1.72 Positive Diversifying SSIIIb LOC_Os04g53310 4 8624 24 8 5 0.927 2.782 1.6 Positive Diversifying SSIVa LOC_Os01g52250 1 10480 25 5 2 0.477 2.385 2.5 Positive Diversifying BEI LOC_Os06g51084 6 7258 14 2 1 0.275 1.928 2 Positive Diversifying BEIIa LOC_Os04g33460 4 2265 0 0 0 0 0 1.00 Neutral Conservative BEIIb LOC_Os02g32660 2 10900 5 0 0 0 0.458 1.00 Neutral Conservative ISA1 LOC_Os08g40930 8 6592 16 2 3 0.303 2.427 0.666 Negative Purifying ISA2 LOC_Os05g32710 5 2403 10 6 4 2.496 4.16 1.5 Positive Diversifying PUL LOC_Os04g08270 4 10399 108 10 9 0.961 10.385 1.11 Positive Diversifying Sequencing data and polymorphism of SNPs derived from OryzaSNP@MSU database for 20 cultivated rices (http://oryzasnp.plantbiology.msu.edu/). *To avoid value of zero for Ka/Ks ratio +1 will be added when number of non-synonymous and synomymous SNP are zero. 9 SSI SSI is primarily responsible for the synthesis of the shortest chains of amylopectin of about 10 glucosyl units or less (DP 7-11). This gene/protein is presumed to be expressed in the endosperm and leaf of rice (Fujita et al., 2006). The SSI gene is located on chromosome 7S of wheat and encodes a Mr 75000 protein that is distributed between starch granule and the soluble phase (Li et al, 1999). Studies on chain-length specificities of maize SSI affinities have revealed that the entire carboxy-terminal region of this protein is necessarily required for starch binding (Commuri and Keeling, 2001). RT-PCR analysis shows that there is only one SSI isoform in rice which has steady expression (Hirose and Terao, 2004). The transcript level of SSI is higher in endosperm than leaf sheaths and blades and has therefore been classified as an endosperm and non-endosperm expressing gene (Hirose et al., 2006). The measurement of SSI transcript levels at different seed developmental stages found high expression at 1-3 days after flowering (DAF), peaking at 5 DAF, and remaining almost constant during endosperm starch synthesis, suggesting SSI is the major SS form in cereals (Cao et al., 1999). A comprehensive analysis of mutant rice with a retrotransposon inserted into the SSI encoding gene revealed SSI has a capacity for the synthesis of chains with DP8-12 with the extension of smaller chains (Nakamura, 2002). Fujita et al. (2006) generated four SSIdeficient rice mutant lines using retrotransposon Tos17 insertion. The deficient mutants exhibited a 0%-20% decrease in the amount of SSI protein in comparison to wild type, changed amylopectin structure and increased the gelatinization temperature of endosperm starch, although the complete absence of 10 SSI had no effect on the size and shape of seeds and starch granules and the crystallinity of endosperm starch (Fujita et al., 2006). This gene has a very small phenotypic effect on rice eating quality although a significant negative correlation between the ratio of short chains (DP 6-12) and gelatinization temperature has been reported (Umemoto et al., 2008). Although 46 SNPs including one nonsynonymous and four synonymous have been detected in rice, none has yet been reported in the SSI gene of any plant species associated with starch composition, amylopectin quality or quantity. This gene is highly polymorphic with 8.64 SNPs/Kb and is undergoing purifying selection. SSII SSII is responsible for the synthesis of shortest chains and further extensions to produce longer chains are catalysed by SSIIa and/or SSIII (Commuri and Keeling, 2001). Previous studies show that there are three isoforms for SSII in monocots: SSIIa, SSIIb and SSIIc. The role of the latter two in starch biosynthesis, especially SSIIc which only expressed in source tissue, is unknown as no mutants have been found yet (Tetlow et al., 2004). SSIIa SSIIa is known to have a major affect on starch quality. This gene is predominantly expressed in cereal endosperm at very high levels and affects amylopectin structure (Craig et al., 1998; Morell et al., 2003). Loss of SSIIa results in reduced starch content, amylopectin chain length, modification in granule morphology and crystallinity. In monocots, SSIIa elongates the short glucan chains DP≤10 to the intermediate size of DP 12-24, thus its loss or down regulation has a dramatic impact on amount and composition of starch (Tetlow et al., 2004). The effect of this gene on rice cooking quality and rice starch texture has clearly been demonstrated by virtue of a significant correlation between gelatinisation temperature (GT) 11 and particular SSIIa alleles (Umemoto et al., 2002; Umemoto et al., 2004). Alk, a major gene regulating alkali disintegration resides on the same position as SSIIa on chromosome 6 of rice (Gao et al., 2003). Further studies have shown the GT of rice flour, chain length distribution of amylopectin and alkali spreading score are associated with different SSIIa haplotypes (Umemoto and Aoki, 2005). GT, alkali disintegration and eating quality of rice starch have been explained by polymorphism of two SNPs, [A/G] and [GC/TT], within the exon 8 of alk loci (Waters et al., 2006). These two SNPs were able to explain classification of 70 rice genotypes into either high GT or low GT types which differed in GT by 8 °C (Waters et al., 2006). Polymorphism analysis of this gene found 2.4 SNPs per Kb and indicates is under positive human selection (Table 1). SSIIb SSIIb is a low level early expressed gene which is primarily expressed in leaf blades and sheaths (leaf specific) at an early stage of grain filling (Hirose and Terao, 2004). However, a recent study presented evidence that SSIIb contributes with six other starch genes to alter some Rapid Viscosity Analyser (RVA) parameters in glutinous rice (Yan et al., 2010). The exact role of SSIIb in starch synthesis is currently unknown mainly due to lack of mutant phenotypes. There were 12 SNPs, including six non-synonymous SNPs, indicating this gene has some phenotypic impact due to high number of SNPs in exonic regions. Domestication has exerted positive selection pressure on SSIIb. 12 SSIIIa The SSIIIa encoding gene is highly expressed in endosperm, although some reports reveal expression in green tissues (Dian et al., 2005). A recent study of a SSIIIa deficient rice mutant found amylose content and the extra long chains of amylopectin increased by 1.3- and 12-fold, due to an increase in GBSSI activity (Fujita et al., 2007). In spite of a relatively high functional SNP rate of 1.686, this gene does not show a high significant association with rice physiochemical characteristics. For example, Yan et al. 2010 found no functional effect on RVA parameters, at least among glutinous cultivars. Out of 52 SNPs, 19 and 11 SNPs are non-synonymous and synonymous, respectively; suggesting SSIIIa is under diversifying selection with a Ka/Ks ratio of 1.727. SSIIIb SSIIIb is mainly expressed in rice endosperm but transient expression in leaf sheaths and leaves have also been reported (Hirose et al., 2006). It has also been classified into two different categories on the basis of timing of expression in the developing seed. The late expression category in which it is expressed in the mid to later stage of grain filling (Hirose and Terao, 2004), and the early expression category in which the transcript level increases to maximum level at 3-5 days after flowering (Ohdan et al., 2005). An association study of rice glutinous near-isogenic lines suggested SSIIIb has a significant impact on RVA parameters such as peak time and pasting temperature (Yan et al., 2010). The total number of SNPs reported in OryzaSNP database is 24, of which eight are nonsynonymous and five synonymous, a high Ka/Ks ratio of 1.6. This ratio suggests this gene as a diversifying gene which has been under positive selection during domestication. 13 SSIVa SSIVa is one of the least known starch genes in plants. Like most starch synthase genes, SSIVa is exclusively involved in amylopectin biosynthesis. Expression analysis by reverse transcription PCR indicated SSIVa is preferentially expressed in rice endosperm and to a degree in leaf blades as a late or steady expresser gene during grain filling (Hirose and Terao, 2004). QTL mapping and expression profile analysis have shown that high temperature during the grain filling can considerably increase the transcription level of SSIVa by up to 1.11-fold, which is considerably higher than other starch synthase genes (Yamakawa et al., 2007), and may contribute to grain chalkiness (Yamakawa et al., 2008). In total 25 SNPs has been reported for this gene in OryzaSNP database, of which five are nsSNPs and two synonymous, a Ka/Ks ratio of 2.5, indicating SSIVa is diversifying under human selection. SSIVa may also affect some secondary RVA parameters such as breakdown and setback (Yan et al., 2010). Starch Branching enzymes (SBEs) Starch branching enzymes (SBEs) break α-(1→4)-linkages in existing chains and attach the released reducing ends to C6 hydoxyls, forming the branched glucan, amylopection (Tetlow et al., 2004) . BEI BEI is mainly expressed in the endosperm and transcript levels increase rapidly 3-5 days after flowering. Biochemical observations with purified BEI from maize endosperm indicate BEI preferentially branches amylose-type polyglucans and has a high capacity for branching less branched α-glucans (Takeda et al., 1993). Analysis of the catalytic properties of BEI has indicated the N- and C-termini play a critical role in chain length transfer and substrate 14 preference (Kuriki et al., 1997). A rice BEI deficient mutant induced by mutagenesis exhibited modified amylopectin structure and grain morphology but the same quantity of starch as the wild type (Satoh et al., 2003) and the BEI encoding gene also effects the RVA profile (Yan et al., 2010). The OryzaSNP@MSU database showed 14 SNPs in total, of which only two were nsSNPs and one synonymous. Therefore, the Ka/Ks ratio of two suggests this is a diversifying gene.. BEIIa BEIIa is a leaf expressed gene involved in amylopectin synthesis. BEIIa is also expressed in the endosperm but at levels 10-fold lower than leaf tissue (Gao et al., 1997). An association study including the gene and RVA properties demonstrated a low F value (6.60) with a very slight influence in glutinous rice (Yan et al., 2010). No SNP/Indel has been reported in OryzaSNP database, suggesting BEIIa might be one of the most conservative starch-related genes in rice. BEIIb BEIIb is known as amylose extender (ae) in maize and other cereals (Yun and Matheson, 1993) and many studies have reported the significance of this gene on starch properties in various plant species (Fisher et al., 1993; Sun et al., 1997; Sun et al., 1998). This is a granuleand soluble- associated enzyme which is only expressed in the endosperm. Expression of three different functional maize SBE genes in BE-deficient yeast strains demonstrated the presence of BEIIb is necessary to activate BEI and BEIIa (Seo et al., 2002). Additionally, a 0.5- to 0.7 fold decrease in the expression of BEIIb during grain filling creates chalky rice (Tanaka et al., 2004). 15 Only five SNPs are in the OryzaSNP database, none of which are in the exonic regions, despite of results of a recent association study that has determined very high F value of 11.12 between BEIIb and RVA properties in rice (Yan et al., 2010). Debranching Enzymes (DBEs) DBEs belong to α-amylase family of which two classes exist in plants, Isoamylase and Pullulanase. These enzymes debranch (hydrolase) α-(1-6)-linkages in amylopectin and pullulan. Defective DBEs in plants are thought to be responsible for accumulation of phytoglycogen rather than starch, and in turn, change the phenotypic appearance of the endosperm (Bustos et al., 2004). ISA1 (Iso 1) In wheat, the expression of ISA1 cDNA was highest in developing endosperm and undetectable in mature grains, suggesting a fundamental biosynthetic role of Isoamylase 1 in plant starch, although precise roles of DBEs are not yet known (Tetlow et al., 2004). Transcript level regulation of ISA1 during rice grain filling in response to high temperatures has been reported by Yamakawa et al. (2007), in which the expression level of ISA1 mRNA increased by 0.94 fold under high temperatures, 8 to 30 days after flowering. In rice endosperm, antisense inhibition of Isoamylase 1 altered the structure of amylopectin and the physiochemical properties of starch (Fujita et al., 2003). ISA genes are also thought to contribute to the degree of setback in glutinous rice cultivars (Yan et al., 2010). In OryzaSNP database 16 SNPs were detected, of which two are nsSNPs. The Ka/Ks ratio of 0.666 signifies this gene has undergone negative selection during domestication. 16 ISA2 (Iso2) Isoamylase 2 corresponds to sugary1 (su1) was first reported in maize endosperm and could separated from Isoamylase 1 by anion-exchange chromatography. (Beatty et al. 1999; Doehlert and Knutson, 1991), A high rate of functional SNP and total polymorphisms was observed for this gene (Table 1). The Ka/Ks ratio of 1.5 suggests this gene is under positive selection and is one of the most diversifying genes among starch-related genes. The high polymorphism rate of 4.16 supports this assertion. Association between ISA2 and rice grain quality is unclear. There is no intron in this relatively small gene (2625 bp), thus each detected SNP/Indel can be potentially important. Pullulanase (PUL) In rice endosperm a defect in pullulanase-type DBE activity triggers and modulates some phenotypic effects (Nakamura et al., 1998). In maize endosperm, it is believed that pullulanase has a dual role, contributing either to starch synthesis or degradation (Dinges et al., 2003). Kubo et al. (1999) suggest pullulanase plays a predominant and essential role in amylopectin synthesis and compensates shortages of isoamylase activity in the construction of multiple cluster structure of amylopectin. The highest polymorphism rate was observed for PUL in OryzaSNP database. In total 108 SNPs were detected in this relatively large gene (10399 bp), of which 10 are nonsynonymous and 9 synonymous, respectively. A Ka/Ks ratio of 1.11 indicates, although this gene is a diversifying gene, its close to one ratio could easily change it into a neutral gene. A recent association study between PUL and RVA profile parameters in glutinous rice has shown strong relations of this gene with peak viscosity, hot paste viscosity, breakdown viscosity and peak time (Yan et al., 2010). Nevertheless, our study only showed very minor 17 association between PUL and some physiochemical properties such as chalkiness, gelatinization and pasting temperature (chapter 4). Proteins A wide range of starch granule-associated proteins has been found from different botanical sources that are diverse in number, identity and possibly function (Baldwin, 2001; Schofield and Greenwell, 1987). They are classified into two main categories of low and high molecular weight proteins. It is widely accepted these proteins are located either inside and/or on the surface of starch granules and influence starch properties. The composition and content of these proteins significantly affects the structure and quality of starch and the baking quality of cereals. Juliano et al. (1965) studied the relation of amylose content, protein content, water absorption and gelatinization temperature on cooking and eating qualities of non-waxy rice and found protein content and cooked rice colour of are positively correlated. Retrogradation is the hardening of cooked rice after storage or cooling. Retrogradation rate has significant implications for rice consumers as many of them cook rice in the morning and consume it after several hours of refrigerated storage. Recent studies have shown removal of total proteins causes softer gels at different storage treatments such as 20, 40 and 60 ºC but will not affect firmness following refrigerated storage (Philpot et al., 2006). This suggests protein could have a major influence on room temperatures retrogradation; however, other factors or biochemical mechanisms such as lipid content might be involved in the lower temperatures. Application of pronase, which digests peptides and proteins, to rice kernels and milled starch caused a significant change in thermal properties and gelatinization profile (Marshall et al., 1990). Ragaee and Abdel-Aal (2006) observed significant differences between cereals physiochemical properties such as starch peak, breakdown and setback viscosities (RVA curves) as well as in protein peak viscosity of a number of cereals. 18 However, proteins inside maize or wheat hydrated-swollen starch granules after gelatinization are degraded extensively by proteases without any apparent change in properties (Debet and Gidley, 2007). Lipids A number of lines of evidence suggest lipids have a significant role in influencing the physiochemical properties of rice such as retrogradation. Debet and Gidley (2007) found proteins and lipids on the granule surface are determinants of ghost robustness and have a role in ghost formation and integrity, a surface film, rich in protein and lipid, limits expansion of starch granules and prevents dissolution after gelatinization. Lipid removal from rice variety Koshihikari grown in different countries, increased retrogradation rate and firmness of gels after storage at different temperatures. The greater the amount of long chain amylose complexed with lipids, the greater the reduction in retrogradation degree which is caused by the unavailability of long amylose chains (Philpot et al., 2006). Lipids are also involved in the physical structure of the rice kernel. Treatment of different rice varieties with hexane caused significant changes in gelatinization parameters and kernel shape and caused extensive fissure formation (Marshall et al., 1990). Environmental factors: Nitrogen (N), Phosphorous (P) and Potassium (K) The optimal application of nitrogen fertilizers for Australian rice cultivars is 170-180 kg/ha and an amount higher or lower than this is considered to be a high or low level. Application of different levels of nitrogen in the field during rice plant development influences solid loss and water uptake ratio during cooking. Rice starch grown under nitrogen application has a higher cooked grain hardness, cohesiveness, chewiness, lower amylose content and higher pasting and gelatinisation temperature and enthalpy (Singh et al., 2011). Pot and field 19 experiments confirmed increasing N application, decreased amylose content, peak viscosity and breakdown viscosity, while setback and consistency go up (Dayong et al., 2004). Other macro-nutrients such as phosphorous (P) and potassium (K) have also been studied in relation to rice grain amylose content and starch viscosity properties. Application of P has no obvious effect on amylose content, peak viscosity, breakdown, set back and gel consistency. However, increasing amounts of K increased amylose content, peak viscosity, breakdown while the setback and gel consistency were reduced. It seems, the interaction of NPK fertilizers on quality characters of different varieties was significant; while reduction of N and increasing K improves rice cooking and eating quality (Dayong et al., 2004). Thermal stress Thermal stress (temperatures above 37 °C) during the critical grain filling period can affect the biochemical processes of starch deposition causing yield loss and starch defects (Peng et al., 2004). Failure in starch deposition results in lightly packed granules and grain chalkiness (Zakaria et al., 2002). It is believed that during heat shock, especially at grain filling, expression of some starch related genes such as GBSSI, BEIIb and a cytosloic dikinase gene is down-regulated, where as some heat shock proteins and alpha-amylases up-regulated (Yamakawa et al., 2008). Yamakawa et al. (2007) suggested that decreased level of amylose and long chain-enriched amylopectin in high temperature-ripened grains is mainly due to repressed expression of GBSSI and BEIIb, respectively. They also reported the expression level of various genes in response to high temperature. Tashiro and Wardlaw, (1991a) showed that high temperature at the milky stage of grain filling has the most extensive influence on rice grain chalkiness because the panicle is the most sensitive organ to high temperature (Sato et al., 1973). 20 CO2 Differences in carbon dioxide concentration have no consistent effect on grain and starch parameters of wheat, small effects have been detected on thousand grain weight, starch content and lipid-free amylose content (Tester et al., 1995). Evaluation of the long-term effects of different CO2 concentrations on carbohydrate status and partitioning of rice (Oryza sativa L cv. IR-30) found the photosynthesis rate was substantially increased with CO2 concentrations up to 500μmol mol−1 and then reached a plateau at higher concentrations (Rowland-Bamford et al., 1990). The ratio of starch to sucrose concentration was positively correlated with the CO2 concentration but had no effect on the carbohydrate concentration in the grain at maturity. Objectives of thesis The objectives of this thesis were to: 1) Characterise rice starch biosynthesis genes; 2) Discover DNA polymorphisms in Australian rice germplasm using new cutting edge technologies (Next Generation Sequencing); 4) Detect and prioritise functional SNPs in starch related genes using computational tools; 5) Associate SNPs (genes) with the physiochemical properties of rice grain. Key concepts The key concepts encompassed by this thesis are: 1) Rice starch varies between rice cultivars due to differences in the gene sequence of the enzymes which synthesise the rice starch; 2) Humans can detect differences in rice starch and these differences define rice quality. These differences can be instrumentally quantified; 21 3) Differences in gene sequence can be defined and the extent to which they control, or are associated with, rice quality differences measured; 4) The chemical properties of the 20 constituent protein amino acids structure and the accumulated knowledge of how protein structure and function are linked can be used to in algorithms which predict how amino acid differences in any one protein may impact the function of that protein. Major activities reported in the thesis The major activities reported in the thesis were: 1) DNA sequence of 18 starch related genes were retrieved from databases and the location and type of SNPs identified; 2) The retrieved SNPs were analysed and then prioritised based on their predicted importance using bioinformatic tools and algorithms; 3) Long range PCRs (LR-PCR) amplified the 18 starch related genes in 233 Australian rice lines/cultivars; 5) The amplified products were pooled and then sequenced using an Illumina GAIIx platform. 6) The sequencing data was analysed and SNPs detected; 7) The SNPs retrieved from databases were compared with SNPs discovered in the sequencing experiment and novel variations identified; 8) Specific markers were designed and generated for multiplexed MALDI-TOF assay of SNPs. 9) All 233 genotypes were assayed (genotyped) individually for all SNPs using multiplexed MALDI-TOF; 10) The phenotypic data for physiochemical traits of 233 rice individuals were obtained from an Australian breeding program; 22 11) Association between SNPs (genes) and traits was assessed using the software TASSEL following a General Linear Model; 12) Data flowing from these activities were discussed within the context of published work in the field. 23 CHAPTER 2 Discovery of polymorphisms in starch related genes in rice germplasm by amplification of pooled DNA and deeply parallel sequencing Summary High-throughput sequencing of pooled DNA was utilised for polymorphism discovery in candidate genes involved in starch synthesis. A total of 17 rice starch synthesis genes, encoding seven classes of enzymes, including ADP-glucose pyrophosphorylase (AGPases), granule starch synthases (GBSS), soluble starch synthase (SS), starch branching enzyme (BE), starch debranching enzyme (DBE) and starch phosphorylase (SPHOL) and phosphate translocator (GPT1) from 233 genotypes were PCR amplified using semi- to long range PCR. The amplification products were equimolarly pooled and sequenced using massively parallel sequencing technology (MPS). By detecting SNP/Indel in both coding and non-coding areas of the genes, the SNP/Indel variation and distribution patterns among individual starch candidate genes were identified and characterized. Approximately 60.9 million reads were generated, of which 54.8 million (90%) mapped to the reference sequences. The average coverage ranged from 12,708× to 38,300× for SSIIa and SSIIIb, respectively. SNPs and single/multiple-base Indels were analysed in a total assembled length of 116,403 bp. A total of 501 SNPs and 113 Indels were detected across the 17 starch related loci. The ratio of synonymous to non-synonymous SNPs (Ka/Ks) test indicated GBSSI and Isoamylase 1 (ISA1) as the least diversified (most purified), reflecting the populations history of selection for low amylose content and gelatinization temperature. This report demonstrates a useful strategy for screening germplasm by MPS to discover variants in a specific target group of genes. 24 Introduction The capacity of massively parallel sequencing to simultaneously assay millions of single nucleotide polymorphisms (SNPs) has made genome-wide studies possible (Schuster, 2008). The use of next generation sequencing (Thomas et al., 2006) platforms for population-based sequencing of targeted genomic regions enables the discovery of new variants and their frequencies across selected genes (Harismendy and Frazer, 2009), or allow identification of errors in previously published reference sequences (Bentley, 2006) or SNP databases (Velicer et al., 2006). Massively parallel sequencing technology (Genome Analyser) is a groundbreaking, flexible and high-throughput platform for genetic analysis and functional genomics which is based on ultra deep sequencing of short reads and a huge number of sequencing reactions (Imelfort et al., 2009). This platform utilizes a sequencing-by-synthesis approach in which all four nucleotides are added simultaneously followed by an optic imaging procedure which occurs at each base incorporation step (Mardis, 2008b) and has widely been used by researchers to discover SNPs associated with human genetic diseases, particularly cancer studies (Bentley, 2006; Mardis, 2008a). This platform can be utilised in different ways, from whole genome sequencing (WGS) of plants and animals to specific genomic regions or even functional encoding genes or loci (Bentley et al., 2008; Hillier et al., 2008; Kim et al., 2009). Massively parallel sequencing (MPS) is an attractive cost efficient technology that enables characterisation of genetic traits on an unprecedented scale, in terms of the number of genes, number of samples and allele frequency which is necessary if rare alleles are to be found (Kaiser, 2008; Pettersson et al., 2009). Recently, targeted MPS has been effectively integrated with Long Range PCR (LR-PCR) of pooled DNA samples which minimises the cost of sequencing, amplification, oligonucleotides, and labour (Out et al., 2009). LR-PCR targeted MPS can be employed to deeply sequence regions surrounding candidate genes 25 containing SNPs/indels (Varley and Mitra, 2008). Utilising this approach, the full extent of allelic variation in a vast number of encoding genes involved in various aspects of physiology, disease etc. can be recovered and large regions of linkage disequilibrium (~5-11 kb) identified (Bodmer and Bonilla, 2008). One of the major advantages of this approach is the capacity of MPS and targeted gene amplification to provide a high sequence depth in all studied loci simultaneously. For example, a total sequence yield of 1 Gb means a fragment of 10-kb will be read approximately 100,000 times (Out et al., 2009) which meets the requirements for discovery of rare alleles (Druley et al., 2009; Thomas et al., 2006; Ingman and Gyllensten, 2008). The flexibility of the platform is extended when multiple genomic regions of numerous individuals from wild or segregating populations are pooled. Rice (Oryza sativa L.) starch, a complex carbohydrate, is one of the most important crop products for humankind (Fitzgerald, 2004). Starch is synthesized by the activity of several enzymes and has been subjected to extensive studies (Morell et al., 2003). Each of the starch synthesis enzymes exists as a number of different isoforms and is usually classified into one of the specific group of genes, such as ADP-glucose pyrophosphorylase (AGPases and GPT1), starch synthase, starch branching enzyme, starch debranching enzyme and starch phosphorylase (James et al., 2003). In this study, DNA of 233 individuals from a breeding population was equimolarly pooled and 17 rice starch quality-related genes encoding seven classes of starch enzymes which are part of the starch bio-synthesis pathway were amplified by a LR- or Semi LR-PCR (SLRPCR) protocol. The pooled-targeted amplifications were subsequently sequenced using MPS sequencing technology (Illumina Inc., San Diego, CA). By detecting SNP/Indels contained in 26 both coding and non-coding areas of the genes, SNP/Indel distribution patterns were characterised. Materials and methods Plant materials All plant material was supplied by Industry and Investment NSW, Yanco Agricultural Research Institute, Australia. Two hundred and thirty three rice lines from a breeding program were analysed. These lines were significantly diverse in starch quality properties, providing a high rate of variation for starch traits. Variability of genotypes This population comprised a series of lines at the F6 stage, from harvested pedigree rows entering the first stage of plot testing. This captured a wide set of lines from a breeding program focused on temperate (japonica-type) rice. Selection was primarily done on plant height, the capacity to flower and set seed in our temperate environment of Australia, and visual inspection of grain size and shape. No selection had taken place for quality traits like gelatinization temperature and RVA curve data (Appendix 2). Sample preparation and DNA extraction Rice seeds of each line were germinated at 25°C and 10-20 seedlings from each individual selected for DNA extraction. Total genomic DNA was extracted from 15-day-old seedlings using DNeasy 96 Plant kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions. 27 Designation of starch-metabolizing enzymes/genes involved in starch synthesis The major databases such as NCBI (http://www.ncbi.nlm.nih.gov/) and the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/cgi-bin/putative_function_search.pl) were searched for the general entries of nucleotide sequences (gDNA) and full-length cDNAs of important gene classes which are presumed to be involved in starch biosynthesis. The available literature was used to choose the most likely candidate genes associated with rice starch quantity and quality (Ohdan et al., 2005; Waters and Henry, 2007; Nakamura, 2002; Hirose et al., 2006; Rahman et al., 2000). The multiple sequence alignment of selected genes was carried out using Sequencher® (Gene Codes Corporation, Ann Arbor, MI, USA) and CLUSTAL W (http://www.ebi.ac.uk/Tools/clustalw2/index.html) and a consensus sequence alignment generated for each candidate gene to design the amplification primers. Target genes for sequence analysis The present study focused on the genes encoding seven groups of enzymes, namely ADPglucose pyrophosphorylase (AGPase), granule bound starch synthase (GBSS), starch synthase (SS), branching enzyme (BE), debranching enzyme (DBE), starch phosphorylase (PHO) and glucose phosphate translocator (GPT). Designing primers to capture target genes The sequence of each target gene, including exons and introns, were divided into two relatively equal fragments. Each selected sequence included 500 bp from up- and downstream of the coding regions (5´ and 3´ UTRs) and an approximately 300 bp overlap in the middle. A set of specific primers were designed for each half using Clone Manager V9.1 (Sci-Ed Software, NC 27513 USA) (Appendix 3). 28 Long range PCR protocol (LR-PCR) The concentration of extracted DNA was quantified by the automated flurometric protocol of PicoGreen (PicoGreen dsDNA Quantification Kit, Invitrogen, CA 92008 USA) and then diluted to 30-40 ng/µl for amplifications. A unified LR-PCR approach was applied to amplify all genes (with few exceptions), simultaneously. BioRad iProofTM High-Fidelity DNA polymerase was used for PCR amplifications, in 10 µl reactions, containing 20 ng of pooled genomic DNA of 20 individuals. The extreme fidelity of iProofTM makes it the enzyme of choice for SNP detection in long amplicons. PCRs were performed using 2 µl of HF or GC buffer (the HF buffer are used for normal and GC buffer for GC rich sequences), 0.2 µl of dNTPs (10 mM), 2 µl of each forward and reverse primer (2.5 µM), 1 µl of pooled DNA (20 ng/µl), 0.1 µl of iProofTM polymerase (0.2 unit) and 2.7 µl sterile water. As the different genes needed unique optimal conditions for amplification, a unified PCR method to amplify all targeted genes simultaneously was attempted. The touchdown PCR protocol was performed using a Corbett PCR thermo cycler as follows: 98 C for 1 min (1 cycle), followed by 10 touch down cycles of 98°C for 10s, annealing temperature of 72-62 (10 C degree touch down) and 72oC extension for 4 min, followed by 28 cycles of a normal amplification of 98oC for 10s, 62oC for 20s and 72oC for 4 min. The final extension was done by a cycle of 72oC for 10 min. Prior to Illumina sequencing, the PCR products were Sanger sequenced using BigDye Terminator version 3.1 (Applied Biosystems, Foster City, CA). The generated sequences were aligned with the reference sequence to ensure the correct gene had been captured. DNA equimolar pooling A uniform pooling strategy was applied for all samples. The genomic DNA of 233 breeding lines, which had already been normalised for PCR in previous stage (30 ng/µl), divided into 29 12 sections, containing the pools of approximately 20 individuals each, and LR-PCRs were carried out. The concentration of PCR products from these pools were measured using PicoGreen (PicoGreen dsDNA Quantification Kit, Invitrogen, CA 92008 USA). A second pool was made for each fragment from PCR products. To facilitate the final equimolar pooling of PCR products, the concentration of second pools (33 second pools/amplicons) were individually normalised to 25 ng/µl and then equimolarly pooled into a mega pool based on the predicted lengths, giving consideration to the requirement that larger amplicons need a higher number of copies than smaller fragments. The final mega pool was prepared with the aim of having the final concentration of 2.5 µg of long amplicons, including all 233 individuals. Massively parallel sequencing The final mega pool was subjected to Illumina GA sequencing (Illumina Inc., San Diego, CA). The PCR product fragmentation and library were prepared according to the manufacturer’s instructions. The fragments with length of approximately 200 bp were selected for sequencing and 4 pmol of the library were added on to a one flowcell. SNP detection and data analysis Data analysis such as filtering, trimming and mapping to the reference sequences were performed with the CLCbio Work Bench 4 using 17 reference sequences with the specified coordinates, extracted form Genbank (Table 3). The CLC work bench general parameters were set to the following: The conflict resolution changed into all four nucleotides (vote A, C, G, T), non specific and masking references ignored. The reads parameters set to default as the min-max distance, mismatch cost; length fraction and similarity were 100-1000, 3, 0.9, and 0.9, respectively for both single and paired end reads. This set of parameters was selected 30 in order to minimize reads alignment ambiguities as well to detect rare SNPs. The minimum coverage and percent of minimum variant frequency were set at 20 and 0.5, respectively, which meant all variations on or above 0.5%, were considered as SNPs. Total polymorphism rate and functional SNPs The total polymorphism rate was calculated as: TSI TL 100 where, TSI=Total number of SNPs and Indels and TL is the total length of each candidate gene. The functional or nonsynonymous SNP rate was also calculated as: NS TL 1000 where, NS= Number of nonsynonymous SNPs in each locus and TL is the total length of each candidate gene. Results Number of reads and average coverage Sequencing of LR-PCR products of all 17 studied loci generated ~60.9 million reads of which 54.8 million (90%) mapped to the reference sequences. Table 1 shows the summary statistics of the mapping report. The average coverage differed among loci and ranged from 12,708× to 38,300× for SSIIa and SSIIIb, respectively (Fig 1). This difference may be related to factors such as concentration of amplicons and PCR efficiency, number of non-specific products and contamination with external PCR products. For example, LR-PCR products of the SSIIa gene revealed a number of non-specific bands on agarose gel which led to higher unmapped reads and lower coverage. The highest and lowest number of reads was counted for SSIIIa and SSIIa, with 5,920,785 and 876,986 reads, respectively. 31 Polymorphism discovery and SNP/Indel detection Starch quality loci of 233 breeding lines were successfully sequenced to great depth and coverage. SNPs and single/multiple-base Indels were discovered in a total length of 116,403 bp assembled by Genome Analyser (GA). In total, 501 SNPs and 113 Indels were detected across the 17 starch related loci (Appendix 1). The total number of polymorphisms was then compared to SNPs available at OryzaSNP MSU database (http://oryzasnp.plantbiology.msu.edu/) (Table 2). A total of 399 SNPs for the targeted loci had already been reported in this database for 20 rice cultivars. As expected, the total number of polymorphisms in this experiment was significantly higher than that reported in the rice database and the confidence score was significantly higher due to huge read coverage. On average, the SNP rate was 4.31 SNPs/kb and 0.97 Indels/Kb. Previous data have reported an average rate of one SNP every 170 bp and one Indel every 540 bp (Goff et al., 2002; Yu et al., 2002). Ave Cov 45000 40000 Average coverage 35000 30000 25000 Ave Cov 20000 15000 10000 5000 Name of gene/enzyme Figure 1. The average coverage in starch related genes 32 PUL ISA2 ISA1 BEIIb BEIIa BEI SSIVa SSIIIb SSIIIa SSIIb SSIIa SSI GBSSII GBSSI GPT1 SPHOL AGPS2b 0 The data indicates one SNP in every 232 bp and one Indel every 1030 bp within this set of germplasm and for these candidate genes. Although the average rate of SNPs is gene specific and related to species and structure of the studied population, these results are similar to previous reports (Nasu et al., 2002; Yu et al., 2002). Out of 501 identified SNPs, 75 or ~14.9% of SNPs caused an amino acid change making them potentially functional. All Indels resided in the intronic regions and were thus not responsible for any stop codons, frameshift mutations or amino acid changes. The Indel rate was a slightly higher than previously reported (Goff et al., 2002; Yu et al., 2002) which may have been due to lower stringency mapping criteria the short reads in CLC workbench, with Min Cost 2; Min Insert 2 and Similarity 0.7. The largest and smallest Indels were 8 bp and 1 bp nucleotides, respectively. Table 1. Summary statistics of mapping report, generated by Illumina Genome Analyser sequencing Statistics Number of reads (Count) Average length of reads (bp) Total bases Reads 60,985,472 64.4 3,927,761,087 Mapped to reference 54,813,065 64.76 3,549,512,020 Unmapped 6,172,407 61.28 378,249,067 Reference sequence* 17 (count) 6,415 109,067 Paired reads 42,720,746 256.68 *Reference sequence was taken from NCBI database. SNP variation across the starch related candidate loci To evaluate the capacity of MPS to detect new variants in starch synthesizing enzyme/genes pools, a comprehensive experiment by Illumina GA platform on 17 different rice starch related genes was conducted. Table 2 summarises the information on newly discovered variation on studied genes. Seven classifications of starch related enzymes which impact starch structure and quality, such as ADP-glucose pyrophosphorylase (AGPase), granule 33 bound starch synthase (GBSS), starch synthase (SS), branching enzyme (BE), debranching enzyme (DBE), starch phosphorylase (PHO) and glucose phosphate translocator (GPT) were pool sequenced. The details of each gene member and their detected polymorphism are as follows: ADP-glucose pyrophosphorylase (AGPase), Starch phosphorylase (PHO) and Glucose phosphate translocator (GPT) gene families These enzymes/genes reside at the top of the starch bio-synthesis pathway and are classified as the starting point to grain starch production. Glucose is first activated by the addition of ADP by AGPase which then becomes the substrate for starch synthases enzymes. There are several gene/isozymes in this classification but AGPS2b has the highest expression level in rice endosperm (Hirose et al., 2006). AGPS2b (small subunit) The role of this subunit in starch granule synthesis has been identified by way of its association with rice shrunken mutants (Kawagoe et al., 2005). A dramatic inhibition of starch synthesis has been observed in AGPase-deficient rice mutants and some other species and results in increased soluble sugars, a large number of underdeveloped granules, small grains and pleomorphic amyloplasts (Rolletschek et al., 2002). In total, 30 SNPs and 4 Indels were found across the population for this gene. None of them caused an amino acid change, suggesting this gene has little impact on starch quality in this population. However, the number of SNPs found was significantly higher than those previously reported in rice databases (Table 2). 34 SPHOL (alpha 1,4 glucan starch phospholrylase) This gene is generally considered to be involved in starch degradation but recent studies suggest some important roles in starch biosynthesis. Although its precise mechanism and influence is still not well known, the mechanism appears to be associated with phosphorylation of some starch-related enzymes and proteins such as starch branching enzymes (SBEs) and starch synthase (SSIIa) (Tetlow et al., 2004). In total, five and seven non-functional SNPs and Indels were found in this gene, respectively. The SNP rate was lower than that reported in databases (Table 2 and 3). GPT1 (Glucose-6-phosphate translocator) GPT1 strongly expressed in endosperm. This gene is believed to be responsible for import of essential carbon substrates such as Glc6P into the plastids during the grain development (Fischer and Weber, 2002; Jiang et al., 2003). There were 16 SNPs found, one of which causes an amino acid change and 8 Indels. A ´C/T´ SNP at reference position of 1188 changes an amino acid from Leu to Phe (Leu42Phe). This is a conservative non-polar amino acid substitution (L→F) and therefore might not significantly alter protein activity. However, this is a new functional SNP which has not previously been reported in databases. 35 o Table 2. Total polymorphism detected across the 17 starch quality related genes. NC-Number Genbank No# Gene ID in NCBI Gene ID in OryzaSNP@MSU database Gene Chr No AGPS2b, [ADPglucose pyrophosphorylase (small Unt) SPHOL (alpha 1,4glucan phosphorylase) GPT1 (Glucose-6phosphate/ phosphatetranslocator GBSSI 8 GBSSII (expressed in leaf) SSI SSIIa Average coverage× Length Assembeled by Illumina (bp) Number of variants Detected by GA Number of SNPs in OryzaSNP@MSU database SNPs In/dels* †High quality 19,203 5,635 30 4 4 †Perlegen+ Machine learning 12 3 29,561 7,489 5 7 11 18 N/A 8 21,042 4,073 16 8 3 6 1 6 32,486 3,480 1 1 8 13 1 7 31,260 8,049 4 8 1 1 1 6 6 32,707 12,708 7,750 4,420 73 31 8 1 46 5 67 12 N/A 22 2 22,494 5,323 3 2 4 16 N/A 8 19,203 11,263 83 15 20 52 23 Number of Functional (amino acid changes) 1 NC_008401 Os08g0345800 LOC_Os08g25734 2 NC_008396 Os03g0758100 LOC_Os03g55090 3 NC_008401-2 Os08g0187800 LOC_Os08g08840 4 NC_008399-1 Os06g0133000 LOC_Os06g04200 5 NC_008400 Os07g0412100 LOC_Os07g22930 6 7 NC_008399-2 NC_008399-3 Os06g0160700 Os06g0229800 LOC_Os06g06560 LOC_Os06g12450 8 NC_008395-1 Os02g0744700 LOC_Os02g51070 9 NC_008401-3 Os08g0191500 LOC_Os08g09230 SSIIb (expressed in leaf) SSIIIa 10 NC_008397-2 Os04g0624600 LOC_Os04g53310 SSIIIb 4 38,300 8,624 26 11 9 24 7 11 NC_008394-1 Os01g0720500 LOC_Os01g52250 SSIVa 1 16,497 10,480 27 6 19 25 5 12 NC_008399 Os06g0726400 LOC_Os06g51084 BEI 6 30,255 7,258 18 6 3 14 1 13 NC_008397 Os04g0409200 LOC_Os04g33460 BEIIa 4 15,958 2,265 6 N/A N/A N/A 1 14 NC_008395 Os02g0528200 LOC_Os02g32660 2 25,491 10,900 53 17 4 5 3 15 NC_008401-1 Os08g0520900 LOC_Os08g40930 BEIIb (Amylose extender) Isoamylase1 (DBE) 8 37,526 6,592 0 9 2 16 N/A 16 AC132483 LOC_Os05g32710 Isoamylase2 (DBE) 5 20,373 2,403 16 N/A 7 10 9 17 NC_008397-1 OSJNBa0014C03 .3 Os04g0164900 LOC_Os04g08270 Pullulanase (DBE) 4 30,067 10,399 109 10 54 109 1 116,403 bp 501 113* 200 399 75 Total N/A * The lower stringency used for in/dels as follows: Min Cost 2; Min Inser: 2; Similarity 0.7. †SNPs in OryzaSNP@MSU database detected and analysed using the Perlegen model-based method as well as a machine learning method. Totally, over 158,000 high quality SNPs have been identified in the rice genome by these two technologies. 36 Granule bound starch synthase (GBSS) gene family This family of genes is responsible for production of the amylose component of starch in plants. GBSSI (Granule bound starch synthase I) GBSSI or the waxy gene is one of the most important genes involved in starch synthesis and influences cereal grain quality, particularly in rice. The major role of GBSSI on amylose content is well known and several SNPs associated with starch quality have been characterized in rice (Chen et al., 2008b). Previous studies have shown that three SNPs, one each at the intron/exon 1 boundary, exon 6 and 10 have the most significant impact on starch quality (Cai et al., 1998). Only one functional ´A/C´ SNP was detected at position 1086 of the reference sequence and corresponds to the previously reported exon 6 SNPs and causes a Tyr→Ser substitution at position 224 of amino acid. This substitution is non-conservative, changes the polarity of the amino acid and the function of GBSSI, enzyme activity and amylose content. Larkin and Park, (2003) have suggested this SNP effects amylose content. One non-functional Indel was also found in this gene. The sole SNP detected in GBSSI in this population compared to the eight SNPs retrieved from OryzaSNP database indicates there has been significant selection pressure imposed on this locus in this population during the course of breeding. A multiplex SNP verification experiment was conducted to validate the data (Masouleh et al., 2009). The results showed that only this SNP, with very low frequency, exists in this population. The breeding selection criteria applied to this population have somehow restricted the polymorphism of the GBSSI gene in this population. The Ka/Ks data also suggests GBSSI is a gene under purifying selection in this population. The highest Ka/Ks ratio of 2.00 was calculated for this gene (Table 3). 37 GBSSII (Granule bound starch synthase II) This gene/enzyme is predominantly expressed in leaf, leaf sheaths, culm, and pericarp tissue at a low level, particularly during pre-heading and 1-3 days after flowering (Ohdan et al., 2005). The impact of GBSSII on elongation of amylose in non-storage tissues of cereals has been confirmed (Vrinten and Nakamura, 2000). GBSSII is found exclusively bound to starch granules in green tissues and synthesises amylose which is subsequently consumed by the plant or accumulated in the endosperm (Dian et al., 2003). There were 4 SNPs and 8 Indels identified, one of which occurred at coordinate 1638 of the reference sequence and altered a Leu to Serine at position 523 of GBSSII. This A/G SNP changes the polarity of the amino acid and hence may impact the activity and function of the protein. All Indels were detected in introns. Starch synthase (SS) family This gene/enzyme family is primarily involved in the production of the amylopectin component of starch in plants. SSI This protein is presumed to be expressed in the endosperm and leaf of rice (Fujita et al., 2006). The transcript level of SSI is higher in endosperm than leaf sheaths and blades and has therefore been classified as an endosperm and non-endosperm expressing gene (Hirose et al., 2006). The measurement of SSI transcript levels at different seed developmental stages found high expression at 1-3 DAF, peaking at 5 DAF, and remaining almost constant during starch synthesis in endosperm, suggesting SSI is the major SS form in cereals (Cao et al., 1999). 38 Table 3. The polymorphism analysis of starch-related candidate genes in rice N o Gene/Enzyme symbol Gene/Enzyme name Gene coordinates in Genbank Total Polymorphism rate 6.033 Non-synonymous SNP rate SNP per Kb Indel per Kb Ka/Ks ratio 15760599- 15,754,206 Length Assembeled by Illumina (bp) 5,635 1 AGPS2b 2 SPHOL ADP-glucose pyrophosphorylase (small unit) alpha 1,4-glucan phosphorylase 0.000 5.323 0.709 1.00 32,183,093-32,190,581 7,489 1.602 0.000 0.667 0.934 1.00 3 GPT1 Glucose-6-phosphate/ phosphatetranslocator Granule Bound Starch Synthase I (Waxy gene) Granule Bound Starch Synthase II 5,138,640- 5,142,712 4,073 5.892 0.245 3.928 1.964 0.50 4 GBSSI 1,764,623- 1,769,657 3,480 0.574 0.287 0.287 0.287 2.00 5 6 GBSSII (expressed in leaves) SSI 13,584,483- 13,576,435 8,049 1.490 0.124 0.496 0.993 2.00 Starch Synthase I 3,078,060- 3,085,809 7,750 10.451 0.000 9.419 1.032 0.11 7 SSIIa Starch Synthase IIa 6,747,562-6,751,981 4,420 7.239 1.357 7.013 0.226 1.00 8 Starch Synthase IIb 32,125,071- 32,119,749 5,323 0.939 0.000 0.563 0.375 0.25 9 SSIIb (expressed in leaves) SSIIIa Starch Synthase IIIa 5,351,108- 5,362,370 11,263 8.701 2.04 7.369 1.331 2.40 10 SSIIIb Starch Synthase IIIb 32,149,493-32,158,120 8,624 4.290 0.811 3.014 1.275 1.33 11 SSIVa Starch Synthase IVa 31,786,842- 31,797,321 10,480 3.148 0.477 2.576 0.572 1.20 12 BEI Branching Enzyme I 31775431- 31,782,688 7,258 3.306 0.137 2.480 0.826 0.400 13 BEIIa Branching Enzyme IIa 20,260,837- 20,265,349 2,265 2.649 0.441 2.649 0.000 2.00 14 Branching Enzyme IIb 20,213,965- 20,224,864 10,900 6.422 0.275 4.862 1.559 0.285 15 BEIIb (Amylose extender) ISA1 (DBE) Debranching Enzyme- Isoamylase 1 25,981,756- 25,988,347 6,592 1.378 0.000 0.000 1.365 1.00 16 ISA2 (DBE) Debranching Enzyme- Isoamylase 2 23,596- 25,998 2,403 6.658 3.745 6.658 0.000 0.70 17 PUL (DBE) Debranching Enzyme- Pullulanase 4,399,980- 4,410,318 10,399 11.443 0.096 10.481 0.961 1.00 Ka/Ks ratio: The proportion of non-synonymous (Ka) relative to synonymous (Ks) can reveal whether a gene has been under purifying, neutral or diversifying selection. The data for calculation of Ka/Ks (number of Ka and Ks) can be found in columns R and S of Appendix 1. Values in column R shows the SNPs exist in the coding region (Marked as CDS or mRNA) and S column shows the number on nsSNPs. The total polymorphism rate calculated as: TSI TL 100 where, TSI=Total number of SNPs and Indels and TL is the total length of each candidate gene. The functional or non-synonymous SNPs rate calculated as: NS TL 1000 where, NS= Number of non-synonymous SNPs in each locus and TL is the total length of each candidate gene. 39 A comprehensive analysis of mutant rice with a retrotransposon inserted into the SSI encoding gene revealed SSI has a capacity for the synthesis of chains with DP8-12 with the extension of smaller chains (Nakamura, 2002). This gene has a very small phenotypic effect on rice eating quality although a significant negative correlation between the ratio of short chains (DP 6-12) and gelatinization temperature has been reported (Umemoto et al., 2008). There were 73 SNPs and 8 Indels detected in this gene. No functional SNP/Indels were found, in comparison with two amino acid changes that have been reported in the OryzaSNP database. SSIIa SSIIa is known to have a major affect on starch quality. This gene is expressed in the endosperm at very high levels and presumably affects amylopectin structure (Craig et al., 1998; Morell et al., 2003). The effect of this gene on cooking quality and starch texture has clearly been revealed (Umemoto et al., 2008; Umemoto et al., 2004). The gelatinisation temperature (GT), alkali disintegration and eating quality of rice starch have been explained by polymorphism of two SNPs, [A/G] and [GC/TT], within the exon 8 of alk loci (Umemoto and Aoki, 2005; Waters et al., 2006). In total, 31 SNPs and 1 Indel were detected in this gene which was significantly higher than those reported in OryzaSNP database (12 SNPs). Surprisingly, 22 SNPs out of 31 were functional and introduced an amino acid change as determined by CLC Workbench. SNP distribution analysis revealed 80% of these low frequency SNPs (25) were located at the beginning of the reference sequence, starting from coordinates 13 to 553, and bringing about 17 amino acid changes. This high SNP rate may be associated with inefficient PCR and consequent low coverage (45×-129×) of this GC rich region (Appendix 1). Re-sequencing verified only four SNPs between coordinates 13-553, with a minimum frequency of 8-10% for the minor allele. Taking the high false positive rate 40 into account, a total of nine SNPs and one Indel (six amino acid changes) were identified in this gene (Appendix 1). Three single nucleotide 3-allelic SNPs [G/T/A] and a G/T SNP at positions 72, 77, 81 and 87 of the reference sequences respectively are new polymorphisms. Of the three single nucleotides 3-allelic SNPs, one ´G/T/A´ SNP is presumed to cause the most critical amino acid substitution of Arg26Met, Lys which induces a polar to non-polar alteration in the protein. SSIIb It is believed SSIIb is a low level early expressed gene which is primarily expressed in sink and source leaf blades and sheaths (leaf specific) at an early stage of grain filling (Hirose and Terao, 2004). However, a recent study presented evidence it contributes to six other starch genes to alter some RVA (Rapid Viscosity Analyser) parameters in glutinous rice (Yan et al., 2010). Only three SNPs and two Indels were found, both of which were non-functional, indicating this gene does not affect phenotypic variation of this population. SSIIIa The highest rate of polymorphismin in terms of amino acid changes was observed in this gene. In total, 83 SNPs and 15 Indels including 23 non-synonymous and nine synonymous substitutions were detected, indicating this is the most diverse gene in our population. Previous findings have detected 52 SNPs. Of 23 non-synonymous substitutions, 10 are amino acid changes which alter polarity and may produce significant changes in the protein structure (Appendix 1). The SSIIIa encoding gene is highly expressed in the endosperm, although some reports revealed its expression on green tissues (Dian et al., 2005). A recent study of amylopectin chain length in a SSIIIa deficient mutant suggests SSIIIa plays an important role in the elongation of amylopectin B2 to B4 chains. Furthermore, in these 41 mutants, the amylose content and the extra long chains of amylopectin increased by 1.3- and 12-fold, due to an increase in GBSSI activity (Fujita et al., 2007). Conversely, no functional effect of SSIIIa differentiation was observed on RVA parameters, at least between glutinous cultivars (Yan et al., 2010). SSIIIb SSIIIb is mainly expressed in endosperm but transient expression in leaf sheaths and leaves has also been reported (Hirose et al., 2006). This might be due to the existence of two divergent groups of SIIIb in rice that are expressed in different tissues (Dian et al., 2005). It has also been classified into two different categories on the basis of timing of expression in the developing seed. In late expression category the gene expressed in the mid to later stage of grain filling (Hirose and Terao, 2004), and in early expressing category the transcript level usually increases to peak, at 3-5 days after flowering (Ohdan et al., 2005). An association study of rice glutinous near-isogenic lines suggested SSIIIb has a significant impact on RVA parameters such as peak time and pasting temperature (Yan et al., 2010). In total, 26 SNPs and 11 Indels were found in this gene. No functional Indels were detected in this gene and of the seven amino acid changes; three changed the polarity of the amino acids, Thr1176Ala, Glu634Gly and Ser756Ile. SSIVa SSIVa is one of the least well known starch genes in plants. Like most starch synthase genes, SSIVa is exclusively involved in amylopectin biosynthesis. Expression analysis with reverse transcription PCR has indicated SSIVa is preferentially expressed in rice endosperm and to a degree in leaf blades as a late or steady expresser gene during grain filling (Hirose and Terao, 2004). QTL mapping and expression profile analysis have shown that high 42 temperature during the grain filling can considerably increase the transcription level of SSIVa up to 1.11-fold, which is considerably higher than the other starch synthase genes, with a general expression level range of 0.8-to 1.2 (Yamakawa et al., 2007), and may contribute to grain chalkiness (Yamakawa et al., 2008). SSIVa may also affect some secondary RVA parameters such as breakdown and setback (Yan et al., 2010). Of 27 SNPs identified, five were non-synonymous and six intronic Indels. Only one SNP modified amino acid polarity, a 'C/T' SNP at coordinate 4048 of the gene nucleotide sequence induced a Gly708ASP substitution. Starch Branching enzymes (SBEs) Starch branching enzymes (SBEs) determine the structure of amylopectin by breaking α(1→4)-linkages in existing chains and attaching the released reducing ends to C6 hydoxyls, forming the elongated and branched glucan, amylopection (Tetlow et al., 2004). The nucleotide polymorphisms of different isoforms of branching enzymes were studied and the results are as follows. BEI BEI is mainly expressed in the endosperm. Biochemical observations with purified BEI from maize endosperm indicate that BEI preferentially branches amylose-type polyglucans and has a high capacity for branching less branched α-glucans (Takeda et al., 1993). Analysis of the catalytic properties of BEI has indicated the N- and C-termini play a critical role in chain length transfer and substrate preference (Kuriki et al., 1997). BEI transcript levels increase rapidly 3-5 days after flowering. A rice BEI deficient mutant induced by mutagenesis exhibited modified amylopectin structure and grain morphology but the same quantity of starch as the wild type (Satoh et al., 2003) and the BEI encoding gene also effects 43 the RVA profile (Yan et al., 2010). The maize sugary gene arises from a mutation in the maize BEI encoding ortholog (Boyer and Preiss, 1978). In total, 18 SNPs and 6 Indels were found in this gene, one of which is non-synonymous ´C/T´ SNP which alters Gly607ASP which is potentially very important as it changes the polarity of the amino acid. BEIIa BEIIa is a leaf expressed gene involved in amylopectin synthesis. BEIIa is also expressed in the endosperm but at levels 10-fold lower than in leaf tissue (Gao et al., 1997). Variation in this gene/enzyme may have a significant influence in rice starch properties, considering that BEIIa is preferentially expressed along with at least one important starch synthesis gene expressed in leaf and endosperm (both tissue expressing genes) (Hirose et al., 2006). An association study including the gene and RVA properties demonstrated a low F value (6.60) with a very slight influence in glutinous rice (Yan et al., 2010). Application of antibodyspecific BEIIa has demonstrated this protein is present in both soluble and granule bound forms in developing wheat endosperm (Rahman et al., 2001). In total, six SNPs were detected including a non-synonymous ´T/G´ which causes a Tyr140Ser substitution, with no polarity alteration. No SNP/Indel has been previously reported for this gene, suggesting BEIIa might be one of the most conservative starch-related genes in rice. BEIIb A relatively high variation rate of 6.422 was detected for this important gene (Table 3), which is also known as amylose extender (ae) in maize and other cereals (Yun and Matheson, 1993). Many studies have reported the significance of this gene on starch properties on various plant species (Fisher et al., 1993; Sun et al., 1997; Sun et al., 1998). This is a granuleand soluble- associated enzyme which is only expressed in the endosperm. Expression of 44 three different functional maize SBE genes in BE-deficient yeast strains demonstrated that the presence of BEIIb is necessary to activate BEI and BEIIa (Seo et al., 2002). A recent association study has determined very high F value of 11.12 between SSIIb and RVA properties in rice (Yan et al., 2010). Additionally, a 0.5- to 0.7 fold decrease in the expression of BEIIb (amylose extender) during grain filling creates chalky rice (Tanaka et al., 2004). There were 53 SNPs, three of which were non-synonymous, and 17 Indels were found in amylose extender. No functional polymorphism were recorded in the available databases but three non-synonymous SNPs ´C/T´ (Val403Ile), ´C/T´( His196Arg) and ´C/A´ (Leu97Val) were detected here, none of which changed amino acid polarity. Debranching Enzymes (DBEs) DBEs belong to α-amylase family of which two classes exist in plants, Isoamylase and Pullulanase. These enzymes debranch (hydrolase) α-(1-6)-linkages in amylopectin and pullulan. Defective DBEs in plants are thought to be responsible for accumulation of phytoglycogen rather than starch, and in turn, change the phenotypic appearance of the endosperm (Bustos et al., 2004). ISA1 In wheat, the expression of ISA1 cDNA was highest in the developing endosperm and undetectable in mature grains. This suggests a fundamental biosynthetic role of Isoamylase 1 in plant starch, although the precise roles of DBEs are not yet known (Tetlow et al., 2004). The regulation of ISA1 gene at the transcriptional level during grain filling of rice in response to high temperatures has been reported (Yamakawa et al., 2008). In rice endosperm, antisense inhibition of Isoamylase 1 has altered the structure of amylopectin and the physiochemical properties of starch (Fujita et al., 2003). The ISA genes are also presumed to have some sort 45 of contributions to the degree of setback on glutinous rice cultivars (Yan et al., 2010). No functional polymorphism was found in this gene in the studied population. Only 9 Indels in the intronic regions were detected. This suggests that this gene has no or minimum effect on variation in starch properties in this population. ISA2 The existence of this type of Isoamylase was first reported in maize endosperm (Doehlert and Knutson, 1991). It was suggested two isoamylase isoforms I and II exist in maize endosperm which were distinguishable by anion-exchange chromatography. On the basis of enzymic characteristics, the sugary1 (su1) protein corresponds to the isoamylase II form in the maize endosperm, (Beatty et al., 1999). Association between ISA2 and rice grain quality is unknown. There is no intron in this relatively small gene (2625 bp), thus each detected SNP/Indel can be potentially important. The polymorphism rate was significantly high about 0.66. There were 16 SNPs including nine non-synonymous SNP and no Indels in the ISA2 gene. Three of the non-synonymous SNPs altered the polarity of amino acids as follows: T/C, C/A and T/G at coordinates 960, 1712 and 2067 of reference sequence which cause Thr482Ala, Arg231Leu and Thr113Pro substitutions, respectively. Pullulanase (PUL) In rice endosperm a defect in pullulanase-type DBE activity triggers and modulates some phenotypic effects (Nakamura et al., 1998). In maize endosperm, it is believed that pullulanase has a dual role, contributing either to starch synthesis or degradation (Dinges et al., 2003). Kubo et al. 1999 suggest pullulanase plays a predominant and essential role in amylopectin synthesis and compensates shortages of isoamylase activity in the construction of multiple cluster structure of amylopectin. A recent association study between pullulanase 46 and RVA profile parameters in glutinous rice has shown strong relations of this gene with peak viscosity, hot paste viscosity, breakdown viscosity and peak time (Yan et al., 2010). The highest polymorphism rate (1.14) was seen in pullulanse, where in total, 109 SNPs and 10 Indels were detected. This number of SNPs exactly equals the number already reported in OryzaSNP database. In our population, only one non-synonymous SNP was detected at coordinate 2319 of the reference which substitutes a Ser to Asn at position 217 of the protein. This alteration might not be very influential as it does not change the polarity of the molecule. Distribution of SNPs across the loci Distributions of detected polymorphism and coverage patterns of short reads across the length of candidate genes indicated no specific correlation among 17 studied loci (Fig. 2). Some genes such as SPHOL and GBSSI exhibited similar distribution patterns. However, there were no associations among the patterns of different genes. Based on the distribution patterns, it can be concluded that most of candidate genes have shown higher polymorphism rate in the median intron/exon regions rather than UTR ends. Ka/Ks ratio; "purifying" vs "diversifying" genes) The proportion of non-synonymous (Ka) relative to synonymous (Ks) SNP can reveal whether a gene has been under purifying, neutral or diversifying selection. The Ka ⁄ Ks ratio has been created to classify candidate genes into two main categories of ‘purifying’ and ‘diversifying’ genes. Under neutral conditions of evolution, at the amino acid level, Ka should equal Ks and hence the ratio Ka ⁄ Ks = 1. Any deviation from this score shows the selection pressure on genetic structure of population or candidate genes. A Ka ⁄ Ks ratio < 1 indicates negative (purifying) selection, while positive (adaptive) selection is indicated when 47 Ka ⁄ Ks > 1. This indicator was applied to assess diversity of samples in the database (20 diverse cultivars) and the Australian population of 233 genotypes. GBSSI and GBSSII were classified as highly conservative genes which are being passed through the adaptive phase by the rice breeder’s artificial selection pressure as they showed a low polymorphism rate and high Ka/Ks ratio. Some genes such as AGPS2b, SSIIa and SPHOL had a Ka/Ks ratio of one which means they probably have not been under significant selection pressure. Discussion This MPS analysis of rice starch metabolism candidate genes identified a relatively high SNP/Indel variation at all loci. In total, 501 SNPs and 113 Indels were detected in comparison with 399 SNPs that are already available in the public domain. No Indels are recorded in public databases such as OryzaSNP. Out of 501 SNPs, 75 SNPs (~14.9%) were non-synonymous leading to amino acid changes. All Indels resided in the intronic regions and so no obvious functional Indels were found. The highest and lowest polymorphism rates were observed in Pullulanase (11.443) and GBSSI (0.574), respectively (Table 3). The low polymorphism rate in important GBSSI gene is one of the surprising results of this study. A possible cause is the source of this population which was an Australian rice breeding population. Indica cultivars do not grow in temperate Australia because the day length is not suitable and so indica cultivars are rarely used as parents. Therefore, one of the waxy gene (GBSSI) SNPs had a very low frequency (<0.05) which was confirmed by Sequenom resequencing (Chapter 6). The low frequency of SNPs in some genes can also be attributed to the negative selection pressure (purifying) imposed by breeders within this population. Massively parallel sequencing combined with LR-PCR ensured high sequence depth in terms of the number of candidate genes and number of samples at all studied loci (Pettersson et al., 48 2009). Among the numerous elements involved in the MPS, amplification efficiency and pooling strategy are the most important parameters. The error rate of BioRad iProofTM High-Fidelity DNA polymerase is low, 4.4×10-7, which is approximately 50-fold lower than normal polymerases, and the extension efficiency is high, 5-30s/kb, four times faster and thus makes the PCR faster, making it the enzyme of choice. However, establishment of an efficient long range PCR for large genomic regions can be costly and time consuming (Ingman and Gyllensten, 2008). To solve this problem, semi-long range PCR (SLR-PCR) which is generally more robust and saves time and cost of primers can be used. Preparing an optimal SLR-PCR will increase the performance of MPS and must be established before pooling genomic DNA samples. The error rate of the GA Illumina is reportedly about 0.5-1.0%. In this experiment 233 DNA samples were pooled. This means that discovery of only one SNP (variant) out of 233 will make the SNP frequency ~0.43% which is lower than the reported error rate. A two step pooling strategy and high coverage depth reducde the risk of false SNP detection. These two strategies significantly overcome the effect of GA Illumina platform error. Figure 1 displays the coverage for all loci, starting from 12000×. The raw data shows that the maximum coverage in some regions reaches to 240,000× (data not shown). This high coverage has significantly neutralised the error rate. Out et al., 2009 have discussed the correlations between allele frequencies, pool size, coverage depth and error rates in GA Illumina. They demonstrated that a coverage depth of 25000× would be enough for detection of SNP frequencies on or above 0.3%. Currently, there is much interest in applying the Illumina GA platform to targeted sequencing of specific candidate genes, particularly for finding SNPs in a large number of individuals in the targeted populations (Hodges et al., 2007). An incorrect pooling strategy is another important issue that may be encountered in generating and analysing data. DNA samples 49 from different individuals may not be amplified with the similar efficiency in PCRs, creating random bias. To rectify this issue, smaller pools (20) were tried which minimised the chance of biased amplification of target regions. Using this strategy, rare SNPs which occurred at a frequency lower than 1% were detected. Coverage is also critical. It is believed 20-fold coverage is sufficient for accurate SNP detection (Dohm et al., 2008). Even coverage is also highly desirable. Given this is the case, with average coverage of 90 ×, reasonable SNP data from the beginning of SSIIaH1 fragment should have been obtained, but only 18.5% of the observed SNPs in this region were validated. This can be attributed to the difficulties encountered in amplifying this high GC region. The main reason for different coverage patterns is still unknown (Mardis, 2008b). The highest peak was observed for positions 4885 of BEIIa with 239,019 × coverage. However, coverage only affects the accuracy of SNP frequency and not the number of discovered SNPs (Morozova and Marra, 2008). Ingman and Gyllensten (2008) studied the effect of different pooling strategies and coverage levels to evaluate SNP frequencies of pooled and un-pooled individuals in a ~17 kb region and they found that all SNPs, including low frequency (not under 0.4% ) can be detected at coverage levels above 500 ×. They suggested that for pooled PCR products, 50 × coverage would be sufficient for SNP frequencies on or above 4%. The very high coverage obtained enabled discovery of rare SNPs with frequencies lower than 0.5%. Sequencing errors are common in NGS and sequencing errors are easily confounded with low frequency SNPs if the minimum number of reads is too low (Futschik and Schlotterer, 2010). The high level of coverage for all candidate genes enabled us to recognise rare SNPs effectively. It has been demonstrated that allelic variation of amino acids and structure of proteins correlate with the effect of natural selection seen as an excess of rare SNPs which affect 50 actual phenotypes (Sunyaev et al., 2000). The distribution of genetic variation in 17 starch candidate genes indicates they have been selected for starch properties. The pressure of natural selection can significantly influence the extant pattern of genetic variation (Akey et al., 2002; Barreiro et al., 2008). In this study, the total polymorphism rate and distribution pattern indicate that the candidate genes have been subjected to selection by breeders, as some important genes with high impact on starch properties such as GBSSI and ISA1 have shown unusually low levels of polymorphism. Artificial selection during the breeding program has had a major influence on genetic variation of population studied. These changes in population structure mainly occur due to narrowing the gene pool, and changing the balance between genetic drift and population size during the breeding process. Appendices: Chapter 2 Appendix 1: Full list of discovered SNP/Indel is 17 studies starch related genes. Appendix 2: Full list of Australian breeding lines (population) and their pedigree information. Appendix 3: Target genes and sequence of gene-specific LR-PCR primers. Appendix 4: SNP/Indel distribution and short read co 51 CHAPTER 3 Bioinformatic tools assist screening of functional SNPs in plants: rice GBSSI as a model Summary Granule Bound Starch Synthase I (GBSSI) influences cereal grain quality and is one of the most important plant genes. Using GBSSI as a model, a number of different computational tools and programs were used to explore the functional SNPs and the possible relationships between genetic mutation and phenotypic variation. A total of 51 SNPs/indels were retrieved from databases, including three non-synonymous SNPs, namely those in exons 6, 9 and 10. Sorting Intolerant from Tolerant (SIFT) results showed that a candidate [C/A] SNP (ID: OryzaSNP2) in exon 6 (coordinate 2494) is most likely the most important non-synonymous SNP with the highest phenotypic impact on GBSSI. This SNP alters a tyrosine to serine at position 224 of GBSSI. Computational simulation of GBSSI with the Geno3D suggested this mutant SNP creates a bigger loop on the surface of GBSSI and results in a shape different from that of native GBSSI. A potential transcriptional binding factor site (TBF8) which has one [C/T] SNP [rs53176842] at coordinate 2777 at the intron 7/exon 8 boundary site according to Transcriptional Factor (TF) Search analysis might have an effect on regulation and function of GBSSI. Combining SNP mining data and in silico structural analysis of GBSSI is a computational pathway which can be applied for other plant genes. Introduction Single nucleotide polymorphisms (SNPs) are the most common and simplest type of genetic variation in organisms. SNPs occur at a frequency of approximately one in a thousand base pairs in the human genome (Brookes, 1999) and one in every 170 bp in rice (Yu et al., 2002). 52 Although SNPs can be found everywhere throughout the genome, such as gene promoter regions, coding sequences, and intronic sequences, most of them are probably located in intergenic regions, most of which are believed to be stable without any deleterious effect. The occurrence of human disease and evolution (Shastry, 2002), as well as many important traits in plants (Bryan et al., 2000; Kennedy et al., 2006; Edwards et al., 2007), can be attributed to the presence of SNP. SNPs can be categorized and named based on their location and function. For example, SNPs within the coding regions (cSNPs) of functional genes which introduce amino acid sequence variations are called non-synonymous SNPs (ns SNPs) and are of major interest. Those SNPs which occur in the coding sequences, but do not change amino acids are called synonymous SNPs. However, most SNPs occur in intronic regions. Study of these SNPs is also important because of their influence on gene expression which can occur through different molecular pathways such as changing regulatory elements, splicing patterns, up and down regulation of exonic splice enhancers (ESE), intronic splice enhancers (ISE) and so forth (El Sharawy et al., 2006). Understanding the functional effect of SNPs is a major challenge. SNPs that lead to a single amino acid substitution, stop codon or frame shift mutation are normally recognized as functional and are easily detected. An experimental-based approach can provide the strongest evidence for the functional role of genetic variations. Consequently, many different types of SNP assays have been applied for experimental prioritization of SNPs (Chen and Sullivan, 2003). However, owing to the lack of reliable genotype and phenotypic data, these experiments are not always easy to set up for characterizing the real effect of SNPs. For example, functional analysis of SNPs in important plant genes needs a segregating population or breeding lines, such as near isogenic lines (NILs) (Umemoto et al., 2008; Mikami et al., 53 2008). On the other hand, many genes may have a vast number of intronic SNPs that cannot be easily associated with in vivo variation of plant populations. Previous studies have focused on non-synonymous SNPs of human disease genes (George Priya Doss et al., 2008; Rajasekaran et al., 2008). Here, a plant-relevant computational pipeline was developed which covers most of important functional elements at the DNA level in addition to non-synonymous SNPs. The model gene chosen to identify and prioritize substitutions was Granule Bound Starch Synthase I (GBSSI), a major and well characterised gene affecting the amylose content of rice grain (Chen et al., 2004). Different computational algorithm tools including Sorting Intolerant from Tolerant (SIFT), Exonic Splicing Enhancer Finder (ESE Finder), Transcriptional Factor search (TF search) and Exonic Splicing Silencer Search (FAS-ESS), were used to prioritize the candidate SNPs most likely to affect the encoded protein and, subsequently, amylose content and rice grain quality. Materials and methods GBSSI encoding gene as a case study Granule Bound Starch Synthase (GBSSI) was analysed as an example to provide a guide for further confirmatory experimental studies for this or other genes. This gene is well known in most cereals for its affect on amylose content (Chen et al., 2008a), pasting properties (Chen et al., 2008b) and eating quality (Umemoto et al., 2008). Sequence alignment GBSSI coding DNA and mRNA sequences for Oryza sativa L. were downloaded from GenBank (http://www.ncbi.nlm.nih.gov/nuccore/297423). (GenBank locus number for genomic DNA is X65183.1 and NM_001063239.1 for mRNA.). 54 Nucleotide coordinates of 1765922 - 1769401 on chromosome 6 (LOC_Os06g04200) were extracted from the Rice Genome Annotation Project at Michigan State University (http://rice.plantbiology.msu.edu/LocusNameSearch.shtml). The total sequence lengths of 5035 bp, 1830 bp and 609 amino acids were recognized in genomic, cDNA and GBSSI protein, respectively. SNP dataset SNP dataset for the GBSSI gene (Figure 1) was retrieved from the NCBI database (Sherry et al., 2001) at (http://www.ncbi.nlm.nih.gov/sites/entrez?db=snp&TabCmd=Limits) for the relevant chromosome range (gene coordinates) and then checked with the SNP dataset in Oryza SNP Consortium (http://oryzasnp.plantbiology.msu.edu/cgibin/find_snps_in_genes.pl) by using the following TIGR gene ID: LOC_Os06g04200. The extra DNA length of 2 kbp from each end of the coding region was also searched for the possible existence of SNPs in the 3´ and 5´ UTRs. Final alignment was carried out by Sequencer 4.6 software (Ann Arbor, MI) and ClustalW2 (http://www.ebi.ac.uk/Tools/clustalw2/index.html) to identify exact location of SNPs in UTR or intronic/exonic regions. Five different functional classes of SNPs were selected to cover the entire gene region, as follows: (1) non-synonymous coding (2) intronic (3) coding synonymous (4) locus region (5) 5´and 3´ UTRs. Computational tools for SNP analysis Several computational software programs were applied to predict the actual or possible impact of SNPs on plant phenotypes, as follows: (1) UTRScan (2) TF Search (3) SIFT: Sorting Tolerant from Intolerant (4) GeneSplicer (5) SEE-ESE (6) FAS-ESS (7) Geno3D (8) PDB viewer and (9) RasMol. 55 3D Modelling of GBSSI and comparative study The native and mutant structure of GBSSI was modelled by Geno3D software (http://geno3d-pbil.ibcp.fr/cgi-bin/geno3d_automat.pl?page=/GENO3D/geno3d_home.html). This program predicts 3D structures of proteins and enzymes based on amino acid gene sequences. This program is capable of extracting 3D structures of very similar proteins from different databases (specifically, PDB) and then modelling the query sequence using available structure, which, for the GBSSI gene, has the PDB identification number of 3D1J. The modelled structure can be validated by PROCHECK (Laskowski et al., 1993). Table 1. List of SNPs extracted from dbSNP and OryzaSNP consortium sorted by chromosome base position from 5´ UTR region. SNP No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 SNP ID rs20225948 rs53176842 rs54208215 rs52851700 rs53723422 rs53846809 rs53666816 rs53675396 rs54192177 rs53646676 rs54248596 rs54113008 rs53274252 rs54167626 rs53460774 rs53001101 rs53728166 rs53922988 rs54148480 rs53616087 rs54179168 rs54262002 rs53810836 rs53561561 rs53120489 rs52893679 rs53682342 rs54228853 rs54192742 rs53932573 rs53273159 rs53746900 SNP/indel [A/C] [C/T] [G/T] [A/T] [A/G] [A/T] [A/C] [C/T] [-/ATAT] [-/T] [-/C] [A/G] [C/T] [C/T] [C/T] [A/G] [-/A] [A/T] [C/T] [A/G] [C/T] [A/T] [A/G] [C/T] [A/C] [C/T] [-/TT] [-/TTC] [-/G] [G/T] [A/G] [A/T] *IUPAC Nucleotide symbol *Symbol M Y K W R W M Y in/del in/del in/del R Y Y Y R in/del W Y R Y W R Y M Y in/del in/del in/del K R W Position 5´ UTR Intron7 Exon8 Intron8 Intron8 Intron8 Intron8 Intron8 Intron8 Intron8 Intron8 Intron8 Intron8 Intron8 Intron8 Intron8 Intron8 Intron8 Inton8 Exon9 Exon9 Exon9 Exon9 Exon9 Intron9 Intron9 Intron9 Intron9 Intron9 Intron9 Intron9 Intron9 56 †Coordinate (bp) 157 2777 2845 2902 2910 2912 2913 2927 2931 2958 2971 2973 2983 2987 2989 2992 2994 3004 3007 3023 3056 3059 3065 3235 3264 3266 3268 3269 3280 3282 3285 3291 Functional class UTR Intronic Coding-synonymous Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Coding-synonymous Coding-synonymous Coding-synonymous Coding-synonymous Coding-ns Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic † Coordinates from the beginning of genomic DNA(Genbank accession: X65138). Table 1. Continued SNP No SNP ID SNP/indel *Symbol Position 33 34 35 36 37 38 39 40 41 42 43 44 45 rs53843894 rs54317579 rs54396762 rs53235554 rs53562846 rs53208426 rs53304553 rs53124355 rs53134907 rs54228170 rs53693780 rs54363195 rs53781551 [-/C] [A/C] [G/T] [-/GA] [-/C] [A/T] [-/GAA] [A/G] [A/G] [A/C] [A/C] [A/T] [G/T] in/del M K in/del in/del W in/del R R M M W K Intron9 Intron9 Intron9 Intron9 Intron9 Intron9 Intron9 Intron9 Intron9 Intron9 Intron9 Intron9 Exon10 †Coordinate (bp) 3292 3294 3295 3297 3298 3300 3311 3350 3352 3357 3360 3365 3422 46 47 48 OryzaSNP1 OryzaSNP2 OryzaSNP3 [G/T] [C/A] [C/T] K M Y Intron 1 Exon 6 Exon 9 246 2494 3212 49 50 51 OryzaSNP4 OryzaSNP5 OryzaSNP6 [A/-] [G/-] [C/T] in/del in/del Y Intron 9 Intron 9 Exon 10 3304 3309 3486 Functional class Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Codingsynonymous Intronic Coding-ns Codingsynonymous Intronic Intronic Coding-ns *IUPAC Nucleotide symbol † Coordinates from the beginning of genomic DNA(Genbank accession: X65138). Figure 1: The structure of GBSSI encoding gene. Blue boxes represent exons. 57 Distribution of SNPs in Waxy gene intronic 78% UTR csSNPs intronic nsSNPs csSNPs 14% UTR 2% nsSNPs 6% Figure 2. Distribution and percentage of SNPs in GBSSI encoding gene. csSNP: coding synonymous; nsSNPs: coding non-synonymous SNPs; UTR: 5´ Untranslated regions Comparative studies were also carried out by Swiss PDB viewer (http://spdbv.vitalit.ch/download.html) and RasMol (http://openrasmol.org) based on superimposed structure and homology analysis of native and mutant protein (Rajesh et al. 2008). Functional flow chart A flow chart was prepared for computational analysis and prioritization of SNPs based on their functionality and possible impact on plant phenotypes (Figure 3). 58 Figure 3. Computational pipeline for in silico analysis of functional SNPs. 59 Results SNPs in GBSSI and comparative study A total of 51 SNPs and in/dels, were extracted from databases consisting of the following: one in the 5´ UTR, three in coding non-synonymous, seven in coding synonymous and forty in the intronic sequences (Table 1) (Figure 2). Computational algorithm tools The following computational tools were used consecutively for comprehensive functional analysis of the GBSSI encoding gene: UTR Scan UTR Scan (http://www.ba.itb.cnr.it/BIG/UTRScan/) identifies patterns of regulatory region motifs from the UTR database and gives information about important elements in the 5´ and 3´ UTRs, including whether the matched pattern is damaged (Pesole and Liuni 1999). One regulatory element was found in the 3´UTR. No functional element was recognized by UTR Scan in the 5´ UTR region of the GBSSI gene. One ´A/C´ SNP [rs20225948] was found in the non-regulatory element of 5´UTR. Since the number of SNPs in the UTR regions of the GBSSI gene was limited to one only, with none in the regulatory regions, it may be presumed that GBSSI UTRs do not change protein expression level. TF Search Two of the most important functional elements in plant genomes are transcriptional factors (TFs) and transcriptional factor binding sites (TFBSs). These sites are usually short DNA sequences, around 5-15 bp, where the TF elements bind to them to begin transcriptional 60 process involving RNA polymerase and the promoter. Occurrence of any mutation in these regions can alter motifs and possibly transcriptional patterns (Bulyk, 2004). Table 2. Transcriptional factor binding sites in GBSSI gene as distinguished by TF Search program No TFBs Sequence Coordinates (gDNA) Score *SNP/Indel 1 TTCTAATTATTTGA 560-573 86.2 N/A 2 TCCAACCAA 741-749 85.1 N/A 3 GCGGTCGGT 1591-1599 85.1 N/A 4 GAGGTAGGA 1939-1947 86.0 N/A 5 ATGGTTGGA 2358-2366 85.6 N/A 6 AGCTACCTG 2639-2647 86.5 N/A 7 AACTACCAG 2654-2662 87.4 N/A 8 CAGGTTGCT 2777-2785 85.6 C/T 9 TCCTACCAG 2804-2812 89.6 N/A 10 TCAAATAATTAGAA 3652-3665 86.2 -/TAA 11 TCATTGTTAAATAT 4566-4579 86.8 N/A 12 ATATTTAACCAAAT 5198-5211 86.8 N/A *SNP/indels which can be involved on TFBs and their function. Various experimental and computational approaches have been used to identify genomic locations of transcription factor binding sites, particularly in higher eukaryotic genomes (Sinha and Tompa, 2002). This algorithm tool is capable of recognizing transcriptional binding sites of genes if a non-coding SNP alters the transcription factor binding site of a gene (Heinemeyer et al., 1998). This program can be reached at: (http://www.cbrc.jp/research/db/TFSEARCH.html). A total of twelve TFBSs were recognized in the GBSSI gene. The scoring scheme is very straightforward in this version of TF Search (v 1.3), ranging from 85.1 to 89.6, where the highest score is associated with the importance of TFBSs. A [C/T] SNP (rs53176842) was found in TFBS number 8 (TFB8) which begins from position 2777 at the junction of intron 7 and exon 8 (end of intron 7 and beginning of exon 8) (Table 2). This [C/T] SNP at position 2777 (5´ end of TFBS8 sequence) potentially has the highest impact on transcriptional factor 61 binding sites in this gene. It should be noted that, in human, the strongest selective pressure was detected for proteins involved in transcription regulation (Ramensky et al., 2002). SIFT (Sorting Tolerant from Intolerant) Each amino acid substitution has the potential to affect protein function. SIFT is a Web-based program which predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids (Ng and Henikoff, 2003). SIFT focuses on non-synonymous SNPs and can be applied to spontaneous occurrence and laboratory-induced point mutations. SIFT is based on the premise that important amino acids will be conserved among sequences in a protein family. As a consequence, changes at amino acids conserved in the family should affect protein function (Ng and Henikoff, 2002). There is a standard tolerance index of ≥ 0.05 in this program, and a separate index/value is devoted to each amino acid position. The values above this threshold (gnomon) are assumed to have lower impact on plants. The lower values (<0.05) are indexed as important, with higher phenotypic impact. In fact, amino acids with probabilities < 0.05 and >0.05 are predicted to have higher and lower impact, respectively. This program can be found at: http://sift.jcvi.org/www/SIFT_seq_submit2.html. Of 51 investigated SNPs, two have already been verified to be non-synonymous with functional impact on waxy protein (in exon 6 and 10) (Chen et al., 2008 a, b; Larkin and Park, 2003). The impact of other SNPs, in exon 9, has not been characterized and its functional impact needs to be confirmed by association analysis (Table 1 and 3). These nsSNPs at positions 2494, 3486 and 3235 can cause functional mutations at amino acids 224, 415 and 370 respectively (Table 3). Among these, the tyrosine to serine (Y→S) substitution at amino acid 224, which arose from C/A SNP at exon 6, has the highest impact on GBSSI (Figure 4) and corresponds to lowest possible SIFT score, less than 0.05 (Table 3). 62 These SNPs were predicted to be tolerated by SIFT analysis with index of 1.00 that indicates these SNPs have smaller effects such as reduction (or increase) of amylose content and endosperm quality. SNP No 47 Table 3. nsSNPs predicted to have functional significance by SIFT. SNP ID Position OryzaSNP2 24 51 Amino acid substitution Y→S SIFT Score (tolerance) *0.00 Impact on protein † C/A Coordinate (protein) 224 High Confidence of prediction High 3235 C/T 370 A→V 1.00 Low High 3486 C/T 415 P→S 1.00 Low High SNP Exon 6 Coordinate (gDNA) 2494 rs53561561 Exon 9 OryzaSNP6 Exon 10 * Substitutions with scores under of 0.05 are predicted to be Not Tolerated † The confidence of predictions have been calculated based on default median conservation value of 3.0 GeneSplicer Splicing is post-transcription modification of RNA in which introns are removed and exons are joined to form mature mRNA. Splice sites SNP or in/dels may lead to a truncated or mutant protein (El Sharawy et al., 2006). GeneSplicer (http://www.cbcb.umd.edu/software/GeneSplicer/gene_spl.shtml) is a computational tool which predicts splice sites in DNA sequence (Pertea et al., 2001). Although splice sites in the GBSSI encoding gene have been identified, this tool was used to identify the exact location of exon-intron boundaries and the possible existence of SNP/indel in unusual donor-acceptor sites which might change GBSSI structure. A maximum 2 bp deviation was found in predicting splicing sites by this software. This deviation probably results from the tendency of this software to recognize alternative splicing patterns. Only one putative SNP was recognized at the exon1-intron1 junction at position 246. The functional effect of this SNP on waxy protein has already been reported (Cai et al., 1998). 63 SEE ESE (Sequence Evaluator of Exonic Splicing Enhancers) Exonic Splicing Enhancers (ESE) are prevalent in plant sequences and normally promote exon recognition and inclusion. These sequences have been identified in several plant genes and reside at variable distances from splice sites. Although such splicing enhancers have been identified in both exons and introns, exon splicing enhancers are generally better characterized and are probably more common. Table 4. Exon/intron boundaries in rice GBSSI gene recognized by GeneSplicer and existence of possible SNPs/Indels ID 5´ Donor Acceptor 3´ Confidence Exon/Intron1 Exon/Intron2 Exon/Intron3 Exon/Intron4 Exon/Intron5 Exon/Intron6 Exon/Intron7 Exon/Intron8 Exon/Intron9 Exon/Intron10 Exon/Intron11 Exon/Intron12 Exon/Intron13 Exon/Intron14 120 1408 1861 2048 2245 2433 2593 2781 3016 3372 3789 4086 4285 4772 246 1749 1942 2150 2335 2502 2692 2896 3256 3550 3981 4174 4416 4889 High High High High High High High High High High High High High High *The confidence score must be higher than 12 *Confidence score 14.09 12.22 14.45 13.75 16.98 16.79 18.47 21.16 19.92 14.13 18.43 14.07 20.53 23.96 Deviation (bp) 0 0 1 1 0 2 1 2 0 0 1 1 0 1 SNP/Indel Position [G/T] 246 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Most of the ESE candidates are hexamers, and the most important candidates are highlighted by this software whenever they overlap the three 9-mers, GAAGAAGAA, CGATCAACG and TGCTGCTGG, which have been found to be very effective ESEs in plants (Tacke and Manley, 1999). Occurrence of SNPs in these regions may generate aberrant mRNAs that are either unstable or code for defective, truncated or deficient protein isoforms. Sequence Evaluator for ESEs (SEE ESE) (http://www.cbcb.umd.edu/software/SeeEse/index.html) was applied to locate conserved motifs represented by these hexamers in exonic regions near splice sites in GBSSI genes. 64 Although a total of 17 potential ESE motifs were found, no SNP/indels were distinguished in these regions. FAS-ESS (Systematic identification and analysis of exonic splicing silencers) Exonic splicing silencers (ESSs) are cis-regulatory sequences (elements) in exons or introns that either inhibit the use of adjacent splice sites, often contributing to alternative splicing (AS), or promote exon skipping. Bioinformatics analyses suggest that these ESS motifs play important roles in suppression of pseudo-exons, in splice site definition, and in AS (Wang et al., 2004). Table 5. List of Exonic Splicing Silencers in GBSSI gene recognized by FAS-SEE program ESS ID *ESS Coordinates ESS Sequence SNP ID *SNP Position SNP ESS1 244-250 AGGTATA OryzaSNP1 246 [G/T] ESS2 2492-2497 TTATGG OryzaSNP2 2494 [C/A] ESS3 2986-2992 TCGTTCA rs54167626 2987 [C/T] ESS4 2986-2992 TCGTTCA rs53001101 2992 [A/G] ESS5 3061-3066 TCCTGG rs53810836 3065 [G/A] * All coordinates and position calculated based on GBSSI genomic DNA (Genbank accession number X65183) Underline-highlighted nucleotides show the position of SNPs. FAS-hex2 set (http://genes.mit.edu/fas-ess/) found of 113 predicted domains, 77 were located in SNP high density regions which are exons/introns 8, 9 and 10. FAS-ESS analysis identified five exonic splicing silencer sites in the GBSSI gene (Table 5). The most important silencer elements are probably ESS2, ESS1 and ESS3, respectively, because they contain SNPs in coding regions or exon-intron splicing sites. ESS2 in exon 6 that has a putative nonsynonymous [C/A] SNP (ID: OryzaSNP2) which is responsible for a Y→S change which may have significant effects on GBSSI protein characteristics. 65 Simulation for finding functional, constructive changes of ns-coding SNPs Protein secondary and tertiary structure of molecules can alter function and activity (Fersht, 1985; Hrmova and Fincher, 2001). Modelling of GBSSI is the final stage of functional analysis. These computational algorithms can recognize the impact of different nsSNPs by simulation and comparison of native and mutant molecular structures. Results from SIFT analysis were included at this level. Out of three SNPs which were non-synonymous, only ´C/A´ SNP at exon 6 [OryzaSNP2] was recognized as important and the other two SNPs in exons 9 and 10 were found to have lower impact by SIFT analysis (Table 3). This nsSNP is associated with the Y→S amino acid substitution at position 224 of GBSSI. The 3D modelling was performed by Geno3D. This software identifies the most similar structure to the query amino acid sequence and simulates a 3D protein automatically. Based on a Geno3D search of different protein databases, the structure for GBSSI has a PDB id: 3D1J. The 3D1J is a glycogen synthase, and the 3D crystal structure of this protein has been elucidated in E. coli (Buschiazzo et al. 2004; Sheng et al. 2009). It is thought that synthesis of storage polysaccharides in bacteria and plants is fulfilled by a similar ADP glucose-based pathway (Ball and Morell, 2003). The exact location of the exon 6 SNP was detected by SWISS PDB viewer. Figures 5a and 5c show the exact position of this high impact substitution in the modelled protein and the Y→S substitution at residue 224 may change the shape, stability and, in turn, the activity of protein (Figures 5b and 5d). 66 Figure 4. ClustalW2 alignment of native (WxN) and mutant (WxM) GBSSI. Y→ S, A →V and P→S substitutions were found at residues 224, 370 and 415 corresponds to SNPs number 47, 24 and 51; at exons 6, 9 and 10, respectively. SIFT analysis found the [C/T] SNP at exon 6 had the highest impact on protein structure. 67 Discussion Computational algorithms are useful and cost-effective tools for analysis of SNPs and genes. Since the emergence of new high-throughput technologies to sequence the whole genome of plants (Henry, 2008), it is not possible to recognize all functional SNPs in a pool of sequencing data which contains neutral SNPs. Assessment of functional SNPs can be performed by phylogenetic comparison (George Priya Doss et al., 2008), such as the study of statistical correlation with residue substitution. Recently, SNP-linkage disequilibrium and association studies, which need accurate phenotypic data of appropriate populations, have gained acceptance as procedures to assess functional SNPs (Carlson et al., 2003). However, these populations can be difficult to generate (Gupta et al., 2005), and they must have high variation in the studied traits. The efficiency of computational tools for identification of functional SNPs in human cancer-related genes such as BRCA1and BRCA2 has been reported by a number of authors (Rajasekaran et al., 2007; Rajasekaran et al., 2008). Shen et al. (2006) demonstrated application of in silico analysis tools like SIFT, Polyphen and UTRScan to recognize SNPs in a cytokine gene that has a known role in human, immune-related diseases. Based on the success of the latter, a computational screening pathway to prioritize and rank plant SNPs to recognize their functionality and impact on plant phenotypes was developed. The results here show there are significant numbers of important elements in the GBSSI gene and the SNPs have been found within these region and these correlations have already been demonstrated by Soussi et al. (2006). Four of these elements were found to have the highest functional effects and these effects appeared to result from the existence of SNPs in these regions. The non-synonymous [C/A] SNP at exon 6 [OryzaSNP2], for example, has the most significant effect on amylose content according to SIFT and this has been previously shown to be the case experimentally (Chen et al. 2008 b). 68 (a) Tyrosine (b) Tyrosine (d) (c) Serine Serine Figure 5. The 3D molecular modelling of GBSSI generated by Geno3D and viewed by Swiss PDB viewer and RasMol software. Arrows indicate the exact location of Y→S (tyrosine to serine) substitution at residue 224 derived from the most important SNP [C/A] in waxy protein gene at exon 6 (OryzaSNP2). (5a and 5b) The native GBSSI protein contains ´A´ nucleotide. (5c and 5d). The mutant protein carries ´C´ instead of ´A´. The arrows in Figures 5b and 5d also indicate a significant difference in the structural loop that occurs in the substitution region. SIFT assumes important amino acids will be conserved in the protein family, and so changes at well conserved, charged or polar residues are predicted to be high impact, or to affect protein function. If a position in an alignment contains hydrophobic amino acids, then SIFT assumes this position can only contain or tolerate amino acids with hydrophobic character for low level effect on protein function and these can be prioritized by SIFT score. 69 The quantitative score of SIFT allows prioritisation of the amino acid changes and to rank the possible functional effects. An important feature of this algorithm is the confidence value. Confidence in a high impact predicted substitution depends on the diversity of the aligned protein sequences and how the sequences are closely related. Therefore, many amino acid residues will become conserved and SIFT will predict most of the substitutions to affect the function of protein which leads to a high false positive or negative error. In fact, a number of functionally neutral substitutions are predicted as high impact or vice versa (false negative effect). To alert the user to these situations, SIFT calculates the median conservation value which measures the diversity of the sequences in the alignment. In SIFT, the conservation is calculated for each position in the alignment and the median of these values is defined. By default, SIFT builds alignments with a median conservation value of 3.0. Predictions based on sequence alignments with higher median conservation values are less diverse and will have a higher false positive error (Ng and Henikoff, 2003). As the default median conservation value of 3.0 and aligned few available homologous sequences was used, the highest possible confidence of SIFT in this study was simply predicted (Table 3). Based on SIFT analysis the [C/A] SNP at exon 6 [Oryza SNP2] is located at a conserved region and is a charged or polar residue. Larkin and Park (2003) found two other coding SNPs, ´C/T´ [OryzaSNP3] and ´C/T´ [OryzaSNP6] at exons 9 and 10 of GBSSI, which have non-functional and functional effects, respectively (Table 3 and 1). They also verified that haplotypes composed of SNP at the exon/intron1 boundary site, exon 6 and exon 10 regulate the GBSSI function. Chen et al. (2008 a, b) have also confirmed that these SNPs can alter the apparent amylose content and pasting properties of rice. Since the [C/A] SNP at exon 6 [OryzaSNP2] had the highest possible impact on GBSSI, for both native (Y) and mutant (S) GBSSI proteins (Y→ S = Tyrosine → Serine at residue 224) 70 were simulated (Fig 5). The superimposed structure of two proteins showed a distinctive deformed loop at the mutation position in comparison with native structure. This deformed loop is located at the outer layer (surface) of GBSSI and alters the 3D shape, structure and function of protein, possibly owing to change in the accuracy of the protein binding site. Sequence similarity confers structural similarity (Chothia and Lesk 1986; Hegyi and Gerstein, 1999), but unfortunately, the relationship between a protein’s sequence similarity and functional similarity is not straightforward (Bork and Koonin, 1998). Exonic Splicing Silencers have relatively major effects on splicing pattern by recognition of splicing sites (Wang et al., 2006). Application of FAS-ESS suggested OryzaSNP2 also has silencing action because it is located in an exonic splicing silencer region although there are no reports of experimental evidence of alternatively spliced mRNAs or altered protein size in rice plants with regard to this SNP (Table 5). The effect of the [C/A] SNP at exon 6 on amylose content and grain quality has been confirmed by many authors (Sano, 1984; Larkin and Park, 2003; Chen et al., 2008a). FASESS analysis has also suggested another important silencer (ESS1) at the splice site of exon/intron 1 which has a [G/T] SNP [OryzaSNP1]. Significance of this SNP to reduce amylose content has already been reported by experimental analysis which found this SNP decreases activity of GBSSI by alteration of the mRNA splice site (Cai et al., 1997). Application of TF has also identified an important TBFS, including one [C/T] SNP [rs53176842] at coordinate 2777, which may potentially have major effect on GBSSI function. Conclusion Most genetic analysis software has been designed for human or animal genetic studies. Application of a number of programs allowed construction of a computational pathway for 71 SNP analysis in plants. There is a significant relationship between in silico and experimental results, thus confirming that computational tools can help identify and characterize functional SNPs. Following Transcriptional Factor (TF) Search analysis, a new [C/T] SNP [rs53176842] at coordinate 2777 near the boundary site of intron 7/exon 8 was predicted which may have a major impact on GBSSI and related phenotypes. 72 CHAPTER 4 SNP in starch biosynthesis genes associated with the nutritional and functional properties of domesticated rice Summary Starch is a major component of human diets. The physio-chemical properties of starch influence the nutritional value of starch and the functional properties of starch containing foods. Many of these traits have been under strong selection in domestication of rice as a food. A population of 233 breeding lines of rice was analysed for variation at 110 functional SNP loci in exonic regions of 18 starch-related genes and the results related to rice pasting and cooking quality. Associations of 65 functional SNPs were detected. Five genes AGPL2a, Isoamylase1, SPHOL, SSIIb an SSIVb showed no polymorphism. The GBSSI (waxy gene) and SSIIa had a major influence on starch properties and the other genes had minor associations. The ´G/T´ SNP at the boundary site of exon/intron1 in GBSSI showed the strongest association with retrogradation and amylose content. The TT allele has been selected in much of the domesticated japonica genepool providing rice with a desirable texture but less resistant starch with associated human health advantages. The GC/TT SNP at exon 8 of SSIIa showed a very significant association with pasting temperature (PT), gelatinization temperature (GT) and peak time. No significant association was found between SSIIa and retrogradation. Other genes contributing to retrogradation were SSI, BEI and SIIIa. The highest level of polymorphism was observed in SSIIIa with 22 SNPs but only limited associations were observed with starch phenotypic values. None of the SNPs were found to be strongly associated with chalkiness except for a weak link with a ´T/C´ SNP at position 960 (Thr482 to Ala) in Isoamylase2. These associations provide new tools for deliberate selection of rice genotypes for specific functional and nutritional outcomes. 73 Introduction Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation. Many important plant traits and human genetic diseases are attributed to these sequence variations (Shastry, 2002; Bryan et al., 2000) either through influencing gene expression or protein function (Kennedy et al., 2006). Identifying SNP associated with grain starch quality advances our understanding of the starch bio-synthesis pathway and highlights potential ways to generate crops with higher yield and better quality, which directly impacts human nutrition and health. Starch is mainly composed of amylose and amylopection (Miles et al., 1985). Seven classes of starch related enzymes with high impact on grain starch structure and quality are known, including ADP-glucose pyrophosphorylase (AGPase), granule bound starch synthase (GBSS), starch synthase (SS), branching enzyme (BE), debranching enzyme (DBE), starch phosphorylase (PHO) and glucose phosphate translocator (GPT in chapter 5). These genes/enzymes contribute directly or indirectly to the production of starch granules composed of amylose and amylopectin. The Rapid Visco Analyser (RVA) is one the most important means of measuring grain quality parameters (Limpisut and Jindal, 2002). Using 43 gene-specific molecular markers, Yan et al. (2010) analysed the association of 17 starch synthesis-related genes with RVA profile parameters in a collection of 118 glutinous rice accessions. They found that 10 of 17 starch-related genes are involved in controlling RVA profile parameters with be most significant being Pullulanase which plays an important role in the control of peak viscosity (PKV), hot paste viscosity (HPV), cool paste viscosity (CPV), breakdown viscosity (BDV), peak time (PKT), and paste temperature (PT) while seven other starch genes had minor impacts on a few RVA profile parameters. The RVA parameters are controlled by a complex 74 genetic system involving many starch-related genes (Tester et al., 1995). This complexity can be attributed to many factors such as genetic, epigenetic, environmental and G×E interaction in studied population (Tester et al., 1995; Morell, 2003). Granule bound starch synthase (GBSSI) is the most important starch synthesis gene in rice and other cereal grains. A number of SNPs in rice GBSSI (waxy gene), at the intron/exon 1 junction site, exon 6 and exon 10, have a significant impact on starch quality (Chen et al., 2008a, b; Cai et al., 1998; Larkin et al., 2003) via their impact on amylose content. Starch synthase IIa (SSIIa) has a major affect on starch quality through its impact on amylopectin structure (Craig et al., 1998; Morell, 2003). The effect of this gene on cooking quality and starch texture has been extensively studied as measured by alkali spreading values, gelatinisation temperature (GT) and eating quality of rice starch by polymorphism of two SNPs, [A/G] and [GC/TT] in alk, the gene which codes for SSIIa (Umemoto et al., 2008; Umemoto et al., 2004; Umemoto and Aoki, 2005; Waters et al., 2006). Except for GBSSI and SSIIa., there are no reports of associations between SNP and starch quality parameters with most studies focusing at gene level rather than SNP levels by undertaking comparisons of gene-deficient mutants (Fujita et al., 2006). Massively parallel sequencing (MPS) technology is a flexible and high-throughput platform for genetic analysis and functional genomics which is based on ultra deep sequencing of short read lengths and a huge number of sequencing reactions (Imelfort et al., 2009). KharabianMasouleh et al. (2011) discovered more than 501 SNPs and 113 In/dels in 17 starch related genes in an Australian rice breeding population using a combination of a target-pooled long range PCR and MPS approach, clearly indicating the capacity of high-throughput MPS technology to discover new SNP variants in plant populations. This technology can be used in combination with multiplexed-MALDI-TOF (Sequenom) to quickly identify genetic variation within plant populations and then assign this variation to individual plants and 75 phenotypes, improving the efficiency of marker assisted selection (MAS) (Masouleh et al., 2009). Resistant starch has a significant impact on human health (Sajilata et al., 2006). The incomplete digestion-absorption of non-digestible resistant starch in the small intestine leads to starch fractions with the physiological functions similar to dietary fibre with significant beneficial impact (Asp and Björck, 1992). Starch retrogradation describes the hardening of cooked starch after cooling due to re-crystallization of gelatinized starch components (Fan and Marks, 1998). There is a significant association between retrograded and resistant starch and hence, in this study the term retrograded-resistant starch is used. The in vivo digestion ability and structural features of resistant-retrograded starch with high amylose content in maize, bean and potato flakes were assessed using the ileal contents of four human populations (Faisant et al., 1993). The main resistant starch fraction consisted primarily of retrograded amylose with degree of polymerization of approximately 35 glucose units and a melting temperature of 150 °C. Likewise, retrograded amylose in peas, maize, wheat, and potatoes was found to be highly resistant to amylolysis (Ring et al., 1988), suggesting this fraction had high amylose content. Other characters which may an influence on the rate of retrogradation, firmness and resilience of rice starch after cooking are protein and lipid contents (Philpot et al., 2006). High amylose rice cultivars are characterized by low RVA parameters, high resistant starch (RS) content and lower estimated glycemic index (EGS) and highly retrograded rice starch tends to a reduction of hydrolysis index (HI) and glycemic index (Hu et al., 2004). Waxy and low amylose rice starch is more quickly and completely hydrolysised relative to intermediate and high amylose rice (Chung et al., 2006; Hu et al., 2004). In this study a novel SNP in Glucose-6- phosphate translocator 1 (GPT1) gene which is highly associated with amylose content and retrogradation rate of resistant starch (Chapter 5) 76 is reported. In addition, an explicit-coherent one by one gene approach is established to unveil association of 18 starch-related genes and their SNP polymorphisms with physiochemical properties of rice starch. Materials and methods Plant materials Plant material was supplied by Industry and Investment NSW, Yanco Agricultural Research Institute, Australia. A population of 233 F6 lines selected from in the Australian temperate (japonica-type) rice breeding program (Appendix 5). Selection was primarily based on capacity to flower and set seed and the morphological traits of plant height, grain size and shape. No selection took place for quality traits such as gelatinization temperature and RVA curve characteristics. Physiochemical properties In total, 13 physiochemical traits including four phenotypic and RVA characteristics were measured. The phenotypic traits consisting of apparent amylose content (AC), gelatinization temperature (GT), grain chalkiness and retrogradation rate [scored by the Martin test (Philpot et al., 2006)], were quantified according standard methods. RVA characteristics such as peak viscosity (PKV), trough viscosity (TV), final viscosity (FV), breakdown, setback, peak time (PKT) and pasting temperature (PT) were measured by a Rapid Visco Analyser (Model, City, country) according to the manufacturer’s instructions. 77 Designation of starch-synthesis genes involved in starch metabolize The available literature was used to identify the most likely candidate genes associated with rice starch quality (Ohdan et al., 2005; Waters and Henry, 2007; Nakamura, 2002; Hirose et al., 2006; Rahman et al., 2000). The genetic map and approximate location of genes on chromosomes are shown in Appendix 8. The general entries of nucleotide sequences (gDNA) and full-length cDNAs of important gene classes which are involved in starch biosynthesis were retrieved from the NCBI (http://www.ncbi.nlm.nih.gov/) and the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/cgi-bin/putative_function_search.pl) databases and resequenced using long range PCR and massively parallel sequencing (Illumina® GAII) to find novel SNPs/Indels in the studied population (Kharabian-Masouleh et al., 2011). Amplification primers were designed based on consensus sequence alignment of each candidate gene. Candidate genes/enzymes for SNP genotyping In total, eighteen genes representing seven groups of enzymes, namely ADP-glucose pyrophosphorylase (AGPase), granule bound starch synthases (GBSSI and GBSSII), starch synthases (SSI, SSIIa, SSIIb, SSIIIa, SSIIIb, SSIVa, SSIVb), branching enzymes (BEI, BEIIa, BEIIb), debranching enzyme (ISA1, ISA2, Pullullanase), starch phosphorylase (SPHOL) and glucose phosphate-6- translocator (GPT1) (in Chapter 5) were selected for SNP genotyping. SNP dataset SNP for each gene was retrieved from the previous SNP discovery approach (KharabianMasouleh et al., 2011). The total number of functional polymorphisms discovered in the 78 population was then compared to SNPs available at OryzaSNP MSU database (http://oryzasnp.plantbiology.msu.edu/) and 59 extra SNPs harvested to ensure that all known non-synonymous SNP (nsSNP) were assayed. In total, 110 nsSNPs with possible functional effects (amino acid change) were chosen for primer design and genotyping, of which 65 were successfully genotyped with different status as polymorphic or non-polymorphic (Appendix 6). In total, 45 SNPs/primer sets either failed to genotype individuals or did not exist in the population (such as those extracted from OryzaSNP databases) and therefore disregarded in the analysis. Primer design and SNP genotyping Several multiplexed assays were designed by Sequenom® MassARRAY® Assay design 3.1 software to cover all available SNPs. The optimal amplicon size containing the polymorphic site in the software was set to 80–120 bp. A 10-mer tag (5-ACGTTGGATG-3) was added to the 5′end of each amplification primer to avoid confusion in the mass spectrum and to improve PCR performance (Masouleh et al., 2009). Capture PCR protocol, primer extension and mass spectrometry The steps of capture PCR primer extension, resin cleanup and mass spectrometry were undertaken according to the manufacturer’s (Sequenom® MassARRAY®) instructions. Association analysis Assays were constructed for 110 polymorphisms defining each of the alleles of 18 genes controlling starch quality traits and retrogradation. SNP calls data of genotyped polymorphic alleles along with phenotypic data then transferred into TASSEL v2.1 (Bradbury et al., 2007) software to find SNPs association with physiochemical properties. A gene by gene approach 79 was employed to understand association of each individual gene/SNP with target traits. A comprehensive association analysis including all significantly associated SNPs with starch properties was accomplished. The latter analysis shows the impact of significant SNPs or starch quality traits when combined to one another, representing the possible compensating or balancing effects of polymorphism on final phenotypic value of individuals. Statistical parameters Some critical statistics such as F-test, p-value, adjusted p-value and R2 were calculated to measure associations, while the per-mutation was set to 1000. Results 65 SNP-assays were designed and 233 individuals genotyped. To avoid complications of association study, a gene by gene approach was applied to find the impact and possible linkage of individual genes on physiochemical and quality-related properties of rice grain. Appendix 6 and 7 represent the identification code; coordinate an association of all studied SNPs. AGPS2b (small subunit) No functional polymorphism was found in AGPS2b suggesting that this gene does not have any impact on physiochemical properties of grain quality in this population. 80 SPHOL (alpha 1,4 glucan starch phospholrylase) Five SNPs was retrieved from OryzaSNP database and genotyped, but no functional polymorphism recognised in this population at all. Therefore, this gene had no effect on studied traits. GBSSI (Granule bound starch synthase I) This gene is the most important gene involved in starch synthesis of rice and other cereal grains. Association study showed a strong correlation between WAXYEXIN1 (G/T) SNP at the junction site of Exon1/Intron1 and RVA curve characteristics such as Peak Viscosity (PKV) and Breakdown. The highest F-value=223.29 in this experiment was observed for this SNP which shows a significant link to retrogradation rate (Martin test) and amylose content (F-value=121.52). The R2 value for retrogradation and amylose content were 0.66 and 0.51, respectively. The second SNP in GBSSI with association on grain properties was WAXYEX10. This ´C/T´ SNP at coordinate 3486 of gene creates a P→S substitution and has a very significant association with Trough, Final Viscosity (FV), set back, retrogradation and amylose content with lower linkage than WAXYEX10. The R2 value for retrogradation and amylose content was 0.39 and 0.16, respectively. The latter SNP, WAXYEX6 also revealed some significant association according to calculated p-values≤ 0.01 but did not show any remarkable F and R2 values which suggest small control of critical pasting properties. In total, the results indicate that this gene can solely interpret a significant portion of producing retrograded-resistant starch in rice. Section 3 of Appendix 7 shows a comprehensive result of association study for GBSSI genes. The data suggest that SNPs WAXYEXIN1 and WAXYEX10 are closely contributing to one another, while WAXYEX6 has less value in controlling starch properties. 81 GBSSII (Granule bound starch synthase II) GBSSII is found exclusively bound to starch granules in green tissues and synthesises amylose. The synthesised amylose subsequently consumed by the plant or accumulated in the endosperm (Dian et al., 2003). During pre-heading, about 1-3 days after flowering, this gene/enzyme is expressed in leaf, leaf sheaths, culm, and pericarp tissue at a low level (Ohdan et al., 2005). The impact of Vrinten and Nakamura, (2000) confirmed the role of GBSSII on elongation of amylose in non-storage tissues of cereals. One nsSNP found at position 1638 of this gene and tested for association study with starch physiochemical traits (Appendix 6). Only one considerable association with pasting temperature (PT) with R2 value of 0.20 was observed for this SNP, although some minor association also calculated with GT and Peak time (sect.4 Appendix 7). SSI Only one ´T/C´ SNP at position 5153 of this gene showed minor associations with FV, SB and Martin test (MT), with R2 values of 0.16, 0.11, 0.16, respectively (Appendix 7). SSIIa Starch synthase IIa (SSIIa) gene has effects on starch quality, presumably by affecting amylopectin structure. Two SNPs at positions 631 and 4827-4828 (ALKSSIIA4) were tested for association (Appendix 6). The effect of [GC/TT] on alkali disintegration and eating quality of rice starch is already known (Umemoto and Aoki, 2005; Waters and Henry, 2007). Highly significant asociations were found between SNPs of SSIIa and important physiochemical properties such as pasting temperature (PT), peak time (PKT), GT and Breakdown viscosity. A highest F-test value of 199.65 was observed for ALKSSIIA4 [GC/TT] SNP and PT, suggesting this SNP controls PT, PKT and BDV with R2 values of 82 0.642, 0.323 and 0.168, respectively. This SNP has one of the strongest associations with the physiochemical properties of rice studied in this population (R2=0.642). This suggests the [GC/TT] SNP at position 4827-4828 of SSIIa as one of the most influential SNP across the assayed polymorphism. The other G/T SNP at position 631 showed no singnificant association with any traits. SSIIb It is believed SSIIb is a low level early expressed gene, which is primarily expressed in green tissues and at an early stage of grain filling (Hirose and Terao, 2004). In total, 6 different SNPs were genotyped in this population (Appendix 6) and no polymorphism was found, suggesting that this gene has no effect in our study. SSIIIa The highest polymorphism was observed in this gene with 22 SNPs in the coding region causing an amino acid change. Available Polymorphism in this gene showed association with a number of studied properties such as FV, SBV, PT, M-test, AC, Predicted N, Dif, GT and chalkiness. However, most of them revealed very low R2 values less than 0.1, indicating that although they are associated but do not have highly significant effect on physiochemical properties (sect 8. Appendix 7). Apparently, some SNPs in SSIIIa are highly associated with GT and M-test. The highest R2 values of 0.243, 0.156, 0.130, 0.113 and 0.113 observed for GT, M-test, Dif, AC and predicted N, respectively (Appendix 6). 83 SSIIIb The main effect of SSIIb was observed on pasting temperature (PT). Strong associations were found between ´T/G´ and ´C/A´ SNPs at positions 7232 and 4543 of SSIIIb with R2 values of 0.315 and 0.225, respectively. This relatively high R2 value suggest the influence of SNPs in the coding regions of this gene on PT, although minor association were found with peak viscosity (PKV) and difference as well. These SNPs alter a Lys→Asn and Ser→Ile at position 207 and 756 of corresponding amino acid, respectively. This gene can be classified as a major contributor to pasting temperature, as some of its other SNPs also exhibited significant associations with PT (sect 9. Appendix 7). Hence, this gene is called the pasting temperature (PT) gene. SSIVa SSIVa is one of the least known starch genes expressed in rice endosperm. Our study showed the impact of this gene on PT and GT. Five SNPs were examined in this gene (Appendix 6), of which four showed significant association with PT (sect 10. Appendix 7). A relatively high R2 of 0.259 was observed for the ´A/G´ SNP at position 7160 which influences PT. In addition, four other SNPs, with R2 values ranging from 0.198-0.222, had an influence on this property. Considering all the influential SNPs in SSIVa, a large portion of phenotypic variation of PT in this population of rice is explained by variation within SSIVa. Some minor association were also observed with GT, PKT, AC and PN. Together, these data suggest SSIIIb and SSIVa in combination have a very strong contribution to PT. 84 SSIVb No variation was observed in two SNPs assayed in this population (sect 11. Appendix 7). BEI Only one C/T SNP at position 1558 of this gene was discovered (Kharabian-Masouleh et al., 2011). Nine of 13 studied physiochemical traits were associated with this SNP at medium level, with the highest R2 values observed for AC, MT, SBV and FV, respectively. The relatively high R2 values of 0.260 and 0.238 for AC and MT suggests there is a significant contribution of this gene to amylose content and retrogradation. Minor associations were also found between this SNP and PV, BDV and FV (sect 12. Appendix 7). BEIIb BEIIb is coded by the amylose extender (ae) in maize and other cereals (Yun and Matheson, 1993). Two SNPs were examined in this gene (Appendix 6) but no significant association was found with starch properties. Previous studies on biochemical analysis of amyloseextender (ae) mutant of rice (Oryza sativa) had revealed the influence of mutation in this gene on gelatinization properties through the structural alteration of amylopectin by reducing short chains and degree of polymerization (Nishi et al., 2001). No pleiotropic effect with other genes such as BEIIa and SI was found, suggesting this is a neutral gene in this population. The main reason for this inconsistency may be due to nature and minor significance of SNPs. Most of studies of this gene have focused on mutant populations, where a large segment of gene has been deleted. Therefore, the results of these experiments are not comparable and must only be interpreted at gene level and cannot be expanded to naturally occurred SNPs (sect 14. Appendix 7). 85 Debranching Enzymes (DBEs) ISA1 (Isoamylase 1) Two SNPs were retrieved from databases and genotyped. No polymorphism detected in this population, indicating simply no association with physiochemical properties of rice (sect.15 Appendix 7). ISA2 (Isoamylase 2) Variation of two SNPs was assessed in this gene and very minor associations with BDV, PT and chalkiness. All R2 values were less than 0.1, which indicate very low association with the variability of associated traits (sect.16 Appendix 7). Pullulanase (PUL) A recent association study between pullulanase and RVA profile parameters in glutinous rice has shown strong relations of this gene with peak viscosity, hot paste viscosity, breakdown viscosity and peak time (Yan et al., 2010). In this study there were only weak associations with two SNPs in pullulanase and PT, GT and CHK with R2 values of 0.174, 0.167 and 0.066, respectively. The values above 0.1 present a low degree association and can express a portion of current variability in this population of rice (sect.17 Appendix 7). Discussion SSI transcript level has been measured at different seed developmental stages. A high expression level reported at 1-3 DAF, peaking at 5 DAF, and remaining almost constant during starch synthesis in endosperm. This suggests SSI as a major SS form in cereals (Cao et al., 1999). 86 Neutral genes with no polymorphism or association In total, 65 SNPs were successfully genotyped in 233 breeding lines (Appendix 6). No polymorphism was detected for AGPS2b, SPHOL, SSIIb, SSIVb and ISA1. Moreover, there was no association between BEIIa and BEIIb and any physiochemical properties in rice. Therefore, seven genes out of eighteen did not contribute to physiochemical properties of this population. Although there have been no reports of associations between naturally occurred SNPs in these genes and quality properties, many studies have reported the importance of these genes in physiochemical properties and quality of starch granules. For example, Kawagoe et al., 2005 described that AGPS2b subunit plays important role in starch granule synthesis and associated with rice shrunken mutants. SPHOL is also supposedly involved in starch degradation and biosynthesis. The mechanism appears to be associated with phosphorylation of some starch-related enzymes and proteins such as starch branching enzymes (SBEs) and starch synthase (SSIIa)(Tetlow et al., 2004). As almost all of these studies have been based on deficient mutants (Rolletschek et al., 2002) and it can be concluded that massive mutations, such as In/dels, which abolish gene functions have an impact on soluble sugar content, structure and appearance of starch granules and quality of endosperm in rice and other species, but SNP may not have any impact on starch quality. Despite the reported impact of BEIIb (amylose extender) and ISA1 on physiochemical properties in several cereal species (Fisher et al., 1993; Sun et al., 1997; Sun et al., 1998; Yamakawa et al., 2008), this study found there was no association any with any physiochemical properties in rice starch. In rice endosperm, antisense inhibition of ISA1 has altered the structure of amylopectin and the physiochemical properties of starch (Fujita et al., 2003). The ISA genes are also presumed to have some sort of contribution to the degree of setback on glutinous rice cultivars (Yan et al., 2010). No significant association was found 87 between two detected SNPs in BEIIb and quality traits in this population. The contradictory results can be attributed to the composition or structure of populations. Not all alleles which affect any one trait may be represented in this population or in a particular population; different minor genes might have peculiar regulatory roles and impacts which is mediated by different genetic backgrounds. Major genes with highly significant associations GBSSI and SSIIa are major genes involved in some of the most important grain quality properties such as amylose content and gelatinization temperature. Highly significant associations were found between GBSSI and retrogradation and amylose content in addition to more significant relationships with RVA properties such as BDV, SBV and FV. A number of authors have reported the importance of this gene on starch physiochemical properties of rice and other cereals, where as SNPs at the intron/exon 1 junction site, exon 6 and 10 in rice GBSSI (waxy gene) have the most significant impact on starch quality (Chen et al., 2008a, b; Cai et al., 1998). Larkin and Park, (2003) has already reported a SNP in exon 6 to be effective on amylose content. This study confirms the T/G SNP at intron/exon 1 junction site has a major influence on a number of physiological properties. SSIIa presented very high association with pasting temperature, gelatinization temperature and peak time. The effect of this gene on cooking quality and starch texture has been extensively studied (Umemoto et al., 2004; Umemoto et al., 2008). Umemoto and Aoki, (2005) explained the alkali disintegration and eating quality of rice starch by polymorphism of two SNPs, [A/G] and [GC/TT]. These SNPs within the exon 8 of alk loci are significantly associated with gelatinisation temperature (GT) (Waters et al., 2006) and here it has been confirmed there is a very significant association between the SSIIa exon 8 GC/TT SNP and pasting temperature (R2=0.642). 88 Contributory genes with low-medium associations In this study, six genes, GBSSII, SSI, SSIIIa, SSIIIb, SSIVa, BEI had low to medium effects on the final phenotypic variation of individuals. In fact, SNPs in these genes have shown significant association with a number of studied characters with low to medium R2 values and here these genes are termed contributory, where addition of their effects can, in part or full, represent the phenotypic values. Some of these genes might work with one another to reach a certain level of phenotypic expression. The effect of contributing genes and how could they be associated together have widely been studied at gene level (Dian et al., 2003; Fujita et al., 2006; Hirose et al., 2006; Umemoto et al., 2008). SSIIIb and SSIVa are PT-associated genes with relatively medium to high level of association with pasting and/or gelatinization temperatures. Minor genes with very low associations Debranching enzymes showed minor influences in this population. Isoamylase was first reported in maize endosperm (Doehlert and Knutson, 1991). ISA2, is relatively small gene (2625 bp) with no intron. Therefore, it is presumed each detected SNP/Indel could be potentially important in this gene. However, no strong association between the two SNPs in this gene and any physiochemical property was found. However, another debranching enzyme, pullulanase, had low associations with PT, GT and chalkiness (CHK). A recent association study between pullulanase and RVA profile parameters in glutinous rice has shown strong relations of this gene with peak viscosity, hot paste viscosity, breakdown viscosity and peak time (Yan et al., 2010). However, our results differ from these and may be attributed to the structure of the population. Minor genes are very population-specific and in 89 each population, different minor genes might contribute to the final phenotypic variability of physiochemical properties. Appendices: Chapter 4 Appendix 5: Full list of 233 studied Australian rice genotypes and their pedigree information. Appendix 6: Name and characteristics of SNPs genotyped in the rice population. Appendix 7: The results of association study among 13 physiochemical traits and SNPs of 18 different starch-related genes. Appendix 8: Linkage map of 17 starch-related genes, showing the approximate each gene’s chromosomal location. 90 CHAPTER 5 Rice GPT1 SNP associated with resistant-retrograded starch Summary Resistant-retrograded starch is widely associated with human health. The highly retrograded starches of cereals usually have a lower glycemic index (GI) which may be beneficial in many human diets. Presented here is evidence the GPT1 gene, early in the biochemical pathway of starch synthesis, encoding the 6-glucose-phosphate translocator enzyme, has a major influence on resistant starch production in rice. A ´T/C´ SNP at position 1188 of the GPT1 gene, alters Leu42 to Phe, and is associated with resistant-retrograded starch and amylose content. The ´T´ and ´C´ alleles produce high and low levels of retrograded starch, respectively. An association study of 233 genotypes demonstrated a significant correlation (R2) of 0.57 and 0.36 (P=0.00099) between this SNP and retrogradation degree and apparent amylose content, respectively. Haplotype and association analysis of this SNP and another ´G/T´ SNP at the boundary site of exon/intron1 in GBSSI gene can explain most of the variability of retrogradation degree and amylose content in this rice population. These two SNPs, ´T´ SNP in GPT1 and ´G´ in GBSSI, combine to produce higher levels of resistantretrograded starch and may provide a new tool for deliberate selection of rice genotypes for specific functional and nutritional outcomes such as resistant-retrograded starch and high amylose content non-sticky rices. 91 Introduction Resistant starch is a major contributor to starch quality with a significant impact on human health (Sajilata et al., 2006). The incomplete digestion-absorption of resistant starch in the small intestine leads to non-digestible starch fractions with a physiological function similar to the beneficial impact of dietary fiber in food (Asp and Björck, 1992). On the other hand, the formation of resistant starch due to retrogradation results in the hardening of cooked starch after cooling due to re-crystallization of gelatinized starch components leading to loss of desirable food texture during the storage of some starch containing foods (Fan and Marks, 1998). The staling of bread or the hardening of pasta or rice on refrigeration after cooking are examples of this process. The concept of starch retrogradation and appropriate methods to measure and score its rate in rice has already been described (Philpot et al., 2006). It is believed there is a significant association between retrograded and resistant starches (Sajilata et al., 2006) and so in this study the term retrograded-resistant starch in rice is used. The in vivo digestibility and structural features of resistant-retrograded starch with high amylose content in maize, bean and potato flakes) were assessed using the ileal contents of different human populations (Faisant et al., 1993) and it was found resistant starch consisted mainly of retrograded amylose with a degree of polymerization of approximately 35 glucose units and a melting temperature of 150 °C. Retrograded amylose in peas, maize, wheat, and potatoes was found to be highly resistant to amylolysis and digestion (Ring et al., 1988). The factors which might have a direct or indirect influence on the rate of retrogradation, firmness and resilience of rice starch after cooking are amylose, protein and lipid contents (Philpot et al., 2006). Highly retrograded cooked rice has a low hydrolysis index (HI) and glycemic index (Hu et al., 2004) while waxy and low amylose rice shows more rapid and complete hydrolysis (Chung et al., 2006; Hu et al., 2004). 92 The Rapid Visco Analyser (RVA) has been widely used to measure grain quality parameters (Limpisut and Jindal, 2002). Hu et al. (2004) reported that the high amylose rice cultivars are normally characterized by low RVA parameters, such as peak viscosity (PKV), hot paste viscosity (HPV) and cool paste viscosity (CPV), with higher resistant starch (RS) content and lower estimated glycemic index (EGS). Yan et al. (2010) analysed the association of 17 starch synthesis-related genes with the rapid visco analyzer (RVA) profile parameters in a collection of 118 glutinous rice accessions using 43 gene-specific molecular markers. They concluded that 10 of 17 starch-related genes are involved in controlling RVA profile parameters. The association analysis revealed that pullulanase plays an important role in control of peak viscosity (PKV), hot paste viscosity (HPV), cool paste viscosity (CPV), breakdown viscosity (BDV), peak time (PeT), and paste temperature (PaT) in glutinous rice. Alleles associated with starch quality have been characterized. Granule bound starch synthase (GBSSI) is the most important gene involved in starch synthesis in rice and other cereal grains. A number of SNPs, one at the intron/exon 1 junction site, exon 6 and 10 in rice GBSSI (waxy gene) with significant impact on starch quality have been characterized (Chen et al., 2008a, b ; Cai et al., 1998; Larkin and Park, 2003). Starch synthase IIa (SSIIa) is also known to have a major affect on starch quality and is exclusively expressed in the endosperm at very high levels. SSIIa affects the amylopectin structure of starch (Craig et al., 1998; Morell, 2003). The influence of this gene on cooking quality and starch texture has been studied extensively (Umemoto et al., 2008; Umemoto et al., 2004). Umemoto and Aoki, (2005) explained the alkali disintegration and eating quality of rice starch by polymorphism of two SNPs, [A/G] and [GC/TT], in SSIIa. These SNPs within exon 8 of SSIIa are also significantly associated with gelatinisation temperature (GT) (Waters et al., 2006). Although the effect of many starch related genes on grain quality has been widely studied, there is little know how polymorphisms in starch related genes influence starch quality 93 parameters, except for those reported for GBSSI and SSIIa. In fact, most studies have focused on comparison of gene-deficient mutants (Fujita et al., 2006) at the gene level rather than the DNA sequence level, probably due to lack of high-throughput technologies to discover new variants in widely diverse populations. Emergence of new technologies such as next generation sequencing and multiplexed-MALDI-TOF technologies has removed the limitations of traditional sequencing and genotyping methods and improved the efficiency of SNP-trait analysis in plants. The Glucose-6-phosphate translocator (GPT) was first isolated from plastid envelope membranes of maize (Zea mays) endosperm (Kammerer et al., 1998). GPT is a key enzyme found early in the starch biosynthesis pathway and controls the production of precursors for starch and fatty acid biosynthesis. Plant genomes normally contain two functional homologous GPT genes, GPT1 and GPT2, both of which have glucose 6-phosphate translocator activity in the plastids of non-green tissues and can import carbon in the form of glucose 6-phosphate. Mutation in the GPT genes of Arabidopsis is disruptive in starch synthesis and the oxidative pentose phosphate cycle of cereals, which in turn affects fatty acid biosynthesis and oil accumulation (Niewiadomski et al., 2005; Wakao et al., 2008). Sequencing 17 starch-related genes in rice using a long-range PCR protocol combined with massively parallel sequencing discovered a number of novel SNPs in the GPT1 gene indicating that this gene potentially has an influence on rice starch quality properties. This study reports a novel SNP in the rice glucose-6- phosphate translocator 1 (GPT1) gene closely associated with resistant-retrograded starch and amylose content and identifies an allelic combination with the waxy gene which explains most of the variability in retrogradation degree and amylose content in rice. 94 Materials and methods Plant materials A population of 233 F6 lines from the Australian temperate (japonica-type) rice breeding program was supplied by Industry and Investment NSW, Yanco Agricultural Research Institute, Australia. Selections for the capacity to flower and set seed and the morphological traits of plant height, grain size and shape had been made on this population. No selection had taken place for quality traits. Physiochemical properties Thirteen physiochemical traits including four phenotypic and RVA characteristics were measured. The phenotypic traits consisting of apparent amylose content (AC), gelatinization temperature (GT), grain chalkiness and retrogradation rate [scored by the Martin test (Philpot et al., 2006), were quantified according standard methods. RVA characteristics such as peak viscosity (PKV), trough viscosity (TV), final viscosity (FV), breakdown, setback, peak time (PKT) and pasting temperature (PT) were measured by a Rapid Visco Analyser (Perten RVA 4500, Segeltorp, Sweden) according to the manufacturer’s instructions. Designation of starch-synthesis genes involved in starch metabolism The available literature was used to identify the most likely candidate genes associated with rice starch quality (Ohdan et al., 2005; Waters and Henry, 2007; Nakamura, 2002; Hirose et al., 2006; Rahman et al., 2000). The general entries of nucleotide sequences (gDNA) and full-length cDNAs of gene classes involved in starch biosynthesis were retrieved from the NCBI (http://www.ncbi.nlm.nih.gov/) 95 and the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/cgibin/putative_function_search.pl) databases. Amplification primers were designed based on consensus sequence alignment of each candidate gene. SNP/Indels were identified by long range PCR and massively parallel sequencing (Illumina® GAII) of the starch biosynthesis genes in this population (Kharabian-Masouleh et al., 2011). Discovery of novel SNP in GPT1 and SNP genotyping A ´C/T´ SNP at reference position of 1188 GPT1 was found in this breeding population which changes an amino acid from Leu to Phe (Leu42Phe). The position and function of a SNP at the boundary of intron/exon1 of GBSSI (waxy gene) has been well characterized (Cai et al., 1998; Isshiki et al., 1998). A specific multiplexed mass spectrometry assay (Sequenom® MassARRAY) was designed for simultaneous genotype analysis of each of these SNPs according to Masouleh et al., (2009) with modification in sequence of capture and extension primers (Table 1). Association analysis SNP data and phenotypic data were analysed in TASSEL v2.1 (Bradbury et al., 2007) software to identify SNP associated with physiochemical properties. The input genotypic and phenotypic files prepared according to Bradbury et al. (2007) and then imported to the software. The general linear model (GLM) was used for alignment of data with 1000 prematuration. Results Two SNPs were genotyped in 233 individuals. To avoid complications in the association study, a gene by gene approach was applied. The results of the association study for each 96 gene were then related to the physio-chemical properties to find the combinations which cause the highest and lowest retrogradation degree and amylose content. Table 1. MassARRAY primers for GPT1 and GBSSI. Primer Sequence 5´→3´ GPT1_GA_Ref1188ER F R E† *ACGTTGGATGGCTTCGGTTTCATCTGTCTC *ACGTTGGATGTAGTGGTGCAAGGTAGAGTG AAGGTAGAGTGGTCTGA GBSSI_EXIN1 F *ACGTTGGATGGATCGATCTGAATAAGAGGG R E † *ACGTTGGATGCTGCTTGTGTTGTTCTGTTG AGGAAGAACATCTGCAAG *A 10-mer tag, sequence 5´-ACGTTGGATG-3´, was added to the 5´ end of each amplification primer to avoid confusion in the mass spectrum and improve chain reaction performance. † Extention primer. GPT1 (Glucose-6-phosphate translocator) GPT1 is found early in the starch biosynthesis pathway (Fig 2). Theoretically, any polymorphism in the coding regions or critical domains of genes can influence the starch properties. GPT1 is strongly expressed in the endosperm and imports the essential carbon substrates such as Glc6P into plastids during grain development (Fischer and Weber, 2002; Jiang et al., 2003). A number of SNP/Indels in GPT1 and a novel non-synonymous ´C/T´ SNP at position 1188 of the gene were detected. This SNP generates two alleles that encode either a Leu or a Phe. The results of this association study revealed a significant association between this SNP and some physiochemical properties of rice starch. The C/T SNP showed an association with amylose content, predicted N, difference and set back (Table 2). However, the strongest association was found between this SNP and retrogradation degree. The R2 value for this important starch property was 0.58. Apparent amylose content, which is one of the other critical components of starch, has a very strong association with this SNP with an R2 value of 0.365. The ´C´ allele in GPT1 results in the lowest degree of 97 retrogradation of about 0.34 while the ´T´ conversely renders the highest value of 2.74 (Fig 1a). Highly retrograded resistant starch releases glucose monomers very slowly, which is highly desirable in human diets. Fig 1b. also shows how GPT1 gene 3.5 30 3 25 2.5 20 Amylose content (%) Retrogradation degree GPT1 gene 2 1.5 1 15 Series1 10 5 0.5 0 0 C T C (b) (a) 30 2.5 25 2 1.5 1 0.5 0 (c) Haplotype Combinations of SNPs in GPT1 and GBSSI 3 Amylose content (%) Retrogradation degree Haplotype combinations of SNPs in GPT1 and GBSSI GPT1 GBSSI C T GPT1 1 T T 20 15 10 5 GBSSI G (d) 0 GPT1 GBSSI C T GPT1 1 T GBSSI G Figure 1. Effect of GPT1 and GBSSI SNPs on retrogradation degree and amylose content (%). (a) Allele ´C´ in GPT1 represents the low retrogradation rate and (b) low amylose content where as ´T´ produces the highest values in both studied traits (±SD=0.34 and 1.57, respectively). (c) Haplotype combination of studied SNPs in GPT1 and GBSSI which creates high and low retrogradation degree and (d) amylose content (%)(±SD=0.34 and 1.57, respectively). 98 Table 2. Physiochemical properties associated with ´C/T´ SNP in GPT1. The R2 values show the portion of total variation explained by SNP in GPT1 gene. Trait Peak viscosity (PV) Break down (BD) Final viscosity (FV) Set back viscosity (SB) Retrogradation degree (Martin-test) Amylose content (AC) Predicted N Difference F-test 21.1979 37.1798 31.1074 83.2826 292.1427 123.0474 122.5425 103.1431 p-value 7.10E-06 4.97E-09 7.32E-08 5.40E-17 7.10E-42 6.97E-23 8.19E-23 4.97E-20 p-value adjusted† 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 R2 0.090 0.148 0.126 0.280 0.577 0.365 0.364 0.325 *Values<0.01 are significant, †p-values adjusted for multiple tests when the permuation set to 1000 of run. Figure 2. Simplified biochemical pathway of starch synthesis in cereals. GPT1 directly affects on the structural fatty acid of amylose and causes high and low resistant-retrograded starch upon occurrence of ´T´ and ´C´ SNPs, respectively. different alleles of this gene influence the production of amylose in which, ´C´ and ´T´ generate the lowest and highest amylose content of 17.8 and 24.7%, respectively. 99 GBSSI (Granule bound starch synthase I) GBSSI is probably the most important gene involved in starch synthesis in rice and other cereal grains. This association study showed a strong correlation between the WaxyIN1 (G/T) SNP at the junction site of Exon1/Intron1 and important RVA curve characteristics such as peak viscosity (PKV), set back and breakdown. This SNP has an influence similar to GPT1 on physiochemical properties of rice starch (Table 3). The association study showed significant F-values of 223.29 and 121.52 for this ´G/T´ SNP, indicating significant association with to retrogradation degree (Martin test) and amylose content, respectively. The R2 values were 0.66 and 0.51 for retrogradation and amylose content, respectively, suggesting this SNP also controls these important traits and can explain a substantial portion of variability in this rice population. Individuals with the T allele exhibited the lowest retrogradation degree of 0.730, whereas the ´G´ allele gave the highly retrograded resistant starch (2.60) with high amylose content. The range of amylose content for T and G alleles was 17.5 and 24.5, respectively (Fig 1c and 1d). Allelic combination of SNPs in GPT1 and GBSSI An association between allelic combinations of GPT1 and GBSSI to control the retrogradation degree and amylose content was detected. The T:G GPT1:GBSSI allelic combination produces the highest amylose content and amount of retrograded resistant starch (Fig 1c. and 1d). Conversely, the C:T allelic combination (GPT1:GBSSI) produces the lowest retrogradation and amylose content. Other combinations of T:G and C:T SNPs resulted in values of 0.73-2.0 and 14.1-22.1% for retrogradation and amylose content, respectively. Some other starch related genes such as BEI, SSI and SSIIa may also play a complementary role. 100 Table 3. Physiochemical properties associated with ´G/T´ SNP in GBSSI. The R2 values show the portion of total variation explained by waxy1 SNP in GBSSI gene. Trait Peak viscosity (PV) Break down (BD) Final viscosity (FV) Set back (SB) Retrogradation degree (Martin-test) Amylose content (AC) Predicted N Difference F-test 34.346 35.189 15.050 76.273 223.294 121.529 121.542 54.612 p-value 8.87E-14 4.64E-14 7.18E-07 3.89E-26 1.29E-54 9.63E-37 9.56E-04 3.92E-04 p-value adjusted† 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 R2 0.231 0.234 0.115 0.398 0.660 0.513 0.513 0.322 *Values<0.01 are significant, †p-values adjusted for multiple tests when the permuation set to 1000 of run. Discussion Resistant starch can play an important role in human nutrition and health. Resistant starch is digested more slowly than non-resistant starch and releases glucose slowly into the blood stream, resulting in a low glycemic index (GI). An in vitro enzymatic starch digestion study showed that there should be a close relationship between resistant starch, amylose content, retrogradation and glycemic index (Hu et al., 2004). The study revealed that high amylose rice cultivars, characterized by low major RVA parameters, such as peak viscosity, hot paste viscosity, and cool paste viscosity, had more resistant starch content and resulted in a lower estimated glycemic index (Hu et al., 2004). When the retrogradation degree is higher, the starch is more resistant to digestion and the GI is lower. In this study, a significant association between a ´T/C´ SNP at position 1188 of GPT1, which alters a Leu42 to Phe, and the presence of resistant-retrograded starch and high amylose content, at which the ´T´ and ´C´ produces a high and low retrogradation rate, respectively, was found. It is believed amylose content has a significant influence on retrogradation rate (Hu et al., 2004) but some studies show that these two important starch properties might work independently from one another. 101 Table 4. Allelic combinations of GPT1 and GBSSI represent different classifications of amylose content and retrogradation degree. Classification GPT1 GBSSI Amylose content (%) 24.45±1.63 *Status G:G Retrogradation Degree (M-test) 2.840±0.73 High AC and Ret No of lines in each class 15 Group 1-High T:T Group 2-High medium C:C G:G 1.577±0.45 22.85±1.08 High-medium AC and Ret 1 Group 3-Low medium C:C G:T 1.032±0.22 20.55±1.16 Low-medium AC and Ret 5 Group 4-Low C:C T:T 0.679±0.35 17.5±3.12 Low AC and Ret 205 *AC=Amylose content; Ret=Retrogradation; M-test=Martin test. The values presented as Mean±SD. Panlasigui et al. (1991) revealed that rice cultivars with very similar amylose content have different digestibility and glycemic index in humans, suggesting that some other mechanisms such as retrogradation must be involved in the process (Panlasigui et al., 1991). In spite of a correlation coefficient of 0.70 between amylose content and retrogradation degree in this study, the conclusion here is these two traits work independently but have some contributing influences on each other. Major genes such as GBSSI and SSIIa and their functional SNPs have a major influence on amylose and amylopectin content in cereals (Nakamura et al., 2005; Umemoto et al., 2004b; Yamamori et al., 2006). Glucose 6-phosphate/phosphate translocator (GPT1) imports carbon resources into non-photosynthetic plastids (Kammerer et al., 1998) and it appears to be a key gene controlling retrogradation degree in rice. Andriotis et al. (2010) conducted an experiment to determine the importance of GPT1 in development of embryo of Arabidopsis. They reported a major influence of GPT1 on seed development, where a strong reduction in activity of this gene resulted in abortion of the embryo due to ultra-structural and biochemical defects including proliferation of starch granules. It was proposed GPT1 is necessary for early embryo development because it catalyses import into plastids glucose 6-phosphate as the substrate for NADPH generation via the oxidative pentose phosphate pathway (Andriotis et al., 2010). Loss of GPT1 activity in developing bean embryos has large effects on storage product synthesis (Rolletschek et al., 102 2007). The same loss or activity variation (occurred by SNPs) in the biochemical pathway of GPT1 can change the constitution of starch and particularly amylose content, which normally accounts 20-30% of starch content. GPT1 is important in fatty acid synthesis of oilseed where oil can account for up to 30-40% of dry matter. In Brassica species, application of exogenous Glc6P changed the activity of GPT1, in which uptake and metabolization of Glc6P to fatty acids was altered significantly through plastidial glycolysis (Eastmond and Rawsthorne, 2000; Hutchings et al., 2005). The influence of GPT1 on starch retrogradation may be explained by its role in fatty acid synthesis. The role of fatty acids and lipids in the helical structure of amylose has long been studied (Tester and Morrison, 1990). It is thought lipids play a structural role as a core centre scaffold in holding together the helical architecture of amylose and it has been suggested amylose content is correlated with lipid content (Morrison, 1988). Philpot et al. (2006) reported removal of lipids significantly increased retrogradation rate and the firmness of rice starch gels. They found O. sativa cv Koshihikari grown in Japan had a lower retrogradation rate relative to O. sativa cv Koshihikari grown in Australia, despite the fact that flour from both origins were 18% amylose. This can be attributed to the amount of long amylose chains complexed with lipids. Apparently, the amount of long amylose chains associated with lipid is greater for the Japanese rice, and the higher lipid content linked to long amylose chains explains the lower retrogradation in the Japanese rice. It is possible then that GPT1 influences retrogradation degree via its influence on fatty acid content rather than directly influencing amylose content. The lipids complex with long chain amylose and relatively high concentrations of lipid disrupt recrysalisation, lowering the extent of starch retrogradation. 103 CHAPTER 6 A high-throughput assay for rapid and simultaneous analysis of perfect markers for important quality and agronomic traits in rice using multiplexed MALDI-TOF Mass Spectrometry Summary The application of single nucleotide polymorphisms (SNPs) in plant breeding involves the analysis of a large number of samples, and therefore requires rapid, inexpensive and highly automated multiplex methods to genotype the sequence variants. A high-throughput multiplexed SNP assay for eight polymorphisms which explain two agronomic and three grain quality traits in rice was optimised. Gene fragments coding for the agronomic traits plant height (semi-dwarf, sd-1) and blast disease resistance (Pi-ta) and the quality traits amylose content (waxy), gelatinization temperature (alk) and fragrance (fgr) were amplified in a multiplex polymerase chain reaction. A single base extension reaction carried out at the polymorphism responsible for each of these phenotypes within these genes generated extension products which were quantified by a matrix-assisted laser desorption ionizationtime of flight system. The assay detects both SNPs and indels and is co-dominant, simultaneously detecting both homozygous and heterozygous samples in a multiplex system. This assay analyses eight functional polymorphisms in one 5 μL reaction, demonstrating the high-throughput and cost-effective capability of this system. At this conservative level of multiplexing, 3072 assays can be performed in a single 384-well microtitre plate, allowing the rapid production of valuable information for selection in rice breeding. 104 Introduction Single nucleotide polymorphisms (SNPs) are the most abundant class of sequence variation and explain the occurrence of human genetic disease (Shastry, 2002) and many important traits in plants (Bryan et al., 2000; Kennedy et al., 2006). The high frequency of SNPs in many plant species, including rice, where comparison of data from japonica and indica cultivars identified one SNP every 170 bp and one indel every 540 bp (Yu et al., 2002), in combination with their genome-wide distribution (Garg et al., 1999; Drenkard et al., 2000; Nasu et al., 2002; Batley et al., 2003), means that they have the capacity to generate highresolution genetic maps (Bhattramakki et al., 2002). The capacity for high resolution means SNP markers are an attractive tool for gene identification. When identified, causal SNPs are the perfect markers within marker-assisted selection programs (Gupta et al., 2001; Rafalski, 2002; Batley et al., 2003). Several techniques have been developed to assay SNPs, including SNP microarray hybridization-based methods (Rapley and Harbron, 2004) and enzyme-based methods including those involving the use of DNA ligase, polymerase and nuclease (McGuigan and Ralston, 2002; Olivier, 2005; Costabile et al., 2006; Gunderson et al., 2006). Other methods, such as Pyrosequencing (Ahmadian et al., 2000), and PCR based approaches (Hayashi et al., 2004) including TaqMan® (Livak, 1999) have been designed for SNP and indel detection; however, they are generally not cost- or time-effective per sample. PCR-based markers are preferable because they are efficient, cost- effective and require only a small quantity of genomic DNA for genotyping, and are thus suitable at all stages of plant growth, including early seedling stages. An increasing number of genes controlling important traits in plants are being discovered, and the underlying polymorphisms can be converted into perfect molecular markers. Some recent examples of perfect markers for important traits in plants include rice fragrance 105 (Bradbury et al., 2005), wheat grain hardness (Morris, 2002), rice blast resistance (Kennedy et al., 2006) and a range of other disease resistance genes (Jeong et al., 2002), however, each of these have been single- trait, uniplex assays. Plant breeders often track and select for more than one trait within any one cross, and as the number of genes which control important traits expands, the need for rapid, simple, inexpensive, reliable multiplex genotyping methods will become more urgent (Hayashi et al., 2004). The objective of this study was to investigate the capability of the multiplex matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry system (Sequenom® MassARRAY®, San Diego, CA, USA) as a high-throughput platform for the rapid, simultaneous and robust multiplex assay of SNPs responsible for important agronomic and grain quality traits in rice. In this article, an assay for distinguishing between eight different important polymorphisms simultaneously in a single 5 μL reaction is reported. Materials and methods Genotypes All plant material was supplied by the Australian Plant DNA Bank (http://www.biobank.com). Twenty-five commercial rice cultivars were analysed: Amaroo, Amber, Basmati 370, BL24, Calrose, Calmochi 202, Dawn, Della, Dellmont, Domsorkh, Doongara, Dragon Eye Ball, Goolarah, Jarrah, Jasmine, Kyeema, Khao Dawk Mali 105, L202, Langi, Millin, M7, Nipponbare, Opus, Teqing and YRF204. DNA extraction Total plant DNA was extracted from individual seedlings at 10 days after germination using a Qiagen (Valencia, CA, USA) DNeasy Plant Kit, according to the manufacturer’s instructions. 106 Primer design/generation of SNP markers Capture and extension primers were designed by Sequenom® MassARRAY® Assay design 3.1 software, with the exception of the sd-1DEL primers which were designed by Primer3 (http:// frodo.wi.mit.edu). The optimal amplicon size containing the polymorphic site in the software was set to 80–120 bp. A 10-mer tag (5-ACGTTGGATG-3) was added to the 5′ end of each amplification primer to avoid confusion in the mass spectrum and to improve PCR performance. Capture PCR protocol Platinum® Taq DNA Polymerase (Invitrogen, Carlsbad, CA, USA) in a final volume of 5 μL was used for all capture PCRs. The eight-plex reaction was optimized by testing a number of capture primer and MgCl2 concentrations in the ranges 0.2–1 μM and 1–3.5 μM, respectively. Uniplex assays using identical PCR conditions confirmed the results of all eight-plex experiments. The optimal eight-plex capture PCR consisted of 3–5 ng of template DNA, 0.5 uL 10 × PCR buffer (InviTrogen), 3 mM MgCl2, 2.5 mM of each deoxynucleoside triphosphate (dNTP), 5 μM of each primer and 1 unit of Taq polymerase (5 U/μL). The reactions were heated to 94 °C for 15 min, followed by 45 cycles of amplification at 94 °C for 20 s, 56 °C for 30 s and 72 °C for 1 min, followed by a final extension at 72 °C for 3 min. As the sd-1DEL is relatively large, the amplification protocol was modified as follows: 3.75 μL of 10 × PCR buffer (50 mM), 2.25 μL of MgCl2 (50 mM), 2.1 μL of 10 μM primers (each), 6 μL of dNTPs (2.5 mM), 12 μL of 2 × Enhancer [6% glycerol + 10% dimethyl sulphoxide (DMSO)], 0.3 μL of Platinum®Taq polymerase and 1.5 μL (5 U/μL) template. The thermocycling program was 94 °C for 5 min, followed by 45 cycles of amplification at 94 °C for 30 s, 55 °C for 30 s and 72 °C for 1 min, followed by a final extension of 72 °C for 3 min. Finally, 1 μL of PCR product was added to the multiplex test tubes. 107 Shrimp alkaline phosphatase (SAP) incubation Unincorporated dNTPs were removed by SAP incubation according to the manufacturer’s (Sequenom, San Diego, CA, USA) instructions. 108 Table 1. MassARRAY markers for eight different functional polymorphisms Polymorphism Trait Capture primers Extension Primer Expected polymorphism sd-1SNP Semi-dwarf F:*CGATGTTGATGACCATGGCG R:*CATCCTCCTCCAGGACGAC AGGACGACGTCGGCGGC [C/T] sd-1Del Semidwarf** F:*CACGCACGGGTTCTTCCAG R:*AGGAGTTCCATGATCGTCAG GCGACAGCTCCTTCATCTCCTCGC [C/T/A] Pi-ta Blast resistance F:*GCTTCTTTCTTTCTCTGCCG R:* CAAACAATCATCAAGTCAGG AAGTCAGGTTGAAGATGCATAG [G/T] waxyIN1 Amylose content F:* GATCGATCTGAATAAGAGGG R:* CTGCTTGTGTTGTTCTGTTG CAGGAAGAACATCTGCAAG [G/T] waxyEX6 Amylose Content F:* ACCTCAACAACAACCCATAC R:* GATCATCATGGATTCCTTCG CCCATACTTCAAAGGAACTT [C/A] alk3 Gelatinizatio n Temp F:* TGTCCTCGAACGGGTCGAAC R:* CTCAACCAGCTCTACGCCAT CTTCTGCGGGCTGAGGGACACC [A/G] alk4 Gelatinizatio n Temp F:* TGACAAGGACCTCCTCGTAG R:* CGCAAGTACAAGGAGAGCTG AAGGAGAGCTGGAGGGG [GC/TT] fgr Fragrance F:* ACCTCAACAACAACCCATAC R:* GTTAGGTTGCATTTACTGGG TGGGAGTTATGAAACTGGTA [TATAT/AAAAGATTATGGC] * A 10-mer tag, sequence 5´-ACGTTGGATG-3´, was added to the 5´ end of each amplification primer to avoid confusion in the mass spectrum and improve PCR performance. ** A modified method was applied to amplify this allele 109 Primer extension and mass spectrometry The remaining assay steps of primer extension, resin cleanup and mass spectrometry were undertaken according to the manufacturer’s (Sequenom®MassARRAY®) instructions. Results Analysis of PCR products Assays were constructed for eight polymorphisms defining each of the alleles of five genes controlling five important commercial traits. The traits and genes were semi-dwarf (sd-1, two alleles) (Sasaki et al., 2002; Spielmeyer et al., 2002), blast disease resistance (Pi-ta, one allele) (Bryan et al., 2000), amylose content (waxy, two SNPs) (Cai et al., 1998; Larkin and Park, 1999; Chen et al., 2008), gelatinization temperature (alk, two SNPs) (Umemoto, 2005; Waters et al., 2006) and fragrance (fgr, one allele) (Bradbury et al., 2005) (Table 1). Optimal capture primer concentration The optimal primer concentration for the amplification of each target polymorphism in uniplex and eight-plex was 0.3 μM. Polymorphism detection at eight-plex was consistent with uniplex data. Increasing the uniplex primer concentration to 0.5 μM led to PCR products of higher concentration, except for waxyIN1, in which there was nonspecific amplification at this concentration. The concentrations of PCR products, as measured by a Bioanalyser 2100 (Agilent Technologies, Palo Alto, CA, USA) DNA 500 LabChip® Kit, ranged from 7.8 ng/μL (sd-1 SNP) to 12.2 ng/μL (alk4) in uniplex (Figure 1a,b), and were relatively lower in eight-plex, ranging from 6.40 ng/μL (sd-1 SNP) to 11.21 ng/μL (alk4), which was sufficient to produce an excellent mass spectrum (Figure 1c,d). 110 Figure 1. Concentration of PCR products in uni-plex and 8-plex: (a) Concentration of sd-1SNP = 7.8 ng/µl (major peak), minor peaks correspond to size standard. (b) alk4 = 12.2 ng/µl, µl (major peak), minor peaks correspond to size standard (c) Concentration of PCR products in 8-plex (d) Concentration of PCR products in a 8-plex which has been analysed individually (all in ng/µl). MgCl2 concentration MgCl2 concentration is one of the most important factors for accurate concurrent amplification of different loci in a multiplex system. The optimal concentration for the amplification of all loci in uniplex and multiplex was 3 mM. At this MgCl2 concentration, all target loci were amplified free from nonspecific amplicons and primer dimers. At lower MgCl2 concentrations of 2 and 2.5 mM, no target DNA was amplified and there were a surprising number of nonspecific bands and primer 111 dimers. At concentrations higher than 3 mM, nonspecific bands were present in addition to the target loci. These results were consistent and reproducible in both uniplex and eight-plex. Identification of SNPs and polymorphisms in agronomic and quality loci All eight loci were amplified in 25 cultivars and genotyped by multiplex MALDI-TOF analysis of single-base extension products, and the polymorphisms were compared (Table 2). Of these, three were responsible for important agronomic traits and five for grain quality traits, including six nucleotide substitutions and two insertions/deletions (indels). Polymorphisms were distinguished at all agronomic and quality loci, as described below. sd-1 The semi-dwarf phenotype is caused by a loss of function of the enzyme gibberellin 20-oxidase (GA 20-oxidase). Plants carrying the non-functional form of the gene, sd-1, have a diminished capacity to produce gibberellin, resulting in a reduced plant height and enhanced grain yield. Two alleles of sd-1 were assayed. One sd-1 allele, here called sd-1SNP, contains a C/T SNP in exon 2 of the gene (C TC = leucine/ T TC = phenylalanine), which does not affect phenotype as it causes a synonymous mutation (Spielmeyer et al., 2002; Monna et al., 2002). The other allele, here called sd-1Del, is characterized by a 280-bp (Spielmeyer et al., 2002) or 278-bp (Sasaki et al., 2002) deletion of part of exon 1 and exon 2 and 102–105 bp of the intron sequence, a 380–383-bp deletion in total (Figure 2). 112 Table 2. Polymorphisms (SNP) in 25 commercial rice cultivars at eight different functional loci. Cultivars Amaroo Amber Basmati 370 BL24 Calrose Calmochi 202 Dawn Della Dellmont Domsorkh Doongara Dragon Eye Ball Goolarah Jarrah Jasmin Kyeema Khao Dawk Mali 105 L 202 Langi Millin M7 Nipponbare Opus Teqing YRF 204 Polymorphism sd-1SNP T C C C C T C C C C C C C T T C C C T T C C T C C sd-1Del A A A T A A A A T A T A A A T A A T A A T A A T T Pi-ta T T T G T T T T T T T T T T T T T T T T T T T G T waxyIN1 waxyEX6 T G G G T T G G G G G T T T T T T T T T T T T G T A C C A A A C C C C C A A A A A A A A A A A A A A 113 alk3 G G G G G G G G G G G A G G G G G G G A G A A G G alk4 TT GC GC GC TT TT GC GC GC GC GC GC GC TT TT GC TT GC GC GC GC GC GC GC GC fgr AAAGATT TATAT TATAT AAAGATT AAAGATT AAAGATT AAAGATT TATAT TATAT TATAT AAAGATT TATAT TATAT AAAGATT TATAT TATAT TATA AAAGATT AAAGATT AAAGATT AAAGATT AAAGATT AAAGATT AAAGATT TATAT Semi-dwarf/Tall Semi-dwarf Semi-dwarf Semi-dwarf Tall Semi-dwarf Semi-dwarf Semi-dwarf Semi-dwarf Tall Semi-dwarf Tall Semi-dwarf Semi-dwarf Semi-dwarf Tall Semi-dwarf Semi-dwarf Tall Semi-dwarf Semi-dwarf Tall Semi-dwarf Semi-dwarf Tall Tall Figure 2. Determination of sd-1Del gene on a 2% agarose gel. Fragments around 300 bp indicate 383 bp deletion in the sd-1Del gene which is responsible for the semi-dwarf phenotype. Fragments of approximately 700 bp are the intact sd-1Del gene of tall plants. Lanes from left to right respectively, 100 bp Ladder, Negative control; rice varieties Nipponbarre, Kyeema, Doongara, Amaroo, BL24, Della and Domsorkh. Although a large deletion, such as sd-1Del, can be determined by the size difference of amplification products on a simple 2% agarose gel (Figure 2), the suitability of MALDI-TOF for the identification of large indels was assessed. In theory, only one base (terminator) is added to the SNP site down- stream of the extension primer. Therefore, accurate gene sequence information, particularly the flanking region just before and after the indel, is necessary because single-base extension either recognizes one base inside or outside of the indel. Theoretically, the indel can be determined by the ddNTP which terminates the extension reaction (Figure 3). However, when using the assay designed by Sequenom® (MassARRAY® Assay design 3.1) in both uniplex and eight-plex, no logic call was detected and all genotypes showed ‘A’, which corresponds to the sd-1Del allele. Modification of the method substantially improved the accuracy of analysis of this allele, from 43.7% to 5% (Table 3). The modification involved amplification of the region containing the deletion in uniplex using PCR primers designed by Primer 3 (http://frodo.wi.mit.edu), followed by the addition of these uniplex amplicons to the other loci which had been amplified in seven-plex for all subsequent manipulations. 114 Figure 3. Determination of sd-1Del gene by MALDI –TOF. There is a 383bp deletion in semi-dwarf plants, therefore extended single base (mass modified terminator) matches to “C” or “A” which is located just after deletion otherwise there will be a peak of T for tall plants. Pi-ta Pi-ta is a major blast resistance gene in rice. Pi-ta encodes a 928-amino-acid polypeptide with a molecular mass of 105 kDa. A [G/T] SNP distinguishes susceptible and resistant genotypes (Bryan et al., 2000); amino acid 918 differs between resistant and susceptible genotypes: all susceptible genotypes have a serine (T) at this position, whereas resistant plants have alanine (G). Most of the cultivars in this study carried the ‘T’ allele that translates to serine (susceptible), whereas BL24 and Teqing contained the resistant ‘G’ allele (alanine). waxy The waxy gene encodes the enzyme granule-bound starch synthase, which is one of the key factors influencing rice starch quality by affecting apparent amylose content (Sano, 1984; Webb, 1991; Chen et al., 2008). The [G/T] SNP at the intron 1/exon 1 splice site (waxyIN1) differentiates between varieties of high and low amylose content (Cai et al., 1998) and, in combination with the exon 6 [C/A] SNP (waxyEX6), differentiates between varieties of high, intermediate and low amylose content in southern US germplasm (Chen et al., 2008). Cultivars with ‘T’ in waxyIN1 and ‘A’ in waxyEX6 have the lowest amylose content or even 115 glutinous starch. High polymorphism was found at waxyIN1 in the studied cultivars Amber, Basmati 370, BL24, Dawn, Della, Dellmont and Domsorkh, Doongara and Teqing contained the ‘G’ allele, and Jasmine, Nipponbare, Langi and M7 carried the ‘T’ allele (Figure 4). At waxyEX6, 18 of the 25 cultivars displayed ‘A’, which suggests low amylose content. alk The major gene regulating alkali disintegration in rice grains, alk (Gao et al., 2003), encodes the enzyme starch synthase IIa (Umemoto et al., 2004). Alkali disintegration is a convenient indirect measure of the gelatinization temperature of rice starch, which is, in turn, associated with rice cooking and eating quality. Two polymorphisms within exon 8 of alk, [A/G] (alk3) and [GC/TT] (alk4), are associated with gelatinization temperature class (Umemoto, 2005; Waters et al., 2006). Figure 4. Sequenom® MassARRAY® waxyIN1 uni-plex spectrum for cv Langi which shows a peak for ´T´ 116 Figure 5. An 8-plex Sequenom® MassARRAY® spectrum for cv Langi A combination of alk3 ‘G’ and alk4 ‘GC’ is found within varieties of high gelatinization temperature and low alkali spreading, whereas varieties with either alk3 ‘A’ or alk4 ‘TT’ are low gel temperature varieties. Both the [GC/ T T] (alk4) and [A/G] (alk3) polymorphisms were determined in all cultivars. fgr A recessive gene (fgr) on chromosome 8 controls rice fragrance. The intact Fgr allele encodes a betaine aldehyde dehydrogenase (BADH2) in non-fragrant rice, whereas fragrant rice contains an 8-bp deletion and three SNPs which prematurely terminate the translation of BADH2. This changes the bio- synthetic pathways in which BADH2 is active, resulting in the accumulation of 2-acetyl-pyrroline, which is responsible for fragrance (Bradbury et al., 2005). The eight-plex assay identified 11 varieties with the fragrant allele fgr. 117 Missing data and heterozygosity The highest rate of missing data belonged to sd-1DEL in eight- plex, which suggests that this allele is not compatible with the multiplex system (Table 3). No missing data were found in waxyIN1, Pi-ta, alk4 and fgr. The apparent heterozygosity values were 3.9% and 3.1% in sd1SNP and alk4, respectively. Discussion This report as demonstrates DNA polymorphisms can be efficiently confirmed and analysed in rice using a MALDI-TOF mass spectrometry system (Ding and Cantor, 2003). These assays can be used as a marker-assisted selection tool in conventional breeding programs. Rice has been at the forefront of the application of genomics and genomics tools to plant breeding and serves as a model for other crops. A whole rice genome sequence has been available for several years (Goff et al., 2002; Yu et al., 2002), and a comprehensive DNA polymorphism database has recently become available online (http://irfgc.irri.org/index.php). The availability of these resources has accelerated the rate, at which gene function has been elucidated. Emerging DNA sequencing technologies are revolutionizing the field of genomics, bringing the reality of relatively inexpensive comparative genome sequencing of all the major crops much closer. MALDI-TOF mass spectrometry, in combination with comparative genome sequence data, will become increasingly useful in marker-assisted breeding as more genes that control important traits are identified. An efficient PCR is the most important predictor for producing a reliable and consistent assay on this platform (Figure 5). The uniform simultaneous amplification of all loci will resolve the most commonly encountered problems (Siebert and Larrick, 1992). The number and intensity of correct SNP calls are increased with higher PCR product concentrations. The minimum concentration of PCR product is 4 ng/μL for loci, which falls within the default size of 80– 120 bp; however, longer PCR products require a higher concentration as measured by mass to 118 Table 3. Percent of missing data in uni-plex and 8-plex and apparent heterozygosity in 8plex Assays/SNPs Plex level sd-1SNP sd-1Del Pi-ta waxyIN1 waxyEX6 alk3 alk4 fgr Missing data uni-plex * 0% 8% 0% 0% 0% 1.1% 0% 0% Missing data 8-plex * 4.5% 43.7% 0% 0% 4.2% 3.1% 0% 0% Missing data modified 8-plex † Apparent heterozygosity 8-plex and modified 8-plex 4.5% 5% 0% 0% 4.2% 3.1% 0% 0% 3.9% 0% 0% 0% 0% 0% 3.1% 0% * Assays designed with Sequenom® MassARRAY® Assay design 3.1 † Assays designed with Sequenom® MassARRAY® Assay design 3.1 except for sd-1Del where PCR primers were designed by Primer 3, sd-1Del amplified in uni-plex and extended and analysed in 8-plex. maintain the molar concentration at acceptable levels for iPlex extension reactions. The concentration of PCR products differs between uniplex and eight-plex systems, which may have an effect on peak height calls. These differences are a result of competition between each PCR in multiplex, and show 5.7%–17.8% reductions in the final eight-plex PCR assay compared with the uniplex assay. Even spectral peak heights (Figure 5) are critical for accurate genotype calls using MALDITOF mass spectrometry, and this is achieved by increasing the concentration of individual extension primers, not by modifying capture PCR conditions, because this does not have a significant effect on the final spectra. PCR yield is intrinsic to the PCR conditions and, when optimized, should be adhered to; increasing the concentration of template, primer and Taq enzyme above that recommended concentration may increase yield in uniplex; however, in multiplex, it may lead to the generation of dimers and spurious PCR products. 119 Accurate DNA sequence data for each polymorphism represent the most important prerequisite for accurate assay design. However, public domain databases and published papers can have conflicting data for each locus. For example, three different sequences for sd-1Del appear in the public domain: the deletion has been reported to be 382 bp (Spielmeyer et al., 2002) or 383 bp (Monna et al., 2002; Sasaki et al., 2002), and differs by the length of intron and the exact location of the deletion. In cases such as this, re-sequencing the target region is necessary for accurate primer design, which ultimately leads to an accurate, consistent assay. The capture PCR stage is important in uniplex reactions, but it is critical in the multiplex system because of the high rate of competition between primers consuming templates and enzyme. Some primers worked well in uniplex, but had missing calls in multiplex, suggesting that there were interactions between primers in eight-plex (Table 3). For example, interactions between waxyIN1 and fgr increased the number of missing calls in eight-plex. There was, however, a high correlation of more than 98% between uniplex and eight-plex calls, and missing calls were around 0.15% and 1.68% (not including sd-1Del) for uniplex and eight-plex respectively, which compares favourably with other sequencing methods (Jones et al., 2007). Multiplex MALDI-TOF is a powerful tool for the detection and confirmation of SNPs in rice. It has been suggested that this platform has the capability of determining more than 40 SNPs in multiplex (Sequenom, 2006) and, given that the platform can process ten 384-well plates per day, users can theoretically analyse in excess of 153 000 SNPs daily (Perkel, 2008). This technique can be applied to segregating populations in the early stages of breeding programs to positively select desired polymorphisms and traits, and is a co-dominant system, having the ability to detect alleles in hybrids, heterozygotes (Jones et al., 2007) and polyploids (Henry et al., 2008). The capacity of the system to accurately identify haplotypes at one or 120 more loci, alk and waxy for example, allows for the efficient selection of target phenotypes within breeding programs. 121 CHAPTER 7 General discussion - Characterisation of starch traits and genes in Australian rice germplasm Background principles Starch is a carbohydrate consisting of large number of glucose units. A significant number of enzyme isoforms and activities contribute to starch synthesis in cereals including rice. Therefore, a substantial number of genes are involved in the process of starch synthesis. A simplified pathway diagram of starch bio-synthesis in Figure 1 (Chapter 1) shows how starch is synthesised by different enzymes and genes in plant green tissues and then deposited in the grain of cereals. Starch consists of two major components, amylose (~20-25%) and amylopectin (~75-80%). Variation in the genes and enzymes involved in synthesis of starch can change the composition and structure of starch considerably which can significantly affect the quality and palatability of rice. The variations normally occur at the DNA level due to spontaneous or induced mutations. The most abundant type of variation in all organisms are SNPs (Bryan et al., 2000; Kennedy et al., 2006). The main hypothesis of this thesis emerges from the fact that SNPs can change the gene, leading to alteration of enzymes, which in turn modifies the biochemical and physiochemical properties of starch (quality). For this purpose, a diverse set of Australian rice germplasm was obtained and the variation of starch related genes at the SNP level studied and a comprehensive association study pursued to ascertain the effect of each gene and its alleles on starch quality. 122 Search in SNP data bases and discovery of polymorphisms First, a range of databases such as OryzaSNP (http://oryzasnp.plantbiology.msu.edu/) were interrogated with BLAST (http://blast.ncbi.nlm.nih.gov/) to identify the previously reported SNPs in the targeted genes. In total, 399 SNPs were detected in 18 starch related genes in data base records. In contrast, sequencing 233 Australian rice breeding lines resulted in the detection of 501 SNPs and 113 Indels, 102 more than were available in the public domain. One of the advantages of this approach was the capacity to detect Indels, none of which were recorded in public databases. However, all Indels detected resided in introns and therefore had no obvious impact on gene function. Of 501 SNPs, only 75 (~14.9 %) were nonsynonymous leading to amino acid changes. This study clearly demonstrated Massively parallel sequencing (MPS) in combination with Long range PCR (LR-PCR) allows analysis of many candidate genes and ensures high sequence depth at all loci (Pettersson et al., 2009). There are a number of available databases which curate DNA polymorphism data which can be converted to DNA markers. Much time, money and intellectual energy has been expended in building and maintaining these databases which are now being rendered redundant by MPS technologies. MPS allows markers to be easily developed for any population of interest at reasonable cost. This means questions can now be tailored for each population/species and answers provided with much greater precision than was possible with marker information derived from unrelated germplasm. The error rate of the Illumina GAIIx and bias in coverage are challenges for this method and accuracy of data. The error rate is reportedly about 0.5-1.0% (Out et al., 2009). In this experiment, 233 DNA samples were pooled which means the detection of one SNP (variant) out of 233 corresponds to a SNP frequency of ~0.43% which is lower than the reported error 123 rate. However, this error rate corresponds to single reads and does not take into account high coverage and the creation of a consensus sequence. The coverage at each base pair reported in this thesis was so high, generally ranging from 12000 to 38000× and in one gene reaching 240,000×., errors could be identified and screened out by imposing a minimum SNP requirement. This has been discussed by Out et al. (2009) who reported there are strong correlations between allele frequencies, pool size, coverage and error rates in Illumina GAIIx sequencing. They demonstrated coverage of 25000× would be sufficient for detection of SNP with frequencies at or above 0.3%. High coverage ensured Illumina GAIIx platform error was neutralised in this experiment. Screening of functional SNPs Since the emergence of high-throughput whole genome sequencing technologies (Henry, 2008), it is not possible to recognize functional SNPs in a pool of DNA sequence data which contains neutral SNPs (George Priya Doss et al., 2008). Computational algorithms are useful and cost-effective tools for analysis of SNPs and genes. Recently, SNP-linkage disequilibrium and association studies, which need accurate phenotypic data of appropriate populations, have gained acceptance as procedures to assess functional SNPs (Carlson et al., 2003). However, these populations can be difficult to generate (Gupta et al., 2005), and they must have high variation in the studied traits. In this thesis, a computational screening pathway was developed to prioritize and rank plant SNPs to predict their functionality and impact on plant phenotypes. This showed there are significant numbers of important elements in the GBSSI gene, some of which have a strong association with starch physiochemical properties (Soussi et al., 2006). Based on computational analysis, the [C/A] SNP at exon 6 [Oryza SNP2], SNPs, ´C/T´ [OryzaSNP3] and ´C/T´ [OryzaSNP6] at exons 9 and 10 of GBSSI have been the most 124 influential SNPs in this population. Larkin and Park (2003) verified that haplotypes composed of SNPs at the exon 1/intron1 boundary site, exon 6 and exon 10 regulate GBSSI function. Chen et al. (2008a and 2008b) have also confirmed these SNPs can alter apparent amylose content and pasting properties of rice. The effect of the [C/A] SNP at exon 6 on amylose content and grain quality has been confirmed by many authors (Sano, 1984; Larkin and Park, 2003; Chen et al., 2008a). In silico analysis with FAS-ESS has also suggested another important silencer (ESS1) at the splice site of exon 1/intron 1 which has a [G/T] SNP [OryzaSNP1]. The significance of this SNP which reduces amylose content was confirmed by Cai et al. (1997) and in the association study (Chapter 4). With the advent of whole genome sequencing, this suggests a computational analysis of whole genome data is a means by which identification of important polymorphisms can be accelerated. Gene copy number in the rice genome In polyploid species which have two or more genomes such as wheat and brassica, there are as many copies of each gene as there are haploid genomes. Tetraploids have at least two copies of each gene while hexaploids have three copies and so on. Rice is a diploid species with mostly one copy of each gene. Some genes exist as gene families such as the starch synthases. The rice genome has fully been sequenced and annotated and there is an explicit list of genes available and in this study, there were 17 different genes, some of which were similar in sequence such as the starch synthase genes such as SSI, SSIIa, SSIIb etc. However, this gene family has been extensively characterised at both the gene and enzyme level and so each member of the gene family is now uniquely identified. In addition, these genes are located on different chromosomes, a means by which they are further separated. 125 Multiplexed MALDI-TOF Mass Spectrometry markers help to genotype individuals in a cost effective manner Multiplex MALDI-TOF is a powerful tool for the detection and confirmation of SNPs in rice. It has been suggested that this platform has the capability of determining more than 40 SNPs in multiplex (Sequenom, 2006) and, given that the platform can process ten 384-well plates per day, users can theoretically analyse in excess of 153 000 SNPs daily (Perkel, 2008). This technique can be applied to segregating populations in the early stages of breeding programs to select desired polymorphisms and traits, and is a co-dominant system, having the ability to detect alleles in hybrids, heterozygotes (Jones et al., 2007) and polyploids (Henry et al., 2008). After recognition and prioritization of important SNPs it was essential to find an appropriate way to genotype 110 functional SNPs in 233 individuals. Based on regular sequencing methods at least (110 × 233 = 25630) assays were needed, which would be too elaborate and expensive. In this thesis, it was demonstrated that DNA polymorphisms can be efficiently confirmed and analysed in rice using a MALDI-TOF mass spectrometry system. These assays can be used as a marker-assisted selection tool in conventional breeding programs. For this reason, an 8-plex assay was designed to check the suitability of some multiplexed MALDI-TOF SNP-specific markers for the first time in plants. The results showed that an optimal condition can be achieved and the method can be efficiently used in the genotyping of rice individuals in association studies (Chapter 4). The only drawback of this system was the inefficient recognition of large indels. However, this study found no Indels in protein coding sequences suggesting Indels are of minor importance in terms of trait determination generally. In the specific case of this rice breeding population, there were no functional indels and they therefore did not influence these data. 126 Association between SNPs in starch biosynthesis genes and the nutritional and functional properties of domesticated rice The main aim of this thesis was to find associations among SNPs in different starch-related genes and rice physiochemical properties. For this reason, 110 functional SNPs derived from database searches and direct sequencing (Chapter 2) were chosen and then genotyped in 233 Australian rice lines using Multiplexed MALDI-TOF (Chapter 6). In total, 65 SNPs were successfully genotyped in 233 breeding lines. No polymorphism was detected for AGPS2b, SPHOL, SSIIb, SSIVb and ISA1 (Yan et al., 2010). Moreover, there was no association between BEIIa and BEIIb (Fisher et al., 1993; Sun et al., 1997; Sun et al., 1998; Yamakawa et al., 2008) and any physiochemical properties in rice. Therefore, seven genes out of eighteen did not contribute to physiochemical properties of this population. Despite the existence of some reports at the gene level of the importance of these genes, no association between any of these genes and quality properties was found (Kawagoe et al., 2005; Tetlow et al., 2004). As almost all of these studies have been based on artificially induced mutants that abolish enzyme activity (Rolletschek et al., 2002). The data here suggests artificially induced mutations, such as In/dels, which abolish gene function may have utility in understanding the role of particular genes and enzymes in starch biosynthesis, provide little guidance as to what genes are important in the natural system. In contrast, GBSSI and SSIIa are major determinants of the important grain quality properties amylose content and gelatinization temperature. Highly significant associations were found between GBSSI and retrogradation and amylose content in addition to more significant relationships with RVA properties such as BDV, SBV and FV (Chen et al., 2008a, b; Cai et al., 1998). Larkin and Park, (2003) has already reported a SNP in exon 6 to be effective on amylose content. This study confirms the T/G SNP at exon 1/intron1 junction site has a major influence on a number of physico-chemical properties. 127 SSIIa presented very high association with pasting temperature, gelatinization temperature and peak time (Umemoto et al., 2004; Umemoto et al., 2008). Umemoto and Aoki, (2005) explained the alkali disintegration and eating quality of rice starch by polymorphism of two SNPs, [A/G] and [GC/TT]. These SNPs within exon 8 of the alk locus is also significantly associated with the gelatinisation temperature (GT) (Waters et al., 2006). We also confirmed a very significant association between GC/TT SNP at exon 8 of SSIIa with pasting temperature (R2=0.642). In this thesis, six genes, GBSSII, SSI, SSIIIa, SSIIIb, SSIVa, BEI had low to medium effects on the final phenotypic variation of individuals so these genes were called contributory (Dian et al., 2003; Fujita et al., 2006; Hirose et al., 2006; Umemoto et al., 2008). Here for the first time, SSIIIb and SSIVa were identified as PT-associated genes with relatively medium to high levels of association with pasting and/or gelatinization temperature. A statistically high R2 value of 0.315 was calculated between a T/G SNP in position 7232 of SSIIIb and pasting temperature. Analysis of this population identified a mixture of genes previously known to have a major impact on rice starch quality, GBSSI and SSIIa, and a set of genes which play relatively minor roles. It is possible the minor genes may make a contribution to starch quality in the Australian rice growing environment only, and other sets of genes are important in other environments. This may be a blueprint for future approaches to developing molecular markers for plant breeding. In the past the relative paucity of data has meant only genes of major effect could be pursued in plant breeding. It is now possible to identify more easily environmentally affected genes of small effect which in combination may have a significant impact on traits of interest. 128 The 6-glucose-phosphate translocator (GPT1) may contribute to resistant starch Glucose 6-phosphate/phosphate translocator (GPT1) imports carbon into non-photosynthetic plastids (Kammerer et al., 1998) and it appears to be a key gene controlling retrogradation degree in rice. The results of this study revealed that high amylose rice cultivars, characterized by low major RVA parameters, such as peak viscosity, hot paste viscosity, and cool paste viscosity, had more resistant starch and a lower estimated glycaemic index. When the retrogradation degree is higher, the starch is more resistant to digestion and the GI is lower (Hu et al., 2004). In this study, a significant association was found between a ´T/C´ SNP at position 1188 of GPT1, which determines Leu42Phe, and resistant-retrograded starch and amylose content. It is believed amylose content has a significant influence on retrogradation rate (Hu et al., 2004) but some studies show that these two important starch properties might work independently from one another. In spite of a correlation coefficient of 0.70 between amylose content and retrogradation degree in this study, the conclusion here is these two traits work independently but have some contributing influences on each other. Conclusion and future directions I conclude that there are three different gene categories affecting starch quality in Australian rice germplasm. First genes with major effects, such as GBSSI and SSIIa, which greatly impact starch characteristics. Any new SNP found in these genes can significantly influence starch phenotype. The second category contains genes with an intermediate impact on starch properties. In this thesis, I named this category contributory genes, which contribute to the major genes to shape the final starch composition. Any variation in these genes has a low to intermediate impact on starch. I suggest six genes, GBSSII, SSI, SSIIIa, SSIIIb, SSIVa, BEI reside in this category. Finally, there are genes such as debranching enzyme genes, which had minor or no impact on starch physiochemical properties. It should be noted that contributory 129 and minor genes may differently impact starch phenotypic variation in different germplasm and environments. Environmental factors may have a major influence on plant growth, starch genes, enzyme activities and traits and, therefore, the results of any association study. Do all roads lead to GBSSI and SSIIa? This study suggests that primary determinants of rice starch quality are GBSSI and SSIIa. However, there might be some other genes in the genome that have a significant impact on starch quality. In this thesis, I suggested GPT1 is one of these novel important contributing genes. Expanding the collection of known alleles of starch synthesis structural genes through whole genome sequencing and associating these with starch traits will improve resolution of the interactions in the starch gene network. Whole genome sequencing of up to 3000 cultivars is underway. When complete, associations between starch gene alleles and starch quality parameters will help us to reach an understanding of the role of natural variation in starch genes in determining starch quality. Protein quantity influences rice eating quality. However, does protein composition influence rice eating quality? Protein bodies (PBs) may be an important factor influencing rice composition and quality. PBs reside among starch granules and their distribution may have significant impact on starch quality. There are two types of protein bodies: PB-I that are prolamins and consist 18-20% of grain protein of which there are several subunits of known amino acid/ gene sequence. PB-II are glutelins and consist 70-80% of grain protein. There has been much activity investigating mechanism of storage protein synthesis and relatively little investigating mature grain quantity and amino acid composition of different storage protein classes. A possible path for assessing the role of proteins in rice quality may involve assembling a panel of rice genotypes with known differences in eating quality parameters as measured by taste panel and associating these differences with protein body subunit concentration (proteomics). Protein body subunit composition could be confirmed through whole genome sequencing and assembly to known subunit gene sequence and 130 correlations/association of protein body subunit concentration and composition with eating quality parameters tested. Regardless of whether the target for further investigation of rice grain quality is starch or protein, high throughput sequencing applied to structured genetic populations will prove to be a powerful tool which will be invaluable in determining the contribution to each of these entities to grain quality. 131 References Ahmadian A, Gharizadeh B, Gustafsson AC, Sterky F, Nyrén P, Uhlén M and Lundeberg J (2000) Single-nucleotide polymorphism analysis by pyrosequencing. Anal. Biochem. 280, 103–110. Akey JM, Zhang G, Zhang K, Jin L and Shriver MD (2002) Interrogating a high-density SNP map for signatures of natural selection. Genome. Res. 12, 1805-1814. Andriotis VME, Pike MJ, Bunnewell S, Hills MJ and Smith AM (2010) The plastidial glucose 6 phosphate/phosphate antiporter GPT1 is essential for morphogenesis in Arabidopsis embryos. Plant. J. 64 (1) 128-139 Andriotis VME, Pike MJ, Kular B, Rawsthorne S and Smith AM (2010) Starch turnover in developing oilseed embryos. New Phytol. 187:791-804. Asp NG and Björck I (1992) Resistant starch. Trends. Food. Sci. Tech. 3:111-114. Baldwin PM (2001) Starch Granule Associated Proteins and Polypeptides: A Review. StarchStärke 53:475-503. Ball SG and Morell MK (2003) From bacterial glycogen to starch: Understanding the biogenesis of the plant starch granule. Annu. Rev. Plant. Biol. 54(1): 207-233. Barreiro LB, Laval G, Quach H, Patin E and Quintana-Murci L (2008) Natural selection has driven population differentiation in modern humans. Nat. Genet. 40, 340-345. Batley J, Mogg R, Edwards D, O’Sullivan H and Edwards KJ (2003) A high-throughput SNuPE assay for genotyping SNPs in the flanking regions of Zea mays sequence tagged simple sequence repeats. Mol. Breed. 11, 111–120. Beatty MK, Rahman A, Cao H, Woodman W, Lee M, Myers AM and James MG (1999) Purification and molecular genetic characterization of ZPU1, a pullulanase-type starchdebranching enzyme from maize. Plant. Physiol. 119, 255-266. Bentley DR (2006) Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16, 545-552. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL and Bignell HR (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 456:53-59. Bhattramakki D, Dolan M, Hanafey M, Wineland R, Vaske D, Register JC, Tingey SV and Rafalski A (2002) Insertion–deletion polymorphisms in 3′ regions of maize genes occur frequently and can be used as highly informative genetic markers. Plant Mol. Biol.48, 539–547. 132 Bodmer W and Bonilla C (2008) Common and rare variants in multifactorial susceptibility to common diseases. Nat. Genet. 40, 695-701. Bork P and Koonin EV (1998) Predicting functions from protein sequences—where are the bottlenecks? Nat. Genet. 18(4): 313-318 Boyer CD and Preiss J (1978) Multiple forms of starch branching enzyme of maize: evidence for independent genetic control. Biochem. Biophys. Res. Commun. 80, 169-175. Bradbury LMT, Fitzgerald TL, Henry RJ, Jin Q and Waters DLE (2005) The gene for fragrance in rice. Plant Biotechnol. J. 3, 363–370. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y and Buckler ES (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 23:2633-2635. Brookes AJ (1999) The essence of SNPs. Gene. 234 (2): 177-186 Bryan GT, Wu KS, Farrall L, Jia Y, Hershey HP, McAdams SA, Faulk KN, Donaldson GK, Tarchini R, and Valent B (2000) A single amino acid difference distinguishes resistant and susceptible alleles of the rice blast resistance gene Pi-ta. Plant. Cell. Online. 12(11): 20332046 Buléon A, Colonna P, Planchot V and Ball S (1998) Starch granules: structure and biosynthesis. Int. J. Biological. Macromol. 23:85-112. Bulyk ML (2004) Computational prediction of transcription-factor binding site locations. Genome. Biol. 5(1): 201-201 Buschiazzo A, Ugalde JE, Guerin ME, Shepard W, Ugalde RA and Alzari PM (2004) Crystal structure of glycogen synthase: homologous enzymes catalyze glycogen synthesis and degradation. EMBO. J. 23(16): 3196 Bustos R, Fahy B, Hylton CM, Seale R, Nebane NM, Edwards A, Martin C and Smith AM (2004) Starch granule initiation is controlled by a heteromultimeric isoamylase in potato tubers. Proc. Natl. Acad. Sci. USA. 101, 2215-2220. Cai XL, Wang ZY, Zheng FQ and Hong MM (1997) Regulation-related Intron in 5'Untranslated Region of Rice Waxy Gene. Acta. Phytophysio. Sini. 23: 257-261 Cai XL, Wang ZY, Xing YY, Zhang JL and Hong MM (1998) Aberrant splicing of intron 1 leads to the heterogeneous 5 UTR and decreased expression of waxy gene in rice cultivars of intermediate amylose content. Plant. J. 14(4): 459-465 133 Cao H, Imparl-Radosevich J, Guan H, Keeling PL, James MG and Myers AM (1999) Identification of the soluble starch synthase activities of maize endosperm. Plant. Physiol. 120, 205-216. Carlson CS, Eberle MA, Rieder MJ, Smith JD, Kruglyak L and Nickerson DA (2003) Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nat. Genet. 33(4): 518-521 Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT, Fu G and Hinds DA (2007) Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science. 317:338 Craig J, Lloyd JR, Tomlinson K, Barber L, Edwards A, Wang TL, Martin C, Hedley CL and Smith AM (1998) Mutations in the gene encoding starch synthase II profoundly alter amylopectin structure in pea embryos. Plant. Cell. Online. 10, 413-426. Chen X and Sullivan PF (2003) Single nucleotide polymorphism genotyping: biochemistry, protocol, cost and throughput. Pharmacogenomics. J. 3(2): 77-96 Chen MH, Bergman C and Fjellstrom R (2004) Waxy locus genetic variation associated with amylose content in international rice germplasm. In: 30th Proceedings of the Rice Technical Working Group Meeting. Beaumont, Texas, USA Chen MH, Bergman C, Pinson S and Fjellstrom R (2008a) Waxy gene haplotypes: Associations with apparent amylose content and the effect by the environment in an international rice germplasm collection. J. Cereal. Sci. 47(3): 536-545 Chen MH, Bergman CJ, Pinson S and Fjellstrom R (2008b) Waxy gene haplotypes: Associations with pasting properties in an international rice germplasm collection. J. Cereal. Sci. 48(3): 781-788 Chothia C and Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO. J. 5(4): 823-826 Chung HJ, Lim HS and Lim ST (2006) Effect of partial gelatinization and retrogradation on the enzymatic digestion of waxy rice starch. J. Cereal. Sci. 43:353-359. Commuri PD and Keeling PL (2001) Chain-length specificities of maize starch synthase I enzyme: studies of glucan affinity and catalytic properties. Plant. J. 25:475-486. Costabile M, Quach A and Ferrante A (2006) Molecular approaches in the diagnosis of primary immunodeficiency diseases. Hum. Mutat. 27, 1163–1173. 134 Dayong X, Jun J, Suyun H, Xiehong W, Yun G and Qingsen Z (2004) Effects of N, P and K fertilizer amount on rice grain amylose content and starch viscosity properties. Chinese. Agri. Sci. Bull. 20:99-99. Debet MR and Gidley MJ (2007) Why do gelatinized starch granules not dissolve completely? Roles for amylose, protein, and lipid in granule “ghost” integrity. J. Agri. Food. Chem. 55:4752-4760. Dian W, Jiang H, Chen Q, Liu F and Wu P (2003) Cloning and characterization of the granulebound starch synthase II gene in rice: gene expression is regulated by the nitrogen level, sugar and circadian rhythm. Planta. 218, 261-268. Dian W, Jiang H and Wu P (2005) Evolution and expression analysis of starch synthase III and IV in rice. J. Exp. Bot., 56, 623-632. Ding C and Cantor CR (2003) A high-throughput gene expression analysis technique using competitive PCR and matrix-assisted laser desorption ionization time-of-flight MS. Proc. Natl. Acad. Sci. USA. 100:3059-3064. Dinges JR, Colleoni C, James MG and Myers AM (2003) Mutational analysis of the pullulanase-type debranching enzyme of maize indicates multiple functions in starch metabolism. Plant. Cell. Online. 15, 666-680. Doehlert DC and Knutson CA (1991) Two classes of starch debranching enzymes from developing maize kernels. Jahresheft der Albrecht-Thaer-Gesellschaft (Germany).138(5) 566-572 Dohm JC, Lottaz C, Borodina T and Himmelbauer H (2008) Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic. Acids. Res. 36 (16) e105 Domon E, Saito A and Takeda K (2002) Comparison of the waxy locus sequence from a nonwaxy strain and two waxy mutants of spontaneous and artificial origins in barley. Genes. Genet. Sys. 77:351-359. Druley TE, Vallania FLM, Wegner DJ, Varley KE, Knowles OL, Bonds JA, Robison SW, Doniger SW, Hamvas A and Cole FS (2009) Quantification of rare allelic variants from pooled genomic DNA. Nat. Methods. 6:263-265. Drenkard E, Richter BG, Rozen S, Stutius LM, Angell NA, Mindrinos M, Cho RJ, Oefner PJ, Davis RW and Ausubel FM (2000) A simple procedure for the analysis of single nucleotide polymorphisms facilitates map-based cloning in Arabidopsis. Plant. Physiol. 124, 1483– 1492. Eagles HA, Cane K, Appelbee M, Kuchel H, Eastwood RF and Martin PJ (2012) The storage protein activator gene Spa-B1 and grain quality traits in southern Australian wheat breeding programs. Crop and Pasture Sci. 63(4) 311-318. 135 Eastmond PJ and Rawsthorne S (2000) Coordinate changes in carbon partitioning and plastidial metabolism during the development of oilseed rape embryos. Plant. Physiol. 122:767-774. Edwards D, Forster JW, Cogan NOI, Batley J and Chagné D (2007) Chapter 4: Single nucleotide polymorphism discovery in plants. Association mapping in plants Springer, New York: 53–76 ElSharawy A, Manaster C, Teuber M, Rosenstiel P, Kwiatkowski R, Huse K, Platzer M, Becker A, Nurnberg P and Schreiber S (2006) SNPSplicer: systematic analysis of SNP-dependent splicing in genotyped cDNAs. Hum. Mutat. 27(11) 1129-1134 Fairbrother WG and Chasin LA (2000) Human genomic sequences that inhibit splicing. Mol. Cell. Biol. 20(18): 6816-6825 Fairbrother WG, Yeh RF, Sharp PA and Burge CB (2002) Predictive identification of exonic splicing enhancers in human genes. Science 9(297) 1007-1013. Faisant N, Champ M, Colonna P, Buleon A, Molis C, Langkilde A, Schweizer T, Flourie B and Galmiche J (1993) Structural features of resistant starch at the end of the human small intestine. Europ. J. Clin. Nutr. 47:285. Fan J and Marks B (1998) Retrogradation kinetics of rice flours as influenced by cultivar. Cereal. Chem. 75:153-155. Fersht AR (1985) Enzyme structure and function New York, Freeman and Co. USA Fischer K and Weber A (2002) Transport of carbon in non-green plastids. Trends. Plant. Sci., 7, 345-351. Fisher DK, Boyer CD and Hannah LC (1993) Starch branching enzyme II from maize endosperm. Plant. Physiol. 102, 1045-1046. Fitzgerald M (2004) Starch. Rice Chemistry and Technology. 109–141. Fujita N, Kubo A, Suh DS, Wong KS, Jane JL, Ozawa K, Takaiwa F, Inaba Y and Nakamura Y (2003) Antisense inhibition of isoamylase alters the structure of amylopectin and the physicochemical properties of starch in rice endosperm. Plant. Cell. Physiol., 44, 607-618. Fujita N, Yoshida M, Asakura N, Ohdan T, Miyao A, Hirochika H and Nakamura Y (2006) Function and characterization of starch synthase I using mutants in rice. Plant. Physiol. 140:1070. Fujita N, Yoshida M, Kondo T, Saito K, Utsumi Y, Tokunaga T, Nishi A, Satoh H, Park JH and Jane JL (2007) Characterization of SSIIIa-deficient mutants of rice: the function of SSIIIa and pleiotropic effects by SSIIIa deficiency in the rice endosperm. Plant. Physiol. 144, 2009-2023. 136 Futschik A and Schlotterer C (2010) The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics. 186, 207-218. Gao M, Fisher DK, Kim KN, Shannon JC and Guiltinan MJ (1997) Independent genetic control of maize starch-branching enzymes IIa and IIb (Isolation and characterization of a Sbe2a cDNA). Plant. Physiol. 114, 69-78. Gao ZY, Zeng DL, Cui X, Zhou YH, Yan MX, Huang DN, Li JY and Qian Q (2003) Mapbased cloning of the ALK gene, which controls the gelatinization temperature of rice. Sci. Chin. Ser. C, Life Sci. 46, 661–668. Garg K, Green P and Nickerson DA (1999) Identification of candidate coding region single nucleotide polymorphisms in 165 human genes using assembled expressed sequence tags. Genome. Res. 9, 1087–1092. George Priya Doss C, Sudandiradoss C, Rajasekaran R, Choudhury P, Sinha P, Hota P and Batra UP (2008) Applications of computational algorithm tools to identify functional SNPs. Funct. Integr. Genomic. 8(4) 309-316 Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P and Varma H (2002) A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica). Science 296:92-100. Gunderson KL, Steemers FJ, Ren H, Ng P, Zhou L, Tsan C, Chang W, Bullis D, Musmacker J and King C (2006) Whole- genome genotyping. Methods. Enzymol. 410, 359–376. Gupta PK, Roy JK and Prasad M (2001) Single nucleotide polymorphisms: a new paradigm for molecular marker technology and DNA polymorphism detection with emphasis on their use in plants. Curr. Sci. 80, 524–535. Gupta PK, Rustgi S and Kulwal PL (2005) Linkage disequilibrium and association studies in higher plants: present status and future prospects. Plant Mol Biol 57(4): 461-485 Hanashiro I, Itoh K, Kuratomi Y, Yamazaki M, Igarashi T, Matsugasako J and Takeda Y (2008) Granule-bound starch synthase I is responsible for biosynthesis of extra-long unit chains of amylopectin in rice. Plant. Cell. Physiol. 49:925. Harismendy O and Frazer K (2009) Method for improving sequence coverage uniformity of targeted genomic intervals amplified by LR-PCR using Illumina GA sequencing-bysynthesis technology. Biotechniques. 46, 229-231. Harn C, Knight M, Ramakrishnan A, Guan H, Keeling PL and Wasserman BP (1998) Isolation and characterization of the zSSIIa and zSSIIb starch synthase cDNA clones from maize endosperm. Plant. Mol Biol. 37:639-649. 137 Hayashi K, Hashimoto N, Daigen M and Ashikawa I (2004) Development of PCR-based SNP markers for rice blast resistance genes at the Piz locus. Theor. Appl. Genet. 108, 1212–1220. Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouze P and Brunak S (1996) Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic. Acids. Res. 24(17): 3439-3452 Hegyi H and Gerstein M (1999) The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol. 288(1) 147-164 Heinemeyer T, Wingender E, Reuter I, Hermjakob H, Kel AE, Kel OV, Ignatieva EV, Ananko EA, Podkolodnaya OA and Kolpakov FA (1998) Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. Nucleic. Acids. Res. 26(1): 362-367 Henry RJ (2008) Future prospects for plant penotyping', (ed), Plant genotyping II: SNP technology, CABI Publishing, Wallingford, UK, pp. 272-280. Henry RJ, Pattemore JA, Waters DLE, Kharabian-Masouleh A, Bundock PC and Eliott FG (2008) Applications of the sequenom platform to SNP analysis in plants. In: Plant and Animal Genomes XVI Conference. San Diego, CA: Town and Country Convention Center. Hillier LDW, Marth GT, Quinlan AR, Dooling D, Fewell G, Barnett D, Fox P, Glasscock JI, Hickenbotham M and Huang W (2008) Whole-genome sequencing and variant discovery in C. elegans. Nat. Methods. 5:183-188. Hirose T and Terao T (2004) A comprehensive expression analysis of the starch synthase gene family in rice (Oryza sativa L.). Planta. 220. 9-16. Hirose T, Ohdan T, Nakamura Y and Terao T (2006) Expression profiling of genes related to starch synthesis in rice leaf sheaths during the heading period. Physiol. Plant. 128, 425-435. Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, Smith SW, Middle CM, Rodesch MJ, Albert TJ and Hannon GJ (2007) Genome-wide in situ exon capture for selective resequencing. Nat. Genet. 39, 1522-1527. Hovenkamp-Hermelink JHM, Jacobsen E, Ponstein AS, Visser RGF, Vos-Scheperkeuter GH, Bijmolt EW, Vries JN, Witholt B and Feenstra WJ (1987) Isolation of an amylose-free starch mutant of the potato (Solanum tuberosum L.). Theor. Appl. Genet. 75:217-221. Hrmova M and Fincher GB (2001) Structure-function relationships of ß-D-glucan endo-and exohydrolases from higher plants. Plant. Mol. Biol. 47(1): 73-91 138 Hu P, Zhao H, Duan Z, Linlin Z and Wu D (2004) Starch digestibility and the estimated glycemic score of different types of rice differing in amylose contents. J. Cereal. Sci. 40:231-237. Hutchings D, Rawsthorne S and Emes MJ (2005) Fatty acid synthesis and the oxidative pentose phosphate pathway in developing embryos of oilseed rape (Brassica napus L.). J. Exp. Bot. 56:577 Hu P, Zhao H, Duan Z, Linlin Z and Wu D (2004) Starch digestibility and the estimated glycemic score of different types of rice differing in amylose contents. J. Cereal. Sci. 40:231-237. Imelfort M, Duran C, Batley J and Edwards D (2009) Discovering genetic polymorphisms in next generation sequencing data. Plant Biotechnol J. 7:312-317. Imparl-Radosevich JM, Nichols DJ, Li P, McKean AL, Keeling PL and Guan H (1999) Analysis of purified maize starch synthases IIa and IIb: SS isoforms can be distinguished based on their kinetic properties. Arch. Biochem. Biophys. 362:131-138. Imparl-Radosevich JM, Gameon JR, McKean A, Wetterberg D, Keeling PL and Guan H (2003) Understanding catalytic properties and functions of maize starch synthase isozymes. J. Appl. Glycoscience. 50:177-182. Ingman M and Gyllensten U (2008) SNP frequency estimation using massively parallel sequencing of pooled DNA. Eur. J. Hum. Genet. 17, 383-386. Ishikawa N, Ishihara J and Itoh M (1995) Artificial induction and characterization of amylosefree mutants of barley. Barley. Genet. Newsl. 24:49-53. Isshiki M, Morino K, Nakajima M, Okagaki RJ, Wessler SR, Izawa T and Shimamoto K (1998) A naturally occurring functional allele of the rice waxy locus has a GT to TT mutation at the 5 splice site of the first intron. Plant. J. 15:133-138. James MG, Denyer K and Myers AM (2003) Starch synthesis in the cereal endosperm. Curr. Opin. Plant Biol. 6, 215-222. Jeong SC, Kristipati S, Hayes AJ, Maughan PJ, Noffsinger SL, Gunduz I, Buss GR and Maroof MAS (2002) Genetic and sequence analysis of markers tightly linked to the soybean mosaic virus resistance gene, Rsv 3. Crop. Sci. 42, 265–270. Jiang H, Dian W, Liu F and Wu P (2003) Cloning and characterization of a glucose 6phosphate/phosphate translocator from Oryza sativa. J. Zhejiang. Univ-Sci. A., 4, 331-335. Jones ES, Sullivan H, Bhattramakki D and Smith JSC (2007) A comparison of simple sequence repeat and single nucleotide polymorphism marker technologies for the genotypic analysis of maize (Zea mays L.). Theor. Appl. Genet. 115:361-371. 139 Juliano BO, Onate LU and Del Mundo AM (1965) Relation of starch composition, protein content, and gelatinization temperature to cooking and eating qualities of milled rice. Food. Technol. 19. Kaiser J (2008) DNA sequencing: A plan to capture human diversity in 1000 Genomes. Science. 319:395. Kammerer B, Fischer K, Hilpert B, Schubert S, Gutensohn M, Weber A and Flügge UI (1998) Molecular characterization of a carbon transporter in plastids from heterotrophic tissues: the glucose 6-phosphate/phosphate antiporter. Plant. Cell. Online. 10:105. Kawagoe Y, Kubo A, Satoh H, Takaiwa F and Nakamura Y (2005) Roles of isoamylase and ADP glucose pyrophosphorylase in starch granule synthesis in rice endosperm. Plant. J. 42, 164-174. Kennedy BG, Waters DLE, Henry RJ (2006) Screening for the rice blast resistance gene Pi-ta using LNA displacement probes and real-time PCR. Mol Breeding 18(3): 185-193 Kharabian-Masouleh A, Waters DLE, Reinke RF and Henry RJ (2011) Discovery of polymorphisms in starch related genes in rice germplasm by amplification of pooled DNA and deeply parallel sequencing. Plant. Biotechnol. J. 9 (9):1074-1085. Kiesselbach TA (1944) Character, field performance, and commercial production of waxy corn. J. Am. Soc. Agron. 36:668-682. Kim JI, Ju YS, Park H, Kim S, Lee S, Yi JH, Mudge J, Miller NA, Hong D and Bell CJ (2009) A highly annotated whole-genome sequence of a Korean individual. Nature. 460:10111015. Kubo A, Fujita N, Harada K, Matsuda T, Satoh H, Nakamura Y (1999) The starch-debranching enzymes isoamylase and pullulanase are both involved in amylopectin biosynthesis in rice endosperm. Plant. Physiol. 121(2): 399-410 Kuipers AGJ, Jacobsen E and Visser RGF (1994) Formation and deposition of amylose in the potato tuber starch granule are affected by the reduction of granule-bound starch synthase gene expression. Plant. Cell. Online. 6:43. Kuriki T, Stewart DC and Preiss J (1997) Construction of chimeric enzymes out of maize endosperm branching enzymes I and II. J. Biol. Chem. 272, 28999-29004. Larkin PD and Park WD (2003) Association of waxy gene single nucleotide polymorphisms with starch characteristics in rice (Oryza sativa L.). Mol. Breeding. 12(4): 335-339 Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26(2): 283-291 140 Li Z, Rahman S, Kosar-Hashemi B, Mouille G, Appels R and Morell MK (1999) Cloning and characterization of a gene encoding wheat starch synthase I. Theor. Appl. Genet. 98:12081216. Li Z, Chu X, Mouille G, Yan L, Kosar-Hashemi B, Hey S, Napier J, Shewry P, Clarke B and Appels R (1999) The localization and expression of the class II starch synthases of wheat. Plant. Physiol. 120:1147. Li Z, Mouille G, Kosar-Hashemi B, Rahman S, Clarke B, Gale KR, Appels R and Morell MK (2000) The structure and expression of the wheat starch synthase III gene. Motifs in the expressed gene define the lineage of the starch synthase III gene family. Plant. Physiol. 123:613. Libessart N, Maddelein ML, Koornhuyse NV, Decq A, Delrue B, Mouille G, D'Hulst C and Ball S (1995) Storage, photosynthesis, and growth: the conditional nature of mutations affecting starch synthesis and structure in Chlamydomonas. Plant. Cell. Online. 7:1117. Limpisut P and Jindal VK (2002) Comparison of rice flour pasting properties using Brabender Viscoamylograph and Rapid Visco Analyser for evaluating cooked rice texture. StarchStärke. 54:350-357. Livak KJ (1999) Allelic discrimination using fluorogenic probes and the 5′ nuclease assay. Genetic analysis: Biomol. Eng. 14, 143 –149. Lumdubwong N and Seib P (2000) Rice starch isolation by alkaline protease digestion of wetmilled rice flour. J. Cereal. Sci. 31:63-74. Mardis ER (2008a) The impact of next-generation sequencing technology on genetics. Trends. Genet. 24, 133-141. Mardis ER (2008b) Next-generation DNA sequencing methods. Annu. Rev. Genom. Hum. G. 9 387-402. Marshall W, Normand F and Goynes W (1990) Effects of lipid and protein removal on starch gelatinization in whole grain milled rice. Cereal. Chem. 67:458-463. Masouleh AK, Waters DLE, Reinke RF and Henry RJ (2009) A high-throughput assay for rapid and simultaneous analysis of perfect markers for important quality and agronomic traits in rice using multiplexed MALDI-TOF mass spectrometry. Plant. Biotechnol. J. 7, 355-363. McGuigan FE and Ralston SH (2002) Single nucleotide poly- morphism detection: allelic discrimination using TaqMan. Psychiatr. Genet. 12, 133–136. 141 McNally KL, Childs KL, Bohnert R, Davidson RM, Zhao K, Ulat VJ, Zeller G, Clark RM, Hoen DR and Bureau TE (2009) Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proceedings of the National Academy of Sciences 106:12273. Mikami I, Uwatoko N, Ikeda Y, Yamaguchi J, Hirano HY, Suzuki Y and Sano Y (2008) Allelic diversification at the wx locus in landraces of Asian rice. Theor. Appl. Genet. 116(7): 979989 Miles MJ, Morris VJ, Orford PD and Ring SG (1985) The roles of amylose and amylopectin in the gelation and retrogradation of starch. Carbohydate. Res. 135:271-281. Monna L, Kitazawa N, Yoshino R, Suzuki J, Masuda H, Maehara Y, Tanji M, Sato M, Nasu S and Minobe Y (2002) Positional cloning of rice semidwarfing gene, sd-1: ‘Rice green revolution gene’ encodes a mutant enzyme involved in gibberellin synthesis. DNA. Res. 9, 11–17. Morell MK, Kosar-Hashemi B, Cmiel M, Samuel MS, Chandler P, Rahman S, Buleon A, Batey I.L and Li Z (2003) Barley sex6 mutants lack starch synthase IIa activity and contain a starch with novel properties. Plant. J. 34, 173-185. Morozova O and Marra MA (2008) Applications of next-generation sequencing technologies in functional genomics. Genomics. 92, 255-264. Morrison WR (1988) Lipids in cereal starches: A review. J. Cereal. Sci. 8:1-15. Morris CF (2002) Puroindolines: the molecular genetic b a s i s of wheat grain hardness. Plant. Mol. Biol. 48, 633–647. Nakamura T, Vrinten P, Hayakawa K and Ikeda J (1998) Characterization of a granule-bound starch synthase isoform found in the pericarp of wheat. Plant. Physiol. 118, 451-459. Nakamura Y (2002) Towards a better understanding of the metabolic system for amylopectin biosynthesis in plants: rice endosperm as a model tissue. Plant. Cell. Physiol. 43, 718-725. Nakamura Y, Francisco PB, Hosaka Y, Sato A, Sawada T, Kubo A and Fujita N (2005) Essential amino acids of starch synthase IIa differentiate amylopectin structure and starch quality between japonica and indica rice varieties. Plant Mol. Biol. 58:213-227. Nasu S, Suzuki J, Ohta R, Hasegawa K, Yui R, Kitazawa N, Monna L and Minobe Y (2002) Search for and analysis of single nucleotide polymorphisms (SNPs) in rice (Oryza sativa, Oryza rufipogon) and establishment of SNP markers. DNA Res. 9, 163-171. Ng PC and Henikoff S (2002) Accounting for human polymorphisms predicted to affect protein function. Cold Spring Harbor Laboratory Press, pp. 436-446. 142 Ng PC and Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic. Acids. Res. 31(13): 3812 Niewiadomski P, Knappe S, Geimer S, Fischer K, Schulz B, Unte US, Rosso MG, Ache P, Flügge UI and Schneider A (2005) The Arabidopsis plastidic glucose 6phosphate/phosphate translocator GPT1 is essential for pollen maturation and embryo sac development. Plant. Cell. Online. 17:760. Nishi A, Nakamura Y, Tanaka N and Satoh H (2001) Biochemical and Genetic Analysis of the Effects ofAmylose-Extender Mutation in Rice Endosperm. Plant. Physiol. 127:459. Novaes E, Drost DR, Farmerie WG, Pappas GJ, Grattapaglia D, Sederoff RR and Kirst M (2008) High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC. Genomics. 9:312. Ohdan T, Francisco Jr PB, Sawada T, Hirose T, Terao T, Satoh H and Nakamura Y (2005) Expression profiling of genes involved in starch synthesis in sink and source organs of rice. J. Exp. Bot. 56, 3229-3244. Olivier M (2005) The Invader® assay for SNP genotyping. Mutat. Res. 573, 103–110. Out AA, van Minderhout I, Goeman JJ, Ariyurek Y, Ossowski S, Schneeberger K, Weigel D, van Galen M, Taschner PEM and Tops CMJ (2009) Deep sequencing to reveal new variants in pooled DNA samples. Hum. Mutat. 30, 1703-1712. Panlasigui L, Thompson L, Juliano B, Perez C, Yiu S and Greenberg G (1991) Rice varieties with similar amylose content differ in starch digestibility and glycemic response in humans. Am. J. Clin. Nutr. 54:871. Patron NJ, Smith AM, Fahy BF, Hylton CM, Naldrett MJ, Rossnagel BG and Denyer K (2002) The altered pattern of amylose accumulation in the endosperm of low-amylose barley cultivars is attributable to a single mutant allele of granule-bound starch synthase I with a deletion in the 5'-non-coding region. Plant. Physio. 130:190. Peng S, Huang J, Sheehy JE, Laza RC, Visperas RM, Zhong X, Centeno GS, Khush GS and Cassman KG (2004) Rice yields decline with higher night temperature from global warming. Proc. Natl. Acad. Sci. USA. 101:9971. Perkel J (2008) SNP genotyping: six technologies that keyed a revolution. Nat Methods, 5, 447–453. Pertea M, Lin X, Salzberg SL (2001) GeneSplicer: a new computational method for splice site prediction. Nucleic. Acids. Res. 29(5): 1185-1190 143 Pesole G and Liuni S (1999) Internet resources for the functional analysis of 5'and 3' untranslated regions of eukaryotic mRNAs. Trends. Genet. 15(9): 378-378 Pettersson E, Lundeberg J and Ahmadian A (2009) Generations of sequencing technologies. Genomics. 93, 105-111. Philpot K, Martin M, Butardo Jr V, Willoughby D and Fitzgerald M (2006) Environmental factors that affect the ability of amylose to contribute to retrogradation in gels made from rice flour. J. Agri. Food. Chem. 54:5182-5190. Raemakers K, Schreuder M, Suurs L, Furrer-Verhorst H, Vincken JP, Vetten N, Jacobsen E and Visser RGF (2005) Improved cassava starch by antisense inhibition of granule-bound starch synthase I. Mol. Breeding. 16:163-172. Rafalski A (2002) Applications of single nucleotide polymorphisms in crop genetics. Curr. Opin. Plant. Biol. 5, 94–100. Ragaee S and Abdel-Aal ESM (2006) Pasting properties of starch and protein in selected cereals and quality of their food products. Food. Chem. 95:9-18. Rahman S, Li Z, Batey I, Cochrane MP, Appels R and Morell M (2000) Genetic alteration of starch functionality in wheat. J. Cereal Sci. 31, 91-110. Rahman S, Regina A, Li Z, Mukai Y, Yamamoto M, Kosar-Hashemi B, Abrahams S and Morell MK (2001) Comparison of starch-branching enzyme genes reveals evolutionary relationships among isoforms. Characterization of a gene for starch-branching enzyme IIa from the wheat D genome donor Aegilops tauschii. Plant. Physiol. 125, 1314-1324. Rajasekaran R, Sudandiradoss C, Doss CGP, Sethumadhavan R (2007) Identification and in silico analysis of functional SNPs of the BRCA1 gene. Genomics. 90(4): 447-452 Rajasekaran R, George Priya Doss C, Sudandiradoss C, Ramanathan K, Rituraj P and Rao S (2008) Computational and structural investigation of deleterious functional SNPs in breast cancer BRCA2 gene. Chinese. J. Biotech. 24(5): 851-856 Rajesh S, Raveendran M and Manickam A (2008) Prediction of 3-dimensional structure of EMV1, a group 1 late embryogenesis abundant protein of Vigna radiata Wilczek. Plant. Omics. J. 1(1): 17-25 Ramensky V, Bork P and Sunyaev S (2002) Human non-synonymous SNPs: server and survey. Nucleic. Acids. Res. 30(17): 3894-3900 Rapley R and Harbron SE (2004) Molecular analysis and genome discovery. Sussex, UK: Wiley. 144 Ring SG, Gee JM, Whittam M, Orford P and Johnson IT (1988) Resistant starch: its chemical form in foodstuffs and effect on digestibility in vitro. Food. Chem. 28:97-109. Roth C and Liberles DA (2006) A systematic search for positive selection in higher plants (Embryophytes). BMC. Plant. Biol. 6, 12. Rowland-Bamford AJ, Allen LH, Baker JT and Boote K (1990) Carbon dioxide effects on carbohydrate status and partitioning in rice. J. Exp. Bot. 41:1601. Rolletschek H, Hajirezaei MR, Wobus U and Weber H (2002) Antisense-inhibition of ADPglucose pyrophosphorylase in Vicia narbonensis seeds increases soluble sugars and leads to higher water and nitrogen uptake. Planta. 214, 954-964. Rolletschek H, Nguyen TH, Häusler RE, Rutten T, Göbel C, Feussner I, Radchuk R, Tewes A, Claus B and Klukas C (2007) Antisense inhibition of the plastidial glucose 6 phosphate/phosphate translocator in Vicia seeds shifts cellular differentiation and promotes protein storage. Plant. J. 51:468-484. Saito M, Konda M, Vrinten P, Nakamura K and Nakamura T (2004) Molecular comparison of waxy null alleles in common wheat and identification of a unique null allele. Theor. Appl. Genet. 108:1205-1211. Sajilata M, Singhal RS and Kulkarni PR (2006) Resistant starch–a review. Comprehensive. Rev. Food. Sci. F. 5:1-17. Sano Y (1984) Differential regulation of waxy gene expression in rice endosperm. Theor. Appl. Genet. 68(5): 467-473 Sasaki A, Ashikari M, Ueguchi-Tanaka M, Itoh H, Nishimura A, Swapan D, Ishiyama K, Saito T, Kobayashi M and Khush GS (2002) A mutant gibberellin-synthesis gene in rice. Nature. 416, 701–702. Sato K, Inaba K and Tozawa M (1973) High temperature injury of ripening in rice plant. I. The effects of high temperature treatments at different stages of panicle development on the ripening, pp 207-213. Sato K (1984) Starch granules in tissues of rice plants and their changes in relation to plant growth. Jap. Agric. Res. Quarter. 18:78-86. Satoh H, Nishi A, Yamashita K, Takemoto Y, Tanaka Y, Hosaka Y, Sakurai A, Fujita N and Nakamura Y (2003) Starch-branching enzyme I-deficient mutation specifically affects the structure and properties of starch in rice endosperm. Plant. Physiol. 133, 1111-1121. 145 Schofield J and Greenwell P (1987) Wheat starch granule proteins and their technological significance. Cereals in a European context. First European Conference on Food Science and Technology, Morton, I.D. (eds.).- New York, NY (USA): VCH, 1987.- ISBN 08-95735237. p. 407-420 Schuster SC (2008) Next-generation sequencing transforms today’s biology. Nat. Methods. 5, 1,16-18. Seo B, Kim S, Scott MP, Singletary GW, Wong K, James MG and Myers AM (2002) Functional interactions between heterologously expressed starch-branching enzymes of maize and the glycogen synthases of brewer's yeast. Plant. Physiol. 128, 1189-1199. Sequenom I (2006) iPLEX™ Gold Assay for SNP Genotyping. Biotechniques. Protocol. Guide. 41. San Diego, CA: Sequenom. Shapter FM, Eggler P, Lee LS and Henry RJ (2009) Variation in Granule Bound Starch Synthase I (GBSSI) loci amongst Australian wild cereal relatives (Poaceae). J. Cereal. Sci. 49:4-11. Shastry BS (2002) SNP alleles in human disease and evolution. J. Hum. Genet. 47(11): 561-566 Shen J, Deininger PL and Zhao H (2006) Applications of computational algorithm tools to identify functional SNPs in cytokine genes. Cytokine. 35(1-2) 62-66 Sheng F, Jia X, Sheng F, Jia X, Yep A, Jack P and Geiger JH (2009) The crystal structures of the open and catalytically competent closed conformation of Escherichia coli glycogen synthase. J. Biol. Chem. 284(26): 17796-17807 Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic. Acids. Res. 29(1): 308-311 Siebert PD and Larrick JW (1992) Competitive PCR. Nature 359:557-558. Singh N, Pal N, Mahajan G, Singh S and Shevkani K (2011) Rice grain and starch properties: effects of nitrogen fertilizer application. Carbohyd. Polym. 86 (1) 219-225 Sinha S and Tompa M (2002) Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic. Acids. Res. 30(24): 5549–5560 Smith AM (1999) Making starch. Curr. Opin. Plant. Biol. 2:223-229. Soussi T, Asselain B, Hamroun D, Kato S, Ishioka C, Claustres M and Beroud C (2006) Metaanalysis of the p53 mutation database for mutant p53 biological activity reveals a methodologic bias in mutation detection. Clin. Cancer. Res. 12:62-69 146 Spielmeyer W, Ellis MH and Chandler PM (2002) Semidwarf (sd-1),“green revolution” rice, contains a defective gibberellin 20-oxidase gene. Proc. Natl. Acad. Sci. USA 99:9043-9048. Sun C, Sathish P, Ahlandsberg S, Deiber A and Jansson C (1997) Identification of four starch branching enzymes in barley endosperm: partial purification of forms I, IIa and IIb. New. Phytol. 137, 215-222. Sun C, Sathish P, Ahlandsberg S and Jansson C (1998) The two genes encoding starchbranching enzymes IIa and IIb are differentially expressed in barley. Plant. Physiol. 118, 37-49. Sunyaev SR, Lathe WC, Ramensky VE and Bork P (2000) SNP frequencies in human genes-an excess of rare alleles and differing modes of selection. Trends. Genet. 16, 335-337. Tacke R and Manley JL (1999) Determinants of SR protein specificity. Curr. Opin. Cell. Biol. 11(3): 358-362 Takeda Y, Guan HP and Preiss J (1993) Branching of amylose by the branching isoenzymes of maize endosperm. Carbohydr. Res. 240, 253-263. Tanaka N, Fujita N, Nishi A, Satoh H, Hosaka Y, Ugaki M, Kawasaki S and Nakamura Y (2004) The structure of starch can be manipulated by changing the expression levels of starch branching enzyme IIb in rice endosperm. Plant. Biotech. J. 2, 507-516. Tashiro T and Wardlaw I (1991) The effect of high temperature on kernel dimensions and the type and occurrence of kernel damage in rice. Aust. J. Agric. Res. 42:485-496. Tester R, Morrison W, Ellis R, Piggo J, Batts G, Wheeler T, Morison J, Hadley P and Ledward D (1995) Effects of elevated growth temperature and carbon dioxide levels on some physicochemical properties of wheat starch. J. Cereal. Sci. 22:63-71. Tester RF and Morrison WR (1990) Swelling and gelatinization of cereal starches. I. Effects of amylopectin, amylose, and lipids. Cereal. Chem. 67:551-557. Tester RF, Karkalas J and Qi X (2004) Starch--composition, fine structure and architecture. J. Cereal. Sci. 39:151-165. Tetlow IJ, Wait R, Lu Z, Akkasaeng R, Bowsher CG, Esposito S, Kosar-Hashemi B, Morell MK and Emes MJ (2004) Protein phosphorylation in amyloplasts regulates starch branching enzyme activity and protein-protein interactions. Plant. Cell. Online. 16, 694-708. Thomas RK, Nickerson E, Simons JF, Jänne PA, Tengs T, Yuza Y, Garraway LA, LaFramboise T, Lee JC and Shah K (2006) Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing. Nature. Med. 12, 852-855. 147 Umemoto T, Yano M, Satoh H, Shomura A and Nakamura Y (2002) Mapping of a gene responsible for the difference in amylopectin structure between japonica-type and indicatype rice varieties. Theor. Appl. Genet. 104:1-8. Umemoto T, Aoki N, Lin H, Nakamura Y, Inouchi N, Sato Y, Yano M, Hirabayashi H and Maruyama S (2004) Natural variation in rice starch synthase IIa affects enzyme and starch properties. Funct. Plant Biol. 31, 671-684. Umemoto T and Aoki N (2005) Single-nucleotide polymorphisms in rice starch synthase IIa that alter starch gelatinisation and starch association of the enzyme. Funct. Plant Biol. 32, 763-768. Umemoto T, Horibata T, Aoki N, Hiratsuka M, Yano M and Inouchi N (2008) Effects of variations in starch synthase on starch properties and eating quality of rice. Plant Prod. Sci. 11, 472-480. Varley KE and Mitra RD (2008) Nested Patch PCR enables highly multiplexed mutation discovery in candidate genes. Genome Res. 18, 1844-1850. Velicer GJ, Raddatz G, Keller H, Deiss S, Lanz C, Dinkelacker I and Schuster SC (2006) Comprehensive mutation identification in an evolved bacterial cooperator and its cheating ancestor. Proc. Natl. Acad. Sci. USA. 103 (21).8107-8112. Visser RGF, Somhorst I, Kuipers GJ, Ruys NJ, Feenstra WJ and Jacobsen E (1991) Inhibition of the expression of the gene for granule-bound starch synthase in potato by antisense constructs. Mol. General. Genet. 225:289-296. Vrinten PL and Nakamura T (2000) Wheat granule-bound starch synthase I and II are encoded by separate genes that are expressed in different tissues. Plant. Physiol. 122, 255-264. Wakao S, Andre C and Benning C (2008) Functional analyses of cytosolic glucose-6-phosphate dehydrogenases and their contribution to seed oil accumulation in Arabidopsis. Plant. Physiol. 146:277. Wang YJ, White P, Pollak L and Jane J (1993) Characterization of starch structures of 17 maize endosperm mutant genotypes with Oh43 inbred line background. Cereal Chem. 70:171-171. Wang Z, Rolish ME, Yeo G, Tung V, Mawson M and Burge CB (2004) Systematic identification and analysis of exonic splicing silencers. Cell. 119(6): 831-845 Wang Z, Xiao X, Van Nostrand E and Burge CB (2006) General and specific functions of exonic splicing silencers in splicing control. Mol. Cell. 23(1): 61-70 Waters DLE, Henry RJ, Reinke RF and Fitzgerald MA (2006) Gelatinization temperature of rice explained by polymorphisms in starch synthase. Plant. Biotechnol. J. 4, 115-122. 148 Waters DLE, Henry RJ (2007) Genetic manipulation of starch properties in plants: patents 2001-2006. Recent. Pat. Biotechnol. 1(3): 52-259. Waters DLE, Henry RJ, Reinke RF and Fitzgerald MA (2006) Gelatinization temperature of rice explained by polymorphisms in starch synthase. Plant. Biotechnol. J. 4:115-122. Webb BD (1991) Rice quality and grades. In: Rice (Bor S. Luh ed.), pp. 89–93. College Station, TX: USDA, Rice Quality Laboratory, Texas A&M University. Yamakawa H, Hirose T, Kuroda M and Yamaguchi T (2007) Comprehensive expression profiling of rice grain filling-related genes under high temperature using DNA microarray. Plant. Physiol. 144, 258-277. Yamakawa H, Ebitani T and Terao T (2008) Comparison between locations of QTLs for grain chalkiness and genes responsive to high temperature during grain filling on the rice chromosome map. Breed. Sci. 58, 337-343. Yamamori M, Kato M, Yui M and Kawasaki M (2006) Resistant starch and starch pasting properties of a starch synthase IIa-deficient wheat with apparent high amylose. Aust J Agri Res. 57:531-536. Yan CJ, Tian ZX, Fang YW, Yang YC, Li J, Zeng SY, Gu SL, Xu CW, Tang SZ and Gu MH (2010) Genetic analysis of starch paste viscosity parameters in glutinous rice (Oryza sativa L.). Theor. Appl. Genet. 122(1) 63-76. Yu J, Hu S, Wang J, Wong GKS, Li S, Liu B, Deng Y, Dai L, Zhou Y and Zhang X (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science. 296, 79-92. Yun SH and Matheson NK (1993) Structures of the amylopectins of waxy, normal, amyloseextender, and wx: ae genotypes and of the phytoglycogen of maize. Carbohydr. Res. 243, 307-321. Zakaria S, Matsuda T, Tajima S and Nitta Y (2002) Effect of high temperature at ripening stage on the reserve accumulation in seed in some rice cultivars. Plant Production Science-Tokyo5:160-168. 149 Appendices 1-8: Attached Appendices: Chapter 2 Appendix 1: Full list of discovered SNP/Indel is 17 studies starch related genes. Appendix 2: Full list of Australian breeding lines (population) and their pedigree information. Appendix 3: Target genes and sequence of gene-specific LR-PCR primers. Appendix 4: SNP/Indel distribution and short read coverage pattern across candidate loci. Appendices: Chapter 4 Appendix 5: Full list of 233 studied Australian rice genotypes and their pedigree information. Appendix 6: Name and characteristics of SNPs genotyped in the rice population. Appendix 7: The results of association study among 13 physiochemical traits and SNPs of 18 different starch-related genes. Appendix 8: Linkage map of 17 starch-related genes, showing the approximate each gene’s chromosomal location. 150 Appendix 1: Full list of discovered SNPs/Indels in 17 starch related genes Gene AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b AGPS2b BEI BEI BEI BEI BEI BEI BEI BEI BEI BEI BEI BEI BEI BEI BEI BEI BEI BEI BEIIa BEIIa BEIIa BEIIa BEIIa BEIIa BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb Reference Consensu Variation position s position type 2533 161 SNP 2644 277 SNP 2713 347 SNP 2799 436 SNP 2886 530 SNP 3380 1029 SNP 3497 1146 SNP 3585 1234 SNP 3609 1258 SNP 3685 1334 SNP 3919 1568 SNP 4259 1908 SNP 4260 1909 SNP 4309 1958 SNP 4355 2004 SNP 4359 2008 SNP 4361 2010 SNP 4372 2021 SNP 4460 2109 SNP 4486 2135 SNP 4499 2148 SNP 4501 2150 SNP 4694 2343 SNP 4787 2436 SNP 4826 2475 SNP 4952 2601 SNP 5026 2675 SNP 5035 2684 SNP 5087 2736 SNP 5477 3126 SNP 193 193 SNP 810 810 SNP 828 828 SNP 1218 1218 SNP 1268 1268 SNP 1341 1341 SNP 1558 1558 SNP 1902 1902 SNP 2410 2410 SNP 3178 3178 SNP 3610 3610 SNP 4480 4480 SNP 6386 6386 SNP 6403 6403 SNP 6554 6554 SNP 6887 6887 SNP 7130 7130 SNP 7214 7214 SNP 1690 531 SNP 1925 766 SNP 1941 782 SNP 1942 783 SNP 2048 889 SNP 3266 2018 SNP 538 541 SNP 1365 1368 SNP 1449 1452 SNP 1658 1661 SNP 2874 2877 SNP 2875 2878 SNP 3137 3140 SNP 3138 3141 SNP 3794 3797 SNP 3957 3960 SNP 4099 4102 SNP Length 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Referenc Allele e Variants variations C 2 C/T G 2 G/C C 2 C/A C 2 C/T T 2 T/C A 2 A/G G 2 G/A A 2 A/G C 2 C/A T 2 T/A T 2 T/G C 2 C/T C 2 C/T G 2 G/T T 2 T/A C 2 C/T C 2 C/A A 2 A/G A 2 A/G T 2 T/A C 2 C/A A 2 A/G A 2 A/T A 2 A/G A 2 A/G A 2 A/T T 2 T/G C 2 C/T A 2 C/A G 2 G/C G 2 G/A C 2 C/T G 2 G/A C 2 C/T C 2 C/A A 2 A/G C 2 C/T C 2 C/T G 2 G/A G 2 G/T G 2 G/A G 2 G/C G 2 G/A A 2 A/G T 2 T/G G 2 G/A A 2 A/G G 2 G/T N 1T G 1T G 1T G 1T A 1T T 2 T/G T 2 G/T T 2 T/C A 2 C/A C 2 T/C C 2 T/C A 2 G/A T 2 T/A T 2 T/A A 2 C/A C 2 G/C T 2 C/T Frequencies 94.7/5.3 95.2/4.7 99.0/1.0 94.8/5.2 98.9/1.1 94.9/5.1 94.8/5.2 99.0/1.0 98.3/1.7 95.0/5.0 98.9/1.1 97.6/2.4 99.3/0.7 98.6/1.4 99.3/0.7 99.2/0.8 99.2/0.8 98.6/1.4 95.7/4.3 95.6/4.4 97.5/2.5 95.8/4.2 98.6/1.4 98.6/1.4 96.1/3.9 97.5/2.5 96.4/3.6 98.6/1.4 77.1/22.9 99.2/0.8 89.2/10.8 88.9/11.1 88.7/11.3 88.3/11.7 88.9/11.1 89.0/11.0 94.5/5.4 88.9/11.1 94.3/5.7 93.9/6.1 88.5/11.5 87.3/12.7 86.8/13.2 97.8/2.1 91.8/8.1 86.8/13.2 99.4/0.6 99.0/1.0 100 100 100 100 100 97.2/2.8 92.1/7.9 99.2/0.8 90.5/9.5 90.5/9.5 90.0/10.0 89.8/10.2 99.5/0.5 99.1/0.9 89.7/10.3 89.9/10.0 86.8/13.2 Counts 22557/1252 65324/3253 27197/275 33604/1828 36781/412 37725/2013 35569/1941 27517/281 24298/413 19394/1014 90/1 2685/65 3463/25 31625/447 30364/229 29463/244 29349/228 28229/411 28708/1283 39609/1819 48230/1237 48626/2145 34913/507 44438/632 34982/1403 31775/822 35926/1341 38005/534 22765/6747 95747/781 15320/1847 20419/2556 20882/2652 20787/2765 21991/2743 21364/2644 19230/1108 19821/2479 21442/1297 26643/1726 35853/4638 32896/4768 13151/2003 16105/352 3257/288 8414/1276 20181/117 1271/13 27778 29024 24302 24331 22534 1795/51 27333/2350 34460/279 31003/3254 27136/2863 26475/2946 26541/3000 21745/114 18785/169 18492/2119 23577/2634 24411/3717 Coverage 23813 68588 27474 35435 37195 39740 37517 27799 24717 20411 91 2750 3488 32075 30593 29711 29580 28641 29991 41431 49469 50771 35423 45072 36390 32603 37273 38546 29514 96562 17170 22976 23536 23553 24739 24008 20341 22308 22744 28372 40502 37668 15155 16459 3546 9692 20298 1284 27780 29028 24305 24334 22543 1846 29685 34741 34258 30001 29421 29542 21859 18956 20613 26220 28133 Variant #1 C G C C T A G A C T T C C G T C C A A T C A A A A A T C C G G C G C C A C C G G G G G A T G A G T T T T T T G T C T T G T T C G C Frequency of Frequency of #1 Count of #1 Variant #2 #2 Count of #2 94.72557007 22557 T 5.257632386 1252 95.24115006 65324 C 4.742812154 3253 98.99177404 27197 A 1.000946349 275 94.83279244 33604 T 5.158741357 1828 98.88694717 36781 C 1.107675763 412 94.92954202 37725 G 5.065425264 2013 94.80768718 35569 A 5.17365461 1941 98.98557502 27517 G 1.010827728 281 98.30481045 24298 A 1.670914755 413 95.01739258 19394 A 4.967909461 1014 98.9010989 90 G 1.098901099 1 97.63636364 2685 T 2.363636364 65 99.28325688 3463 T 0.716743119 25 98.59703819 31625 T 1.39360873 447 99.25146275 30364 A 0.748537247 229 99.16529232 29463 T 0.821244657 244 99.21906694 29349 A 0.770791075 228 98.56150274 28229 G 1.435005761 411 95.72204995 28708 G 4.277950052 1283 95.60232676 39609 A 4.390432285 1819 97.49540116 48230 A 2.500555904 1237 95.77514723 48626 G 4.22485277 2145 98.56025746 34913 T 1.431273466 507 98.59336173 44438 G 1.402200923 632 96.13080517 34982 G 3.855454795 1403 97.46035641 31775 T 2.521240377 822 96.386124 35926 G 3.597778553 1341 98.59648213 38005 T 1.385357754 534 77.13288609 22765 A 22.86033747 6747 99.15598268 95747 C 0.808806777 781 89.22539313 15320 A 10.75713454 1847 88.87099582 20419 T 11.12465181 2556 88.72365738 20882 A 11.267845 2652 88.25627309 20787 T 11.73948117 2765 88.89203282 21991 A 11.08775617 2743 88.98700433 21364 G 11.01299567 2644 94.53812497 19230 T 5.447126493 1108 88.85153308 19821 T 11.11260534 2479 94.2754133 21442 A 5.702602884 1297 93.90596363 26643 T 6.083462569 1726 88.52155449 35853 A 11.45128636 4638 87.3314219 32896 C 12.65795901 4768 86.77664137 13151 A 13.21676015 2003 97.84920105 16105 G 2.138647548 352 91.8499718 3257 G 8.121827411 288 86.81386711 8414 A 13.16549732 1276 99.42358853 20181 G 0.576411469 117 98.98753894 1271 T 1.012461059 13 99.99280058 27778 99.9862202 29024 99.98765686 24302 99.98767157 24331 99.9600763 22534 97.23726977 1795 G 2.762730228 51 92.07680647 27333 T 7.916456123 2350 99.19115742 34460 C 0.803085691 279 90.49856968 31003 A 9.498511297 3254 90.45031832 27136 C 9.543015233 2863 89.98674416 26475 C 10.01325584 2946 89.84158148 26541 A 10.15503351 3000 99.47847569 21745 A 0.521524315 114 99.09791095 18785 A 0.891538299 169 89.71037695 18492 A 10.27992044 2119 89.91990847 23577 C 10.04576659 2634 86.76998543 24411 T 13.21224185 3717 Amino acid Overlapping annotations change Quality Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A Low Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A Low Gene: Os08g0345800 N/A Low Gene: Os08g0345800 N/A Low Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A High Gene: Os08g0345800 N/A Low Gene: Os06g0726400, CDS: Os06g0726400, N/A mRNA:High Os06g0726400 Gene: Os06g0726400 N/A High Gene: Os06g0726400 N/A High Gene: Os06g0726400 N/A High Gene: Os06g0726400 N/A High Gene: Os06g0726400 N/A High Gene: Os06g0726400, CDS: Os06g0726400, Gly607Asp mRNA:High Os06g0726400 Gene: Os06g0726400 N/A High Gene: Os06g0726400 N/A High Gene: Os06g0726400, CDS: Os06g0726400, N/A mRNA:High Os06g0726400 Gene: Os06g0726400 N/A High Gene: Os06g0726400 N/A High Gene: Os06g0726400 N/A High Gene: Os06g0726400 N/A High Gene: Os06g0726400 N/A High Gene: Os06g0726400 N/A High Gene: Os06g0726400 N/A Low Gene: Os06g0726400, CDS: Os06g0726400, N/A mRNA:High Os06g0726400 Gene: Os04g0409200 N/A Low Gene: Os04g0409200 N/A Low Gene: Os04g0409200 N/A Low Gene: Os04g0409200 N/A Low Gene: Os04g0409200, CDS: Os04g0409200, N/A mRNA:Low Os04g0409200 Gene: Os04g0409200, CDS: Os04g0409200, Tyr140Ser mRNA:High Os04g0409200 Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A Low Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A Low Gene: Os02g0528200 N/A Low Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 Isoamylase 2 GBSSI GBSSII GBSSII GBSSII GBSSII GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 4288 4293 4357 4538 5016 5259 5280 5349 5641 5828 5867 5933 6119 6159 6167 6375 6385 6429 6457 6727 6905 6908 6920 6932 6941 6959 6962 6968 7051 7207 7394 7826 8139 8272 8310 8334 8775 9035 9294 9761 10068 10561 136 799 960 1120 1462 1712 2040 2067 2122 2130 2161 2163 2169 2170 2214 2217 1086 670 1638 5170 5174 386 429 499 513 1094 1188 4291 4296 4360 4541 5019 5262 5283 5352 5644 5830 5869 5935 6121 6161 6169 6377 6387 6431 6459 6729 6907 6910 6922 6934 6943 6961 6964 6970 7053 7211 7398 7830 8143 8276 8314 8338 8779 9039 9298 9761 10068 10561 136 799 960 1120 1462 1712 2040 2067 2122 2130 2161 2163 2169 2170 2214 2217 1086 585 1553 5085 5089 331 374 444 459 1040 1134 SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 A C C A A A A G T C G T T G T T C C A A T C T G A G T C C T C A C T A G T T A T A C A A T C G C G T C C C C C C C C A T A C C G C A G C C 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 C/A A/C T/C G/A A/G C/A A/G T/G C/T A/C A/G C/T A/T A/G C/T T/C C/G C/T A/G T/A T/C C/T T/C G/A A/T G/A T/A C/T T/C C/T T/C T/A T/C C/T A/T A/G A/T C/T C/A C/T C/A C/A A/G A/G T/C C/T G/A C/A G/T T/G C/T C/A C/A C/A C/T C/T C/A C/A A/C T/A A/G C/G C/G G/A C/G A/G G/T C/T C/T 90.1/9.9 90.0/10.0 90.6/9.4 89.9/10.1 99.4/0.6 90.4/9.6 99.1/0.9 85.0/15.0 86.5/13.5 86.4/13.6 86.1/13.9 87.0/13.0 86.2/13.8 84.3/15.6 84.2/15.8 99.5/0.5 93.4/6.6 98.4/1.6 98.1/1.9 86.1/13.9 96.1/3.9 96.1/3.9 92.7/7.3 89.9/10.1 89.9/10.1 96.4/3.6 96.4/3.6 99.1/0.9 84.8/15.2 85.1/14.9 85.5/14.5 87.7/12.3 85.2/14.8 60.3/39.7 57.2/42.7 84.0/16.0 84.4/15.6 83.6/16.4 84.8/15.2 87.0/13.0 85.1/14.9 99.2/0.8 54.2/45.8 53.3/46.7 51.9/48.1 59.7/40.3 65.8/34.2 66.1/33.9 65.3/34.7 85.7/14.3 99.5/0.5 99.5/0.5 98.3/1.7 98.3/1.7 97.6/2.4 97.8/2.2 98.6/1.4 98.0/2.0 99.1/0.9 97.3/2.7 97.5/2.5 97.9/2.1 98.0/1.9 95.2/4.8 75.0/24.9 95.1/4.9 97.9/2.1 94.9/5.1 95.8/4.2 33436/3659 32840/3638 30362/3141 31063/3499 66816/415 114671/12210 108792/953 2914/513 14612/2285 15779/2477 15582/2511 15158/2273 15951/2548 14548/2698 15838/2972 25803/136 20636/1449 18045/296 16662/317 13187/2124 18473/741 18278/735 19151/1513 18875/2125 19086/2146 18930/711 18937/711 19054/165 13272/2374 16485/2881 17518/2970 10001/1404 12817/2225 4666/3076 2930/2188 4810/919 15578/2889 13341/2612 14371/2582 13612/2030 11744/2062 125/1 8106/6843 10794/9440 34842/32345 66247/44769 5351/2786 3097/1590 673/358 240/40 183/1 196/1 58/1 59/1 41/1 45/1 72/1 50/1 47013/447 33112/909 39384/1003 33987/727 35723/709 4605/232 4213/1400 26198/1338 12441/270 16845/901 13206/574 37097 36479 33504 34570 67232 126895 109752 3428 16897 18260 18096 17432 18501 17249 18814 25940 22088 18343 16980 15311 19214 19018 20666 21002 21235 19646 19649 19223 15649 19367 20489 11406 15044 7743 5119 5729 18468 15958 16954 15645 13807 126 14949 20234 67193 111022 8137 4687 1031 280 184 197 59 60 42 46 73 51 47463 34021 40390 34718 36436 4837 5615 27537 12712 17755 13781 C A T G A C A T C A A C A A C T C C A T T C T G A G T C T C T T T C A A A C C C C C A A T C G C G T C C C C C C C C A T A C C G C A G C C 90.13127746 90.0243976 90.62201528 89.85536592 99.38124703 90.36683872 99.12530068 85.00583431 86.47688939 86.41292442 86.10742706 86.95502524 86.21696125 84.34112122 84.18199213 99.47185813 93.42629482 98.37540206 98.12720848 86.12762066 96.14343708 96.10894942 92.66911836 89.87239311 89.87991523 96.35549221 96.37640592 99.12084482 84.81053102 85.11901688 85.49953634 87.6819218 85.19675618 60.2608808 57.23774175 83.95880607 84.35131037 83.60070184 84.76465731 87.00543305 85.05830376 99.20634921 54.22436283 53.34585351 51.8536157 59.67015546 65.7613371 66.07638148 65.27643065 85.71428571 99.45652174 99.49238579 98.30508475 98.33333333 97.61904762 97.82608696 98.63013699 98.03921569 99.05189305 97.32812087 97.50928448 97.89446397 98.04314414 95.20363862 75.03116652 95.13745143 97.86815607 94.87468319 95.82758871 33436 32840 30362 31063 66816 114671 108792 2914 14612 15779 15582 15158 15951 14548 15838 25803 20636 18045 16662 13187 18473 18278 19151 18875 19086 18930 18937 19054 13272 16485 17518 10001 12817 4666 2930 4810 15578 13341 14371 13612 11744 125 8106 10794 34842 66247 5351 3097 673 240 183 196 58 59 41 45 72 50 47013 33112 39384 33987 35723 4605 4213 26198 12441 16845 13206 A C C A G A G G T C G T T G T C G T G A C T C A T A A T C T C A C T T G T T A T A A G G C T A A T G T A A A T T A A C A G G G A G G T T T 9.863331267 9.972861098 9.375 10.12149262 0.617265588 9.622128531 0.868321306 14.96499417 13.52311061 13.56516977 13.87599469 13.03923818 13.77222853 15.64148646 15.7967471 0.524286816 6.560123144 1.613694597 1.866902238 13.87237934 3.856562923 3.864759701 7.32120391 10.11808399 10.10595715 3.619057314 3.618504759 0.858346772 15.17029842 14.87581969 14.495583 12.30931089 14.78994948 39.72620431 42.74272319 16.04119393 15.64327485 16.36796591 15.22944438 12.9753915 14.93445354 0.793650794 45.77563717 46.65414649 48.13745479 40.3244402 34.2386629 33.92361852 34.72356935 14.28571429 0.543478261 0.507614213 1.694915254 1.666666667 2.380952381 2.173913043 1.369863014 1.960784314 0.941786233 2.671879133 2.483287943 2.094014632 1.945877703 4.796361381 24.9332146 4.858917093 2.123977344 5.074626866 4.165154923 3659 3638 3141 3499 415 12210 953 513 2285 2477 2511 2273 2548 2698 2972 136 1449 296 317 2124 741 735 1513 2125 2146 711 711 165 2374 2881 2970 1404 2225 3076 2188 919 2889 2612 2582 2030 2062 1 6843 9440 32345 44769 2786 1590 358 40 1 1 1 1 1 1 1 1 447 909 1003 727 709 232 1400 1338 270 901 574 Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A Low Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:High Os02g0528200 Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:Low Os02g0528200 Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:High Os02g0528200 Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A Low Gene: Os02g0528200 N/A High Gene: Os02g0528200, CDS: Os02g0528200, Val403IlemRNA:High Os02g0528200 Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:High Os02g0528200 Gene: Os02g0528200 N/A High Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:High Os02g0528200 Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:High Os02g0528200 Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:High Os02g0528200 Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:High Os02g0528200 Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:High Os02g0528200 Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:High Os02g0528200 Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:High Os02g0528200 Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:Low Os02g0528200 Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:High Os02g0528200 Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200, CDS: Os02g0528200, His196Arg mRNA:High Os02g0528200 Gene: Os02g0528200 N/A High Gene: Os02g0528200 N/A High Gene: Os02g0528200, CDS: Os02g0528200, Leu94ValmRNA:High Os02g0528200 Gene: Os02g0528200, CDS: Os02g0528200, N/A mRNA:Low Os02g0528200 Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 N/A High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 N/A High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 Thr482Ala High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 N/A High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 N/A High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 Arg231Leu High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 Leu122Met High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 Thr113Pro High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 N/A Low Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 Gly92Cys Low Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 N/A High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 Gly81Trp High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 Glu79Lys High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 N/A High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 Gly64Trp High Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3 Gly63Cys High Gene: Os06g0133000, CDS: Os06g0133000, Tyr224Ser mRNA:Low Os06g0133000 Gene: Os07g0412100, mRNA: Os07g0412100 N/A High Gene: Os07g0412100, CDS: Os07g0412100, Leu523Ser mRNA:High Os07g0412100 Gene: Os07g0412100 N/A High Gene: Os07g0412100 N/A High Gene: Os08g0187800 N/A High Gene: Os08g0187800 N/A High Gene: Os08g0187800 N/A High Gene: Os08g0187800 N/A High Gene: Os08g0187800, CDS: Os08g0187800, N/A mRNA:High Os08g0187800 Gene: Os08g0187800, CDS: Os08g0187800, Leu42PhemRNA:High Os08g0187800 GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase 1680 2395 2479 2688 2736 2890 3325 3475 3507 3574 901 1021 1052 1098 1268 1271 1302 1356 1388 1390 1564 1595 1620 1625 1635 1642 1715 1742 1763 1792 1804 1813 1910 2023 2042 2125 2154 2159 2225 2319 2365 2374 2408 2429 2471 2478 2490 2606 2624 2631 2691 2742 2746 2799 2822 2849 2886 2911 2925 3013 3075 3168 3199 3256 3296 3332 3367 3392 3394 1626 2341 2425 2634 2682 2836 3271 3421 3453 3520 206 326 357 403 573 576 607 661 693 695 869 900 925 930 940 947 1020 1047 1068 1097 1109 1118 1215 1328 1347 1430 1459 1464 1530 1624 1670 1679 1713 1734 1776 1783 1795 1911 1929 1936 1996 2047 2051 2104 2127 2154 2191 2216 2230 2318 2380 2473 2504 2561 2601 2637 2672 2697 2699 SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 G C A C T G A A T C C T C A T T T T A G A T C C C G G A G G G C T T A C A G G G G G T A T T G C T T T C T C A C C G T G C A G T C T C T T 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 G/A C/G A/G C/T T/G G/A A/G A/G T/C C/A C/T T/G C/T A/C T/C T/C T/C T/A A/G G/C A/G T/A C/T C/T C/T G/T G/T A/G G/A G/A G/A C/T T/C T/C A/T C/T A/G G/T G/A G/A G/A G/C T/A G/A T/C T/G G/A C/T T/A T/C T/C C/T T/C C/T A/G C/T C/T G/A T/G G/A C/G A/C G/A T/C C/G T/A C/G T/C T/C 95.6/4.4 96.2/3.8 68.6/31.4 73.5/26.5 95.9/4.1 95.7/4.3 73.0/27.0 95.8/4.2 96.8/3.2 97.0/3.0 99.0/1.0 98.8/1.1 98.8/1.1 99.1/0.9 98.9/1.1 98.9/1.1 98.9/1.1 99.0/1.0 99.1/0.9 99.1/0.9 98.9/1.1 98.7/1.2 98.7/1.3 98.7/1.3 98.8/1.2 98.7/1.3 99.3/0.7 98.8/1.2 98.7/1.2 98.7/1.3 98.7/1.3 98.7/1.3 98.4/1.5 99.1/0.9 99.4/0.6 98.2/1.8 98.8/1.1 99.4/0.6 98.5/1.5 99.1/0.9 98.4/1.6 98.3/1.7 98.2/1.8 64.7/35.3 98.4/1.6 98.5/1.4 98.5/1.5 98.9/1.1 98.5/1.5 98.6/1.4 98.2/1.8 98.4/1.6 98.4/1.6 99.4/0.6 99.5/0.5 99.3/0.6 98.0/2.0 99.3/0.7 98.9/1.1 99.4/0.6 99.2/0.8 97.9/2.0 98.2/1.8 99.0/1.0 99.2/0.8 99.4/0.6 98.7/1.3 98.3/1.7 98.3/1.7 18024/829 20520/818 12564/5754 17374/6259 24255/1027 22880/1027 14302/5295 21595/952 20097/655 17760/550 75204/779 43188/501 102730/1191 76276/700 34504/389 33664/384 41282/465 41726/412 37356/346 37209/346 45550/517 44787/562 42075/540 41551/528 40330/487 39111/499 39700/282 42595/521 44421/560 46344/599 47243/617 47570/614 36053/567 45060/399 40103/241 33016/616 31371/360 32002/181 41557/634 38785/354 41880/671 42169/729 45905/844 27149/14815 36288/598 36932/543 37148/548 43040/464 41713/617 43670/631 31735/566 33436/546 34317/550 42312/256 43328/230 44396/288 30765/636 34113/245 36011/411 44995/262 43947/347 41109/859 42926/803 44869/448 41967/327 36667/228 38002/484 37643/662 37830/662 18854 21340 18319 23636 25284 23909 19600 22547 20752 18311 75996 43693 103936 76979 34894 34048 41750 42140 37704 37558 46069 45356 42625 42082 40820 39613 39986 43119 44988 46945 47864 48188 36621 45462 40346 33634 31736 32200 42198 39143 42557 42899 46752 41965 36887 37478 37699 43509 42333 44302 32303 33990 34867 42573 43559 44690 31403 34364 36427 45267 44305 41971 43733 45320 42298 36897 38494 38305 38494 G C A C T G A A T C C T C A T T T T A G A T C C C G G A G G G C T T A C A G G G G G T G T T G C T T T C T C A C C G T G C A G T C T C T T 95.59775114 96.1574508 68.58452972 73.50651548 95.93023256 95.69618135 72.96938776 95.77770879 96.84367772 96.9908798 98.95783989 98.84420845 98.83967057 99.08676392 98.88232934 98.87218045 98.87904192 99.01756051 99.07702101 99.07077054 98.87342899 98.7454802 98.70967742 98.73817784 98.79960804 98.73273925 99.28474966 98.78475846 98.73966391 98.71977846 98.70257396 98.71752303 98.44897736 99.11574502 99.39770981 98.16257359 98.84988656 99.38509317 98.48097066 99.0854048 98.40919238 98.29832863 98.1883128 64.69438818 98.37612167 98.54314531 98.53842277 98.9220621 98.53542154 98.57342784 98.24164938 98.37010886 98.42257722 99.38693538 99.4696848 99.34213471 97.96834697 99.26958445 98.85798995 99.39912077 99.19196479 97.94620095 98.15471155 99.00485437 99.21745709 99.37664309 98.72187873 98.27176609 98.27505585 18024 20520 12564 17374 24255 22880 14302 21595 20097 17760 75204 43188 102730 76276 34504 33664 41282 41726 37356 37209 45550 44787 42075 41551 40330 39111 39700 42595 44421 46344 47243 47570 36053 45060 40103 33016 31371 32002 41557 38785 41880 42169 45905 27149 36288 36932 37148 43040 41713 43670 31735 33436 34317 42312 43328 44396 30765 34113 36011 44995 43947 41109 42926 44869 41967 36667 38002 37643 37830 A G G T G A G G C A T G T C C C C A G C G A T T T T T G A A A T C C T T G T A A A C A A C G A T A C C T C T G T T A G A G C A C G A G C C 4.396944945 3.833177132 31.41001146 26.48079201 4.061857301 4.295453595 27.01530612 4.222291214 3.156322282 3.003659003 1.02505395 1.146636761 1.145897475 0.909338911 1.114804838 1.127819549 1.113772455 0.977693403 0.917674517 0.921241813 1.122229699 1.239086339 1.26686217 1.254693218 1.193042626 1.259687476 0.705246836 1.208284051 1.244776385 1.275961231 1.289069029 1.274176143 1.548291964 0.877656064 0.597333069 1.83148005 1.134358457 0.562111801 1.502440874 0.904376261 1.576708885 1.699340311 1.805270363 35.30322888 1.621167349 1.448849992 1.453619459 1.066446023 1.457491791 1.424314929 1.752159242 1.60635481 1.577422778 0.601320086 0.528019468 0.644439472 2.025284209 0.712955418 1.128283965 0.57878808 0.783207313 2.046651259 1.836142044 0.988526037 0.773086198 0.617936418 1.257338806 1.728233912 1.719748532 829 818 5754 6259 1027 1027 5295 952 655 550 779 501 1191 700 389 384 465 412 346 346 517 562 540 528 487 499 282 521 560 599 617 614 567 399 241 616 360 181 634 354 671 729 844 14815 598 543 548 464 617 631 566 546 550 256 230 288 636 245 411 262 347 859 803 448 327 228 484 662 662 Gene: Os08g0187800, CDS: Os08g0187800, N/A mRNA:High Os08g0187800 Gene: Os08g0187800 N/A High Gene: Os08g0187800, CDS: Os08g0187800, N/A mRNA:High Os08g0187800 Gene: Os08g0187800 N/A High Gene: Os08g0187800 N/A High Gene: Os08g0187800 N/A High Gene: Os08g0187800 N/A High Gene: Os08g0187800 N/A High Gene: Os08g0187800 N/A High Gene: Os08g0187800 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900, CDS: Os04g0164900, N/A mRNA:High Os04g0164900 Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A High Gene: Os04g0164900, CDS: Os04g0164900, Ser217Asn mRNA:Low Os04g0164900 Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A Low Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Gene: Os04g0164900 N/A High Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase SPHOL SPHOL SPHOL SPHOL SPHOL SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI 3419 3513 3608 3612 3634 3666 3719 3878 3891 4030 4081 4082 4088 4131 4163 4181 4190 4198 4284 4309 4327 4347 4612 4703 4720 4985 5020 5062 5100 5127 5161 5259 5266 5295 5306 5319 5409 5418 5425 5441 5727 5729 5735 6160 7176 7837 7845 7859 8136 8708 3645 3683 3770 3785 6447 919 1132 1212 1231 2268 2334 2611 2712 3544 3773 4021 4335 4340 4399 2724 2818 2913 2917 2939 2971 3024 3183 3196 3335 3386 3387 3393 3436 3468 3486 3495 3503 3589 3614 3632 3652 3917 4008 4025 4290 4325 4367 4405 4432 4466 4564 4571 4600 4611 4624 4714 4723 4730 4746 5032 5034 5040 5465 6481 7142 7150 7164 7441 8013 3539 3577 3664 3679 6341 709 922 1002 1021 2058 2124 2401 2502 3334 3563 3811 4125 4130 4189 SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C A C A A G C T C T T A G T T T T C G C G T C G A T G T T T A C C C A A G G A A T C G T C G A G T C T G C A C A T G C G C C G A A T T T A 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 C/A A/G C/T A/C A/G G/C C/T T/C C/T T/G T/C A/C G/A T/C T/C T/G T/A C/T G/A C/G G/T T/A C/T G/C A/G T/C G/A T/C T/C T/G A/T C/T C/T C/G A/G A/T G/A G/T A/G A/T T/A C/A G/A A/T T/C G/A A/G G/C T/A T/C A A A T C/G A/G T/A G/A C/T G/T C/A C/A G/A A/C A/G T/C T/C T/A A/T 98.5/1.5 99.4/0.6 99.0/1.0 99.4/0.6 98.5/1.5 98.9/1.1 99.1/0.9 98.9/1.1 98.9/1.1 98.8/1.2 99.0/1.0 99.0/1.0 98.8/1.1 98.8/1.2 98.8/1.1 98.8/1.1 99.0/1.0 98.8/1.2 98.6/1.3 98.6/1.4 98.7/1.3 98.8/1.2 99.4/0.6 98.7/1.3 98.9/1.1 98.6/1.4 99.5/0.5 99.3/0.7 98.6/1.4 97.7/2.3 97.8/2.2 99.2/0.8 99.2/0.8 99.1/0.9 98.9/1.1 98.6/1.4 98.7/1.3 98.8/1.2 98.4/1.6 99.3/0.7 99.4/0.6 99.4/0.6 99.3/0.7 67.2/32.8 69.2/30.8 99.4/0.6 99.4/0.6 99.4/0.6 99.5/0.5 68.3/31.7 37581/555 49128/303 36594/374 38365/221 39035/584 42292/461 43624/400 42999/476 43089/487 34510/416 40136/422 40054/411 40780/474 41352/510 46158/536 46654/536 45224/472 47443/577 37601/510 39186/553 40404/542 43039/522 39027/236 40860/530 40899/456 32932/478 35331/186 36342/262 40252/577 37613/891 34930/772 39071/325 40306/335 41503/385 42190/465 41905/590 31268/416 31067/392 30639/503 32769/235 129483/779 127023/775 114076/802 10477/5105 13740/6119 19591/121 19142/119 18033/101 18870/98 11092/5153 100 100 100 100 98.9/1.1 99.5/0.5 99.5/0.5 99.5/0.5 99.4/0.5 99.3/0.7 99.1/0.9 99.4/0.6 99.5/0.5 99.4/0.6 93.5/6.5 99.4/0.6 98.1/1.9 98.2/1.8 98.5/1.5 69231 30446 19556 20255 89/1 36518/193 37891/205 37864/199 36320/199 32010/211 30531/284 34113/190 37591/193 42375/255 18944/1313 4922/28 83965/1607 87233/1627 33814/507 38140 49431 36975 38590 39624 42768 44028 43478 43581 34928 40558 40469 41261 41863 46696 47200 45698 48024 38116 39749 40946 43566 39266 41394 41355 33413 35525 36605 40830 38510 35705 39404 40648 41898 42655 42498 31687 31460 31142 33011 130270 127810 114886 15582 19859 19717 19262 18139 18971 16246 69244 30450 19561 20261 90 36711 38097 38071 36523 32225 30820 34304 37789 42635 20258 4950 85576 88866 34322 C A C A A G C T C T T A G T T T T C G C G T C G A T G T T T A C C C A A G G A A T C G A T G A G T T A A A T C A T G C G C C G A A T T T A 98.53434714 99.38702434 98.96957404 99.4169474 98.51352716 98.88701833 99.08240211 98.89829339 98.87106767 98.8032524 98.95951477 98.97452371 98.83425026 98.77935169 98.84786705 98.84322034 98.96275548 98.79018824 98.64886137 98.58361217 98.67630538 98.79034109 99.39133092 98.70995796 98.89735219 98.56044055 99.4539057 99.28151892 98.58437423 97.67073487 97.82943565 99.15490813 99.15863019 99.05723424 98.90985816 98.60464022 98.67769117 98.75079466 98.38481793 99.26691103 99.39587012 99.38424223 99.29495326 67.23783853 69.18777381 99.36095755 99.37701173 99.41562379 99.46760846 68.27526776 99.98122581 99.98686371 99.97443893 99.97038646 98.88888889 99.47427202 99.45927501 99.45627906 99.44418586 99.33281614 99.06229721 99.44321362 99.476038 99.39017239 93.51367361 99.43434343 98.11746284 98.16240182 98.51989977 37581 49128 36594 38365 39035 42292 43624 42999 43089 34510 40136 40054 40780 41352 46158 46654 45224 47443 37601 39186 40404 43039 39027 40860 40899 32932 35331 36342 40252 37613 34930 39071 40306 41503 42190 41905 31268 31067 30639 32769 129483 127023 114076 10477 13740 19591 19142 18033 18870 11092 69231 30446 19556 20255 89 36518 37891 37864 36320 32010 30531 34113 37591 42375 18944 4922 83965 87233 33814 A G T C G C T C T G C C A C C G A T A G T A T C G C A C C G T T T G G T A T G T A A A T C A G C A C 1.455165181 0.612975663 1.011494253 0.572687225 1.47385423 1.077908717 0.908512765 1.094806569 1.117459443 1.19102153 1.040485231 1.015592182 1.148784567 1.218259561 1.147849923 1.13559322 1.032867959 1.201482592 1.338020779 1.391229968 1.323694622 1.198182069 0.60102888 1.280378799 1.102647806 1.430580912 0.523574947 0.715749215 1.413176586 2.313684757 2.162162162 0.824789361 0.82414879 0.918898277 1.090141836 1.388300626 1.312841228 1.246026701 1.615182069 0.711883917 0.597988793 0.606368829 0.698083317 32.76216147 30.81222619 0.613683623 0.617796698 0.556811291 0.516577935 31.71857688 555 303 374 221 584 461 400 476 487 416 422 411 474 510 536 536 472 577 510 553 542 522 236 530 456 478 186 262 577 891 772 325 335 385 465 590 416 392 503 235 779 775 802 5105 6119 121 119 101 98 5153 G G A A T T A A A C G C C A T 1.111111111 0.525727983 0.538100113 0.522707573 0.544862142 0.65477114 0.921479559 0.553871269 0.510730636 0.598100152 6.481390068 0.565656566 1.877862952 1.830846443 1.477186644 1 193 205 199 199 211 284 190 193 255 1313 28 1607 1627 507 Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os03g0758100, mRNA: Os03g0758100 N/A Gene: Os03g0758100 N/A Gene: Os03g0758100 N/A Gene: Os03g0758100 N/A Gene: Os03g0758100 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700, mRNA: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700, mRNA: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A High Low High Low High High Low High High High High High High High High High High High High High High High Low High High High Low Low High High High Low Low Low High High High High High Low Low Low Low High High Low Low Low Low High Low Low Low Low High Low Low Low Low Low Low Low Low Low High Low High High High SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSI SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa 4421 4487 4563 4612 4619 4666 4675 4696 4712 4715 4833 4875 4876 4908 4975 5127 5188 5200 5249 5271 5307 5361 5379 5437 5439 5538 5545 5579 5613 5636 5656 5723 5822 5846 5859 5872 5895 5939 6087 6090 6110 6443 6674 6727 6759 6828 6842 6939 6988 7028 7052 7100 7101 7115 7171 7177 7253 7254 7255 13 21 33 67 71 72 77 78 80 81 4211 4277 4353 4402 4409 4456 4465 4486 4502 4505 4623 4665 4666 4698 4765 4917 4978 4990 5039 5061 5097 5151 5169 5227 5229 5328 5335 5369 5403 5426 5446 5513 5612 5636 5649 5662 5685 5729 5877 5880 5900 6233 6464 6517 6549 6618 6632 6729 6778 6818 6842 6890 6891 6905 6961 6967 7043 7044 7045 13 21 33 67 71 72 77 78 80 81 SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP Complex SNP Complex SNP Complex SNP SNP SNP Complex SNP 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 G C G C T C C C T T A G C C G A G G G G A T C G A G C C G A C G T C G T A A T G A C A C T A T C A C A T T T A G C C A G G G G G G G G G G 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 2 2 3 G/A C/T G/A C/T T/C C/T C/T C/T T/G T/G A/C G/A C/T C/T G/T A/T G/A G/A G/A G/T A/G T/C C/G G/A A/C G/T C/A C/T G/A A/T C/T G/T T/C C/T G/A T/C A/G A/G T/C G/A A/G C/T A/C C/G T/G A/G T/C C/T A/G C/A A/C T/A T/C T/C A/G G/C C/T C/T A/G G/T G/T G/T G/T G/C/T G/T/A G/T/A G/T G/A G/T/A 98.3/1.7 98.1/1.9 98.0/2.0 98.1/1.8 98.1/1.9 98.1/1.9 98.1/1.8 98.1/1.9 98.2/1.8 98.1/1.9 98.2/1.8 98.4/1.6 98.4/1.6 98.0/2.0 97.9/2.1 98.0/2.0 97.6/2.4 97.7/2.3 98.3/1.7 98.0/1.8 98.0/2.0 97.7/2.3 97.5/2.5 98.0/2.0 98.0/2.0 98.2/1.8 98.1/1.9 97.5/2.5 97.7/2.3 97.8/2.2 97.8/2.2 98.0/2.0 97.9/2.1 97.8/2.2 97.8/2.2 97.8/2.2 98.0/2.0 98.1/1.8 98.2/1.8 98.2/1.8 98.3/1.7 98.2/1.8 98.8/1.2 98.7/1.3 98.1/1.9 97.3/2.7 97.9/2.1 98.5/1.5 98.6/1.4 97.9/2.1 97.7/2.3 98.3/1.7 98.3/1.7 97.4/2.6 98.0/2.0 92.8/7.2 97.7/2.2 97.8/2.2 97.8/2.2 97.8/2.2 98.4/1.6 98.9/1.1 97.9/2.1 95.5/2.2/2.2 90.8/8.0/1.1 87.2/11.5/1.3 96.2/3.8 98.8/1.2 93.8/4.9/1.2 21836/382 28376/558 34697/710 34464/648 34265/654 35320/666 34568/651 34719/655 34877/647 33983/646 35883/667 33423/535 33573/541 33040/662 34187/728 29893/601 32098/774 33341/768 35903/630 32696/610 31586/633 30340/701 31297/795 34033/680 33044/682 36486/678 36651/709 37012/939 38556/911 36868/831 36553/830 40860/830 36753/779 36993/833 37409/842 37961/855 38508/799 36385/685 31938/572 32045/581 29831/523 23554/432 23938/301 23066/295 24674/477 24897/692 24700/534 17105/260 14880/207 38472/806 30825/712 14591/254 15157/260 7899/211 2050/41 1669/129 44447/1016 44786/1001 44769/1020 44/1 61/1 92/1 95/2 85/2/2 79/7/1 68/9/1 77/3 79/1 76/4/1 22220 28937 35407 35116 34920 35990 35224 35378 35525 34634 36551 33962 34116 33712 34919 30497 32875 34115 36542 33368 32222 31042 32097 34716 33730 37167 37366 37958 39473 37702 37386 41693 37533 37827 38257 38816 39309 37071 32510 32631 30358 23988 24240 23366 25153 25590 25234 17367 15087 39283 31538 14847 15417 8111 2091 1799 45474 45803 45791 45 62 93 97 89 87 78 80 80 81 G C G C T C C C T T A G C C G A G G G G A T C G A G C C G A C G T C G T A A T G A C A C T A T C A C A T T T A G C C A G G G G G G G G G G 98.27182718 98.06130559 97.9947468 98.1432965 98.12428408 98.13837177 98.13763343 98.13726044 98.17593244 98.12034417 98.17241662 98.4129321 98.40837144 98.00664452 97.90372004 98.01947733 97.6365019 97.73120328 98.25132724 97.98609446 98.02619328 97.73854777 97.50755522 98.03260744 97.96620219 98.16772944 98.08649574 97.50777175 97.67689307 97.78791576 97.77189322 98.0020627 97.92182879 97.79522563 97.78341219 97.79730008 97.96229871 98.14949691 98.24054137 98.20416169 98.26404902 98.19076205 98.75412541 98.7160832 98.09565459 97.2919109 97.88380756 98.49139172 98.62795784 97.93549373 97.73923521 98.27574594 98.31354998 97.38626557 98.03921569 92.7737632 97.74156661 97.77962142 97.76812037 97.77777778 98.38709677 98.92473118 97.93814433 95.50561798 90.8045977 87.17948718 96.25 98.75 93.82716049 21836 28376 34697 34464 34265 35320 34568 34719 34877 33983 35883 33423 33573 33040 34187 29893 32098 33341 35903 32696 31586 30340 31297 34033 33044 36486 36651 37012 38556 36868 36553 40860 36753 36993 37409 37961 38508 36385 31938 32045 29831 23554 23938 23066 24674 24897 24700 17105 14880 38472 30825 14591 15157 7899 2050 1669 44447 44786 44769 44 61 92 95 85 79 68 77 79 76 A T A T C T T T G G C A T T T T A A A T G C G A C T A T A T T T C T A C G G C A G T C G G G C T G A C A C C G C T T G T T T T C T T T A T 1.719171917 1.928327055 2.005253199 1.845312678 1.872852234 1.850514032 1.848171701 1.851433094 1.821252639 1.865219149 1.824847473 1.57529003 1.585766209 1.963692454 2.08482488 1.970685641 2.354372624 2.251209146 1.724043566 1.828098777 1.964496307 2.258230784 2.476866997 1.958751008 2.021938927 1.824198886 1.897446877 2.473786817 2.30790667 2.204127102 2.220082384 1.990741851 2.075506887 2.202130753 2.20090441 2.202699918 2.032613396 1.847805562 1.759458628 1.780515461 1.722774886 1.80090045 1.241749175 1.262518189 1.896394068 2.704181321 2.116192439 1.497092186 1.372042155 2.051778123 2.257594014 1.710783323 1.686450023 2.601405499 1.960784314 7.170650361 2.234243744 2.185446368 2.227511956 2.222222222 1.612903226 1.075268817 2.06185567 2.247191011 8.045977011 11.53846154 3.75 1.25 4.938271605 382 558 710 648 654 666 651 655 647 646 667 535 541 662 728 601 774 768 630 610 633 701 795 680 682 678 709 939 911 831 830 830 779 833 842 855 799 685 572 581 523 432 301 295 477 692 534 260 207 806 712 254 260 211 41 129 1016 1001 1020 1 1 1 2 2 7 9 3 1 4 Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700, mRNA: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700, mRNA: Os06g0160700 N/A High Gene: Os06g0160700, mRNA: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700, mRNA: Os06g0160700 N/A High Gene: Os06g0160700, mRNA: Os06g0160700 N/A High Gene: Os06g0160700, mRNA: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0160700 N/A High Gene: Os06g0229800, CDS: Os06g0229800, mRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, N/A mRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, N/A mRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Gly23StpmRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Gly24Ala,Val mRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, N/A mRNA:High Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Arg26Met,Lys mRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Arg26SermRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Arg27LysmRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Arg27Ser,Arg mRNA:Low Os06g0229800 SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIa SSIIb SSIIb SSIIb SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa 82 83 84 87 90 91 93 94 95 100 107 128 129 131 174 533 553 2524 4196 4327 4328 675 695 703 97 433 590 659 901 1058 1134 1357 1379 1457 1615 1680 1708 1722 1834 2024 2080 2276 2488 2618 2758 3073 3135 3136 3179 3274 3391 3481 3559 3779 4384 4493 4496 4742 4857 4926 5021 5047 5097 5110 5308 5330 5466 5515 5614 82 83 84 87 90 91 93 94 95 100 107 128 129 131 174 533 553 2524 4196 4327 4328 40 60 68 97 433 590 659 901 1058 1134 1357 1379 1457 1615 1680 1708 1722 1834 2024 2080 2276 2488 2618 2758 3073 3135 3136 3179 3274 3391 3481 3559 3779 4384 4493 4496 4742 4857 4926 5021 5047 5097 5110 5308 5330 5466 5515 5614 SNP SNP Complex SNP SNP Complex SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 G G G G G G G G G G C G C G T G G G A G C C G G G G A C C T A G A A C G G G C G C T C C G G C G C G T G T A G A A C G T T A C A A C G G T 2 2 3 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 G/T G/T G/T/A G/T G/T/A G/T G/T G/T G/T G/T C/A G/A C/A G/T T/C G/T G/T G/C G/A G/T C/T C/A G/A G/T G/A G/A A/G C/T C/T T/A A/G G/A A/C A/C C/T G/A G/A G/A C/T G/A C/T T/C C/T C/T G/A G/A C/A G/A C/T G/A T/A G/A T/A A/T G/A A/T A/G C/T G/A T/A T/C A/G C/T A/C A/C C/T G/A G/A T/C 96.3/3.7 95.5/4.5 95.3/3.5/1.2 90.9/9.1 93.1/4.6/2.3 97.6/2.4 97.6/2.4 97.5/2.5 97.4/2.6 95.7/4.3 98.8/1.2 99.2/0.8 99.2/0.8 99.2/0.8 98.6/1.4 99.4/0.6 99.1/0.9 96.2/3.8 87.9/12.1 63.9/36.1 55.6/44.4 99.5/0.5 61.7/38.3 97.6/2.4 87.3/12.7 87.5/12.5 80.7/19.3 94.7/5.2 78.4/21.5 94.8/5.2 95.0/5.0 87.6/12.4 94.7/5.3 87.7/12.3 79.9/20.1 94.3/5.6 94.2/5.8 94.7/5.3 79.8/20.2 87.0/13.0 86.9/13.1 62.6/37.4 87.7/12.3 87.2/12.8 87.3/12.7 87.0/13.0 80.0/20.0 87.2/12.8 86.9/13.1 87.1/12.9 88.8/11.2 88.2/11.8 62.8/37.2 92.0/8.0 89.6/10.3 58.2/41.8 57.7/42.3 89.7/10.3 91.2/8.8 56.1/43.9 58.1/41.9 92.6/7.4 72.2/27.8 56.4/43.6 71.8/28.2 95.5/4.5 95.3/4.7 56.9/43.0 95.8/4.2 79/3 84/4 82/3/1 80/8 81/4/2 83/2 80/2 78/2 76/2 67/3 84/1 126/1 128/1 124/1 71/1 169/1 109/1 25504/1015 3119/428 195/110 179/143 30150/152 3240/2011 1202/30 27329/3982 31199/4461 23183/5550 27668/1532 19741/5423 29861/1624 29780/1565 31457/4449 34679/1938 25359/3556 24247/6083 29003/1737 28942/1773 29504/1641 19948/5052 23646/3538 23232/3502 15456/9219 21796/3053 26402/3888 27977/4064 26439/3948 23988/5982 26441/3884 29015/4379 22993/3412 30877/3883 25143/3354 2011/1190 11550/1010 32666/3770 20437/14656 20230/14817 34312/3946 36783/3553 20884/16322 18861/13618 39441/3156 30372/11693 24651/19025 30418/11945 40705/1911 51381/2531 22510/17017 41044/1807 82 88 86 88 87 85 82 80 78 70 85 127 129 125 72 170 110 26522 3549 305 322 30305 5252 1232 31313 35667 28734 29206 25166 31485 31345 35910 36617 28918 30335 30745 30717 31148 25006 27186 26740 24676 24850 30292 32043 30390 29972 30330 33396 26406 34764 28500 3201 12561 36442 35099 35050 38262 40346 37212 32483 42599 42067 43679 42374 42627 53918 39531 42853 G G G G G G G G G G C G C G T G G G G G C C G G G G A C C T A G A A C G G G C G C T C C G G C G C G T G T A G A A C G T T A C A A C G G T 96.34146341 95.45454545 95.34883721 90.90909091 93.10344828 97.64705882 97.56097561 97.5 97.43589744 95.71428571 98.82352941 99.21259843 99.2248062 99.2 98.61111111 99.41176471 99.09090909 96.1616771 87.88391096 63.93442623 55.59006211 99.48853325 61.69078446 97.56493506 87.27684987 87.47301427 80.6814227 94.73395878 78.44313757 94.84198825 95.00717818 87.59955444 94.70737636 87.6927865 79.93077303 94.33403805 94.2214409 94.72197252 79.77285451 86.97859192 86.88107704 62.63575944 87.71026157 87.15832563 87.31080111 86.99901283 80.03469905 87.17771184 86.88166247 87.07490722 88.81889311 88.22105263 62.82411746 91.95127776 89.6383294 58.22673011 57.71754636 89.67644138 91.16888911 56.12168118 58.06421821 92.58668044 72.19911094 56.43673161 71.78458489 95.49112065 95.29470678 56.9426526 95.77859193 79 84 82 80 81 83 80 78 76 67 84 126 128 124 71 169 109 25504 3119 195 179 30150 3240 1202 27329 31199 23183 27668 19741 29861 29780 31457 34679 25359 24247 29003 28942 29504 19948 23646 23232 15456 21796 26402 27977 26439 23988 26441 29015 22993 30877 25143 2011 11550 32666 20437 20230 34312 36783 20884 18861 39441 30372 24651 30418 40705 51381 22510 41044 T T T T T T T T T T A A A T C T T C A T T A A T A A G T T A G A C C T A A A T A T C T T A A A A T A A A A T A T G T A A C G T C C T A A C 3.658536585 4.545454545 3.488372093 9.090909091 4.597701149 2.352941176 2.43902439 2.5 2.564102564 4.285714286 1.176470588 0.787401575 0.775193798 0.8 1.388888889 0.588235294 0.909090909 3.827011538 12.05973514 36.06557377 44.40993789 0.501567398 38.29017517 2.435064935 12.71676301 12.50735974 19.3150971 5.245497501 21.5489152 5.158011752 4.992821822 12.3893066 5.292623645 12.29683934 20.05274435 5.649699138 5.772048052 5.268396045 20.20315124 13.01405135 13.09648467 37.36018804 12.28571429 12.83507197 12.68295728 12.9911155 19.95862805 12.80580284 13.11234878 12.92130576 11.16960074 11.76842105 37.17588254 8.040761086 10.34520608 41.75617539 42.27389444 10.31310439 8.806325286 43.86219499 41.92346766 7.408624616 27.79613474 43.5564001 28.1894558 4.483074108 4.694165214 43.04722876 4.216740952 3 4 3 8 4 2 2 2 2 3 1 1 1 1 1 1 1 1015 428 110 143 152 2011 30 3982 4461 5550 1532 5423 1624 1565 4449 1938 3556 6083 1737 1773 1641 5052 3538 3502 9219 3053 3888 4064 3948 5982 3884 4379 3412 3883 3354 1190 1010 3770 14656 14817 3946 3553 16322 13618 3156 11693 19025 11945 1911 2531 17017 1807 Gene: Os06g0229800, CDS: Os06g0229800, Gly28TrpmRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Gly28ValmRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, N/A mRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Arg29SermRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, N/A mRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Val31LeumRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, N/A mRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Gly32CysmRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Gly32ValmRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Ala34SermRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Pro36GlnmRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Gly43AspmRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, N/A mRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Arg44LeumRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, N/A mRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Gly135Val mRNA:Low Os06g0229800 Gene: Os06g0229800, CDS: Os06g0229800, Ala142SermRNA:Low Os06g0229800 Gene: Os06g0229800 N/A High Gene: Os06g0229800, Gene: Os06g0229900, Met737Val CDS: Os06g0229800, High CDS: Os06g0229900, mRNA: Os06g0229800, mRNA: Os06g0229900 Gene: Os06g0229800, Gene: Os06g0229900, Pro32ThrCDS: Os06g0229800, High CDS: Os06g0229900, mRNA: Os06g0229800, mRNA: Os06g0229900 Gene: Os06g0229800, Gene: Os06g0229900, Leu781Phe CDS: Os06g0229800, High CDS: Os06g0229900, mRNA: Os06g0229800, mRNA: Os06g0229900 Gene: Os02g0744700, Gene: Os02g0744800, N/A mRNA: Low Os02g0744700, mRNA: Os02g0744800 Gene: Os02g0744700, Gene: Os02g0744800, N/A mRNA: High Os02g0744700, mRNA: Os02g0744800 Gene: Os02g0744700, Gene: Os02g0744800, N/A mRNA: High Os02g0744700, mRNA: Os02g0744800 N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A Low N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIVa SSIVa SSIVa SSIVa SSIVa 5898 5925 6086 6204 6242 6414 6421 6507 6678 6699 6752 6795 6858 6886 6895 6998 7004 7395 7443 7444 7522 7660 7810 7869 7920 8102 8159 8456 8491 8519 8888 9076 9366 9467 9517 10336 10761 11041 1315 2155 2348 2493 3683 3686 3933 4543 5032 5057 5321 5333 5384 5451 5927 5945 5949 5967 5969 6475 7232 7255 7437 8079 8530 8544 1349 1748 2222 2874 3246 5898 5925 6086 6204 6242 6414 6421 6507 6678 6699 6752 6795 6858 6886 6895 6998 7004 7377 7425 7426 7504 7642 7792 7851 7902 8084 8141 8438 8473 8501 8870 9058 9348 9449 9499 10318 10743 11023 316 1170 1363 1508 2698 2701 2948 3558 4047 4072 4336 4348 4399 4466 4941 4959 4963 4981 4983 5489 6246 6269 6451 7093 7544 7558 141 542 1022 1674 2046 SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C A A C T A A T A C T G C A C C T C C G A A T G G T T C T G G C A T T G C T T A A A A A A C T C T C G T C C G C T C T C A A C A T A A A T 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 C/T A/G A/G C/T T/C A/T A/G T/C A/G C/T T/C G/A C/T A/G C/T C/T T/C C/T C/A A/G A/G A/T T/C G/T A/G T/A G/T C/T A/T A/G A/G C/A A/T C/T C/T A/G T/C A/T T/C A/T A/G A/G A/G A/T A/G C/A T/C C/T T/G C/T G/A T/C A A A A A C/T T/G C/A A/C A/T C/T A/G T/C A/G A/G A/T T/G 89.0/11.0 56.9/43.1 89.6/10.3 89.6/10.4 89.5/10.5 89.8/10.2 95.4/4.6 96.0/4.0 99.0/0.9 90.7/9.3 96.0/4.0 92.7/7.3 59.6/40.4 58.0/42.0 56.9/43.1 92.6/7.4 93.0/7.0 61.2/38.8 99.4/0.6 59.8/40.2 95.4/4.6 52.5/47.5 95.0/5.0 94.9/5.1 57.9/42.1 94.8/5.2 61.5/38.4 97.4/2.6 62.1/37.9 63.2/36.8 61.8/38.2 94.9/5.1 94.9/5.1 59.1/40.9 59.5/40.5 54.3/45.7 51.4/48.6 58.0/42.0 92.7/7.3 90.1/9.9 90.6/9.3 90.2/9.8 90.8/9.2 91.2/8.8 91.2/8.8 90.0/10.0 90.8/9.2 91.4/8.6 90.0/10.0 90.5/9.5 90.2/9.8 90.6/9.4 100 100 100 100 100 94.7/5.3 89.2/10.7 90.3/9.7 90.6/9.4 90.7/9.3 77.7/22.3 79.2/20.8 86.7/13.3 87.6/12.4 87.9/12.1 87.7/12.3 87.9/12.1 45236/5592 27586/20925 44693/5159 42505/4917 36574/4300 36697/4155 41615/1999 39114/1627 58954/565 49110/5048 25338/1051 22616/1779 23914/16215 24363/17639 20295/15387 18662/1488 21015/1588 1337/847 5756/33 3494/2346 17729/863 11967/10843 33289/1747 33544/1809 16585/12049 27061/1470 16935/10581 38756/1015 21756/13254 17456/10182 20575/12720 31344/1686 35309/1904 15656/10851 17021/11575 10401/8753 7587/7177 12822/9303 50816/3984 36870/4035 50981/5256 54436/5904 38648/3893 38783/3725 48003/4622 36879/4090 16420/1660 41687/3933 22468/2492 26146/2738 29827/3253 31071/3236 27741 31417 31395 30019 29935 25219/1424 31919/3841 37021/3996 33029/3437 23640/2431 3450/990 6541/1716 21321/3279 9650/1367 12246/1692 12149/1698 11799/1627 50832 48515 49857 47427 40875 40856 43617 40747 59522 54160 26391 24398 40135 42006 35685 20151 22604 2184 5793 5841 18592 22811 35038 35358 28636 28532 27519 39772 35012 27640 33300 33033 37216 26511 28598 19156 14765 22125 54800 40913 56241 60351 42544 42510 52633 40974 18081 45629 24961 28892 33085 34309 27744 31422 31400 30022 29940 26644 35764 41020 36466 26071 4441 8257 24602 11017 13938 13849 13427 C A A C T A A T A C T G C A C C T C C A A A T G A T G C A A A C A C C A T A T A A A A A A C T C T C G T A A A A A C T C A A C A T A A A T 88.99118665 56.86076471 89.6423772 89.62194531 89.47767584 89.82034463 95.41004654 95.99234299 99.04573099 90.67577548 96.01000341 92.69612263 59.58390432 57.99885731 56.87263556 92.61078855 92.97027075 61.21794872 99.36129812 59.81852423 95.35821859 52.46153172 95.00827673 94.86961932 57.91660846 94.84438525 61.53930012 97.445439 62.1386953 63.15484805 61.78678679 94.88693125 94.87585985 59.054732 59.51814812 54.29630403 51.38503217 57.95254237 92.72992701 90.11805539 90.64739247 90.1990025 90.84242196 91.23265114 91.20323751 90.00585737 90.8135612 91.36075741 90.01241937 90.49563893 90.15263715 90.56224314 99.98918685 99.98408758 99.98407643 99.99000733 99.98329993 94.65170395 89.24896544 90.25109703 90.57478199 90.67546316 77.68520603 79.21763352 86.66368588 87.59190342 87.86052518 87.72474547 87.87517688 45236 27586 44693 42505 36574 36697 41615 39114 58954 49110 25338 22616 23914 24363 20295 18662 21015 1337 5756 3494 17729 11967 33289 33544 16585 27061 16935 38756 21756 17456 20575 31344 35309 15656 17021 10401 7587 12822 50816 36870 50981 54436 38648 38783 48003 36879 16420 41687 22468 26146 29827 31071 27741 31417 31395 30019 29935 25219 31919 37021 33029 23640 3450 6541 21321 9650 12246 12149 11799 T G G T C T G C G T C A T G T T C T A G G T C T G A T T T G G A T T T G C T C T G G G T G A C T G T A C 11.00094429 43.13099042 10.34759412 10.36751218 10.51987768 10.16986489 4.583075406 3.992931995 0.949228857 9.320531758 3.982418249 7.291581277 40.40114613 41.99162024 43.11895755 7.384248921 7.025305256 38.78205128 0.56965303 40.16435542 4.641781411 47.53408443 4.986015184 5.116239606 42.07640732 5.152109912 38.44979832 2.552046666 37.85559237 36.83791606 38.1981982 5.103986922 5.116079106 40.93017993 40.47485838 45.69325538 48.60819506 42.04745763 7.270072993 9.862390927 9.345495279 9.782770791 9.150526514 8.762644084 8.781562898 9.981939767 9.180908136 8.619518289 9.983574376 9.476671743 9.832250264 9.431927483 5592 20925 5159 4917 4300 4155 1999 1627 565 5048 1051 1779 16215 17639 15387 1488 1588 847 33 2346 863 10843 1747 1809 12049 1470 10581 1015 13254 10182 12720 1686 1904 10851 11575 8753 7177 9303 3984 4035 5256 5904 3893 3725 4622 4090 1660 3933 2492 2738 3253 3236 T G A C T T G C G G T G 5.344542861 10.73985013 9.741589469 9.425218011 9.324536842 22.29227651 20.78236648 13.3281847 12.40809658 12.13947482 12.26081306 12.11737544 1424 3841 3996 3437 2431 990 1716 3279 1367 1692 1698 1627 N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A Low N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A Low N/A High N/A High N/A High N/A Low N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High N/A High Gene: Os08g0191500, mRNA: Os08g0191500 N/A High Gene: Os04g0624600, CDS: Os04g0624600, Thr1176Ala mRNA:High Os04g0624600 Gene: Os04g0624600, CDS: Os04g0624600, N/A mRNA:High Os04g0624600 Gene: Os04g0624600 N/A High Gene: Os04g0624600, CDS: Os04g0624600, N/A mRNA:High Os04g0624600 Gene: Os04g0624600 N/A High Gene: Os04g0624600 N/A High Gene: Os04g0624600 N/A High Gene: Os04g0624600, CDS: Os04g0624600, Ser756IlemRNA:High Os04g0624600 Gene: Os04g0624600 N/A High Gene: Os04g0624600 N/A High Gene: Os04g0624600, CDS: Os04g0624600, N/A mRNA:High Os04g0624600 Gene: Os04g0624600, CDS: Os04g0624600, N/A mRNA:High Os04g0624600 Gene: Os04g0624600, CDS: Os04g0624600, N/A mRNA:High Os04g0624600 Gene: Os04g0624600, CDS: Os04g0624600, Glu643Gly mRNA:High Os04g0624600 Gene: Os04g0624600 N/A Low Gene: Os04g0624600 N/A Low Gene: Os04g0624600 N/A Low Gene: Os04g0624600 N/A Low Gene: Os04g0624600 N/A Low Gene: Os04g0624600, CDS: Os04g0624600, Glu460Lys mRNA:High Os04g0624600 Gene: Os04g0624600, CDS: Os04g0624600, Lys207Asn mRNA:High Os04g0624600 Gene: Os04g0624600, CDS: Os04g0624600, Val200Phe mRNA:High Os04g0624600 Gene: Os04g0624600, CDS: Os04g0624600, Phe139Cys mRNA:High Os04g0624600 Gene: Os04g0624600 N/A High N/A High N/A High N/A High Gene: Os01g0720600, mRNA: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa 3423 3575 4048 4394 4604 5720 6389 6800 6901 7160 7289 7401 7506 7702 7744 7823 8383 8772 9016 9109 10020 10411 2223 2375 2848 3194 3404 4520 5189 5600 5701 5960 6089 6201 6306 6502 6544 6623 7183 7572 7816 7909 8820 9211 SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP SNP 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 List of Indels Mapping AGPS2b AGPS2b AGPS2b AGPS2b BEI BEI BEI BEI BEI BEI BEI BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb BEIIb Isoamylase1 Isoamylase1 Isoamylase1 Isoamylase1 Isoamylase1 T T C G T G C T C A T A A T T T C G A C G A 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 T/C T/A C/T G/A T/A G/A C/T T/C C/T A/G T/G A/T A/T T/C T/A T/C C/A G/A A/G C/T G/A A/C 88.9/11.1 88.2/11.7 88.3/11.7 86.8/13.2 87.8/12.2 87.7/12.3 86.7/13.3 87.0/13.0 87.9/12.1 74.1/25.9 87.1/12.9 87.4/12.6 86.8/13.2 87.5/12.5 87.7/12.2 87.1/12.9 86.9/13.1 86.1/13.9 87.5/12.5 86.7/13.3 88.5/11.5 88.5/11.5 9943/1242 8572/1141 9640/1279 12943/1972 11431/1586 10685/1504 22821/3487 25532/3807 22567/3104 15409/5383 23450/3474 18706/2692 24225/3676 22936/3270 23961/3341 21389/3176 22602/3406 22505/3632 21481/3066 24682/3776 5112/662 8892/1152 11186 9715 10919 14915 13018 12190 26312 29340 25676 20794 26924 21399 27903 26207 27307 24565 26013 26138 24549 28462 5775 10046 T T C G T G C T C A T A A T T T C G A C G A 88.88789558 88.23468863 88.28647312 86.778411 87.80918728 87.6538146 86.73228945 87.02113156 87.89141611 74.10310667 87.09701382 87.41529978 86.81862165 87.5186019 87.74673161 87.07103603 86.88732557 86.1006963 87.50254593 86.71913428 88.51948052 88.51284093 9943 8572 9640 12943 11431 10685 22821 25532 22567 15409 23450 18706 24225 22936 23961 21389 22602 22505 21481 24682 5112 8892 C A T A A A T C T G G T T C A C A A G T A C 11.10316467 11.74472465 11.71352688 13.221589 12.18313105 12.33798195 13.25250836 12.97546012 12.08911045 25.88727518 12.90298618 12.5800271 13.17421066 12.47758233 12.23495807 12.92896397 13.09345327 13.89547785 12.4893071 13.26681189 11.46320346 11.46725065 1242 1141 1279 1972 1586 1504 3487 3807 3104 5383 3474 2692 3676 3270 3341 3176 3406 3632 3066 3776 662 1152 Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600, CDS: Os01g0720600, Gly708Asp mRNA:High Os01g0720600 Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600, CDS: Os01g0720600, Val480AlamRNA:High Os01g0720600 Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600, CDS: Os01g0720600, Ser469Thr mRNA:High Os01g0720600 Gene: Os01g0720600, CDS: Os01g0720600, N/A mRNA:High Os01g0720600 Gene: Os01g0720600, CDS: Os01g0720600, N/A mRNA:High Os01g0720600 Gene: Os01g0720600, CDS: Os01g0720600, His363Arg mRNA:High Os01g0720600 Gene: Os01g0720600, CDS: Os01g0720600, Leu176Phe mRNA:High Os01g0720600 Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600 N/A High Gene: Os01g0720600, mRNA: Os01g0720600 N/A High List of Insertion/Deletion Reference Consensu Variation position s position type 3926 1579 INDEL 4245 1898 INDEL 4294 1947 INDEL 4305 1958 INDEL 5733 5733 INDEL 5733 5733 INDEL 6401 6401 INDEL 6574 6574 INDEL 6579 6579 INDEL 6772 6772 INDEL 6789 6789 INDEL 156 156 INDEL 408 409 INDEL 612 615 INDEL 612 615 INDEL 3127 3130 INDEL 3127 3130 INDEL 4058 4061 INDEL 4058 4061 INDEL 5378 5381 INDEL 5378 5381 INDEL 5813 5816 INDEL 7056 7058 INDEL 7056 7058 INDEL 7148 7151 INDEL 7724 7728 INDEL 8297 8301 INDEL 8309 8313 INDEL 8310 8314 INDEL 8311 8315 INDEL 8311 8315 INDEL 8311 8315 INDEL 9548 9552 INDEL 9600 9603 INDEL 9600 9603 INDEL 9642 9645 INDEL 3530 3530 INDEL 3539 3539 INDEL 3786 3786 INDEL 3970 3970 INDEL 4533 4533 INDEL Length 1 1 4 1 1 1 1 1 4 2 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 8 2 2 1 1 3 5 1 1 1 3 1 1 2 1 1 Referenc e Variants 2 2 AACT 2 2 2 A 2 2 2 ---2 GA 2 2 2 -2 A 2 2 T 2 2 2 A 2 A 2 2 T 2 -2 2 2 GGGGGGGG 3 AT 2 AA 3 2 2 --2 ----2 T 2 2 T 2 TTA 2 2 2 TA 2 2 A 2 Allele variations Frequencies Counts -/G 99.1/0.9 108/1 -/A 98.9/1.1 939/10 AACT/---98.7/1.1 11484/131 -/T 99.1/0.9 14049/121 -/A 97.9/2.1 12716/267 A/95.1/4.8 13121/665 -/G 92.0/8.0 4234/368 -/A 99.2/0.8 1159/9 ----/CCTG 99.2/0.8 1553/12 GA/-99.4/0.5 3220/17 -/A 99.3/0.7 2630/19 G/81.4/18.6 5342/1218 CA/-69.0/31.0 3731/1675 A/96.6/3.3 14474/500 -/A 99.3/0.7 14574/101 T/96.9/2.9 6201/184 -/T 99.0/1.0 5923/57 -/A 98.1/1.9 11607/220 A/98.9/1.0 12075/128 A/98.9/1.1 2468/28 -/A 99.1/0.9 2344/21 -/T 73.1/26.8 5192/1902 --/AA 41.7/1.9 2455/110 A/56.4/41.7 3323/2455 T/81.0/19.0 7810/1829 GGGGGGGG/GGTGGGGG/-------97.2/1.4/0.7 138/2/1 AT/-98.7/1.3 3220/43 AA/AT/-64.9/33.9/1.1 1352/706/23 -/T 65.1/34.9 1356/726 -/T 64.7/15.2 1328/313 ---/TAT 64.7/16.2 1328/332 -----/TATAT 64.7/3.8 1328/79 -/T 76.1/23.8 4778/1493 -/T 98.6/1.4 8114/112 T/96.4/3.5 8035/292 ---/TTA 76.7/23.2 4224/1276 -/A 99.3/0.7 16623/113 -/T 98.2/1.8 17273/312 TA/-99.4/0.6 20518/124 -/A 99.0/0.9 17704/169 A/98.5/1.5 15699/240 Coverage 109 949 11639 14170 12991 13794 4602 1168 1565 3239 2649 6560 5408 14982 14676 6401 5981 11829 12205 2496 2366 7098 5894 5894 9642 142 3263 2082 2082 2053 2053 2053 6279 8232 8333 5505 16736 17585 20652 17877 15943 Variant Frequency of Frequency of #1 #1 Count of #1 Variant #2 #2 Count of #2 Variant #3 99.08256881 108 G 0.917431193 1 98.94625922 939 A 1.05374078 10 AACT 98.66827047 11484 ---1.125526248 131 99.14608327 14049 T 0.853916725 121 97.88314987 12716 A 2.055269032 267 A 95.12106713 13121 4.820936639 665 92.00347675 4234 G 7.996523251 368 99.22945205 1159 A 0.770547945 9 ---99.23322684 1553 CCTG 0.766773163 12 GA 99.4133992 3220 -0.52485335 17 99.28274821 2630 A 0.717251793 19 G 81.43292683 5342 18.56707317 1218 CA 68.99038462 3731 -30.97263314 1675 A 96.60926445 14474 3.337338139 500 99.30498774 14574 A 0.688198419 101 T 96.8754882 6201 2.874550851 184 99.0302625 5923 T 0.95301789 57 98.1232564 11607 A 1.859835996 220 A 98.93486276 12075 1.048750512 128 A 98.87820513 2468 1.121794872 28 99.07016061 2344 A 0.887573964 21 73.14736546 5192 T 26.79628064 1902 -41.65252799 2455 AA 1.866304717 110 A 56.37936885 3323 41.65252799 2455 T 80.99979257 7810 18.96909355 1829 GGGGGGGG 97.18309859 138 GGTGGGGG1.408450704 2 -------AT 98.6821943 3220 -1.3178057 43 AA 64.93756004 1352 AT 33.90970221 706 -65.129683 1356 T 34.870317 726 64.68582562 1328 T 15.24598149 313 --64.68582562 1328 TAT 16.17145641 332 ----64.68582562 1328 TATAT 3.848027277 79 76.09491957 4778 T 23.7776716 1493 98.56656948 8114 T 1.360544218 112 T 96.42385695 8035 3.504140166 292 --76.73024523 4224 TTA 23.17892825 1276 99.3248088 16623 A 0.675191205 113 98.22576059 17273 T 1.774239409 312 TA 99.35115243 20518 -0.600426109 124 99.03227611 17704 A 0.945348772 169 A 98.46954776 15699 1.505362855 240 Frequency of #3 Count of #3 Variant #4 0.704225352 1 1.104707012 23 Frequency of #4 Count of #4 Overlappin g annotation Amino acid s change Gene: Os08g0345800 N/A Gene: Os08g0345800 N/A Gene: Os08g0345800 N/A Gene: Os08g0345800 N/A Gene: Os06g0726400 N/A Gene: Os06g0726400 N/A Gene: Os06g0726400 N/A Gene: Os06g0726400 N/A Gene: Os06g0726400 N/A Gene: Os06g0726400 N/A Gene: Os06g0726400 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os02g0528200 N/A Gene: Os08g0520900 N/A Gene: Os08g0520900 N/A Gene: Os08g0520900 N/A Gene: Os08g0520900 N/A Gene: Os08g0520900 N/A Isoamylase1 Isoamylase1 Isoamylase1 Isoamylase1 Isoamylase1 Isoamylase1 Isoamylase1 Isoamylase1 GBSSI GBSSI GBSSII GBSSII GBSSII GBSSII GBSSII GBSSII GBSSII GBSSII GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 GPT1 Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase Pullulanase SPHOL SPHOL SPHOL SPHOL SPHOL SPHOL SPHOL SPHOL SSI SSI SSI SSI SSI SSI SSI SSI SSIIa SSIIb SSIIb SSIIb SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa 4533 5685 5685 5685 5685 5871 5871 6331 2208 2208 291 491 3358 3447 3447 3529 3529 5173 315 315 315 326 326 365 518 852 3623 3623 3657 2133 2133 2169 2584 2683 2686 5166 5368 5662 6656 8651 8651 411 411 1613 2552 3567 3620 4569 4579 1712 3773 4884 4987 5365 5422 6578 7231 3196 703 3213 3213 199 448 4520 4520 5011 5011 5170 5196 4533 5685 5685 5685 5685 5871 5871 6331 2208 2208 212 412 3279 3368 3368 3450 3450 5094 260 260 260 271 271 310 463 797 3568 3568 3602 1438 1438 1474 1889 1988 1991 4471 4673 4967 5961 7956 7956 305 305 1507 2446 3461 3513 4463 4474 1501 3562 4673 4776 5154 5211 6367 7020 3196 68 2578 2578 199 448 4520 4520 5011 5011 5170 5196 INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 3 1 2 1 1 4 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 2 1 1 1 1 AA AAA A A A A A A A A T C T TT ----T TTAC T -G -A T T T A A C T C A A A G GGT G A T -T C 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 -/A AA/-AAA/--A/-/A -/A A/A/-/A A/A/A/A/A/-/A T/-/T C/T/TT/--/T --/TA -/A ---/GCC -/A --/TG -/T T/TTAC/---T/-/T -/T -/A -/G --/GA G/--/GC A/T/T/-/T T/-/T A/A/-/C C/T/-/T C/A/-/G A/-/T A/G/GGT/--G/-/T -/T A/-/A -/A T/-/A --/AA -/T T/-/T C/- 99.2/0.8 57.9/33.8 56.7/4.0 59.1/4.3 98.8/1.2 98.9/1.1 96.9/3.0 97.8/2.2 97.1/2.7 96.1/3.7 99.3/0.7 99.0/1.0 97.6/2.4 99.2/0.8 99.5/0.5 96.5/3.4 98.5/1.5 98.0/2.0 75.7/22.6 75.1/1.7 99.0/1.0 96.6/2.4 96.6/0.8 96.8/3.2 99.4/0.6 75.0/24.9 94.3/5.6 96.6/3.3 98.1/1.5 98.8/1.2 98.6/1.3 99.3/0.7 98.7/1.3 99.5/0.5 98.6/1.4 97.8/2.2 98.7/1.3 99.2/0.8 98.4/1.3 97.7/2.2 98.8/1.2 99.4/0.6 99.4/0.6 98.7/1.2 83.0/17.0 96.5/3.5 96.1/3.9 92.2/7.7 92.4/7.6 99.2/0.8 81.7/9.6/8.3 99.4/0.6 97.9/2.1 97.9/2.0 97.5/2.5 98.8/1.1 96.4/3.5 93.3/6.6 99.4/0.6 99.4/0.6 99.3/0.7 79.7/20.3 64.0/36.0 73.8/25.4 73.8/0.8 98.3/1.7 98.7/1.3 97.0/3.0 99.1/0.9 15529/123 6393/3731 6107/426 6733/490 11089/132 4179/48 4438/137 1088/24 6607/186 6967/270 26301/185 8488/82 22410/553 15182/126 14686/80 13208/464 13219/202 17655/359 8013/2394 7744/174 10293/104 7238/182 7238/63 1075/36 2535/16 5216/1734 6430/381 6679/230 5930/93 10170/122 9973/134 14079/104 16977/226 13502/70 13091/182 12509/277 13498/176 22397/171 9251/126 7532/172 7548/92 7412/46 7299/47 11876/147 13202/2696 3046/109 17705/717 6227/522 6872/569 10416/81 5135/602/522 12595/71 14704/311 13265/276 15682/395 8013/88 4016/146 10588/747 166/1 22033/125 21721/163 6615/1684 9456/5313 9839/3381 9839/109 12638/218 12963/171 17265/537 15444/134 15653 11050 10764 11386 11225 4227 4582 1113 6801 7249 26490 8573 22967 15311 14767 13681 13424 18020 10582 10315 10400 7492 7492 1111 2551 6952 6822 6916 6045 10295 10111 14183 17203 13572 13276 12788 13674 22577 9404 7712 7641 7458 7346 12027 15900 3155 18425 6751 7441 10503 6286 12667 15015 13544 16084 8113 4164 11343 167 22159 21885 8300 14772 13335 13335 12862 13135 17802 15588 AA AAA A A A A A A A A T C T TT ----T TTAC T -G -A T T T A A C T C A A A G GGT G A T -T C 99.20781959 57.85520362 56.73541434 59.13402424 98.78841871 98.86444287 96.85726757 97.75381851 97.14747831 96.10980825 99.28652322 99.00851511 97.57478121 99.15746849 99.45147965 96.54265039 98.47288439 97.97447281 75.72292572 75.0751333 98.97115385 96.60971703 96.60971703 96.75967597 99.37279498 75.0287687 94.25388449 96.57316368 98.09760132 98.78581836 98.63514984 99.26672777 98.68627565 99.48423224 98.60650798 97.81826713 98.71288577 99.20272844 98.37303275 97.6659751 98.78288182 99.38321266 99.36019603 98.74449156 83.03144654 96.5451664 96.09226594 92.23818694 92.35317834 99.17166524 81.68946866 99.43159391 97.92873793 97.94004725 97.50062174 98.76741033 96.44572526 93.34391255 99.4011976 99.43138228 99.25062828 79.69879518 64.01299756 73.78327709 73.78327709 98.2584357 98.69052151 96.983485 99.07621247 15529 6393 6107 6733 11089 4179 4438 1088 6607 6967 26301 8488 22410 15182 14686 13208 13219 17655 8013 7744 10293 7238 7238 1075 2535 5216 6430 6679 5930 10170 9973 14079 16977 13502 13091 12509 13498 22397 9251 7532 7548 7412 7299 11876 13202 3046 17705 6227 6872 10416 5135 12595 14704 13265 15682 8013 4016 10588 166 22033 21721 6615 9456 9839 9839 12638 12963 17265 15444 A ---A A A A T -T TA A GCC A TG T ---T T A G GA GC T T C T T --T T A A A AA T T - 0.785791861 33.76470588 3.957636566 4.303530652 1.175946548 1.135557133 2.989960716 2.156334232 2.734891928 3.724651676 0.698376746 0.95649131 2.407802499 0.822937757 0.541748493 3.391564944 1.50476758 1.992230855 22.62332262 1.686863791 1 2.429257875 0.840896957 3.240324032 0.627205018 24.9424626 5.584872471 3.325621747 1.538461538 1.185041282 1.325289289 0.733272227 1.31372435 0.515767757 1.370894848 2.166093212 1.287114231 0.757407982 1.339855381 2.230290456 1.204030886 0.616787342 0.639803975 1.222249938 16.95597484 3.454833597 3.891451832 7.732187824 7.646821664 0.771208226 9.576837416 0.560511565 2.071262071 2.037802717 2.455856752 1.08467891 3.506243996 6.585559376 0.598802395 0.564104878 0.744802376 20.28915663 35.96669374 25.35433071 0.817397825 1.694915254 1.301865246 3.016514998 0.859635617 123 3731 426 490 132 48 137 24 186 270 185 82 553 126 80 464 202 359 2394 174 104 182 63 36 16 1734 381 230 93 122 134 104 226 70 182 277 176 171 126 172 92 46 47 147 2696 109 717 522 569 81 602 G 71 311 276 395 88 146 747 1 125 163 1684 5313 3381 109 218 171 537 134 8.304167992 522 Gene: Os08g0520900 N/A Gene: Os08g0520900 N/A Gene: Os08g0520900 N/A Gene: Os08g0520900 N/A Gene: Os08g0520900 N/A Gene: Os08g0520900 N/A Gene: Os08g0520900 N/A Gene: Os08g0520900 N/A Gene: Os06g0133000 N/A Gene: Os06g0133000 N/A Gene: Os07g0412100, N/A mRNA: Os07g0412100 Gene: Os07g0412100, N/A mRNA: Os07g0412100 Gene: Os07g0412100 N/A Gene: Os07g0412100 N/A Gene: Os07g0412100 N/A Gene: Os07g0412100 N/A Gene: Os07g0412100 N/A Gene: Os07g0412100 N/A Gene: Os08g0187800 N/A Gene: Os08g0187800 N/A Gene: Os08g0187800 N/A Gene: Os08g0187800 N/A Gene: Os08g0187800 N/A Gene: Os08g0187800 N/A Gene: Os08g0187800 N/A Gene: Os08g0187800 N/A Gene: Os08g0187800 N/A Gene: Os08g0187800 N/A Gene: Os08g0187800 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os04g0164900 N/A Gene: Os03g0758100 N/A Gene: Os03g0758100 N/A Gene: Os03g0758100 N/A Gene: Os03g0758100 N/A Gene: Os03g0758100, N/A mRNA: Os03g0758100 Gene: Os03g0758100, N/A mRNA: Os03g0758100 Gene: Os03g0758100, N/A mRNA: Os03g0758100 Gene: Os03g0758100, N/A mRNA: Os03g0758100 Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0160700 N/A Gene: Os06g0229800 N/A Gene: Os02N/A Gene: Os02g0744700 N/A Gene: Os02g0744700 N/A N/A N/A N/A N/A N/A N/A N/A N/A SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIIIb SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa SSIVa 6030 6914 6914 7619 7622 7658 7660 7757 7757 7757 10220 10852 1076 2957 2957 2957 3521 4720 4720 5529 5926 7722 7722 7862 7862 8436 8431 8439 4151 4492 4492 5509 5768 5774 9939 6030 6914 6914 7593 7596 7632 7634 7731 7731 7731 10194 10826 45 1943 1943 1943 2507 3706 3706 4515 4912 6707 6707 6847 6847 7421 7416 7424 2951 3292 3292 4309 4568 4574 8739 INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL INDEL Stringency Criteria for INDEL detection (DIP) Mismatch cost 2 Similarity 0.7 Min variant frequency 0.5% 1 2 1 4 1 1 1 1 2 1 3 1 2 3 1 2 2 2 2 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 2 T ----A -T --TC AAA A AA AT -GA A G A A CCT C A A A -- 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 T/--/AA -/A ----/CTTA -/A -/T A/T/-/T --/TT T/---/CCT -/T TC/-AAA/--A/AA/-AT/---/GA GA/-A/A/-/A -/A A/CCT/--C/-/C -/A -/A A/A/-/C A/--/CG 95.7/4.3 93.0/6.3 93.0/0.6 70.6/29.2 90.2/9.8 98.5/1.4 53.7/42.1/4.0 63.7/34.5 63.7/1.7 98.7/1.2 54.0/45.9 91.6/8.4 89.3/10.6 96.5/0.6 97.0/1.6 96.8/0.7 91.2/8.6 97.8/2.2 98.2/1.8 90.3/9.6 100 94.8/5.0 96.4/3.5 98.3/1.7 98.8/1.1 98.9/1.1 99.3/0.7 97.9/2.1 88.8/11.2 99.2/0.7 99.1/0.9 98.4/1.6 99.4/0.6 98.0/1.9 92.6/7.4 23530/1057 11219/761 11219/72 3382/1400 5548/605 10887/157 5911/4636/445 5078/2751 5078/136 8329/105 5151/4375 6537/598 18803/2237 7131/41 8201/133 7663/59 11174/1057 31903/704 30606/555 11203/1195 15265 9147/483 9023/325 6241/109 6759/73 370/4 432/3 374/8 5936/746 6970/51 7089/61 5383/87 4773/28 5026/98 2430/194 24590 12063 12063 4789 6154 11048 11013 7976 7976 8435 9532 7135 21052 7388 8456 7919 12249 32607 31174 12403 15272 9644 9356 6352 6841 374 435 382 6684 7023 7151 5471 4801 5126 2625 T ----A -T --TC AAA A AA AT -GA A A A CCT C A A A -- 95.6893046 93.00339882 93.00339882 70.62017123 90.15274618 98.54272266 53.67293199 63.66599799 63.66599799 98.74333136 54.03902644 91.61878066 89.31692951 96.52138603 96.98438978 96.76726859 91.22377337 97.8409544 98.17796882 90.32492139 99.95416448 94.84653671 96.44078666 98.25251889 98.80134483 98.93048128 99.31034483 97.90575916 88.80909635 99.24533675 99.13298839 98.39151892 99.41678817 98.04916114 92.57142857 23530 11219 11219 3382 5548 10887 5911 5078 5078 8329 5151 6537 18803 7131 8201 7663 11174 31903 30606 11203 15265 9147 9023 6241 6759 370 432 374 5936 6970 7089 5383 4773 5026 2430 AA A CTTA A T T T TT CCT T -----GA -- 4.298495323 6.308546796 0.596866451 29.23366047 9.831004225 1.421071687 42.09570508 34.49097292 1.705115346 1.244813278 45.8980277 8.381219341 10.62606878 0.554953979 1.572847682 0.745043566 8.629275859 2.159045604 1.780329762 9.634765782 1057 761 72 1400 605 157 4636 2751 136 105 4375 598 2237 41 133 59 1057 704 555 1195 A A --C A A C CG 5.008295313 3.473706712 1.715994962 1.067095454 1.069518717 0.689655172 2.094240838 11.16098145 0.726185391 0.853027549 1.590202888 0.583211831 1.911822083 7.39047619 483 325 109 73 4 3 8 746 51 61 87 28 98 194 4.040679197 445 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Gene: Os08g0191500, N/A mRNA: Os08g0191500 Gene: Os04g0624600, N/A mRNA: Os04g0624600 Gene: Os04g0624600 N/A Gene: Os04g0624600 N/A Gene: Os04g0624600 N/A Gene: Os04g0624600 N/A Gene: Os04g0624600 N/A Gene: Os04g0624600 N/A Gene: Os04g0624600 N/A Gene: Os04g0624600 N/A Gene: Os04g0624600 N/A Gene: Os04g0624600 N/A Gene: Os04g0624600 N/A Gene: Os04g0624600 N/A Gene: Os04g0624600, N/A mRNA: Os04g0624600 Gene: Os04g0624600, N/A mRNA: Os04g0624600 Gene: Os04g0624600, N/A mRNA: Os04g0624600 Gene: Os01g0720600 N/A Gene: Os01g0720600 N/A Gene: Os01g0720600 N/A Gene: Os01g0720600 N/A Gene: Os01g0720600 N/A Gene: Os01g0720600 N/A Gene: Os01g0720600 N/A Appendix 2. The full list of breeding lines (studied population) and their pedigree information. barcode barcode 08 pedigree Cross *YRR07=01-01* *YRR08=01-03* ILLABONG/SARA YC 01008-0-0-56-B *YRR07=01-15* *YRR08=01-05* ILLABONG/VIALONE NANO-Y4 YC 01011-0-0-3-B *YUR07=08-19* *YRR08=01-07* ILLABONG///M102//M201/YRM3 YC 99160-0-0-4-10-B *YRR07=01-19* *YRR08=01-08* YRB4 YC 92013-1-108 *YRR07=03-14* *YRR08=01-08* YRB4 YC 92013-1-108 *YUR07=11-18* *YRR08=01-09* ILLABONG/4/YRB3///YRM2//M7/RINGO YC 00067-0-0-51-B *YUR07=02-17* *YRR08=01-10* ILLABONG/MILLIN YC 97205-0-22-B *YRR07=02-10* *YRR08=02-11* ILLABONG///YR83/M9//M7 YC 99159-0-0-29-B *YRR07=02-20* *YRR08=02-13* ILLABONG/VIALONE NANO-Y4 YC 01011-0-0-5-B *YRR07=02-16* *YRR08=02-16* ILLABONG///YR83/M9//M7 YC 99159-0-0-13-B *YRE07=10-02* *YRR08=02-19* YRM67 YC 94002-1-62 *YRR07=02-13* *YRR08=02-20* JARRAH YC 82003-14-2 *YRR07=04-17* *YRR08=02-20* JARRAH YC 82003-14-2 *YRR07=02-06* *YRR08=03-02* ILLABONG/YRM39 YC 94208-0-12-B *YRR07=02-07* *YRR08=03-09* M103///YRM34//YRM3/HUNG.NO.1 YC 00085-0-0-95-B *YUR07=02-16* *YRR08=04-02* ILLABONG/4/YRB3///YRM2//M7/RINGO YC 00067-0-0-46-B *YRR07=01-03* *YRR08=04-12* ILLABONG/IR65600-27-1-2-2 YC 97157-0-1-13-B *YRR07=01-08* *YRR08=07-01* YRB3/ARBORIO//MILLIN/WC1043 YC 00038-0-0-60-B *YRR07=01-05* *YRR08=07-06* ILLABONG YC 81116-1-5 *YRR07=03-11* *YRR08=07-06* ILLABONG YC 81116-1-5 *YRI07=04-08* *YRI08=01-01* 71048.200/H1//INGA///YRL113 YC 98023-0-44-B *YUD07=14-15* *YRI08=01-02* YRL118///INGA/M9//213D.25 YC 99046-0-0-11-S-7 *YRI07=01-04* *YRI08=01-04* 71048.200/H1//INGA///YRL113 YC 98023-0-63-B *YUD07=03-24* *YRI08=01-05* BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-53-S-6 *YRI07=01-03* *YRI08=01-06* YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-8-B *YRI07=02-05* *YRI08=01-07* YRL113/WAB450-11-1-P31-1-HB YC 00006-0-0-30-B *YRI07=01-02* *YRI08=01-08* YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-67-B *YRI07=03-08* *YRI08=02-01* BBL//M9/PELDE///YRL30/4/YRL101YC 98095-0-49-B *YRI07=03-03* *YUI07=08-12* *YUI07=14-10* *YRI07=02-03* *YRI07=02-08* *YRI07=02-04* *YRI07=01-07* *YRI07=03-02* *YUI07=12-12* *YUI07=12-10* *YRI07=04-05* *YUI07=06-10* *YRI07=02-02* *YRI07=01-06* *YRJ07=05-15* *YRJ07=01-08* *YRJ07=04-08* *YUJ07=09-22* *YRJ07=03-13* *YRJ07=03-12* *YRI07=04-06* *YRI07=04-03* *YRI07=04-01* *YRI07=03-07* *YRI07=03-01* *YRI07=01-08* *YRI07=02-07* *YRI07=01-01* *YUD07=07-19* *YRJ07=01-02* *YRI08=01-09* *YRI08=01-10* *YRI08=02-03* *YRI08=02-06* *YRI08=02-08* *YRI08=03-02* *YRI08=03-03* *YRI08=04-04* *YRI08=04-09* *YRI08=04-10* *YRI08=05-09* *YRI08=05-10* *YRI08=06-02* *YRI08=06-05* *YRJ08=02-33* *YRJ08=02-30* *YRJ08=03-28* *YRJ08=03-26* *YRJ08=03-35* *YRJ08=03-30* *YRI08=06-07* *YRI08=06-08* *YRI08=07-03* *YRI08=07-07* *YRI08=08-10* *YRI08=09-01* *YRI08=09-10* *YRI08=10-04* *YRI08=14-09* *YRJ08=01-13* YRL113///RD91V55//P/GO(4)/D.10 YC 98149-0-6-B YRL113/WAB450-11-1-P31-1-HB YC 00006-0-0-40-S-12 YRL113//L203/YRL34 YC 00064-0-0-82-S-6 BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-7-B YRL113 YC 89045J-0-17 71011//73/M7///P/4/YRL34/5/BBL//M9/P/4/YRL34 YC 99221-0-0-3-B 71048.200/H1//INGA///YRL113 YC 98023-0-74-B YRL113//L203/YRL34 YC 00064-0-0-3-B BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-53-S-9 YRL113//L203/YRL34 YC 00064-0-0-82-S-10 BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-114-B RIZABELL/YRL113 YC 99293-0-0-25-S-14 YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-69-B YRL113//L203/YRL34 YC 00064-0-0-147-B M103///M201//YR196/ARDITO YC 92243-0-11-B M103/OEIRAS YC 99089-0-38-B M103/YRK2 YC 94161S-2-0-10-B YRM49///ILLABONG/YRM54 YC 02041-B-2S-9 AKIHIKARI//KOSHIHIKARI (T)/YRK4 YC 00238A-0-0-42-B M103/YRM44 YC 95050S-5-0-B BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-10-B YRL113//L203/YRL34 YC 00064-0-0-29-B BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-65-B BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-12-B LANGI/LAGRUE//YRL113 YC 01087-0-0-22-B YRL101///PELDE/M9//M101 YC 94118-368-59-B BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-96-B YRL113//L203/YRL34 YC 00064-0-0-76-B L203//YRL101/LANGI YC 02056-B-2S-10 M103///M201//196/ARD/4/M103/YRM31 YC 99140-0-117-B *YRJ07=04-13* *YRJ07=03-09* *YUJ07=19-23* *YRJ07=05-08* *YRJ07=03-04* *YRJ07=04-01* *YRJ07=04-03* *YRJ07=02-14* *YRJ07=05-01* *YRJ07=01-20* *YRJ07=02-15* *YRJ07=04-02* *YRJ07=01-12* *YRJ07=01-07* *YRJ07=05-14* *YUJ07=19-25* *YRJ07=04-19* *YRJ07=01-18* *YRJ07=02-09* *YRJ07=03-15* *YRJ07=01-03* *YUR07=01-16* *YRJ07=05-19* *YRJ07=02-04* *YRJ07=02-19* *YRJ07=02-03* *YRJ07=04-09* *YRJ07=02-12* *YRJ07=05-06* *YRE07=10-04* *YRJ08=01-17* *YRJ08=01-20* *YRJ08=01-22* *YRJ08=01-24* *YRJ08=01-23* *YRJ08=01-27* *YRJ08=01-30* *YRJ08=01-32* *YRJ08=01-33* *YRJ08=01-35* *YRJ08=02-16* *YRJ08=02-19* *YRJ08=02-20* *YRJ08=02-21* *YRJ08=02-22* *YRJ08=02-24* *YRJ08=02-26* *YRJ08=02-27* *YRJ08=02-28* *YRJ08=02-35* *YRJ08=03-12* *YRJ08=03-13* *YRJ08=03-14* *YRJ08=03-15* *YRJ08=03-17* *YRJ08=03-18* *YRJ08=03-16* *YRJ08=03-25* *YRJ08=03-23* *YRA08=01-03* M103///M201//196/ARD/4/M103/YRM31 YC 99140-0-105-B M103/YRM54 YC 99114-0-9-1-B M103//M201/YRM3///IR65600-42-5-/MILLIN YC 00039-0-0--38-S-6 M103///M201/EIKO//CALROSE YC 92212-0-73-B M103/YRM54 YC 99114S-26-B M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-5-B ECHUCA/SHIMUZI MOCHI//MILLINYC 97035-0-309-B M103///M201//196/ARD/4/M103/YRM31 YC 99140-0-3-B M103/YRM54 YC 99114S-30-B M103/YRM54 YC 99114-0-12-B M103///M201//196/ARD/4/M103/YRM31 YC 99140-0-135-B M102/M103//M103 YC 97063-0-106-B ECHUCA/80023-TR166-2-1-4 YC 96041T-12-12-B M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-7-B M103///M201//196/ARD/4/M103/YRM31 YC 99140-0-50-B M103//M201/R///M201/YRM3//BOG.YC 98064-0-8-S-6 M103///M201//YR196/ARDITO YC 92243J-0-5-B M103/YRM18 YC 90019J-50-0-B M103//M401/CALROSE YC 96106-39-2-B M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-15-B M102/M103 YC 92076S-301-0-B VIALONE NANO Y01/008 M103///M201//YR196/ARDITO YC 92243J-0-11-B M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-30-B M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-23-B M104 Y03/009 JARRAH YC 82003-14-2 M103/YRM54 YC 99114-0-80-9-B LIMAN Y98/001 YRM64/NORIN PL11 YC 01012-0-0-9-B *YRA07=03-10* *YRE07=19-01* *YRE07=15-01* *YUA07=04-28* *YUA07=05-27* *YRA07=04-05* *YRA07=03-04* *YRA07=02-10* *YRA07=04-14* *YRA07=02-16* *YRA07=03-05* *YUA07=05-20* *YRA07=04-09* *YUA07=17-19* *YRA07=05-03* *YRA07=05-02* *YRA07=03-03* *YRA07=04-12* *YRA07=03-12* *YUA07=17-22* *YRA07=03-06* *YRA07=01-06* *YRA07=05-10* *YRE07=14-04* *YUA07=05-21* *YRA07=03-07* *YRA07=02-08* *YRA07=02-01* *YRA07=05-08* *YRA07=01-07* *YRA08=01-06* *YRA08=01-05* *YRA08=01-09* *YRA08=01-10* *YRA08=02-01* *YRA08=02-02* *YRA08=02-03* *YRA08=02-05* *YRA08=02-06* *YRA08=02-07* *YRA08=02-08* *YRA08=02-12* *YRA08=03-01* *YRA08=03-02* *YRA08=03-03* *YRA08=03-04* *YRA08=03-05* *YRA08=03-06* *YRA08=03-08* *YRA08=03-10* *YRA08=03-11* *YRA08=04-02* *YRA08=04-04* *YRA08=04-05* *YRA08=04-07* *YRA08=04-10* *YRA08=04-11* *YRA08=04-12* *YRA08=05-02* *YRA08=05-03* REIZIQ M401/YRM42//YRM54 NAMAGA///M201/YRM3//BOGAN KOSHIHIKARI(T)/M202//BOGAN YRM65//JARRAH/AMAROO M201//YR196/ARDITO///YRM54 PARAGON M201//YR196/ARDITO///YRM54 M7/M201//M103 YRM54/YRM61 YRM54/YRM61 YRM65//JARRAH/AMAROO YRM66 IR65600-42-5-2/MILLIN//YRM63 YRM68 NAMAGA M201//YR196/ARDITO///YRM54 M201//YR196/ARDITO///YRM54 QUEST_CT19 KOSHIHIKARI/M102//YRM43 YRK4/SR13925-13-1 M201//YR196/ARDITO///YRM54 OPUS/MATSURIBARE YRM64/NORIN PL11 M201/YRM3//BOGAN///YRM33 M201//YR196/ARDITO///YRM54 YRM54/YRM61 QUEST M201//YR196/ARDITO///YRM54 MILLIN YC 86003S-12-0 YC 01038-0-0-118-B YC 98140S-89-B YC 00018-0-0-6-S-10 YC 02035-B-2S-19 YC 99113S-57-B YC 92061-0-13 YC 99113-0-0-8-B YC 94014-1-59-B YC 97073S-93-0-B YC 97073S-93-0-12-B YC 02035-B-2S-1 YC 92175-1-30 YC 01088-0-0-42-B YC 92086-1-15 YC 84019-43-3 YC 99113-0-0-57-B YC 99113-0-0-19-B YC 86008-96-3 YC 96102-1-94-4-B YC 97053-1-112-7-B YC 99113-0-0-100-B YC 99009-0-0-47-B YC 01012-0-0-47-B YC 98039-0-44-9-B YC 99113-0-0-141-B YC 97073S-93-0-4-B YC 86008-96-3 YC 99113-0-0-84-B YC 82003-28-4 *YRA07=01-16* *YRA07=04-03* *YRA07=02-04* *YRA07=01-02* *YRA07=01-03* *YRA07=05-06* *YRA07=01-11* *YUA07=17-25* *YRA07=03-16* *YUA07=01-28* *YUA07=20-22* *YRA07=01-13* *YRA07=05-05* *YRB07=03-08* *YRB07=04-07* *YRB07=05-08* *YUB07=14-24* *YUB07=08-26* *YRB07=04-02* *YRB07=05-03* *YRB07=05-01* *YUB07=13-20* *YRB07=01-09* *YRB07=04-15* *YRB07=01-16* *YRD07=04-10* *YRB07=05-12* *YRD07=04-02* *YRB07=04-03* *YUE07=08-07* *YRA08=05-04* *YRA08=05-08* *YRA08=05-11* *YRA08=06-01* *YRA08=06-02* *YRA08=06-04* *YRA08=06-05* *YRA08=06-06* *YRA08=06-07* *YRA08=06-08* *YRA08=06-10* *YRA08=06-11* *YRA08=06-12* *YRB08=01-03* *YRB08=01-06* *YRB08=01-09* *YRB08=01-10* *YRB08=01-11* *YRB08=01-16* *YRB08=02-01* *YRB08=02-03* *YRB08=02-04* *YRB08=02-05* *YRB08=02-08* *YRB08=02-10* *YRB08=02-12* *YRB08=02-16* *YRB08=03-03* *YRB08=03-06* *YRE08=05-11* JARRAH YC 82003-14-2 CALHIKARI Y03/004 YRM42//BOGAN/M302 YC 97048S-28-0-B OPUS/MATSURIBARE YC 99009-0-0-14-B YRM64 YC 92061-0-59 AMAROO YC 79011S-0-32 YRM54//ECHUCA/SHIMUZI MOCHIYC 97086-0-43-B M204/YRM43 YC 94227-B-2S-13-6B-S-5 ILLABONG/YRM54 YC 96027-1-33-B M401/YRM42//YRM54 YC 01038-0-0-95-B YRM64/NORIN PL11 YC 01012-0-0-21-B OPUS YC 87332-27-7 KOSHIHIKARI YC 82003-28-4 YRF205/LANGI YC 98086-0-18-B DELLMONT/LANGI YC 98081-0-7-B DOONGARA/YRL38 YC 95096S-172-0-4-B LANGI///&(DAWN/K//IR579/K)/P//DOONGARA YC 02112-B-2S-4 YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-13 YRL34//INGA/M9(5)/PEL///DOONGARA YC 99225-0-0-11-B YRL125 YC 92108-1-11 YRL122/4/71011//M9/PEL//YRL29 YC 98090-0-251-B YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-31 PELDE/GOPALBHOG(4)/YR71048-10//YRL101 YC 95200-34-9 YRF205/LANGI YC 97074-1-38-B PELDE/GOPALBHOG(4)/YR71048-10//YRL101 YC 95200-47-3 71011//73/M7///PEL/4/YRL34/5/IR20 YC 99231-0-0-2-B YRM54/CN892-874/1 YC 98003-0-42-B INGA/L201//DOONGARA///L202 YC 00184-0-0-37-B KYEEMA YC 83110-3-4 M102/M103//YRM42/SR11327-22-3-2YC 02058-B-2S-4 *YRB07=02-12* *YRB07=05-11* *YRB07=02-15* *YRB07=03-17* *YRD07=04-11* *YRB07=04-13* *YRE07=12-02* *YUB07=02-21* *YRB07=03-10* *YRB07=05-09* *YRB07=02-07* *YRB07=02-01* *YRB07=03-04* *YRB07=05-10* *YRB07=04-05* *YRE07=17-02* *YUE07=05-02* *YRE07=14-02* *YUE07=05-03* *YRE07=11-02* *YRE07=22-05* *YRE07=22-01* *YRE07=18-02* *YUE07=15-01* *YRE07=22-03* *YUE07=11-04* *YUE07=01-12* *YRE07=09-05* *YRE07=20-02* *YRE07=23-04* *YRB08=03-08* *YRB08=03-09* *YRB08=03-12* *YRB08=03-13* *YRB08=04-01* *YRB08=04-02* *YRE08=05-07* *YRB08=04-03* *YRB08=04-06* *YRB08=04-07* *YRB08=04-10* *YRB08=04-11* *YRB08=04-16* *YRB08=05-02* *YRB08=05-11* *YRE08=01-01* *YRE08=01-11* *YRE08=02-01* *YRE08=05-06* *YRE08=02-02* *YRE08=02-03* *YRE08=02-12* *YRE08=02-09* *YRE08=05-08* *YRE08=02-10* *YRE08=02-11* *YRE08=02-05* *YRE08=05-05* *YRE08=03-03* *YRE08=03-06* DOONGARA/YRL38 YC 95096S-146-0-B DOONGARA/SERATUS MALAM YC 00128T-0-2-B YRL123 YC 91063-1-11 LANGI YC 82079-66-4 INGA/L201//DOONGARA///L202 YC 00184-0-0-15-B YRF205/LANGI YC 98086-0-18-B QUEST YC 86008-96-3 YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-2 YRF205/LANGI YC 98086-0-21-B INGA/L201//DOONGARA///L202 YC 00184-0-0-52-B DOONGARA YC 71048-111-9 YRL118 YC 89198J-1-1 YRF207/L202 YC 99092-0-0-24-B YRL125_S_CT18 YC 92108-1-11 LANGI/JOJUTLA 4 YC 94070-0-2-B YRM64/TOYONISHIKI YC 01013-0-0-58-B OPUS/4/M7/KITAKOGANE///M201//EIKO/H.NO.1 YC 00248-0-0-36-B YRM65 YC 92061-0-51 M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-9-B M103/YRM54 YC 99114S-38-B M103 YU87/001 YRM49//IR65600-42-5-2/MILLIN YC 01029-0-0-38-B M103/YRM54 YC 99114S-25-B OPUS//KOSHIHIKARI (T)/M202 YC 00002-0-0-18-S-5 M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-9-B HITOMEBORE//YRM39/AKITAKOMACHI YC 00056-0-0-98-S-1 YRM49///ILLABONG/YRM54 YC 02041-B-2S-5 M102//M201/BOGAN///YRM54 YC 99104S-9-B MILLIN YC 82003-28-4 JARRAH YC 82003-14-2 *YUE07=20-13* *YRE07=14-03* *YUE07=14-11* *YUE07=10-03* *YRE07=10-03* *YRE07=09-02* *YUE07=08-02* *YUE07=01-10* *YUE07=17-11* *YRE07=20-03* *YUE07=02-01* *YUE07=04-09* *YRJ07=04-06* *YUD07=02-14* *YUD07=02-19* *YUD07=01-16* *YRD07=04-03* *YUD07=05-19* *YRD07=04-01* *YRD07=02-05* *YUD07=14-22* *YRD07=01-11* *YRD07=01-07* *YRD07=03-04* *YRD07=02-08* *YRD07=05-12* *YRD07=05-08* *YUD07=08-15* *YRD07=02-02* *YRD07=02-07* *YRE08=03-05* *YRE08=03-09* *YRE08=05-03* *YRE08=03-10* *YRE08=03-11* *YRE08=04-01* *YRE08=04-07* *YRE08=04-09* *YRE08=04-10* *YRE08=04-11* *YUE08=02-14* *YUE08=02-18* *YUJ08=13-20* *YUD08=01-22* *YUD08=01-15* *YUD08=01-20* *YRD08=01-03* *YRD08=01-04* *YRD08=01-08* *YRD08=01-09* *YRD08=01-10* *YRD08=01-07* *YRD08=02-03* *YRD08=02-04* *YRD08=02-05* *YRD08=02-06* *YRD08=02-08* *YRD08=02-09* *YRD08=03-01* *YRD08=03-02* M103/YRM54 YC 99114S-11-B ECHUCA YC 81121DS M201/YRM3//BOGAN///OPUS YC 99226-0-0-33-10-B M103/YRM49 YC 99219S-7-B QUEST_CT19 YC 86008-96-3 YRM64/TOYONISHIKI YC 01013-0-0-47-B M103/HITOMEBORE YC 98052-0-51-2-S-11 YRM54//YRK4/KOSHIHIKARI (TYNAN) YC 02020-B-2S-5 OPUS//KOSHIHIKARI (T)/M202 YC 00002-0-0-18-S-8 YRM54/M202 YC 97027S-22-0-B MILLIN YC 82003-28-4 JARRAH YC 82003-14-2 SPRINT Y98/005 L205 Y03/008 YRL113 YC 89045J-0-17 YRL118 YC 89198J-1-1 YRL111 YC 89097-0-55 L202///BASMATI 370/PELDE//BASMATI YC 01102-0-0-40-B 370 YRL113//H263-9-1-1/YRL34 YC 00158-0-0-19-B LANGI/IR66167-27-5-1-6 YC 97181-0-28-8-B THAIBONNET/YRL101 YC 98073-0-28-4-S-6 BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-65-B L205 Y03/008 YRL118 YC 89198J-1-1 LANGI/IR 65600-38-1-2 YC 95341-1-95-4-B 213D.25/83//M7/IRR.ING///YRL38 YC 95146-2-0-6-B BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-35-B DELLMONT//BASMATI 370/PELDEYC 00210-0-24-S-16 BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-13-B LANGI/INGA//PELDE YC 99217-0-0-23-3-B *YRD07=02-04* *YRD07=05-11* *YRD07=01-03* *YRD07=01-04* *YRD07=03-05* *YUD07=03-22* *YRD07=03-07* *YRD07=03-08* *YUD07=06-17* *YRD07=02-03* *YRD07=01-05* *YUD07=03-20* *YRD07=03-10* *YRD07=05-04* *YRD07=05-02* *YUD07=11-14* *YRD07=01-10* *YUD07=05-18* *YRD07=03-11* *YRD07=05-07* *YRD07=01-02* *YRD07=05-10* *YRD07=02-10* *YRD07=03-02* *YRD07=02-06* *YRD08=03-04* *YRD08=03-05* *YRD08=03-06* *YRD08=03-07* *YRD08=03-10* *YRD08=04-01* *YRD08=04-02* *YRD08=04-03* *YRD08=04-04* *YRD08=04-05* *YRD08=04-08* *YRD08=04-09* *YRD08=05-01* *YRD08=05-03* *YRD08=05-05* *YRD08=05-06* *YRD08=05-07* *YRD08=05-09* *YRD08=05-10* *YRD08=06-01* *YRD08=06-02* *YRD08=06-04* *YRD08=06-06* *YRD08=06-09* *YRD08=06-10* LANGI/INGA//PELDE YC 99217-0-0-23-5-B LANGI/LAGRUE YC 95073-0-8-B (PELDE*2/CALROSE76)*2//DOONGARA YC 99248S-10-B YRL101//IR72/YRL39 YC 97098-0-151-B YRL122/THAIBONNET YC 98110-0-42-3-B M103/DOONGARA DH1 YC 03370DH-15 L203/YRL39//YRL101 YC 97107-0-23-14-B BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-72-B YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-46-B YRL113 YC 89045J-0-17 LANGI/IR 65600-134-2-2 YC 95342-1-1-7-B YRF205/LANGI YC 98086-0-4-B L202/DOONGARA YC 99247-0-0-17-B BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-84-B YRL101/4/YRL39///213D.25/YR83//M7/IRR.INGA YC 00153-0-0-25-B M103/DOONGARA DH1 YC 03370DH-24 YRB90 V31/YRL34 YC 90041J-1-24-B I/M9(5)/3/M101/73//P(2)/4/I/5/YRL101YC 99186-0-0-17-S-6 LANGI/IR65600.27.1.2.2.2 YC 97182A-0-2-4-B GULFMONT//YRL39/IR65597-134-2-2 YC 1080-0-0-24-B YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-42-B 71048.166//R/IR36///I/M9(5)/P YC 92164-3-B LANGI/IR66167-27-5-1-6 YC 97181-0-28-10-B L202///BASMATI 370/PELDE//BASMATI YC 01102-0-0-32-B 370 L203 YU85/001 Appendix 3. Target genes and sequence of gene-specific LR-PCR primers. No 1 Gene AGPS2b Fragments Length Primer sequence 5-3 (bp) Forward Reverse H1 3144 AATCTTGACCGCAGTGTCG AAGTGTTGCCTGTGCATTAC H2 3028 TTGTAATGCACAGGCAACAC ATCTCGACTGCCCAGTTAAG H1 3119 TGCTGGTTGGTCGTAATGTG AGCCTCATTCCAGCTTAACC 2 SPHOL H2 3362 CACTGTGCATTCCTGAGTTG TTGTCCATAGCTGCAGGTAG 3 GPT1 C 3940 TATCAGATTCCGAGGGCTTG GTTACCTTCCCACACCCAGA 4 GBSSI H1 2164 CCAACTAGCTCCACAAGATG CATTGGGCTGGTAGTTGTTC H2 2475 CCTTCCGGTTTGTTACTGAC CACACCCAGAAGAGTACAAC 5 GBSSII C 5405 CCATCGCATAGGATGAGTGA TGGAACACAACCCAGATGAA 6 SSI H1 3201 AGCCCGATCTAGAAGGTACG TGATAGGCTCAAACCTGATG H2 3888 ACATCAGGTTTGAGCCTATC GACACTTGACATCGCAGTAG H1 2681 AGAAGAAACGGCTACTCCTC TCATGCGCTTCATGATTTCC H2 2316 AGGAGACGCAAATTCCTTAG CATTGGTACTTGGCCTTGAC H1 2229 ACACCTCCGGCAGATCTTTC CAACAGCAGCTTTGCAGAAC H2 2322 TCTGCAAAGCTGCTGTTGAG ACTTCCACCGTTGCTCCTAC H1 3833 GGTTCTCAGTGTGGTGTTTG CATCCTTCGGAGTTCTTGTG H2 3666 GTCACCACAGGACAATATCG ACCCTGCATCTTAGGCTTAG H3 3863 GTTCCTGTCGAGTACAAGAG AGCCATAGTCCAGATGTAGC H1 3709 GGAGCCTTTCTTTCTCCTTC CTCCACTTGGGTTTCATGTC H2 3880 AGCAACCTTGGGTAGGAATG CAATGTAGAAGCCGGGATTG H1 4549 GTCCCGATACTGTTGTCTTG ATTGCCAGCAGACTACTTTC H2 4767 AGAAGGTGCTAGGTTTGTTG CTTAGCCACCCATTCTTCTC H1 4303 AATCGCCGCCGATTTCGAAG TTGCCAGGCGGAAGTCAAAC H2 4159 TCTGGGATAGTCGTCTGTTC TACTTGTCTGGTGCTAGGAG H1 1511 GGTGCTCTTCAGGAGGAAGG TACCTGCGGGTGAATCCAAG H2 1754 GCGCTGAAGGCATTACCTAC AGTTGAACAGGCGAGAATCC H1 5437 GGTAAATCGTCGTGATCTTC AAGGGAAGTAGCGATTAACG H2 5581 CATCATTGGATGTGGGATTC GATGTACAGAAGTGCAGAAC H1 4027 GTCAATTTCGCCGTCTACTC CAGTAGCCCTGAGAAATAGC H2 2979 GGTCAGAATGGAATGGAAAG TCATCTTTCGTCCACTCAAC H1 1256 GCGGCGGAAGAGTTGTAGCG GCTTCTGAGTCACCGGATGG H2 1369 CGTGGCATTGATAATTCCTC TCAGGGAACATGAAGGTAAC H1 5150 TAACCCAGATGGTCCTAGTC ACCAGTGGTCAACCTGTATG H2 4857 CTATTGGTTTCCAGCCTAGC CCTTACGGAGATGACAAAGC 33 109,067 7 SSIIa 8 SSIIb 9 10 SSIIIa SSIIIb 11 SSIVa 12 BEI 13 BEIIa 14 BEIIb 15 ISA1 16 ISA2 17 Total PUL H1= first half, H2= Second half, H3= third fragment, C= Complete fragment Appendix 4. SNP/Indel distribution and short read coverage pattern across candidate loci. Name of each gene indicated at the top. X-Y plotters (up sides): The X-axis indicates the length of sequenced area (genes) in kb and Y-axis shows the number of detected SNPs/Indes. The graphs show the distribution of SNPs across the gene (The values under zero must be regarded as zero). Graphics in the middle side show the relevant gene (Blue=introns and Yellow=exons) . Graphics in the down side (pink colour) show the coverage pattern of each gene. Appendix 5. The full list of breeding lines (studied population) and their pedigree information. barcode barcode 08 pedigree Cross *YRR07=01-01* *YRR08=01-03* ILLABONG/SARA YC 01008-0-0-56-B *YRR07=01-15* *YRR08=01-05* ILLABONG/VIALONE NANO-Y4 YC 01011-0-0-3-B *YUR07=08-19* *YRR08=01-07* ILLABONG///M102//M201/YRM3 YC 99160-0-0-4-10-B *YRR07=01-19* *YRR08=01-08* YRB4 YC 92013-1-108 *YRR07=03-14* *YRR08=01-08* YRB4 YC 92013-1-108 *YUR07=11-18* *YRR08=01-09* ILLABONG/4/YRB3///YRM2//M7/RINGO YC 00067-0-0-51-B *YUR07=02-17* *YRR08=01-10* ILLABONG/MILLIN YC 97205-0-22-B *YRR07=02-10* *YRR08=02-11* ILLABONG///YR83/M9//M7 YC 99159-0-0-29-B *YRR07=02-20* *YRR08=02-13* ILLABONG/VIALONE NANO-Y4 YC 01011-0-0-5-B *YRR07=02-16* *YRR08=02-16* ILLABONG///YR83/M9//M7 YC 99159-0-0-13-B *YRE07=10-02* *YRR08=02-19* YRM67 YC 94002-1-62 *YRR07=02-13* *YRR08=02-20* JARRAH YC 82003-14-2 *YRR07=04-17* *YRR08=02-20* JARRAH YC 82003-14-2 *YRR07=02-06* *YRR08=03-02* ILLABONG/YRM39 YC 94208-0-12-B *YRR07=02-07* *YRR08=03-09* M103///YRM34//YRM3/HUNG.NO.1 YC 00085-0-0-95-B *YUR07=02-16* *YRR08=04-02* ILLABONG/4/YRB3///YRM2//M7/RINGO YC 00067-0-0-46-B *YRR07=01-03* *YRR08=04-12* ILLABONG/IR65600-27-1-2-2 YC 97157-0-1-13-B *YRR07=01-08* *YRR08=07-01* YRB3/ARBORIO//MILLIN/WC1043 YC 00038-0-0-60-B *YRR07=01-05* *YRR08=07-06* ILLABONG YC 81116-1-5 *YRR07=03-11* *YRR08=07-06* ILLABONG YC 81116-1-5 *YRI07=04-08* *YRI08=01-01* 71048.200/H1//INGA///YRL113 YC 98023-0-44-B *YUD07=14-15* *YRI08=01-02* YRL118///INGA/M9//213D.25 YC 99046-0-0-11-S-7 *YRI07=01-04* *YRI08=01-04* 71048.200/H1//INGA///YRL113 YC 98023-0-63-B *YUD07=03-24* *YRI08=01-05* BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-53-S-6 *YRI07=01-03* *YRI08=01-06* YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-8-B *YRI07=02-05* *YRI08=01-07* YRL113/WAB450-11-1-P31-1-HB YC 00006-0-0-30-B *YRI07=01-02* *YRI08=01-08* YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-67-B *YRI07=03-08* *YRI08=02-01* BBL//M9/PELDE///YRL30/4/YRL101YC 98095-0-49-B *YRI07=03-03* *YUI07=08-12* *YUI07=14-10* *YRI07=02-03* *YRI07=02-08* *YRI07=02-04* *YRI07=01-07* *YRI07=03-02* *YUI07=12-12* *YUI07=12-10* *YRI07=04-05* *YUI07=06-10* *YRI07=02-02* *YRI07=01-06* *YRJ07=05-15* *YRJ07=01-08* *YRJ07=04-08* *YUJ07=09-22* *YRJ07=03-13* *YRJ07=03-12* *YRI07=04-06* *YRI07=04-03* *YRI07=04-01* *YRI07=03-07* *YRI07=03-01* *YRI07=01-08* *YRI07=02-07* *YRI07=01-01* *YUD07=07-19* *YRJ07=01-02* *YRI08=01-09* *YRI08=01-10* *YRI08=02-03* *YRI08=02-06* *YRI08=02-08* *YRI08=03-02* *YRI08=03-03* *YRI08=04-04* *YRI08=04-09* *YRI08=04-10* *YRI08=05-09* *YRI08=05-10* *YRI08=06-02* *YRI08=06-05* *YRJ08=02-33* *YRJ08=02-30* *YRJ08=03-28* *YRJ08=03-26* *YRJ08=03-35* *YRJ08=03-30* *YRI08=06-07* *YRI08=06-08* *YRI08=07-03* *YRI08=07-07* *YRI08=08-10* *YRI08=09-01* *YRI08=09-10* *YRI08=10-04* *YRI08=14-09* *YRJ08=01-13* YRL113///RD91V55//P/GO(4)/D.10 YC 98149-0-6-B YRL113/WAB450-11-1-P31-1-HB YC 00006-0-0-40-S-12 YRL113//L203/YRL34 YC 00064-0-0-82-S-6 BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-7-B YRL113 YC 89045J-0-17 71011//73/M7///P/4/YRL34/5/BBL//M9/P/4/YRL34 YC 99221-0-0-3-B 71048.200/H1//INGA///YRL113 YC 98023-0-74-B YRL113//L203/YRL34 YC 00064-0-0-3-B BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-53-S-9 YRL113//L203/YRL34 YC 00064-0-0-82-S-10 BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-114-B RIZABELL/YRL113 YC 99293-0-0-25-S-14 YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-69-B YRL113//L203/YRL34 YC 00064-0-0-147-B M103///M201//YR196/ARDITO YC 92243-0-11-B M103/OEIRAS YC 99089-0-38-B M103/YRK2 YC 94161S-2-0-10-B YRM49///ILLABONG/YRM54 YC 02041-B-2S-9 AKIHIKARI//KOSHIHIKARI (T)/YRK4 YC 00238A-0-0-42-B M103/YRM44 YC 95050S-5-0-B BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-10-B YRL113//L203/YRL34 YC 00064-0-0-29-B BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-65-B BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-12-B LANGI/LAGRUE//YRL113 YC 01087-0-0-22-B YRL101///PELDE/M9//M101 YC 94118-368-59-B BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-96-B YRL113//L203/YRL34 YC 00064-0-0-76-B L203//YRL101/LANGI YC 02056-B-2S-10 M103///M201//196/ARD/4/M103/YRM31 YC 99140-0-117-B *YRJ07=04-13* *YRJ07=03-09* *YUJ07=19-23* *YRJ07=05-08* *YRJ07=03-04* *YRJ07=04-01* *YRJ07=04-03* *YRJ07=02-14* *YRJ07=05-01* *YRJ07=01-20* *YRJ07=02-15* *YRJ07=04-02* *YRJ07=01-12* *YRJ07=01-07* *YRJ07=05-14* *YUJ07=19-25* *YRJ07=04-19* *YRJ07=01-18* *YRJ07=02-09* *YRJ07=03-15* *YRJ07=01-03* *YUR07=01-16* *YRJ07=05-19* *YRJ07=02-04* *YRJ07=02-19* *YRJ07=02-03* *YRJ07=04-09* *YRJ07=02-12* *YRJ07=05-06* *YRE07=10-04* *YRJ08=01-17* *YRJ08=01-20* *YRJ08=01-22* *YRJ08=01-24* *YRJ08=01-23* *YRJ08=01-27* *YRJ08=01-30* *YRJ08=01-32* *YRJ08=01-33* *YRJ08=01-35* *YRJ08=02-16* *YRJ08=02-19* *YRJ08=02-20* *YRJ08=02-21* *YRJ08=02-22* *YRJ08=02-24* *YRJ08=02-26* *YRJ08=02-27* *YRJ08=02-28* *YRJ08=02-35* *YRJ08=03-12* *YRJ08=03-13* *YRJ08=03-14* *YRJ08=03-15* *YRJ08=03-17* *YRJ08=03-18* *YRJ08=03-16* *YRJ08=03-25* *YRJ08=03-23* *YRA08=01-03* M103///M201//196/ARD/4/M103/YRM31 YC 99140-0-105-B M103/YRM54 YC 99114-0-9-1-B M103//M201/YRM3///IR65600-42-5-/MILLIN YC 00039-0-0--38-S-6 M103///M201/EIKO//CALROSE YC 92212-0-73-B M103/YRM54 YC 99114S-26-B M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-5-B ECHUCA/SHIMUZI MOCHI//MILLINYC 97035-0-309-B M103///M201//196/ARD/4/M103/YRM31 YC 99140-0-3-B M103/YRM54 YC 99114S-30-B M103/YRM54 YC 99114-0-12-B M103///M201//196/ARD/4/M103/YRM31 YC 99140-0-135-B M102/M103//M103 YC 97063-0-106-B ECHUCA/80023-TR166-2-1-4 YC 96041T-12-12-B M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-7-B M103///M201//196/ARD/4/M103/YRM31 YC 99140-0-50-B M103//M201/R///M201/YRM3//BOG.YC 98064-0-8-S-6 M103///M201//YR196/ARDITO YC 92243J-0-5-B M103/YRM18 YC 90019J-50-0-B M103//M401/CALROSE YC 96106-39-2-B M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-15-B M102/M103 YC 92076S-301-0-B VIALONE NANO Y01/008 M103///M201//YR196/ARDITO YC 92243J-0-11-B M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-30-B M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-23-B M104 Y03/009 JARRAH YC 82003-14-2 M103/YRM54 YC 99114-0-80-9-B LIMAN Y98/001 YRM64/NORIN PL11 YC 01012-0-0-9-B *YRA07=03-10* *YRE07=19-01* *YRE07=15-01* *YUA07=04-28* *YUA07=05-27* *YRA07=04-05* *YRA07=03-04* *YRA07=02-10* *YRA07=04-14* *YRA07=02-16* *YRA07=03-05* *YUA07=05-20* *YRA07=04-09* *YUA07=17-19* *YRA07=05-03* *YRA07=05-02* *YRA07=03-03* *YRA07=04-12* *YRA07=03-12* *YUA07=17-22* *YRA07=03-06* *YRA07=01-06* *YRA07=05-10* *YRE07=14-04* *YUA07=05-21* *YRA07=03-07* *YRA07=02-08* *YRA07=02-01* *YRA07=05-08* *YRA07=01-07* *YRA08=01-06* *YRA08=01-05* *YRA08=01-09* *YRA08=01-10* *YRA08=02-01* *YRA08=02-02* *YRA08=02-03* *YRA08=02-05* *YRA08=02-06* *YRA08=02-07* *YRA08=02-08* *YRA08=02-12* *YRA08=03-01* *YRA08=03-02* *YRA08=03-03* *YRA08=03-04* *YRA08=03-05* *YRA08=03-06* *YRA08=03-08* *YRA08=03-10* *YRA08=03-11* *YRA08=04-02* *YRA08=04-04* *YRA08=04-05* *YRA08=04-07* *YRA08=04-10* *YRA08=04-11* *YRA08=04-12* *YRA08=05-02* *YRA08=05-03* REIZIQ M401/YRM42//YRM54 NAMAGA///M201/YRM3//BOGAN KOSHIHIKARI(T)/M202//BOGAN YRM65//JARRAH/AMAROO M201//YR196/ARDITO///YRM54 PARAGON M201//YR196/ARDITO///YRM54 M7/M201//M103 YRM54/YRM61 YRM54/YRM61 YRM65//JARRAH/AMAROO YRM66 IR65600-42-5-2/MILLIN//YRM63 YRM68 NAMAGA M201//YR196/ARDITO///YRM54 M201//YR196/ARDITO///YRM54 QUEST_CT19 KOSHIHIKARI/M102//YRM43 YRK4/SR13925-13-1 M201//YR196/ARDITO///YRM54 OPUS/MATSURIBARE YRM64/NORIN PL11 M201/YRM3//BOGAN///YRM33 M201//YR196/ARDITO///YRM54 YRM54/YRM61 QUEST M201//YR196/ARDITO///YRM54 MILLIN YC 86003S-12-0 YC 01038-0-0-118-B YC 98140S-89-B YC 00018-0-0-6-S-10 YC 02035-B-2S-19 YC 99113S-57-B YC 92061-0-13 YC 99113-0-0-8-B YC 94014-1-59-B YC 97073S-93-0-B YC 97073S-93-0-12-B YC 02035-B-2S-1 YC 92175-1-30 YC 01088-0-0-42-B YC 92086-1-15 YC 84019-43-3 YC 99113-0-0-57-B YC 99113-0-0-19-B YC 86008-96-3 YC 96102-1-94-4-B YC 97053-1-112-7-B YC 99113-0-0-100-B YC 99009-0-0-47-B YC 01012-0-0-47-B YC 98039-0-44-9-B YC 99113-0-0-141-B YC 97073S-93-0-4-B YC 86008-96-3 YC 99113-0-0-84-B YC 82003-28-4 *YRA07=01-16* *YRA07=04-03* *YRA07=02-04* *YRA07=01-02* *YRA07=01-03* *YRA07=05-06* *YRA07=01-11* *YUA07=17-25* *YRA07=03-16* *YUA07=01-28* *YUA07=20-22* *YRA07=01-13* *YRA07=05-05* *YRB07=03-08* *YRB07=04-07* *YRB07=05-08* *YUB07=14-24* *YUB07=08-26* *YRB07=04-02* *YRB07=05-03* *YRB07=05-01* *YUB07=13-20* *YRB07=01-09* *YRB07=04-15* *YRB07=01-16* *YRD07=04-10* *YRB07=05-12* *YRD07=04-02* *YRB07=04-03* *YUE07=08-07* *YRA08=05-04* *YRA08=05-08* *YRA08=05-11* *YRA08=06-01* *YRA08=06-02* *YRA08=06-04* *YRA08=06-05* *YRA08=06-06* *YRA08=06-07* *YRA08=06-08* *YRA08=06-10* *YRA08=06-11* *YRA08=06-12* *YRB08=01-03* *YRB08=01-06* *YRB08=01-09* *YRB08=01-10* *YRB08=01-11* *YRB08=01-16* *YRB08=02-01* *YRB08=02-03* *YRB08=02-04* *YRB08=02-05* *YRB08=02-08* *YRB08=02-10* *YRB08=02-12* *YRB08=02-16* *YRB08=03-03* *YRB08=03-06* *YRE08=05-11* JARRAH YC 82003-14-2 CALHIKARI Y03/004 YRM42//BOGAN/M302 YC 97048S-28-0-B OPUS/MATSURIBARE YC 99009-0-0-14-B YRM64 YC 92061-0-59 AMAROO YC 79011S-0-32 YRM54//ECHUCA/SHIMUZI MOCHIYC 97086-0-43-B M204/YRM43 YC 94227-B-2S-13-6B-S-5 ILLABONG/YRM54 YC 96027-1-33-B M401/YRM42//YRM54 YC 01038-0-0-95-B YRM64/NORIN PL11 YC 01012-0-0-21-B OPUS YC 87332-27-7 KOSHIHIKARI YC 82003-28-4 YRF205/LANGI YC 98086-0-18-B DELLMONT/LANGI YC 98081-0-7-B DOONGARA/YRL38 YC 95096S-172-0-4-B LANGI///&(DAWN/K//IR579/K)/P//DOONGARA YC 02112-B-2S-4 YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-13 YRL34//INGA/M9(5)/PEL///DOONGARA YC 99225-0-0-11-B YRL125 YC 92108-1-11 YRL122/4/71011//M9/PEL//YRL29 YC 98090-0-251-B YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-31 PELDE/GOPALBHOG(4)/YR71048-10//YRL101 YC 95200-34-9 YRF205/LANGI YC 97074-1-38-B PELDE/GOPALBHOG(4)/YR71048-10//YRL101 YC 95200-47-3 71011//73/M7///PEL/4/YRL34/5/IR20 YC 99231-0-0-2-B YRM54/CN892-874/1 YC 98003-0-42-B INGA/L201//DOONGARA///L202 YC 00184-0-0-37-B KYEEMA YC 83110-3-4 M102/M103//YRM42/SR11327-22-3-2YC 02058-B-2S-4 *YRB07=02-12* *YRB07=05-11* *YRB07=02-15* *YRB07=03-17* *YRD07=04-11* *YRB07=04-13* *YRE07=12-02* *YUB07=02-21* *YRB07=03-10* *YRB07=05-09* *YRB07=02-07* *YRB07=02-01* *YRB07=03-04* *YRB07=05-10* *YRB07=04-05* *YRE07=17-02* *YUE07=05-02* *YRE07=14-02* *YUE07=05-03* *YRE07=11-02* *YRE07=22-05* *YRE07=22-01* *YRE07=18-02* *YUE07=15-01* *YRE07=22-03* *YUE07=11-04* *YUE07=01-12* *YRE07=09-05* *YRE07=20-02* *YRE07=23-04* *YRB08=03-08* *YRB08=03-09* *YRB08=03-12* *YRB08=03-13* *YRB08=04-01* *YRB08=04-02* *YRE08=05-07* *YRB08=04-03* *YRB08=04-06* *YRB08=04-07* *YRB08=04-10* *YRB08=04-11* *YRB08=04-16* *YRB08=05-02* *YRB08=05-11* *YRE08=01-01* *YRE08=01-11* *YRE08=02-01* *YRE08=05-06* *YRE08=02-02* *YRE08=02-03* *YRE08=02-12* *YRE08=02-09* *YRE08=05-08* *YRE08=02-10* *YRE08=02-11* *YRE08=02-05* *YRE08=05-05* *YRE08=03-03* *YRE08=03-06* DOONGARA/YRL38 YC 95096S-146-0-B DOONGARA/SERATUS MALAM YC 00128T-0-2-B YRL123 YC 91063-1-11 LANGI YC 82079-66-4 INGA/L201//DOONGARA///L202 YC 00184-0-0-15-B YRF205/LANGI YC 98086-0-18-B QUEST YC 86008-96-3 YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-2 YRF205/LANGI YC 98086-0-21-B INGA/L201//DOONGARA///L202 YC 00184-0-0-52-B DOONGARA YC 71048-111-9 YRL118 YC 89198J-1-1 YRF207/L202 YC 99092-0-0-24-B YRL125_S_CT18 YC 92108-1-11 LANGI/JOJUTLA 4 YC 94070-0-2-B YRM64/TOYONISHIKI YC 01013-0-0-58-B OPUS/4/M7/KITAKOGANE///M201//EIKO/H.NO.1 YC 00248-0-0-36-B YRM65 YC 92061-0-51 M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-9-B M103/YRM54 YC 99114S-38-B M103 YU87/001 YRM49//IR65600-42-5-2/MILLIN YC 01029-0-0-38-B M103/YRM54 YC 99114S-25-B OPUS//KOSHIHIKARI (T)/M202 YC 00002-0-0-18-S-5 M103///YRM3/HUNG.NO.1//M401/4/YRM54 YC 00268-0-0-9-B HITOMEBORE//YRM39/AKITAKOMACHI YC 00056-0-0-98-S-1 YRM49///ILLABONG/YRM54 YC 02041-B-2S-5 M102//M201/BOGAN///YRM54 YC 99104S-9-B MILLIN YC 82003-28-4 JARRAH YC 82003-14-2 *YUE07=20-13* *YRE07=14-03* *YUE07=14-11* *YUE07=10-03* *YRE07=10-03* *YRE07=09-02* *YUE07=08-02* *YUE07=01-10* *YUE07=17-11* *YRE07=20-03* *YUE07=02-01* *YUE07=04-09* *YRJ07=04-06* *YUD07=02-14* *YUD07=02-19* *YUD07=01-16* *YRD07=04-03* *YUD07=05-19* *YRD07=04-01* *YRD07=02-05* *YUD07=14-22* *YRD07=01-11* *YRD07=01-07* *YRD07=03-04* *YRD07=02-08* *YRD07=05-12* *YRD07=05-08* *YUD07=08-15* *YRD07=02-02* *YRD07=02-07* *YRE08=03-05* *YRE08=03-09* *YRE08=05-03* *YRE08=03-10* *YRE08=03-11* *YRE08=04-01* *YRE08=04-07* *YRE08=04-09* *YRE08=04-10* *YRE08=04-11* *YUE08=02-14* *YUE08=02-18* *YUJ08=13-20* *YUD08=01-22* *YUD08=01-15* *YUD08=01-20* *YRD08=01-03* *YRD08=01-04* *YRD08=01-08* *YRD08=01-09* *YRD08=01-10* *YRD08=01-07* *YRD08=02-03* *YRD08=02-04* *YRD08=02-05* *YRD08=02-06* *YRD08=02-08* *YRD08=02-09* *YRD08=03-01* *YRD08=03-02* M103/YRM54 YC 99114S-11-B ECHUCA YC 81121DS M201/YRM3//BOGAN///OPUS YC 99226-0-0-33-10-B M103/YRM49 YC 99219S-7-B QUEST_CT19 YC 86008-96-3 YRM64/TOYONISHIKI YC 01013-0-0-47-B M103/HITOMEBORE YC 98052-0-51-2-S-11 YRM54//YRK4/KOSHIHIKARI (TYNAN) YC 02020-B-2S-5 OPUS//KOSHIHIKARI (T)/M202 YC 00002-0-0-18-S-8 YRM54/M202 YC 97027S-22-0-B MILLIN YC 82003-28-4 JARRAH YC 82003-14-2 SPRINT Y98/005 L205 Y03/008 YRL113 YC 89045J-0-17 YRL118 YC 89198J-1-1 YRL111 YC 89097-0-55 L202///BASMATI 370/PELDE//BASMATI YC 01102-0-0-40-B 370 YRL113//H263-9-1-1/YRL34 YC 00158-0-0-19-B LANGI/IR66167-27-5-1-6 YC 97181-0-28-8-B THAIBONNET/YRL101 YC 98073-0-28-4-S-6 BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-65-B L205 Y03/008 YRL118 YC 89198J-1-1 LANGI/IR 65600-38-1-2 YC 95341-1-95-4-B 213D.25/83//M7/IRR.ING///YRL38 YC 95146-2-0-6-B BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-35-B DELLMONT//BASMATI 370/PELDEYC 00210-0-24-S-16 BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-13-B LANGI/INGA//PELDE YC 99217-0-0-23-3-B *YRD07=02-04* *YRD07=05-11* *YRD07=01-03* *YRD07=01-04* *YRD07=03-05* *YUD07=03-22* *YRD07=03-07* *YRD07=03-08* *YUD07=06-17* *YRD07=02-03* *YRD07=01-05* *YUD07=03-20* *YRD07=03-10* *YRD07=05-04* *YRD07=05-02* *YUD07=11-14* *YRD07=01-10* *YUD07=05-18* *YRD07=03-11* *YRD07=05-07* *YRD07=01-02* *YRD07=05-10* *YRD07=02-10* *YRD07=03-02* *YRD07=02-06* *YRD08=03-04* *YRD08=03-05* *YRD08=03-06* *YRD08=03-07* *YRD08=03-10* *YRD08=04-01* *YRD08=04-02* *YRD08=04-03* *YRD08=04-04* *YRD08=04-05* *YRD08=04-08* *YRD08=04-09* *YRD08=05-01* *YRD08=05-03* *YRD08=05-05* *YRD08=05-06* *YRD08=05-07* *YRD08=05-09* *YRD08=05-10* *YRD08=06-01* *YRD08=06-02* *YRD08=06-04* *YRD08=06-06* *YRD08=06-09* *YRD08=06-10* LANGI/INGA//PELDE YC 99217-0-0-23-5-B LANGI/LAGRUE YC 95073-0-8-B (PELDE*2/CALROSE76)*2//DOONGARA YC 99248S-10-B YRL101//IR72/YRL39 YC 97098-0-151-B YRL122/THAIBONNET YC 98110-0-42-3-B M103/DOONGARA DH1 YC 03370DH-15 L203/YRL39//YRL101 YC 97107-0-23-14-B BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-72-B YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-46-B YRL113 YC 89045J-0-17 LANGI/IR 65600-134-2-2 YC 95342-1-1-7-B YRF205/LANGI YC 98086-0-4-B L202/DOONGARA YC 99247-0-0-17-B BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-84-B YRL101/4/YRL39///213D.25/YR83//M7/IRR.INGA YC 00153-0-0-25-B M103/DOONGARA DH1 YC 03370DH-24 YRB90 V31/YRL34 YC 90041J-1-24-B I/M9(5)/3/M101/73//P(2)/4/I/5/YRL101YC 99186-0-0-17-S-6 LANGI/IR65600.27.1.2.2.2 YC 97182A-0-2-4-B GULFMONT//YRL39/IR65597-134-2-2 YC 1080-0-0-24-B YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-42-B 71048.166//R/IR36///I/M9(5)/P YC 92164-3-B LANGI/IR66167-27-5-1-6 YC 97181-0-28-10-B L202///BASMATI 370/PELDE//BASMATI YC 01102-0-0-32-B 370 L203 YU85/001 Appendix 6. Name and characteristics of SNPs genotyped in the rice population. No Gene SNP ID* Expected SNP G/T T/C G/T C/T C/A G/T G/T T/G A/C C/T G/A T/C G/T SNP Assayed† Association with Physiochemical traits Status TBGU388647 TBGI050742 TBGU168031 TBGU168032 TBGU168027 TBGU168024 TBGU168039 WAXYEXIN1 WAXYEX6 WAXYEX10 GBSSII_GA_1638 TBGU272768 SSIIa_GA_Ref631 Coordinates on gDNA 233 1507 2501 2920 1001 176 5514 246 2494 3486 1638 5153 631 1 2 3 4 5 6 7 8 9 10 11 12 13 AGPS2b AGPS2b SPHOL SPHOL SPHOL SPHOL SPHOL GBSSI GBSSI GBSSI GBSSII SSI SSIIa G/G T/T G/G C/C C/C G/G G/G T/G A/C C/T G/A T/C G/T N/A N/A N/A N/A N/A N/A N/A P1,BD,FV,SB,MT,AC,PN,Dif SB,BD,MT,AC T1,FV,SB,MT,AC,PN,Dif PT, GT FV,SBV,MT N/A No polymorphism No polymorphism No polymorphism No polymorphism No polymorphism No polymorphism No polymorphism Highly associated Highly associated Highly associated Low-Medium association Low-Medium association No association 14 15 16 17 SSIIa SSIIb SSIIb SSIIb ALKSSIIA4 TBGU116115 TBGU116120 TBGU116121 4827-4828 3416 3948 3979 GC/TT A/G G/C T/C GC/TT A/A G/G T/T BDV,SB,PKT,PT,GT,CHK N/A N/A N/A Highly associated No polymorphism No polymorphism No polymorphism 18 19 20 21 SSIIb SSIIb SSIIb SSIIIa TBGU116109 TBGU116119 TBGU116116 GA_Ref1058 330 3946 3487 1058 G/A C/T T/G T/A A/A C/C T/T T/A N/A N/A N/A PT,MT, No polymorphism No polymorphism No polymorphism Low-Medium association 22 23 24 SSIIIa SSIIIa SSIIIa GA_Ref1680 GA_Ref3136 GA_Ref3391 1680 3136 3391 G/A G/A T/A G/A G/A T/A SBV,PT,MT,AC,PN,Dif,GT N/A N/A Low associated No association No association 25 26 27 28 29 30 SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa SSIIIa GA_Ref3559 GA_Ref4384 GA_Ref1379 GA_Ref1708 GA_Ref3274 GA_Ref6242 3559 4384 1379 1708 3274 6242 T/A G/A A/C G/A G/A T/C T/A G/A A/C G/A G/A T/C CHK N/A FV,SBV,PT,MT,AC,PN,Dif MT,AC,PN,Dif,GT N/A N/A Low association No association Low-Medium association Low-Medium association No association No association 31 32 33 SSIIIa SSIIIa SSIIIa GA_Ref1457 GA_Ref1615 GA_Ref1834 1457 1615 1834 A/C C/T C/T A/C C/T C/T N/A N/A N/A No association No association No association 34 35 SSIIIa SSIIIa GA_Ref2758 GA_Ref1722ER 2758 1722 G/A G/A G/A G/A N/A FV,SBV,PT,MT,AC,PN,Dif,GT No association Low-Medium association 36 37 38 SSIIIa SSIIIa SSIIIa GA_Ref2488 GA_Ref3073 GA_Ref1357 2488 3073 1357 C/T G/A G/A C/T G/A G/A N/A N/A MT No association No association No association 39 40 SSIIIa SSIIIa GA_Ref2080 GA_Ref3481 2080 3481 C/T G/A C/T G/A N/A N/A No association No association 41 42 SSIIIa SSIIIa GA_Ref5466 GA_Ref10761 5466 10761 G/A C/T G/A C/T FV,SBV,PT,MT,AC,PN,Dif PT Low-Medium association Low association 43 SSIIIb GA_Ref1315 1315 T/C T/C PT Medium association 44 45 SSIIIb SSIIIb GA_Ref4543 GA_Ref5451 4543 5451 C/A T/C C/A T/C PT PT Medium association Medium association 46 47 48 49 SSIIIb SSIIIb SSIIIb SSIVa GA_Ref7232 GA_Ref7255ER GA_Ref7437 GA_Ref4048 3232 7255 7437 4048 T/G C/A A/C C/T T/G C/A A/C C/T PT PKV PT, Dif PT,GT Medium-High association Medium association Low-Medium association Low-Medium association 50 51 SSIVa SSIVa GA_Ref7160 GA_Ref7506 7160 7506 A/G A/T A/G A/T PKT,PT,AC,PN,GT PT,GT Low-Medium association Low-Medium association 52 SSIVa GA_Ref7823 7823 T/C T/C PT,GT Low-Medium association 53 54 55 56 57 SSIVa SSIVb SSIVb BEI BEIIa GA_Ref8383 TBGU260749 TBGU260765 GA_Ref1558 GA_Ref3266 8383 5090 9525 1558 3266 C/A G/C G/A C/T T/G C/A G/G G/G C/T T/G PT,GT N/A N/A PV,BDV,FV,SBV,PT,MT,AC,PN,Dif N/A Medium association No polymorphism No polymorphism Low-Medium association No association 58 BEIIb GA_Ref9035 9035 C/T C/T N/A No association 59 60 BEIIb ISA1 GA_Ref10068 TBGU362347 10068 1748 C/A G/A C/A G/G N/A N/A No association No polymorphism 61 62 63 64 65 ISA1 ISA2 ISA2 Pullulanase Pullulanase TBGU362346 Iso2_GA_Ref960 Iso2_GA_Ref1712 TBGU185983 TBGU185989 1746 960 1712 1938 2380 C/G T/C C/A G/A T/C C/C T/C C/A G/A T/C N/A BDV, PT, CHK BDV, PT, CHK PT, GT CHK No polymorphism Low association Low association Low association Low association *SNP identification can be found from Kharabian-Masouleh et al., 2011 (starting with GA code) or OryzaSNP MSU database (http://oryzasnp.plantbiology.msu.edu/) starting with TBG or TBU codes. †Homozygosity of SNP calls mean no polymorphism in the corresponding allele. MT=Martin test (retrogradation), PN=Predicted N, Dif=Difference, CHK=Chalkiness (%). Appendix 7. The results of association study among 13 different physiochemical traits and SNPs of 18 different genes. The most important columns are F-test and R2 Marker. Trait Locus/SNP df_MarkerF-test AGPS2b Section 1 No Functional Polymorphism found in this gene SPHOL Section 2 No Functional Polymorphism found in this gene GBSSI Section 3 Peak1 WAXYEXIN1 2 34.346 Trough1 WAXYEX10 1 36.9498 Breakdown WAXYEXIN1 2 35.1893 Breakdown WAXYEX10 1 18.9223 Final Viscosity WAXYEXIN1 2 15.0534 Final Viscosity WAXYEX10 1 106.0684 Setback WAXYEXIN1 2 76.2739 Setback WAXYEX10 1 59.8068 Martin_N WAXYEXIN1 2 223.2942 Martin_N WAXYEX10 1 147.7825 Martin_N WAXYEX6 1 16.8014 AC_percent WAXYEXIN1 2 121.5295 AC_percent WAXYEX10 1 44.0661 AC_percent WAXYEX6 1 16.2252 predicted_N WAXYEXIN1 2 121.5429 predicted_N WAXYEX10 1 43.967 predicted_N WAXYEX6 1 16.3841 diff WAXYEXIN1 2 54.612 diff WAXYEX10 1 97.1222 GBSSII Section 4 Past_temp GBSSII_GA_Ref1638 2 27.8519 GT GBSSII_GA_Ref1638 2 9.7254 SSI Section 5 Trough1 SSI_TBGU272768_5153 1 14.2713 FinalVisc SSI_TBGU272768_5153 1 43.6138 Setback SSI_TBGU272768_5153 1 28.8805 Martin_N SSI_TBGU272768_5153 1 45.7145 AC_percent SSI_TBGU272768_5153 1 20.5891 SSI_TBGU272768_5153 1 20.4244 predicted_N diff SSI_TBGU272768_5153 1 22.1635 SSIIa Section 6 Breakdown ALKSSIIA4 2 22.4536 p-value #perm_Marker p-perm_Marker p-adjusted df_Model value df_Error MS_Error 9.99E-04 9.99E-04 9.99E-04 0.002 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 0.028 9.99E-04 9.99E-04 0.026 9.99E-04 9.99E-04 0.023 9.99E-04 9.99E-04 R2_ModelR2_Marker 8.87E-14 4.99E-09 4.64E-14 2.05E-05 7.18E-07 1.06E-20 3.89E-26 3.26E-13 1.29E-54 1.37E-26 5.76E-05 9.63E-37 2.26E-10 7.64E-05 9.56E-37 2.36E-10 7.07E-05 3.92E-20 2.44E-19 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 2 1 2 1 2 1 2 1 2 1 1 2 1 1 2 1 1 2 1 230 230 230 230 230 230 230 230 230 230 230 230 230 230 230 230 230 230 230 38700.419 16578.651 33032.8292 39440.6237 36055.0683 27888.2408 41231.9601 54241.7925 0.041 0.0733 0.0974 1.2099 2.0869 2.2559 0.0199 0.0344 0.0372 0.042 0.0436 0.23 0.1384 0.2343 0.076 0.1157 0.3156 0.3988 0.2064 0.6601 0.3912 0.0681 0.5138 0.1608 0.0659 0.5138 0.1605 0.0665 0.322 0.2969 0.23 0.1384 0.2343 0.076 0.1157 0.3156 0.3988 0.2064 0.6601 0.3912 0.0681 0.5138 0.1608 0.0659 0.5138 0.1605 0.0665 0.322 0.2969 1.67E-11 9.57E-05 1000 9.99E-04 9.99E-04 1000 0.004 9.99E-04 2 2 219 188 15.5428 48.9948 0.2028 0.0938 0.2028 0.0938 2.02E-04 2.80E-10 1.91E-07 1.14E-10 9.23E-06 9.98E-06 4.36E-06 1000 9.99E-04 9.99E-04 1000 9.99E-04 9.99E-04 1000 9.99E-04 9.99E-04 1000 0.011 9.99E-04 1000 0.009 9.99E-04 1000 0.012 9.99E-04 1000 0.004 9.99E-04 1 1 1 1 1 1 1 227 227 227 227 227 227 227 16583.9218 28064.635 54692.8998 0.0718 2.1071 0.0347 0.042 0.0592 0.1612 0.1129 0.1676 0.0832 0.0825 0.089 0.0592 0.1612 0.1129 0.1676 0.0832 0.0825 0.089 1.32E-09 1000 9.99E-04 9.99E-04 2 222 36814.9988 0.1682 0.1682 Setback ALKSSIIA4 PeakTime ALKSSIIA4 Past_temp ALKSSIIA4 GT ALKSSIIA4 Chalk% ALKSSIIA4 SSIIb Section 7 No Functional Polymorphism found in this gene SSIIIa Section 8 FinalVisc SSIIIa_GA_Ref1379 FinalVisc SSIIIa_GA_Ref1722ER FinalVisc SSIIIa_GA_Ref5466 Setback SSIIIa_GA_Ref1680 Setback SSIIIa_GA_Ref1379 Setback SSIIIa_GA_Ref1722ER Setback SSIIIa_GA_Ref5466 Past_temp SSIIIa_GA_Ref1058 Past_temp SSIIIa_GA_Ref1680 Past_temp SSIIIa_GA_Ref1379 Past_temp SSIIIa_GA_Ref1722ER Past_temp SSIIIa_GA_Ref10761 Past_temp SSIIIa_GA_Ref5466 Martin_N SSIIIa_GA_Ref1058 Martin_N SSIIIa_GA_Ref1680 Martin_N SSIIIa_GA_Ref1379 Martin_N SSIIIa_GA_Ref1708 Martin_N SSIIIa_GA_Ref1722ER Martin_N SSIIIa_GA_Ref1357 Martin_N SSIIIa_GA_Ref5466 AC_percent SSIIIa_GA_Ref1680 AC_percent SSIIIa_GA_Ref1379 AC_percent SSIIIa_GA_Ref1708 AC_percent SSIIIa_GA_Ref1722ER AC_percent SSIIIa_GA_Ref5466 predicted_N SSIIIa_GA_Ref1680 predicted_N SSIIIa_GA_Ref1379 predicted_N SSIIIa_GA_Ref1708 predicted_N SSIIIa_GA_Ref1722ER predicted_N SSIIIa_GA_Ref5466 diff SSIIIa_GA_Ref1680 diff SSIIIa_GA_Ref1379 2 5.7025 0.0038 2 53.0867 1.44E-19 2 199.6523 2.45E-50 2 32.806 5.55E-13 2 8.9273 1.87E-04 1000 0.007 0.0659 1000 9.99E-04 9.99E-04 1000 9.99E-04 9.99E-04 1000 0.004 9.99E-04 1000 0.005 9.99E-04 2 2 2 2 2 222 222 222 192 222 67276.7238 0.0136 6.9583 40.2842 91.0399 0.0489 0.3235 0.6427 0.2547 0.0744 0.0489 0.3235 0.6427 0.2547 0.0744 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 222 226 225 225 222 226 225 224 225 222 226 218 225 224 225 222 221 226 223 225 225 222 221 226 225 225 222 221 226 225 225 222 37968.6918 38124.2411 38444.9081 57537.4747 62943.584 63363.2621 64066.5547 17.9321 18.1252 17.9153 17.9836 18.0475 17.7556 0.0778 0.0727 0.0994 0.0766 0.1035 0.1158 0.104 2.1222 2.2508 2.1493 2.298 2.2906 0.035 0.0371 0.0354 0.0379 0.0378 0.0423 0.0552 0.0753 0.0723 0.0736 0.0655 0.0948 0.0821 0.0745 0.0722 0.0622 0.0667 0.0763 0.062 0.0722 0.1058 0.1564 0.2002 0.1301 0.1546 0.0639 0.1536 0.084 0.1136 0.0779 0.0864 0.0902 0.0837 0.1135 0.0777 0.0862 0.09 0.0787 0.1305 0.0753 0.0723 0.0736 0.0655 0.0948 0.0821 0.0745 0.0722 0.0622 0.0667 0.0763 0.062 0.0722 0.1058 0.1564 0.2002 0.1301 0.1546 0.0639 0.1536 0.084 0.1136 0.0779 0.0864 0.0902 0.0837 0.1135 0.0777 0.0862 0.09 0.0787 0.1305 9.0413 8.8028 8.9423 7.8821 11.6269 10.1037 9.0543 8.7158 7.4574 7.9273 9.3315 7.2026 8.756 13.2478 20.8545 27.7893 16.5211 20.6652 7.6136 20.4182 10.3167 14.2201 9.3351 10.6866 11.1556 10.2716 14.2099 9.3091 10.6615 11.1281 9.6109 16.6551 1.68E-04 2.08E-04 1.83E-04 4.91E-04 1.58E-05 6.27E-05 1.65E-04 2.26E-04 7.31E-04 4.73E-04 1.28E-04 9.35E-04 2.18E-04 3.65E-06 4.91E-09 1.70E-11 2.06E-07 5.73E-09 6.33E-04 7.10E-09 5.17E-05 1.55E-06 1.28E-04 3.68E-05 2.40E-05 5.38E-05 1.56E-06 1.31E-04 3.76E-05 2.46E-05 9.88E-05 1.82E-07 0.013 0.006 0.01 0.003 0.002 9.99E-04 0.015 9.99E-04 9.99E-04 0.002 9.99E-04 0.002 9.99E-04 0.006 0.002 0.002 0.002 9.99E-04 0.03 0.003 0.002 9.99E-04 0.003 0.002 0.006 0.003 9.99E-04 0.007 0.004 0.006 0.005 0.007 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 diff diff diff GT GT GT Chalk% SSIIIb Peak Viscosity Past_temp Past_temp Past_temp Past_temp Past_temp Past_temp diff SSIVa PeakTime Past_temp Past_temp Past_temp Past_temp Past_temp AC_percent predicted_N GT GT GT GT GT SSIVb SSIIIa_GA_Ref1708 SSIIIa_GA_Ref1722ER SSIIIa_GA_Ref5466 SSIIIa_GA_Ref1680 SSIIIa_GA_Ref1708 SSIIIa_GA_Ref1722ER SSIIIa_GA_Ref3559 Section 9 SSIIIb_GA_Ref7255ER SSIIIb_GA_Ref4543 SSIIIb_GA_Ref5451 SSIIIb_GA_Ref1315 SSIIIb_GA_Ref7232 SSIIIb_GA_Ref7255ER SSIIIb_GA_Ref7437 SSIIIb_GA_Ref7437 Section 10 SSIva_GA_Ref7160 SSIva_GA_Ref4048 SSIva_GA_Ref7160 SSIva_GA_Ref7823 SSIva_GA_Ref8383 SSIva_GA_Ref7506 SSIva_GA_Ref7160 SSIva_GA_Ref7160 SSIva_GA_Ref4048 SSIva_GA_Ref7160 SSIva_GA_Ref7823 SSIva_GA_Ref8383 SSIva_GA_Ref7506 Section 11 SSIvb_TBGU260749_5090 SSIvb_TBGU260765_9525 BEI Section 12 Peak Viscosity BEI_GA_Ref1558 Breakdown Viscosity BEI_GA_Ref1558 FinalViscosity BEI_GA_Ref1558 Setback ViscosityBEI_GA_Ref1558 Past_temp BEI_GA_Ref1558 Martin_N BEI_GA_Ref1558 2 2 2 2 2 2 2 7.166 12.0666 12.3189 10.0271 30.2791 15.2535 8.9878 9.65E-04 1.05E-05 8.38E-06 7.21E-05 3.99E-12 7.07E-07 1.83E-04 1000 0.012 1000 0.003 1000 0.005 1000 0.024 1000 0.006 1000 0.013 1000 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 2 2 2 2 2 2 2 221 226 225 192 188 193 201 0.044 0.0566 0.0566 48.6731 40.6703 46.254 95.3296 0.0609 0.0965 0.0987 0.0946 0.2436 0.1365 0.0821 0.0609 0.0965 0.0987 0.0946 0.2436 0.1365 0.0821 2 2 2 2 1 2 2 2 7.7442 21.3553 23.0673 25.0653 41.4018 29.1937 21.0809 7.338 5.64E-04 7.21E-09 8.32E-10 1.55E-10 5.87E-09 5.91E-12 4.03E-09 8.17E-04 1000 1000 1000 1000 1000 1000 1000 1000 0.002 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 0.01 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 2 2 2 2 1 2 2 2 217 147 216 221 90 217 226 226 47808.4024 15.5668 15.4824 15.4957 13.8698 15.0332 16.2001 0.0591 0.0666 0.2251 0.176 0.1849 0.3151 0.212 0.1572 0.061 0.0666 0.2251 0.176 0.1849 0.3151 0.212 0.1572 0.061 2 2 2 2 2 2 2 2 2 2 2 2 2 10.7899 27.6864 39.5053 30.2856 30.8007 29.3874 9.1222 9.077 8.5371 19.7873 10.209 10.4426 10.6137 3.35E-05 1.82E-11 1.97E-15 2.41E-12 1.73E-12 4.41E-12 1.55E-04 1.62E-04 2.82E-04 1.54E-08 6.18E-05 5.05E-05 4.21E-05 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 0.002 0.019 9.99E-04 0.015 0.01 0.012 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 9.99E-04 2 2 2 2 2 2 2 2 2 2 2 2 2 225 223 225 220 215 228 225 225 190 192 188 185 195 0.0183 15.3704 14.1948 15.0459 15.2886 15.3101 2.3238 0.0383 49.075 44.2873 48.1588 48.9202 48.0344 0.0875 0.1989 0.2599 0.2159 0.2227 0.205 0.075 0.0747 0.0825 0.1709 0.098 0.1014 0.0982 0.0875 0.1989 0.2599 0.2159 0.2227 0.205 0.075 0.0747 0.0825 0.1709 0.098 0.1014 0.0982 1000 0.006 9.99E-04 1000 9.99E-04 9.99E-04 1000 0.013 9.99E-04 1000 9.99E-04 9.99E-04 1000 9.99E-04 9.99E-04 1000 0.004 9.99E-04 2 2 2 2 2 2 221 221 221 221 221 221 46937.7885 39184.4825 37058.8842 53674.1793 18.1687 0.0951 0.0796 0.092 0.1054 0.2255 0.0708 0.2383 0.0796 0.092 0.1054 0.2255 0.0708 0.2383 No polymorphism detected 2 2 2 2 2 2 9.5546 11.2003 13.0129 32.1812 8.4131 34.5608 1.05E-04 2.33E-05 4.54E-06 5.42E-13 3.01E-04 8.71E-14 AC_percent predicted_N diff BEIIa Iso1 Iso2 Breakdown Breakdown Past_temp Past_temp Chalk% Chalk% Pullulanase Past_temp GT Chalk% BEI_GA_Ref1558 BEI_GA_Ref1558 BEI_GA_Ref1558 Section 13 BEIIa_GA_Ref3266 Section 14 BEIIb_GA_Ref9035 BEIIb_GA_Ref10068 Section 15 TBGU362347_1748ER TBGU362346_1746EF Section 16 Iso2_GA_Ref1712 Iso2_GA_Ref960 Iso2_GA_Ref1712 Iso2_GA_Ref960 Iso2_GA_Ref1712 Is02_GA_Ref960 Section 17 Pullu_TBGU185983_1938 Pullu_TBGU185983_1938 Pullu_TBGU185989_2380 2 2 2 38.8652 3.44E-15 39.1031 2.89E-15 8.6861 2.34E-04 1000 9.99E-04 9.99E-04 1000 9.99E-04 9.99E-04 1000 0.005 9.99E-04 2 2 2 221 221 221 1.8804 0.031 0.0592 0.2602 0.2614 0.0729 0.2602 0.2614 0.0729 No significant association with starch properties was observed for this SNP No significant association with starch properties was observed for this SNP No polymorphism detected 2 2 2 2 2 2 2 2 2 8.2378 9.0028 7.8355 7.8341 7.2855 8.2391 3.66E-04 1.75E-04 5.31E-04 5.18E-04 8.85E-04 3.55E-04 1000 0.002 9.99E-04 1000 0.002 9.99E-04 1000 9.99E-04 9.99E-04 1000 9.99E-04 9.99E-04 1000 9.99E-04 9.99E-04 1000 0.002 9.99E-04 2 2 2 2 2 2 198 219 198 219 198 219 39338.6699 38822.3178 17.8716 18.0195 100.3691 96.6328 0.0768 0.076 0.0733 0.0668 0.0685 0.07 0.0768 0.076 0.0733 0.0668 0.0685 0.07 23.5989 5.05E-10 19.1496 2.63E-08 7.5266 6.96E-04 1000 9.99E-04 9.99E-04 1000 9.99E-04 9.99E-04 1000 0.002 9.99E-04 2 2 2 223 191 211 15.7171 23.0156 96.2102 0.1747 0.167 0.0666 0.1747 0.167 0.0666 Appendix 8: The linkage map of 17 starch-related genes, showing the approximate location of gene on chromosomes (Chr). The red lines show the exact location of gene on chromosomes. SSIVa SSIIb BEIIb SPHOL Pullulanase BEIIa SSIIIb ISA2 GBSSI SSI SSIIa BEI GBSSII GPT1 SSIIIa ISA1 AGPS2b
© Copyright 2026 Paperzz