Characterisation of starch traits and genes in Australian rice

Southern Cross University
ePublications@SCU
Theses
2013
Characterisation of starch traits and genes in
Australian rice germplasm
Ardashir Kharabian Masouleh
Southern Cross University
Publication details
Kharabian Masouleh, A 2013, 'Characterisation of starch traits and genes in Australian rice germplasm', PhD thesis, Southern Cross
University, Lismore, NSW.
Copyright A Kharabian Masouleh 2013
ePublications@SCU is an electronic repository administered by Southern Cross University Library. Its goal is to capture and preserve the intellectual
output of Southern Cross University authors and researchers, and to increase visibility and impact through open access to researchers around the
world. For further information please contact [email protected].
Characterisation of starch traits and genes in Australian rice germplasm
Ardashir Kharabian Masouleh (B.Sc, M.Sc)
A thesis submitted to Southern Cross University in fulfillment of the requirements for
the degree of Doctor of Philosophy
Southern Cross Plant Science
Southern Cross University
Lismore, NSW Australia
March 2013
i
Statement of originality
I certify that the work presented in this thesis is, to the best of my knowledge and belief,
original, except as acknowledged in the text, and that the material has not been submitted,
either in whole or in part, for a degree at this or any other university. I acknowledge that I
have read and understood the Universities rules, requirements, procedures and policy
relating to my higher degree research award and to my thesis. I certify that I have
complied with the rules, requirements, procedures and policy of the University.
Ardashir Kharabian Masouleh
March 2013
ii
Acknowledgements
First, my great gratitude to my principal supervisors, Robert J Henry, Daniel LE Waters and
Russell F. Reinke for allowing me to undertake this project at the Southern Cross Plant
Science. I would also like to thank them for their direction and endless support during this
PhD project.
Next I would like to thank my other supervisors in the centre, Graham King and Michael
Heinrich, for their help, thoughts and valuable suggestions throughout my candidature.
Thanks to the many people who have been of great help in the lab, Stirling Bowen, Peter
Bundock, Timothy Sexton and everyone else who helped me out learning various techniques.
Thanks to all in the post grad room and beyond, especially Tiffeny Byrnes and Cathy Nock
who have been great support as Lab manager and administration.
And last but not least thanks to my family, especially my wife Shiva for the endless support
during these four and a half years. I could not have this big commitment done without my
family support.
iii
Abstract
Starch is a major component of human diets. The physio-chemical properties of starch
influence the nutritional value of starch and the functional properties of starch containing
foods. Many of these traits have been under strong selection in domestication of rice as a
food. A population of 233 breeding lines of rice was analysed for variation in 17 rice starch
synthesis genes, encoding seven classes of enzymes, including ADP-glucose
pyrophosphorylase (AGPases), granule starch synthases (GBSS), soluble starch synthase
(SS), starch branching enzyme (BE), starch debranching enzyme (DBE) and starch
phosphorylase (SPHOL) and phosphate translocator (GPT1). This approach employed semito long-range PCR (LR-PCR) followed by next-generation sequencing technology. The
amplification products were equimolarly pooled and sequenced using massively parallel
sequencing technology (MPS). SNP/Indels in both coding and non-coding were identified
and the distribution patterns among individual starch candidate genes characterized.
Approximately, 60.9 million reads were generated, of which 54.8 million (90%) mapped to
the reference sequences. The coverage rate ranged from 12,708× to 38,300× for SSIIa and
SSIIIb, respectively. SNPs and single/multiple-base Indels were analysed in a total assembled
length of 116,403 bp. In total, 501 SNPs, of which 110 were non-synonomous/ fuctional, and
113 Indels were detected across the 17 starch related loci. Five genes AGPL2a, Isoamylase1,
SPHOL, SSIIb an SSIVb showed no polymorphism. The ratio of synonymous to nonsynonymous SNPs (Ka/Ks) test suggested GBSSI and Isoamylase 1 (ISA1) are the least
diversified (most purified) and conservative genes as the studied populations have been
through several cycles of selection for low amylose content and gelatinization temperature.
The 110 functional SNP loci were analysed for associations with rice pasting and cooking
quality. Associations of 65 functional SNPs with starch traits were detected. The GBSSI
(waxy gene) and SSIIa had a major influence on starch properties and the other genes had
iv
minor associations. The ´G/T´ SNP at the boundary site of exon/intron1 in GBSSI showed the
strongest association with retrogradation and amylose content. The TT allele has been
selected in much of the domesticated japonica genepool providing rice with a desirable
texture but less resistant starch with associated human health advantages. The GC/TT SNP at
exon 8 of SSIIa showed a very significant association with pasting temperature (PT),
gelatinization temperature (GT) and peak time. No significant association was found
between SSIIa and retrogradation. Other genes contributing to retrogradation were SSI, BEI
and SIIIa. The highest level of polymorphism was observed in SSIIIa with 22 SNPs but only
limited associations were observed with starch phenotypic values. None of the SNP were
found to be strongly associated with chalkiness except for a weak link with a ´T/C´ SNP at
position 960 (Thr482 to Ala) in Isoamylase2. These associations provide new tools for
deliberate selection of rice genotypes for specific functional and nutritional outcomes.
Resistant-retrograded starch is widely associated with human health. The highly retrograded
starches of cereals usually have a lower glycemic index (GI) which may be beneficial in
many human diets. The data reported here suggests 6 glucose-phosphate translocator (GPT1)
an enzyme early in the biochemical pathway of starch synthesis, has a major influence on
resistant starch production in rice. A ´T/C´ SNP at position 1188 of the GPT1 encoding gene,
alters Leu24 to Phe, and is highly associated with resistant-retrograded starch and amylose
content. The ´T´ and ´C´ alleles produce high and low levels of retrograded starch,
respectively. An association study of 233 genotypes demonstrated a highly significant
correlation (R2) of 0.57 and 0.36 (P=0.00099) between this SNP and retrogradation degree
and apparent amylose content, respectively. Haplotype and association analysis of this SNP
and another ´G/T´ SNP at the boundary site of exon/intron1 in GBSSI encoding gene explains
most of the variability of retrogradation degree and amylose content in the rice population.
v
These two SNP contribute to produce higher levels of resistant-retrograded starch, when ´T´
SNP in GPT1 and ´G´ in GBSSI are present. This ´T:G´ haplotype can provide a new tool for
deliberate selection of rice genotypes for specific functional and nutritional outcomes such as
resistant-retrograded starch and high amylose content non-sticky rices.
Granule Bound Starch Synthase I (GBSSI) influences the grain quality of all cereals and,
particularly, rice. Using GBSSI as a model plant gene, a number of different computational
algorithm tools and programs were used to explore the functional SNPs of this important rice
gene and the possible relationships between genetic mutation and phenotypic variation. A
total of 51 SNPs/indels were retrieved from databases, including three important coding nonsynonymous SNPs, namely those in exons 6, 9 and 10. Sorting Intolerant from Tolerant
(SIFT) results showed that a candidate [C/A] SNP (ID: OryzaSNP2) in exon 6 (coordinate
2494) is the most important non-synonymous SNP with the highest phenotypic impact on
GBSSI. This SNP alters a tyrosine to serine at position 224 of the waxy protein.
Computational simulation of GBSSI protein with the Geno3D suggested this mutant SNP
creates a bigger loop on the surface of GBSSI and results in a shape different from that of
native GBSSI. Here, we suggest a potential transcriptional binding factor site (TBF8) which
has one [C/T] SNP [rs53176842] at coordinate 2777 in boundary site of intron 7/exon 8,
according to Transcriptional Factor (TF) Search analysis. This SNP might potentially have a
major effect on regulation and function of GBSSI.
The application of single nucleotide polymorphisms (SNPs) in plant breeding involves the
analysis of a large number of samples, and therefore requires rapid, inexpensive and highly
automated multiplex methods to genotype the sequence variants. A high-throughput
multiplexed SNP assay for eight polymorphisms which explain two agronomic and three
vi
grain quality traits in rice was optimised. Gene fragments coding for the agronomic traits plant
height (semi-dwarf, sd-1) and blast disease resistance (Pi-ta) and the quality traits amylose
content (waxy), gelatinization temperature (alk) and fragrance (fgr) were amplified in a
multiplex polymerase chain reaction. A single base extension reaction carried out at the
polymorphism responsible for each of these phenotypes within these genes generated
extension products which were quantified by a matrix-assisted laser desorption ionizationtime of flight system. The assay detects both SNPs and indels and is co-dominant,
simultaneously detecting both homozygous and heterozygous samples in a multiplex system.
This assay analyses eight functional polymorphisms in one 5 μL reaction, demonstrating the
high-throughput and cost-effective capability of this system. At this conservative level of
multiplexing, 3072 assays can be performed in a single 384-well microtitre plate, allowing
the rapid production of valuable information for selection in rice breeding.
vii
Table of Contents
Title page
i
Statement of originality
ii
Acknowledgements
iii
Abstract
iv
Table of contents
viii
List of abbreviations
xv
Publications arising from thesis
xvi
Chapter 1
Allele mining and characterization of starch genes in rice: from SNPs to phenotype
Starch structure
1
Starch synthesis
1
Starch synthesis enzymes and genes
2
Ka/Ks ratio ("purifying" vs "diversifying" genes)
3
Definition of purifying and diversifying genes
4
ADP-glucose pyrophosphorylase (AGPase), Starch phosphorylase (PHO)
and Glucose phosphate translocator (GPT) gene families
5
AGPS2b (small subunit)
5
SPHOL (alpha 1,4 glucan starch phospholrylase)
5
GPT1 (Glucose-6-phosphate translocator)
6
Pathway to amylose
6
Granule bound starch synthesis (GBSS)
6
Pathway to amylopectin
8
Starch Synthase (SS) genes
8
SSI
10
SSII
11
SSIIa
11
SSIIb
12
SSIIIa
13
SSIIIb
13
SSIVa
14
Starch Branching enzymes (SBEs)
14
viii
BEI
14
BEIIa
15
BEIIb
15
Debranching Enzymes (DBEs)
16
ISA1 (Iso 1)
16
ISA2 (Iso2)
17
Pullulanase (PUL)
17
Proteins
18
Lipids
19
Environmental factors: Nitrogen (N), Phosphorous (P) and Potassium (K)
19
Thermal stress
20
CO2
21
Objectives of thesis
21
Key concepts
21
Major activities reported in the thesis
22
Chapter 2
Discovery of polymorphisms in starch related genes in rice germplasm by amplification of pooled
DNA and deeply parallel sequencing
Summary
24
Introduction
25
Materials and methods
27
Plant materials
27
Variability of genotypes
27
Sample preparation and DNA extraction
27
Designation of starch-metabolizing enzymes/genes involved in starch synthesis
28
Target genes for sequence analysis
28
Designing primers to capture target genes
28
Long range PCR protocol (LR-PCR)
29
DNA equimolar pooling
29
Massively parallel sequencing
30
SNP detection and data analysis
30
Total polymorphism rate and functional SNPs
31
Results
31
ix
Number of reads and average coverage
31
Polymorphism discovery and SNP/Indel detection
32
SNP variation across the starch related candidate loci
33
ADP-glucose pyrophosphorylase (AGPase), Starch phosphorylase (PHO)
and Glucose phosphate translocator (GPT) gene families
34
AGPS2b (small subunit)
34
SPHOL (alpha 1,4 glucan starch phospholrylase)
35
GPT1 (Glucose-6-phosphate translocator)
35
Granule bound starch synthase (GBSS) gene family
37
GBSSI (Granule bound starch synthase I)
37
GBSSII (Granule bound starch synthase II)
38
Starch synthase (SS) family
38
SSI
38
SSIIa
40
SSIIb
41
SSIIIa
41
SSIIIb
42
SSIVa
42
Starch Branching enzymes (SBEs)
43
BEI
43
BEIIa
44
BEIIb
44
Debranching Enzymes (DBEs)
45
ISA1 (Iso 1)
45
ISA2 (Iso2)
46
Pullulanase (PUL)
46
Distribution of SNPs across the loci
47
Ka/Ks ratio ("purifying" vs "diversifying" genes)
47
Discussion
48
Chapter 3
Bioinformatic tools assist screening of functional SNPs in plants: GBSSI in rice as a model
gene
Summary
52
Introduction
52
x
Materials and methods
54
GBSSI gene as a case study
54
Sequence alignment
54
SNP dataset
55
Computational tools for SNP analysis
55
3D Modelling of GBSSI and comparative study
56
Functional flow chart
58
Results
60
SNPs in GBSSI gene and comparative study
60
Computational algorithm tools
60
UTR Scan
60
TF Search
60
SIFT (Sorting Tolerant from Intolerant)
62
GeneSplicer
63
SEE ESE (Sequence Evaluator of Exonic Splicing Enhancers)
64
FAS-ESS (Systematic identification and analysis of exonic splicing silencers)
65
Simulation for finding functional, constructive changes of ns-coding SNPs
66
Discussion
68
Conclusion
71
Chapter 4
SNP in starch biosynthesis genes associated with the nutritional and functional properties of
domesticated rice
Summary
73
Introduction
74
Materials and methods
77
Plant materials
77
Physiochemical properties
77
Designation of starch-synthesis genes involved in starch metabolize
78
Candidate genes/enzymes for SNP genotyping
78
SNP dataset
78
Primer design and SNP genotyping
79
Capture PCR protocol, primer extension and mass spectrometry
79
Association analysis
79
xi
Statistical parameters
80
Results
80
AGPS2b (small subunit)
80
SPHOL (alpha 1,4 glucan starch phospholrylase)
81
GBSSI (Granule bound starch synthase I)
81
GBSSII (Granule bound starch synthase II)
82
SSI
82
SSIIa
82
SSIIb
83
SSIIIa
83
SSIIIb
84
SSIVa
84
SSIVb
85
BEI
85
BEIIb
85
Debranching Enzymes (DBEs)
86
ISA1 (Isoamylase 1)
86
ISA2 (Isoamylase 2)
86
Pullulanase
86
Discussion
86
Neutral genes with no polymorphism or association
87
Major genes with highly significant associations
88
Contributory genes with low-medium associations
89
Minor genes with very low associations
89
Chapter 5
A SNP in GPT1 is closely associated with nutritionally important resistant-retrograded starch in
rice
Summary
91
Introduction
92
Materials and methods
95
Plant materials
95
Physiochemical properties
95
Designation of starch-synthesis genes involved in starch metabolism
95
xii
Discovery of novel SNP in GPT1 and SNP genotyping in population
96
Association analysis
96
Results
96
GPT1 (Glucose-6-phosphate translocator)
97
GBSSI (Granule bound starch synthase I)
100
Allelic combination of SNPs in GPT1 and GBSSI
100
Discussion
101
Chapter 6
SNPs and marker assisted selection (MAS) in plant breeding. A high-throughput assay for rapid
and simultaneous analysis of perfect markers for important quality and agronomic traits in rice
using multiplexed MALDI-TOF Mass Spectrometry
Summary
104
Introduction
105
Materials and methods
106
Genotypes
106
DNA extraction
106
Primer design/generation of SNP markers
107
Capture PCR protocol
107
Shrimp alkaline phosphatase (SAP) incubation
108
Primer extension and mass spectrometry
110
Results
110
Analysis of PCR products
110
Optimal capture primer concentration
110
MgCl2 concentration
111
Identification of SNPs and polymorphisms in agronomic and quality loci
112
sd-1
112
Pi-ta
115
waxy
115
alk
116
fgr
117
Missing data and heterozygosity
118
Discussion
118
xiii
CHAPTER 7
General discussion - Characterisation of starch traits and genes in Australian rice
germplasm
Background principles
122
Search in SNP data bases and discovery of polymorphisms
122
Screening of functional SNPs
124
Gene copy number in the rice genome
125
Multiplexed MALDI-TOF Mass Spectrometry markers help to genotype
individuals in a cost effective manner
125
Association between SNPs in starch biosynthesis genes and the nutritional
and functional properties of domesticated rice
126
The 6-glucose-phosphate translocator (GPT1) may contribute to resistant starch
128
Conclusion and further directions
129
References
132
Appendices
150
xiv
List of abbreviations
AC
Amylose content
BDV
Breakdown viscosity
CHK
Chalkiness
FV
Final Viscosity
GPT1
Glucose 6 -Phosphate Translocator gene
GT
Gelatinisation temperature
MT
Martin Test
MPS
Massively parallel sequencing
NGS
Next generation sequencing
Ns
non-synonymous
PT
Pasting temperature
PaT
Paste temperature
PeT
Peak time
P1
Peak
PKT
Peak time
PKV
Peak viscosity
PN
Predicted N
TF
Transcriptional factors
TFBS
Transcriptional factor binding site
SB
Set back
T1
Through
UTR
Untranslated region
xv
Publications arising from thesis
Publications arising from thesis
1) Masouleh AK, Waters DLE, Reinke RF, Henry RJ (2009) A high-throughput assay for
rapid and simultaneous analysis of perfect markers for important quality and agronomic traits
in rice using multiplexed MALDI-TOF mass spectrometry. Plant Biotechnology Journal.
7:355–363
2) Kharabian, A (2010) An efficient computational method for screening functional SNPs in
plants. Journal of Theoretical Biology 265(1):55-62
3) Kharabian-Masouleh A, Waters DLE, Reinke RF, Henry RJ (2011) Discovery of
polymorphisms in starch-related genes in rice germplasm by amplification of pooled DNA
and deeply parallel sequencing. Plant Biotechnology Journal. 9:1074-1085.
4) Kharabian-Masouleh A, Waters DLE, Reinke RF, Ward R, Henry RJ (2012) SNP in starch
biosynthesis genes associated with nutritional and functional properties of rice. Scientific
Reports. 2:557; DOI:10.1038/srep00557.
xvi
xvii
CHAPTER 1
Allele mining and characterization of starch genes in rice: From SNPs to
phenotype
Starch constitutes most of the dry matter in the harvested organs of crop plants and is one of
the most important human foods. Starch is an end product of photosynthesis that is mainly
stored in the form of granules in the endosperm of grains and specialized organelles such as
chloroplasts and amyloplasts. Numerous studies have been undertaken to elucidate starch biosynthesis and its genetic control and to discover the relationship between its structure,
physical properties and the influence of environment on starch properties. Although, a
number of comprehensive research and review articles have been published on starch
chemistry and pathways of synthesis, there is still much that is not known which means it is
not possible to modify starch components or quality in a predictable way.
Starch structure
Starch, a complex carbohydrate, is a polymer of glucose molecules. It occurs as two main
forms: amylose, consisting of predominantly linear chains of glucose monomers linked by
α1-4 glycosidic bonds, and amylopectin, in which the chains are branched by the addition of
α1-6 glycosidic bonds. Depending upon species and the site of storage, amylose generally
constitutes approximately 10 to 35% of the starch found in plants and the remainder is
amylopectin.
Starch synthesis
The biochemistry of starch synthesis is relatively well understood although it is a complex
process (Buléon et al., 1998; Libessart et al., 1995). Many enzymes are involved in starch
1
synthesis and several isoforms of these enzymes exist, leading to a highly complex
biosynthetic process.
The starting point of starch synthesis is glucose which is derived from photosynthesis in the
green parts of plants. This glucose is transported to and deposited in storage tissue including
grain endosperm and tuberous roots. In the amyloplast, glucose is activated by the addition of
ADP by ADP-glucose pyrophosphorylase (AGPase) (James et al., 2003). The ADP-glucose is
then used by starch synthases which add glucose units to the growing polymer chain to build
the starch molecules (Buléon et al., 1998).
Starch synthesis enzymes and genes
A significant number of enzyme isoforms and activities contribute to starch synthesis and
therefore many genes are involved in the process. A simplified pathway diagram of starch
bio-synthesis and the enzyme and genes involved is shown in Figure 1. If we consider ADPglucose as the main substrate then there are two different pathways which lead to starch, one
toward amylose and the other to amylopectin. In each of these biochemical pathways,
different enzymes and genes play a role. These enzymes and genes work in a complex
process and each one makes a partial contribution to the starch end product and its quality
(Tester et al., 2004).
Some starch genes, such as SSIIa, are mainly expressed in the endosperm and others only in
leaves while others are expressed in both green and storage tissues. Genes belonging to “non-
2
Figure 1. A schematic diagram showing the biochemical pathways of cereal starch
production.
endosperm type” are often expressed together with one “both tissue type”. For example
GBSSII, SSIIB, SSIIIb are leaf expressed and are co-ordinately expressed with SSI which is
expressed in both tissue types (Hirose and Terao, 2004). For this reason when investigating
the association of starch genes with grain quality it is necessary to focus on all genes with a
possible phenotypic effect. Mutations in genes which operate early in the starch bio-synthesis
pathway (Fig 1) are likely to influence starch quality or quantity.
Ka/Ks ratio ("purifying" vs "diversifying" genes)
The ratio of non-synonymous (Ka) to synonymous (Ks) SNP can reveal whether a gene has
been under purifying, neutral or diversifying selection. The Ka/Ks ratio has been created to
classify candidate genes into two main categories of “purifying” and “diversifying” genes.
3
Under neutral conditions of evolution, at the amino acid level, Ka should equal Ks and hence
the ratio Ka/Ks = 1. Any deviation from this score shows the selection pressure on genetic
structure of population or candidate genes. The Ka/Ks ratio < 1 indicates negative (purifying)
selection and positive (diversifying) selection is Ka/Ks>1 (Roth and Liberles, 2006). SNPs in
the genes studied in this thesis were retrieved from The International Rice Functional
Genomics Consortium (IRFGC) database (http://oryzasnp.plantbiology.msu.edu/). This
database holds records of the sequence analysis, including SNPs, of 20 diverse rice (Oryza
sativa L.) cultivars. The different Ka/Ks ratios were calculated for candidate genes, ranging
from 0.11 to 2.40 for SSI and SSIIIa, respectively (Table 1). These results indicate that genes
such as SSIIIa are under diversifying selection whereas others such as SSI are under
purifying selection.
Definition of purifying and diversifying genes
These terms extend the concept of evolution, in which genes, or more accurately allele
frequency, are diversified (diversifying) or purified (purifying) under natural or artificial
selection pressure. In natural selection, purifying selection equals negative selection where
deleterious alleles (SNP) (point mutations) are gradually removed from the population which
tends to stabilise the population (selection). In contrast, diversifying (or disruptive) selection
is where allele frequencies change and extreme trait values are favoured over intermediate
values. This normally follows positive natural or artificial selection.
4
ADP-glucose pyrophosphorylase (AGPase), Starch phosphorylase (PHO) and Glucose
phosphate translocator (GPT) gene families
These enzymes/genes reside at the top of the starch bio-synthetic pathway and are the starting
point of grain starch production. Glucose is first activated by the addition of ADP by AGPase
which then becomes the substrate for the other major starch enzymes. There are several
gene/isozymes in this classification but AGPS2b has the highest expression level in rice
endosperm (Hirose et al., 2006).
AGPS2b (small subunit)
The role of this subunit in starch granule synthesis has been identified by way of its
association with rice shrunken mutants (Kawagoe et al., 2005). A dramatic inhibition of
starch synthesis has been observed in AGPase-deficient rice mutants and some other species
and results in increased soluble sugars, a large number of underdeveloped granules, small
grains and pleomorphic amyloplasts (Rolletschek et al., 2002). In total, 12 SNPs have been
retrieved for 20 fully sequenced cultivars in OryzaSNP@MSU database. The polymorphism
rate detected for AGPS2b in OryzaSNP database is relatively high at 1.876 and the Ka/Ks
ratio (0.25) indicates that this gene has been under negative or purifying human selection.
SPHOL (alpha 1,4 glucan starch phospholrylase)
This gene is generally considered to be involved in starch degradation but recent studies
suggest some important roles in starch biosynthesis. Although its precise mechanism and
influence is still not well known, the mechanism appears to be associated with
phosphorylation of some starch-related enzymes and proteins such as starch branching
enzymes (SBEs) and starch synthase
5
(SSIIa) (Tetlow et al., 2004). In total, 11 SNPs are known in this gene, including two nonsynonymous and four synonymous. The SNP rate is 1.46 and gene has been under negative
selection (Table 1).
GPT1 (Glucose-6-phosphate translocator)
GPT1 is strongly expressed in the endosperm. This gene is believed to be responsible for the
import of essential carbon substrates such as Glc6P into the plastids during the grain
development (Fischer and Weber, 2002; Jiang et al., 2003). Three SNPs, all in introns, and a
Ka/Ks ratio suggests this gene has not been under any selection pressure by humans.
Pathway to amylose
This is the shortest and simplest pathway of starch synthesis and the most well recognised.
ADP-glucose is converted to amylose by the contribution of one major enzyme, granule
bound starch synthesis I (GBSS I).
Granule bound starch synthesis (GBSSI)
GBSSI coded by the Waxy gene is the most well characterised starch biosynthesis enzyme in
plants and has very significant effect on starch composition and quality. The α1-4 glycosidic
bonds of amylose are synthesised by GBSSI. In rice, high activity of GBSSI produces high
amylose content leading to a non-waxy, non-sticky or non-glutinous phenotype. On the
contrary, if GBSSI gene is partially active or inactive, the waxy (sticky), glutinous
appearance will be produced. In maize the waxy phenotype contains no amylose due to a
defect in GBSS encoding gene (Kiesselbach, 1944) while potato and cassava amylose free
cultivars have been generated
6
by GBSS suppression (Raemakers et al., 2005; Visser et al., 1991; Hovenkamp-Hermelink et
al, 1987; Kuipers et al., 1994). Several wx mutants and isoforms of GBSS have been reported
in barley waxy cultivars which synthesize small amounts of endosperm amylose (Ishikawa et
al., 1995; Patron et al, 2002).
There are two isofoms of GBSS, GBSSI and GBSSII. These isoforms are homologous and
have approximately 66-69% amino acid sequence identity but their encoding genes are
situated at different loci. The gene encoding GBSSI is predominantly expressed in endosperm
whereas GBSSII is expressed in leaves and other non-storage tissues (Vrinten and Nakamura,
2000). Therefore, GBSSI is the most important enzyme responsible for endosperm amylose
content. GBSSI has been widely studied in different plant species (Nakamura et al., 1998;
Nakamura, 2002; Domon et al., 2002; Saito et al, 2004; Shapter et al., 2009). In rice, a
significant association between RVA pasting properties and the waxy gene sequence has been
found. Three SNP sites in the waxy gene in exon1/intron1 boundary site, exon 6, and exon 10
were determined to be responsible for different apparent amylose content and pasting
properties (Larkin and Park, 2003; Chen et al., 2008). Chen et al. (2008) identified four SNPhaplotypes/ alleles that explained the high variability of RVA pasting properties in
international rice germplasm
GBSSII in rice is exclusively bound to the starch granules of leaves and has an important
function in amylose synthesis in the pericarp of the mature ovary (Nakamura et al, 1998).
Starch produced by GBSSII may be stored temporarily in the pericarp and later converted to
sugar and transferred to the endosperm as a substrate for starch synthesis during endosperm
development (Sato, 1984). The possible existence of SNPs/indels in GBSSII and their
impacts on starch properties and grain quality is still unknown.
7
GBSSI has widely been selecting by breeders in the past two decades and thus is under
purifying or almost has been purified. In contrast, it seems GBSSII is one of the most
conservative of all starch genes as only one SNP has been detected suggesting this gene has
not undergone any artificial selection pressure (Novaes et al., 2008).
Pathway to amylopectin
This is the second branch of the starch synthesis pathway (Fig 1), the end product of which is
amylopection. This pathway is more complex with many genes/enzymes and their isoforms
being involved in the process. Although amylopectin is the most abundant constituent of
grain starch, the role of different genes on starch composition in this pathway is relatively
unknown, perhaps because of complexity of the pathway.
Starch Synthase (SS) genes
The Starch Synthases (SS) or Soluble Starch Synthases (SSS) exists in all plants in multiple
isoforms and are responsible for the construction of α1-4 glycosidic bonds in amylopectin.
There are five genes encoding five different SS isoforms in the rice genome (SSI, SSII, SSIII,
SSIV and SSVI). All classes of SS are expressed in the endosperm of plants (Li et al., 1999a;
Li et al., 1999b; Li et al., 2000) and probably in all starch synthesising cells (Smith, 1999).
There is good evidence amylopectin chains are synthesized by the coordinated actions of SSI,
SSIIa, and SSIIIa isoforms.
8
Table 1. Polymorphism in rice genes responsible for starch synthesis and their status during domestication.
No
Chro#
Nucleotide length (bp)
Total number of SNPs
AGPS2b
Locus No (Rice
genome annotation
Project)
LOC_Os08g25734
Number of
Synonymous
SNPs
3
Functional
SNP rate
n.s/total
0.00
Polymorphism
rate (SNP/Kb)
Ka/Ks
ratio*
Selection
type
12
Number
of n.s
SNPs
0
1.876
0.25
Negative
Gene status
during
domestication
Purifying
8
6394
SPHOL
LOC_Os03g55090
3
7489
11
2
4
0.181
1.469
0.5
Negative
Purifying
GPT1
LOC_Os08g08840
8
4073
3
0
0
0.00
0.736
1.00
Neutral
Intact
GBSSI
LOC_Os06g04200
6
5035
8
1
2
0.125
1..588
0.5
Negative
Purifying
GBSSII
LOC_Os07g22930
7
8049
1
0
0
0.00
0.124
1.00
Neutral
Intact
SSI
LOC_Os06g06560
6
7750
67
2
6
0.014
8.645
0.33
Negative
Purifying
SSIIa
LOC_Os06g12450
6
4981
12
2
0
0.166
2.409
3.00
Positive
Diversifying
SSIIb
LOC_Os02g51070
2
5323
16
6
4
0.375
3.00
1.5
Positive
Diversifying
SSIIIa
LOC_Os08g09230
8
11263
52
19
11
1.686
4.61
1.72
Positive
Diversifying
SSIIIb
LOC_Os04g53310
4
8624
24
8
5
0.927
2.782
1.6
Positive
Diversifying
SSIVa
LOC_Os01g52250
1
10480
25
5
2
0.477
2.385
2.5
Positive
Diversifying
BEI
LOC_Os06g51084
6
7258
14
2
1
0.275
1.928
2
Positive
Diversifying
BEIIa
LOC_Os04g33460
4
2265
0
0
0
0
0
1.00
Neutral
Conservative
BEIIb
LOC_Os02g32660
2
10900
5
0
0
0
0.458
1.00
Neutral
Conservative
ISA1
LOC_Os08g40930
8
6592
16
2
3
0.303
2.427
0.666
Negative
Purifying
ISA2
LOC_Os05g32710
5
2403
10
6
4
2.496
4.16
1.5
Positive
Diversifying
PUL
LOC_Os04g08270
4
10399
108
10
9
0.961
10.385
1.11
Positive
Diversifying
Sequencing data and polymorphism of SNPs derived from OryzaSNP@MSU database for 20 cultivated rices (http://oryzasnp.plantbiology.msu.edu/).
*To avoid value of zero for Ka/Ks ratio +1 will be added when number of non-synonymous and synomymous SNP are zero.
9
SSI
SSI is primarily responsible for the synthesis of the shortest chains of amylopectin of about
10 glucosyl units or less (DP 7-11). This gene/protein is presumed to be expressed in the
endosperm and leaf of rice (Fujita et al., 2006). The SSI gene is located on chromosome 7S of
wheat and encodes a Mr 75000 protein that is distributed between starch granule and the
soluble phase (Li et al, 1999). Studies on chain-length specificities of maize SSI affinities
have revealed that the entire carboxy-terminal region of this protein is necessarily required
for starch binding (Commuri and Keeling, 2001).
RT-PCR analysis shows that there is only one SSI isoform in rice which has steady
expression (Hirose and Terao, 2004). The transcript level of SSI is higher in endosperm than
leaf sheaths and blades and has therefore been classified as an endosperm and non-endosperm
expressing gene (Hirose et al., 2006). The measurement of SSI transcript levels at different
seed developmental stages found high expression at 1-3 days after flowering (DAF), peaking
at 5 DAF, and remaining almost constant during endosperm starch synthesis, suggesting SSI
is the major SS form in cereals (Cao et al., 1999).
A comprehensive analysis of mutant rice with a retrotransposon inserted into the SSI
encoding gene revealed SSI has a capacity for the synthesis of chains with DP8-12 with the
extension of smaller chains (Nakamura, 2002). Fujita et al. (2006) generated four SSIdeficient rice mutant lines using retrotransposon Tos17 insertion. The deficient mutants
exhibited a 0%-20% decrease in the amount of SSI protein in comparison to wild type,
changed amylopectin structure and increased the gelatinization temperature of endosperm
starch, although the complete absence of
10
SSI had no effect on the size and shape of seeds and starch granules and the crystallinity of
endosperm starch (Fujita et al., 2006).
This gene has a very small phenotypic effect on rice eating quality although a significant
negative correlation between the ratio of short chains (DP 6-12) and gelatinization
temperature has been reported (Umemoto et al., 2008). Although 46 SNPs including one nonsynonymous and four synonymous have been detected in rice, none has yet been reported in
the SSI gene of any plant species associated with starch composition, amylopectin quality or
quantity. This gene is highly polymorphic with 8.64 SNPs/Kb and is undergoing purifying
selection.
SSII
SSII is responsible for the synthesis of shortest chains and further extensions to produce
longer chains are catalysed by SSIIa and/or SSIII (Commuri and Keeling, 2001). Previous
studies show that there are three isoforms for SSII in monocots: SSIIa, SSIIb and SSIIc. The
role of the latter two in starch biosynthesis, especially SSIIc which only expressed in source
tissue, is unknown as no mutants have been found yet (Tetlow et al., 2004).
SSIIa
SSIIa is known to have a major affect on starch quality. This gene is predominantly expressed
in cereal endosperm at very high levels and affects amylopectin structure (Craig et al., 1998;
Morell et al., 2003). Loss of SSIIa results in reduced starch content, amylopectin chain
length, modification in granule morphology and crystallinity. In monocots, SSIIa elongates
the short glucan chains DP≤10 to the intermediate size of DP 12-24, thus its loss or down
regulation has a dramatic impact on amount and composition of starch (Tetlow et al., 2004).
The effect of this gene on rice cooking quality and rice starch texture has clearly been
demonstrated by virtue of a significant correlation between gelatinisation temperature (GT)
11
and particular SSIIa alleles (Umemoto et al., 2002; Umemoto et al., 2004). Alk, a major gene
regulating alkali disintegration resides on the same position as SSIIa on chromosome 6 of rice
(Gao et al., 2003). Further studies have shown the GT of rice flour, chain length distribution
of amylopectin and alkali spreading score are associated with different SSIIa haplotypes
(Umemoto and Aoki, 2005).
GT, alkali disintegration and eating quality of rice starch have been explained by
polymorphism of two SNPs, [A/G] and [GC/TT], within the exon 8 of alk loci (Waters et al.,
2006). These two SNPs were able to explain classification of 70 rice genotypes into either
high GT or low GT types which differed in GT by 8 °C (Waters et al., 2006). Polymorphism
analysis of this gene found 2.4 SNPs per Kb and indicates is under positive human selection
(Table 1).
SSIIb
SSIIb is a low level early expressed gene which is primarily expressed in leaf blades and
sheaths (leaf specific) at an early stage of grain filling (Hirose and Terao, 2004). However, a
recent study presented evidence that SSIIb contributes with six other starch genes to alter
some Rapid Viscosity Analyser (RVA) parameters in glutinous rice (Yan et al., 2010). The
exact role of SSIIb in starch synthesis is currently unknown mainly due to lack of mutant
phenotypes.
There were 12 SNPs, including six non-synonymous SNPs, indicating this gene has some
phenotypic impact due to high number of SNPs in exonic regions. Domestication has exerted
positive selection pressure on SSIIb.
12
SSIIIa
The SSIIIa encoding gene is highly expressed in endosperm, although some reports reveal
expression in green tissues (Dian et al., 2005). A recent study of a SSIIIa deficient rice
mutant found amylose content and the extra long chains of amylopectin increased by 1.3- and
12-fold, due to an increase in GBSSI activity (Fujita et al., 2007).
In spite of a relatively high functional SNP rate of 1.686, this gene does not show a high
significant association with rice physiochemical characteristics. For example, Yan et al. 2010
found no functional effect on RVA parameters, at least among glutinous cultivars. Out of 52
SNPs, 19 and 11 SNPs are non-synonymous and synonymous, respectively; suggesting
SSIIIa is under diversifying selection with a Ka/Ks ratio of 1.727.
SSIIIb
SSIIIb is mainly expressed in rice endosperm but transient expression in leaf sheaths and
leaves have also been reported (Hirose et al., 2006). It has also been classified into two
different categories on the basis of timing of expression in the developing seed. The late
expression category in which it is expressed in the mid to later stage of grain filling (Hirose
and Terao, 2004), and the early expression category in which the transcript level increases to
maximum level at 3-5 days after flowering (Ohdan et al., 2005).
An association study of rice glutinous near-isogenic lines suggested SSIIIb has a significant
impact on RVA parameters such as peak time and pasting temperature (Yan et al., 2010). The
total number of SNPs reported in OryzaSNP database is 24, of which eight are nonsynonymous and five synonymous, a high Ka/Ks ratio of 1.6. This ratio suggests this gene as
a diversifying gene which has been under positive selection during domestication.
13
SSIVa
SSIVa is one of the least known starch genes in plants. Like most starch synthase genes,
SSIVa is exclusively involved in amylopectin biosynthesis. Expression analysis by reverse
transcription PCR indicated SSIVa is preferentially expressed in rice endosperm and to a
degree in leaf blades as a late or steady expresser gene during grain filling (Hirose and Terao,
2004). QTL mapping and expression profile analysis have shown that high temperature
during the grain filling can considerably increase the transcription level of SSIVa by up to
1.11-fold, which is considerably higher than other starch synthase genes (Yamakawa et al.,
2007), and may contribute to grain chalkiness (Yamakawa et al., 2008). In total 25 SNPs has
been reported for this gene in OryzaSNP database, of which five are nsSNPs and two
synonymous, a Ka/Ks ratio of 2.5, indicating SSIVa is diversifying under human selection.
SSIVa may also affect some secondary RVA parameters such as breakdown and setback
(Yan et al., 2010).
Starch Branching enzymes (SBEs)
Starch branching enzymes (SBEs) break α-(1→4)-linkages in existing chains and attach the
released reducing ends to C6 hydoxyls, forming the branched glucan, amylopection (Tetlow
et al., 2004) .
BEI
BEI is mainly expressed in the endosperm and transcript levels increase rapidly 3-5 days after
flowering. Biochemical observations with purified BEI from maize endosperm indicate BEI
preferentially branches amylose-type polyglucans and has a high capacity for branching less
branched α-glucans (Takeda et al., 1993). Analysis of the catalytic properties of BEI has
indicated the N- and C-termini play a critical role in chain length transfer and substrate
14
preference (Kuriki et al., 1997). A rice BEI deficient mutant induced by mutagenesis
exhibited modified amylopectin structure and grain morphology but the same quantity of
starch as the wild type (Satoh et al., 2003) and the BEI encoding gene also effects the RVA
profile (Yan et al., 2010).
The OryzaSNP@MSU database showed 14 SNPs in total, of which only two were nsSNPs
and one synonymous. Therefore, the Ka/Ks ratio of two suggests this is a diversifying gene..
BEIIa
BEIIa is a leaf expressed gene involved in amylopectin synthesis. BEIIa is also expressed in
the endosperm but at levels 10-fold lower than leaf tissue (Gao et al., 1997). An association
study including the gene and RVA properties demonstrated a low F value (6.60) with a very
slight influence in glutinous rice (Yan et al., 2010). No SNP/Indel has been reported in
OryzaSNP database, suggesting BEIIa might be one of the most conservative starch-related
genes in rice.
BEIIb
BEIIb is known as amylose extender (ae) in maize and other cereals (Yun and Matheson,
1993) and many studies have reported the significance of this gene on starch properties in
various plant species (Fisher et al., 1993; Sun et al., 1997; Sun et al., 1998). This is a granuleand soluble- associated enzyme which is only expressed in the endosperm. Expression of
three different functional maize SBE genes in BE-deficient yeast strains demonstrated the
presence of BEIIb is necessary to activate BEI and BEIIa (Seo et al., 2002). Additionally, a
0.5- to 0.7 fold decrease in the expression of BEIIb during grain filling creates chalky rice
(Tanaka et al., 2004).
15
Only five SNPs are in the OryzaSNP database, none of which are in the exonic regions,
despite of results of a recent association study that has determined very high F value of 11.12
between BEIIb and RVA properties in rice (Yan et al., 2010).
Debranching Enzymes (DBEs)
DBEs belong to α-amylase family of which two classes exist in plants, Isoamylase and
Pullulanase. These enzymes debranch (hydrolase) α-(1-6)-linkages in amylopectin and
pullulan. Defective DBEs in plants are thought to be responsible for accumulation of
phytoglycogen rather than starch, and in turn, change the phenotypic appearance of the
endosperm (Bustos et al., 2004).
ISA1 (Iso 1)
In wheat, the expression of ISA1 cDNA was highest in developing endosperm and
undetectable in mature grains, suggesting a fundamental biosynthetic role of Isoamylase 1 in
plant starch, although precise roles of DBEs are not yet known (Tetlow et al., 2004).
Transcript level regulation of ISA1 during rice grain filling in response to high temperatures
has been reported by Yamakawa et al. (2007), in which the expression level of ISA1 mRNA
increased by 0.94 fold under high temperatures, 8 to 30 days after flowering. In rice
endosperm, antisense inhibition of Isoamylase 1 altered the structure of amylopectin and the
physiochemical properties of starch (Fujita et al., 2003). ISA genes are also thought to
contribute to the degree of setback in glutinous rice cultivars (Yan et al., 2010). In OryzaSNP
database 16 SNPs were detected, of which two are nsSNPs. The Ka/Ks ratio of 0.666
signifies this gene has undergone negative selection during domestication.
16
ISA2 (Iso2)
Isoamylase 2 corresponds to sugary1 (su1) was first reported in maize endosperm and could
separated from Isoamylase 1 by anion-exchange chromatography. (Beatty et al. 1999;
Doehlert and Knutson, 1991), A high rate of functional SNP and total polymorphisms was
observed for this gene (Table 1). The Ka/Ks ratio of 1.5 suggests this gene is under positive
selection and is one of the most diversifying genes among starch-related genes. The high
polymorphism rate of 4.16 supports this assertion. Association between ISA2 and rice grain
quality is unclear. There is no intron in this relatively small gene (2625 bp), thus each
detected SNP/Indel can be potentially important.
Pullulanase (PUL)
In rice endosperm a defect in pullulanase-type DBE activity triggers and modulates some
phenotypic effects (Nakamura et al., 1998). In maize endosperm, it is believed that
pullulanase has a dual role, contributing either to starch synthesis or degradation (Dinges et
al., 2003). Kubo et al. (1999) suggest pullulanase plays a predominant and essential role in
amylopectin synthesis and compensates shortages of isoamylase activity in the construction
of multiple cluster structure of amylopectin.
The highest polymorphism rate was observed for PUL in OryzaSNP database. In total 108
SNPs were detected in this relatively large gene (10399 bp), of which 10 are nonsynonymous and 9 synonymous, respectively. A Ka/Ks ratio of 1.11 indicates, although this
gene is a diversifying gene, its close to one ratio could easily change it into a neutral gene. A
recent association study between PUL and RVA profile parameters in glutinous rice has
shown strong relations of this gene with peak viscosity, hot paste viscosity, breakdown
viscosity and peak time (Yan et al., 2010). Nevertheless, our study only showed very minor
17
association between PUL and some physiochemical properties such as chalkiness,
gelatinization and pasting temperature (chapter 4).
Proteins
A wide range of starch granule-associated proteins has been found from different botanical
sources that are diverse in number, identity and possibly function (Baldwin, 2001; Schofield
and Greenwell, 1987). They are classified into two main categories of low and high
molecular weight proteins. It is widely accepted these proteins are located either inside and/or
on the surface of starch granules and influence starch properties. The composition and
content of these proteins significantly affects the structure and quality of starch and the
baking quality of cereals. Juliano et al. (1965) studied the relation of amylose content, protein
content, water absorption and gelatinization temperature on cooking and eating qualities of
non-waxy rice and found protein content and cooked rice colour of are positively correlated.
Retrogradation is the hardening of cooked rice after storage or cooling. Retrogradation rate
has significant implications for rice consumers as many of them cook rice in the morning and
consume it after several hours of refrigerated storage. Recent studies have shown removal of
total proteins causes softer gels at different storage treatments such as 20, 40 and 60 ºC but
will not affect firmness following refrigerated storage (Philpot et al., 2006). This suggests
protein could have a major influence on room temperatures retrogradation; however, other
factors or biochemical mechanisms such as lipid content might be involved in the lower
temperatures. Application of pronase, which digests peptides and proteins, to rice kernels and
milled starch caused a significant change in thermal properties and gelatinization profile
(Marshall et al., 1990). Ragaee and Abdel-Aal (2006) observed significant differences
between cereals physiochemical properties such as starch peak, breakdown and setback
viscosities (RVA curves) as well as in protein peak viscosity of a number of cereals.
18
However, proteins inside maize or wheat hydrated-swollen starch granules after
gelatinization are degraded extensively by proteases without any apparent change in
properties (Debet and Gidley, 2007).
Lipids
A number of lines of evidence suggest lipids have a significant role in influencing the
physiochemical properties of rice such as retrogradation. Debet and Gidley (2007) found
proteins and lipids on the granule surface are determinants of ghost robustness and have a
role in ghost formation and integrity, a surface film, rich in protein and lipid, limits expansion
of starch granules and prevents dissolution after gelatinization. Lipid removal from rice
variety Koshihikari grown in different countries, increased retrogradation rate and firmness of
gels after storage at different temperatures. The greater the amount of long chain amylose
complexed with lipids, the greater the reduction in retrogradation degree which is caused by
the unavailability of long amylose chains (Philpot et al., 2006).
Lipids are also involved in the physical structure of the rice kernel. Treatment of different
rice varieties with hexane caused significant changes in gelatinization parameters and kernel
shape and caused extensive fissure formation (Marshall et al., 1990).
Environmental factors: Nitrogen (N), Phosphorous (P) and Potassium (K)
The optimal application of nitrogen fertilizers for Australian rice cultivars is 170-180 kg/ha
and an amount higher or lower than this is considered to be a high or low level. Application
of different levels of nitrogen in the field during rice plant development influences solid loss
and water uptake ratio during cooking. Rice starch grown under nitrogen application has a
higher cooked grain hardness, cohesiveness, chewiness, lower amylose content and higher
pasting and gelatinisation temperature and enthalpy (Singh et al., 2011). Pot and field
19
experiments confirmed increasing N application, decreased amylose content, peak viscosity
and breakdown viscosity, while setback and consistency go up (Dayong et al., 2004).
Other macro-nutrients such as phosphorous (P) and potassium (K) have also been studied in
relation to rice grain amylose content and starch viscosity properties. Application of P has no
obvious effect on amylose content, peak viscosity, breakdown, set back and gel consistency.
However, increasing amounts of K increased amylose content, peak viscosity, breakdown
while the setback and gel consistency were reduced. It seems, the interaction of NPK
fertilizers on quality characters of different varieties was significant; while reduction of N
and increasing K improves rice cooking and eating quality (Dayong et al., 2004).
Thermal stress
Thermal stress (temperatures above 37 °C) during the critical grain filling period can affect
the biochemical processes of starch deposition causing yield loss and starch defects (Peng et
al., 2004). Failure in starch deposition results in lightly packed granules and grain chalkiness
(Zakaria et al., 2002). It is believed that during heat shock, especially at grain filling,
expression of some starch related genes such as GBSSI, BEIIb and a cytosloic dikinase gene
is down-regulated, where as some heat shock proteins and alpha-amylases up-regulated
(Yamakawa et al., 2008). Yamakawa et al. (2007) suggested that decreased level of amylose
and long chain-enriched amylopectin in high temperature-ripened grains is mainly due to
repressed expression of GBSSI and BEIIb, respectively. They also reported the expression
level of various genes in response to high temperature. Tashiro and Wardlaw, (1991a)
showed that high temperature at the milky stage of grain filling has the most extensive
influence on rice grain chalkiness because the panicle is the most sensitive organ to high
temperature (Sato et al., 1973).
20
CO2
Differences in carbon dioxide concentration have no consistent effect on grain and starch
parameters of wheat, small effects have been detected on thousand grain weight, starch
content and lipid-free amylose content (Tester et al., 1995). Evaluation of the long-term
effects of different CO2 concentrations on carbohydrate status and partitioning of rice (Oryza
sativa L cv. IR-30) found the photosynthesis rate was substantially increased with CO2
concentrations up to 500μmol mol−1 and then reached a plateau at higher concentrations
(Rowland-Bamford et al., 1990). The ratio of starch to sucrose concentration was positively
correlated with the CO2 concentration but had no effect on the carbohydrate concentration in
the grain at maturity.
Objectives of thesis
The objectives of this thesis were to:
1) Characterise rice starch biosynthesis genes;
2) Discover DNA polymorphisms in Australian rice germplasm using new cutting edge
technologies (Next Generation Sequencing);
4) Detect and prioritise functional SNPs in starch related genes using computational tools;
5) Associate SNPs (genes) with the physiochemical properties of rice grain.
Key concepts
The key concepts encompassed by this thesis are:
1) Rice starch varies between rice cultivars due to differences in the gene sequence of the
enzymes which synthesise the rice starch;
2) Humans can detect differences in rice starch and these differences define rice quality.
These differences can be instrumentally quantified;
21
3) Differences in gene sequence can be defined and the extent to which they control, or are
associated with, rice quality differences measured;
4) The chemical properties of the 20 constituent protein amino acids structure and the
accumulated knowledge of how protein structure and function are linked can be used to in
algorithms which predict how amino acid differences in any one protein may impact the
function of that protein.
Major activities reported in the thesis
The major activities reported in the thesis were:
1) DNA sequence of 18 starch related genes were retrieved from databases and the location
and type of SNPs identified;
2) The retrieved SNPs were analysed and then prioritised based on their predicted importance
using bioinformatic tools and algorithms;
3) Long range PCRs (LR-PCR) amplified the 18 starch related genes in 233 Australian rice
lines/cultivars;
5) The amplified products were pooled and then sequenced using an Illumina GAIIx
platform.
6) The sequencing data was analysed and SNPs detected;
7) The SNPs retrieved from databases were compared with SNPs discovered in the
sequencing experiment and novel variations identified;
8) Specific markers were designed and generated for multiplexed MALDI-TOF assay of
SNPs.
9) All 233 genotypes were assayed (genotyped) individually for all SNPs using multiplexed
MALDI-TOF;
10) The phenotypic data for physiochemical traits of 233 rice individuals were obtained from
an Australian breeding program;
22
11) Association between SNPs (genes) and traits was assessed using the software TASSEL
following a General Linear Model;
12) Data flowing from these activities were discussed within the context of published work in
the field.
23
CHAPTER 2
Discovery of polymorphisms in starch related genes in rice germplasm by
amplification of pooled DNA and deeply parallel sequencing
Summary
High-throughput sequencing of pooled DNA was utilised for polymorphism discovery in
candidate genes involved in starch synthesis. A total of 17 rice starch synthesis genes,
encoding seven classes of enzymes, including ADP-glucose pyrophosphorylase (AGPases),
granule starch synthases (GBSS), soluble starch synthase (SS), starch branching enzyme
(BE), starch debranching enzyme (DBE) and starch phosphorylase (SPHOL) and phosphate
translocator (GPT1) from 233 genotypes were PCR amplified using semi- to long range PCR.
The amplification products were equimolarly pooled and sequenced using massively parallel
sequencing technology (MPS). By detecting SNP/Indel in both coding and non-coding areas
of the genes, the SNP/Indel variation and distribution patterns among individual starch
candidate genes were identified and characterized. Approximately 60.9 million reads were
generated, of which 54.8 million (90%) mapped to the reference sequences. The average
coverage ranged from 12,708× to 38,300× for SSIIa and SSIIIb, respectively. SNPs and
single/multiple-base Indels were analysed in a total assembled length of 116,403 bp. A total
of 501 SNPs and 113 Indels were detected across the 17 starch related loci. The ratio of
synonymous to non-synonymous SNPs (Ka/Ks) test indicated GBSSI and Isoamylase 1
(ISA1) as the least diversified (most purified), reflecting the populations history of selection
for low amylose content and gelatinization temperature. This report demonstrates a useful
strategy for screening germplasm by MPS to discover variants in a specific target group of
genes.
24
Introduction
The capacity of massively parallel sequencing to simultaneously assay millions of single
nucleotide polymorphisms (SNPs) has made genome-wide studies possible (Schuster, 2008).
The use of next generation sequencing (Thomas et al., 2006) platforms for population-based
sequencing of targeted genomic regions enables the discovery of new variants and their
frequencies across selected genes (Harismendy and Frazer, 2009), or allow identification of
errors in previously published reference sequences (Bentley, 2006) or SNP databases (Velicer
et al., 2006). Massively parallel sequencing technology (Genome Analyser) is a
groundbreaking, flexible and high-throughput platform for genetic analysis and functional
genomics which is based on ultra deep sequencing of short reads and a huge number of
sequencing reactions (Imelfort et al., 2009). This platform utilizes a sequencing-by-synthesis
approach in which all four nucleotides are added simultaneously followed by an optic
imaging procedure which occurs at each base incorporation step (Mardis, 2008b) and has
widely been used by researchers to discover SNPs associated with human genetic diseases,
particularly cancer studies (Bentley, 2006; Mardis, 2008a). This platform can be utilised in
different ways, from whole genome sequencing (WGS) of plants and animals to specific
genomic regions or even functional encoding genes or loci (Bentley et al., 2008; Hillier et al.,
2008; Kim et al., 2009).
Massively parallel sequencing (MPS) is an attractive cost efficient technology that enables
characterisation of genetic traits on an unprecedented scale, in terms of the number of genes,
number of samples and allele frequency which is necessary if rare alleles are to be found
(Kaiser, 2008; Pettersson et al., 2009). Recently, targeted MPS has been effectively
integrated with Long Range PCR (LR-PCR) of pooled DNA samples which minimises the
cost of sequencing, amplification, oligonucleotides, and labour (Out et al., 2009). LR-PCR
targeted MPS can be employed to deeply sequence regions surrounding candidate genes
25
containing SNPs/indels (Varley and Mitra, 2008). Utilising this approach, the full extent of
allelic variation in a vast number of encoding genes involved in various aspects of
physiology, disease etc. can be recovered and large regions of linkage disequilibrium (~5-11
kb) identified (Bodmer and Bonilla, 2008).
One of the major advantages of this approach is the capacity of MPS and targeted gene
amplification to provide a high sequence depth in all studied loci simultaneously. For
example, a total sequence yield of 1 Gb means a fragment of 10-kb will be read
approximately 100,000 times (Out et al., 2009) which meets the requirements for discovery
of rare alleles (Druley et al., 2009; Thomas et al., 2006; Ingman and Gyllensten, 2008). The
flexibility of the platform is extended when multiple genomic regions of numerous
individuals from wild or segregating populations are pooled.
Rice (Oryza sativa L.) starch, a complex carbohydrate, is one of the most important crop
products for humankind (Fitzgerald, 2004). Starch is synthesized by the activity of several
enzymes and has been subjected to extensive studies (Morell et al., 2003).
Each of the starch synthesis enzymes exists as a number of different isoforms and is usually
classified into one of the specific group of genes, such as ADP-glucose pyrophosphorylase
(AGPases and GPT1), starch synthase, starch branching enzyme, starch debranching enzyme
and starch phosphorylase (James et al., 2003).
In this study, DNA of 233 individuals from a breeding population was equimolarly pooled
and 17 rice starch quality-related genes encoding seven classes of starch enzymes which are
part of the starch bio-synthesis pathway were amplified by a LR- or Semi LR-PCR (SLRPCR) protocol. The pooled-targeted amplifications were subsequently sequenced using MPS
sequencing technology (Illumina Inc., San Diego, CA). By detecting SNP/Indels contained in
26
both coding and non-coding areas of the genes, SNP/Indel distribution patterns were
characterised.
Materials and methods
Plant materials
All plant material was supplied by Industry and Investment NSW, Yanco Agricultural
Research Institute, Australia. Two hundred and thirty three rice lines from a breeding
program were analysed. These lines were significantly diverse in starch quality properties,
providing a high rate of variation for starch traits.
Variability of genotypes
This population comprised a series of lines at the F6 stage, from harvested pedigree rows
entering the first stage of plot testing. This captured a wide set of lines from a breeding
program focused on temperate (japonica-type) rice. Selection was primarily done on plant
height, the capacity to flower and set seed in our temperate environment of Australia, and
visual inspection of grain size and shape. No selection had taken place for quality traits like
gelatinization temperature and RVA curve data (Appendix 2).
Sample preparation and DNA extraction
Rice seeds of each line were germinated at 25°C and 10-20 seedlings from each individual
selected for DNA extraction. Total genomic DNA was extracted from 15-day-old seedlings
using DNeasy 96 Plant kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s
instructions.
27
Designation of starch-metabolizing enzymes/genes involved in starch synthesis
The major databases such as NCBI (http://www.ncbi.nlm.nih.gov/) and the Rice Genome
Annotation Project (http://rice.plantbiology.msu.edu/cgi-bin/putative_function_search.pl)
were searched for the general entries of nucleotide sequences (gDNA) and full-length cDNAs
of important gene classes which are presumed to be involved in starch biosynthesis. The
available literature was used to choose the most likely candidate genes associated with rice
starch quantity and quality (Ohdan et al., 2005; Waters and Henry, 2007; Nakamura, 2002;
Hirose et al., 2006; Rahman et al., 2000). The multiple sequence alignment of selected genes
was carried out using Sequencher® (Gene Codes Corporation, Ann Arbor, MI, USA) and
CLUSTAL W (http://www.ebi.ac.uk/Tools/clustalw2/index.html) and a consensus sequence
alignment generated for each candidate gene to design the amplification primers.
Target genes for sequence analysis
The present study focused on the genes encoding seven groups of enzymes, namely ADPglucose pyrophosphorylase (AGPase), granule bound starch synthase (GBSS), starch
synthase (SS), branching enzyme (BE), debranching enzyme (DBE), starch phosphorylase
(PHO) and glucose phosphate translocator (GPT).
Designing primers to capture target genes
The sequence of each target gene, including exons and introns, were divided into two
relatively equal fragments. Each selected sequence included 500 bp from up- and
downstream of the coding regions (5´ and 3´ UTRs) and an approximately 300 bp overlap in
the middle. A set of specific primers were designed for each half using Clone Manager V9.1
(Sci-Ed Software, NC 27513 USA) (Appendix 3).
28
Long range PCR protocol (LR-PCR)
The concentration of extracted DNA was quantified by the automated flurometric protocol of
PicoGreen (PicoGreen dsDNA Quantification Kit, Invitrogen, CA 92008 USA) and then
diluted to 30-40 ng/µl for amplifications. A unified LR-PCR approach was applied to amplify
all genes (with few exceptions), simultaneously. BioRad iProofTM High-Fidelity DNA
polymerase was used for PCR amplifications, in 10 µl reactions, containing 20 ng of pooled
genomic DNA of 20 individuals. The extreme fidelity of iProofTM makes it the enzyme of
choice for SNP detection in long amplicons. PCRs were performed using 2 µl of HF or GC
buffer (the HF buffer are used for normal and GC buffer for GC rich sequences), 0.2 µl of
dNTPs (10 mM), 2 µl of each forward and reverse primer (2.5 µM), 1 µl of pooled DNA (20
ng/µl), 0.1 µl of iProofTM polymerase (0.2 unit) and 2.7 µl sterile water. As the different
genes needed unique optimal conditions for amplification, a unified PCR method to amplify
all targeted genes simultaneously was attempted. The touchdown PCR protocol was
performed using a Corbett PCR thermo cycler as follows: 98 C for 1 min (1 cycle), followed
by 10 touch down cycles of 98°C for 10s, annealing temperature of 72-62 (10 C degree touch
down) and 72oC extension for 4 min, followed by 28 cycles of a normal amplification of
98oC for 10s, 62oC for 20s and 72oC for 4 min. The final extension was done by a cycle of
72oC for 10 min. Prior to Illumina sequencing, the PCR products were Sanger sequenced
using BigDye Terminator version 3.1 (Applied Biosystems, Foster City, CA). The generated
sequences were aligned with the reference sequence to ensure the correct gene had been
captured.
DNA equimolar pooling
A uniform pooling strategy was applied for all samples. The genomic DNA of 233 breeding
lines, which had already been normalised for PCR in previous stage (30 ng/µl), divided into
29
12 sections, containing the pools of approximately 20 individuals each, and LR-PCRs were
carried out. The concentration of PCR products from these pools were measured using
PicoGreen (PicoGreen dsDNA Quantification Kit, Invitrogen, CA 92008 USA). A second
pool was made for each fragment from PCR products. To facilitate the final equimolar
pooling of PCR products, the concentration of second pools (33 second pools/amplicons)
were individually normalised to 25 ng/µl and then equimolarly pooled into a mega pool based
on the predicted lengths, giving consideration to the requirement that larger amplicons need a
higher number of copies than smaller fragments. The final mega pool was prepared with the
aim of having the final concentration of 2.5 µg of long amplicons, including all 233
individuals.
Massively parallel sequencing
The final mega pool was subjected to Illumina GA sequencing (Illumina Inc., San Diego,
CA). The PCR product fragmentation and library were prepared according to the
manufacturer’s instructions. The fragments with length of approximately 200 bp were
selected for sequencing and 4 pmol of the library were added on to a one flowcell.
SNP detection and data analysis
Data analysis such as filtering, trimming and mapping to the reference sequences were
performed with the CLCbio Work Bench 4 using 17 reference sequences with the specified
coordinates, extracted form Genbank (Table 3). The CLC work bench general parameters
were set to the following: The conflict resolution changed into all four nucleotides (vote A,
C, G, T), non specific and masking references ignored. The reads parameters set to default as
the min-max distance, mismatch cost; length fraction and similarity were 100-1000, 3, 0.9,
and 0.9, respectively for both single and paired end reads. This set of parameters was selected
30
in order to minimize reads alignment ambiguities as well to detect rare SNPs. The minimum
coverage and percent of minimum variant frequency were set at 20 and 0.5, respectively,
which meant all variations on or above 0.5%, were considered as SNPs.
Total polymorphism rate and functional SNPs
The total polymorphism rate was calculated as: TSI TL  100 where, TSI=Total number of
SNPs and Indels and TL is the total length of each candidate gene. The functional or nonsynonymous SNP rate was also calculated as: NS TL 1000 where, NS= Number of nonsynonymous SNPs in each locus and TL is the total length of each candidate gene.
Results
Number of reads and average coverage
Sequencing of LR-PCR products of all 17 studied loci generated ~60.9 million reads of which
54.8 million (90%) mapped to the reference sequences. Table 1 shows the summary statistics
of the mapping report. The average coverage differed among loci and ranged from 12,708× to
38,300× for SSIIa and SSIIIb, respectively (Fig 1). This difference may be related to factors
such as concentration of amplicons and PCR efficiency, number of non-specific products and
contamination with external PCR products. For example, LR-PCR products of the SSIIa gene
revealed a number of non-specific bands on agarose gel which led to higher unmapped reads
and lower coverage. The highest and lowest number of reads was counted for SSIIIa and
SSIIa, with 5,920,785 and 876,986 reads, respectively.
31
Polymorphism discovery and SNP/Indel detection
Starch quality loci of 233 breeding lines were successfully sequenced to great depth and
coverage. SNPs and single/multiple-base Indels were discovered in a total length of 116,403
bp assembled by Genome Analyser (GA). In total, 501 SNPs and 113 Indels were detected
across the 17 starch related loci (Appendix 1). The total number of polymorphisms was then
compared to SNPs available at OryzaSNP MSU database
(http://oryzasnp.plantbiology.msu.edu/) (Table 2). A total of 399 SNPs for the targeted loci
had already been reported in this database for 20 rice cultivars. As expected, the total number
of polymorphisms in this experiment was significantly higher than that reported in the rice
database and the confidence score was significantly higher due to huge read coverage. On
average, the SNP rate was 4.31 SNPs/kb and 0.97 Indels/Kb. Previous data have reported an
average rate of one SNP every 170 bp and one Indel every 540 bp (Goff et al., 2002; Yu et
al., 2002).
Ave Cov
45000
40000
Average coverage
35000
30000
25000
Ave Cov
20000
15000
10000
5000
Name of gene/enzyme
Figure 1. The average coverage in starch related genes
32
PUL
ISA2
ISA1
BEIIb
BEIIa
BEI
SSIVa
SSIIIb
SSIIIa
SSIIb
SSIIa
SSI
GBSSII
GBSSI
GPT1
SPHOL
AGPS2b
0
The data indicates one SNP in every 232 bp and one Indel every 1030 bp within this set of
germplasm and for these candidate genes. Although the average rate of SNPs is gene specific
and related to species and structure of the studied population, these results are similar to
previous reports (Nasu et al., 2002; Yu et al., 2002). Out of 501 identified SNPs, 75 or
~14.9% of SNPs caused an amino acid change making them potentially functional. All Indels
resided in the intronic regions and were thus not responsible for any stop codons, frameshift
mutations or amino acid changes. The Indel rate was a slightly higher than previously
reported (Goff et al., 2002; Yu et al., 2002) which may have been due to lower stringency
mapping criteria the short reads in CLC workbench, with Min Cost 2; Min Insert 2 and
Similarity 0.7. The largest and smallest Indels were 8 bp and 1 bp nucleotides, respectively.
Table 1. Summary statistics of mapping report, generated by Illumina Genome Analyser
sequencing
Statistics
Number of
reads (Count)
Average length of reads
(bp)
Total bases
Reads
60,985,472
64.4
3,927,761,087
Mapped to reference
54,813,065
64.76
3,549,512,020
Unmapped
6,172,407
61.28
378,249,067
Reference sequence*
17 (count)
6,415
109,067
Paired reads
42,720,746
256.68
*Reference sequence was taken from NCBI database.
SNP variation across the starch related candidate loci
To evaluate the capacity of MPS to detect new variants in starch synthesizing enzyme/genes
pools, a comprehensive experiment by Illumina GA platform on 17 different rice starch
related genes was conducted. Table 2 summarises the information on newly discovered
variation on studied genes. Seven classifications of starch related enzymes which impact
starch structure and quality, such as ADP-glucose pyrophosphorylase (AGPase), granule
33
bound starch synthase (GBSS), starch synthase (SS), branching enzyme (BE), debranching
enzyme (DBE), starch phosphorylase (PHO) and glucose phosphate translocator (GPT) were
pool sequenced. The details of each gene member and their detected polymorphism are as
follows:
ADP-glucose pyrophosphorylase (AGPase), Starch phosphorylase (PHO) and Glucose
phosphate translocator (GPT) gene families
These enzymes/genes reside at the top of the starch bio-synthesis pathway and are classified
as the starting point to grain starch production. Glucose is first activated by the addition of
ADP by AGPase which then becomes the substrate for starch synthases enzymes. There are
several gene/isozymes in this classification but AGPS2b has the highest expression level in
rice endosperm (Hirose et al., 2006).
AGPS2b (small subunit)
The role of this subunit in starch granule synthesis has been identified by way of its
association with rice shrunken mutants (Kawagoe et al., 2005). A dramatic inhibition of
starch synthesis has been observed in AGPase-deficient rice mutants and some other species
and results in increased soluble sugars, a large number of underdeveloped granules, small
grains and pleomorphic amyloplasts (Rolletschek et al., 2002).
In total, 30 SNPs and 4 Indels were found across the population for this gene. None of them
caused an amino acid change, suggesting this gene has little impact on starch quality in this
population. However, the number of SNPs found was significantly higher than those
previously reported in rice databases (Table 2).
34
SPHOL (alpha 1,4 glucan starch phospholrylase)
This gene is generally considered to be involved in starch degradation but recent studies
suggest some important roles in starch biosynthesis. Although its precise mechanism and
influence is still not well known, the mechanism appears to be associated with
phosphorylation of some starch-related enzymes and proteins such as starch branching
enzymes (SBEs) and starch synthase (SSIIa) (Tetlow et al., 2004). In total, five and seven
non-functional SNPs and Indels were found in this gene, respectively. The SNP rate was
lower than that reported in databases (Table 2 and 3).
GPT1 (Glucose-6-phosphate translocator)
GPT1 strongly expressed in endosperm. This gene is believed to be responsible for import of
essential carbon substrates such as Glc6P into the plastids during the grain development
(Fischer and Weber, 2002; Jiang et al., 2003). There were 16 SNPs found, one of which
causes an amino acid change and 8 Indels. A ´C/T´ SNP at reference position of 1188
changes an amino acid from Leu to Phe (Leu42Phe). This is a conservative non-polar amino
acid substitution (L→F) and therefore might not significantly alter protein activity. However,
this is a new functional SNP which has not previously been reported in databases.
35
o
Table 2. Total polymorphism detected across the 17 starch quality related genes.
NC-Number
Genbank No#
Gene ID in NCBI
Gene ID in
OryzaSNP@MSU
database
Gene
Chr No
AGPS2b, [ADPglucose
pyrophosphorylase
(small Unt)
SPHOL (alpha 1,4glucan
phosphorylase)
GPT1 (Glucose-6phosphate/
phosphatetranslocator
GBSSI
8
GBSSII (expressed in
leaf)
SSI
SSIIa
Average
coverage×
Length
Assembeled by
Illumina (bp)
Number of variants
Detected by GA
Number of SNPs in
OryzaSNP@MSU database
SNPs
In/dels*
†High quality
19,203
5,635
30
4
4
†Perlegen+
Machine
learning
12
3
29,561
7,489
5
7
11
18
N/A
8
21,042
4,073
16
8
3
6
1
6
32,486
3,480
1
1
8
13
1
7
31,260
8,049
4
8
1
1
1
6
6
32,707
12,708
7,750
4,420
73
31
8
1
46
5
67
12
N/A
22
2
22,494
5,323
3
2
4
16
N/A
8
19,203
11,263
83
15
20
52
23
Number of Functional
(amino acid changes)
1
NC_008401
Os08g0345800
LOC_Os08g25734
2
NC_008396
Os03g0758100
LOC_Os03g55090
3
NC_008401-2
Os08g0187800
LOC_Os08g08840
4
NC_008399-1
Os06g0133000
LOC_Os06g04200
5
NC_008400
Os07g0412100
LOC_Os07g22930
6
7
NC_008399-2
NC_008399-3
Os06g0160700
Os06g0229800
LOC_Os06g06560
LOC_Os06g12450
8
NC_008395-1
Os02g0744700
LOC_Os02g51070
9
NC_008401-3
Os08g0191500
LOC_Os08g09230
SSIIb (expressed in
leaf)
SSIIIa
10
NC_008397-2
Os04g0624600
LOC_Os04g53310
SSIIIb
4
38,300
8,624
26
11
9
24
7
11
NC_008394-1
Os01g0720500
LOC_Os01g52250
SSIVa
1
16,497
10,480
27
6
19
25
5
12
NC_008399
Os06g0726400
LOC_Os06g51084
BEI
6
30,255
7,258
18
6
3
14
1
13
NC_008397
Os04g0409200
LOC_Os04g33460
BEIIa
4
15,958
2,265
6
N/A
N/A
N/A
1
14
NC_008395
Os02g0528200
LOC_Os02g32660
2
25,491
10,900
53
17
4
5
3
15
NC_008401-1
Os08g0520900
LOC_Os08g40930
BEIIb (Amylose
extender)
Isoamylase1 (DBE)
8
37,526
6,592
0
9
2
16
N/A
16
AC132483
LOC_Os05g32710
Isoamylase2 (DBE)
5
20,373
2,403
16
N/A
7
10
9
17
NC_008397-1
OSJNBa0014C03
.3
Os04g0164900
LOC_Os04g08270
Pullulanase (DBE)
4
30,067
10,399
109
10
54
109
1
116,403 bp
501
113*
200
399
75
Total
N/A
* The lower stringency used for in/dels as follows: Min Cost 2; Min Inser: 2; Similarity 0.7.
†SNPs in OryzaSNP@MSU database detected and analysed using the Perlegen model-based method as well as a machine learning method. Totally, over 158,000 high
quality SNPs have been identified in the rice genome by these two technologies.
36
Granule bound starch synthase (GBSS) gene family
This family of genes is responsible for production of the amylose component of starch in
plants.
GBSSI (Granule bound starch synthase I)
GBSSI or the waxy gene is one of the most important genes involved in starch synthesis and
influences cereal grain quality, particularly in rice. The major role of GBSSI on amylose
content is well known and several SNPs associated with starch quality have been
characterized in rice (Chen et al., 2008b). Previous studies have shown that three SNPs, one
each at the intron/exon 1 boundary, exon 6 and 10 have the most significant impact on starch
quality (Cai et al., 1998). Only one functional ´A/C´ SNP was detected at position 1086 of the
reference sequence and corresponds to the previously reported exon 6 SNPs and causes a
Tyr→Ser substitution at position 224 of amino acid. This substitution is non-conservative,
changes the polarity of the amino acid and the function of GBSSI, enzyme activity and
amylose content. Larkin and Park, (2003) have suggested this SNP effects amylose content.
One non-functional Indel was also found in this gene. The sole SNP detected in GBSSI in
this population compared to the eight SNPs retrieved from OryzaSNP database indicates
there has been significant selection pressure imposed on this locus in this population during
the course of breeding. A multiplex SNP verification experiment was conducted to validate
the data (Masouleh et al., 2009). The results showed that only this SNP, with very low
frequency, exists in this population. The breeding selection criteria applied to this population
have somehow restricted the polymorphism of the GBSSI gene in this population. The Ka/Ks
data also suggests GBSSI is a gene under purifying selection in this population. The highest
Ka/Ks ratio of 2.00 was calculated for this gene (Table 3).
37
GBSSII (Granule bound starch synthase II)
This gene/enzyme is predominantly expressed in leaf, leaf sheaths, culm, and pericarp tissue
at a low level, particularly during pre-heading and 1-3 days after flowering (Ohdan et al.,
2005). The impact of GBSSII on elongation of amylose in non-storage tissues of cereals has
been confirmed (Vrinten and Nakamura, 2000). GBSSII is found exclusively bound to starch
granules in green tissues and synthesises amylose which is subsequently consumed by the
plant or accumulated in the endosperm (Dian et al., 2003).
There were 4 SNPs and 8 Indels identified, one of which occurred at coordinate 1638 of the
reference sequence and altered a Leu to Serine at position 523 of GBSSII. This A/G SNP
changes the polarity of the amino acid and hence may impact the activity and function of the
protein. All Indels were detected in introns.
Starch synthase (SS) family
This gene/enzyme family is primarily involved in the production of the amylopectin
component of starch in plants.
SSI
This protein is presumed to be expressed in the endosperm and leaf of rice (Fujita et al.,
2006). The transcript level of SSI is higher in endosperm than leaf sheaths and blades and has
therefore been classified as an endosperm and non-endosperm expressing gene (Hirose et al.,
2006). The measurement of SSI transcript levels at different seed developmental stages found
high expression at 1-3 DAF, peaking at 5 DAF, and remaining almost constant during starch
synthesis in endosperm, suggesting SSI is the major SS form in cereals (Cao et al., 1999).
38
Table 3. The polymorphism analysis of starch-related candidate genes in rice
N
o
Gene/Enzyme symbol
Gene/Enzyme name
Gene coordinates in Genbank
Total
Polymorphism
rate
6.033
Non-synonymous
SNP rate
SNP per Kb
Indel per Kb
Ka/Ks ratio
15760599- 15,754,206
Length
Assembeled by
Illumina (bp)
5,635
1
AGPS2b
2
SPHOL
ADP-glucose pyrophosphorylase (small
unit)
alpha 1,4-glucan phosphorylase
0.000
5.323
0.709
1.00
32,183,093-32,190,581
7,489
1.602
0.000
0.667
0.934
1.00
3
GPT1
Glucose-6-phosphate/ phosphatetranslocator
Granule Bound Starch Synthase I (Waxy
gene)
Granule Bound Starch Synthase II
5,138,640- 5,142,712
4,073
5.892
0.245
3.928
1.964
0.50
4
GBSSI
1,764,623- 1,769,657
3,480
0.574
0.287
0.287
0.287
2.00
5
6
GBSSII (expressed in
leaves)
SSI
13,584,483- 13,576,435
8,049
1.490
0.124
0.496
0.993
2.00
Starch Synthase I
3,078,060- 3,085,809
7,750
10.451
0.000
9.419
1.032
0.11
7
SSIIa
Starch Synthase IIa
6,747,562-6,751,981
4,420
7.239
1.357
7.013
0.226
1.00
8
Starch Synthase IIb
32,125,071- 32,119,749
5,323
0.939
0.000
0.563
0.375
0.25
9
SSIIb (expressed in
leaves)
SSIIIa
Starch Synthase IIIa
5,351,108- 5,362,370
11,263
8.701
2.04
7.369
1.331
2.40
10
SSIIIb
Starch Synthase IIIb
32,149,493-32,158,120
8,624
4.290
0.811
3.014
1.275
1.33
11
SSIVa
Starch Synthase IVa
31,786,842- 31,797,321
10,480
3.148
0.477
2.576
0.572
1.20
12
BEI
Branching Enzyme I
31775431- 31,782,688
7,258
3.306
0.137
2.480
0.826
0.400
13
BEIIa
Branching Enzyme IIa
20,260,837- 20,265,349
2,265
2.649
0.441
2.649
0.000
2.00
14
Branching Enzyme IIb
20,213,965- 20,224,864
10,900
6.422
0.275
4.862
1.559
0.285
15
BEIIb (Amylose
extender)
ISA1 (DBE)
Debranching Enzyme- Isoamylase 1
25,981,756- 25,988,347
6,592
1.378
0.000
0.000
1.365
1.00
16
ISA2 (DBE)
Debranching Enzyme- Isoamylase 2
23,596- 25,998
2,403
6.658
3.745
6.658
0.000
0.70
17
PUL (DBE)
Debranching Enzyme- Pullulanase
4,399,980- 4,410,318
10,399
11.443
0.096
10.481
0.961
1.00
Ka/Ks ratio: The proportion of non-synonymous (Ka) relative to synonymous (Ks) can reveal whether a gene has been under purifying, neutral or diversifying selection. The data for calculation of Ka/Ks (number of
Ka and Ks) can be found in columns R and S of Appendix 1. Values in column R shows the SNPs exist in the coding region (Marked as CDS or mRNA) and S column shows the number on nsSNPs. The total
polymorphism rate calculated as: TSI  TL  100 where, TSI=Total number of SNPs and Indels and TL is the total length of each candidate gene. The functional or non-synonymous SNPs rate calculated as:
NS  TL  1000 where, NS= Number of non-synonymous SNPs in each locus and TL is the total length of each candidate gene.
39
A comprehensive analysis of mutant rice with a retrotransposon inserted into the SSI
encoding gene revealed SSI has a capacity for the synthesis of chains with DP8-12 with the
extension of smaller chains (Nakamura, 2002). This gene has a very small phenotypic effect
on rice eating quality although a significant negative correlation between the ratio of short
chains (DP 6-12) and gelatinization temperature has been reported (Umemoto et al., 2008).
There were 73 SNPs and 8 Indels detected in this gene. No functional SNP/Indels were
found, in comparison with two amino acid changes that have been reported in the OryzaSNP
database.
SSIIa
SSIIa is known to have a major affect on starch quality. This gene is expressed in the
endosperm at very high levels and presumably affects amylopectin structure (Craig et al.,
1998; Morell et al., 2003). The effect of this gene on cooking quality and starch texture has
clearly been revealed (Umemoto et al., 2008; Umemoto et al., 2004). The gelatinisation
temperature (GT), alkali disintegration and eating quality of rice starch have been explained
by polymorphism of two SNPs, [A/G] and [GC/TT], within the exon 8 of alk loci (Umemoto
and Aoki, 2005; Waters et al., 2006). In total, 31 SNPs and 1 Indel were detected in this gene
which was significantly higher than those reported in OryzaSNP database (12 SNPs).
Surprisingly, 22 SNPs out of 31 were functional and introduced an amino acid change as
determined by CLC Workbench. SNP distribution analysis revealed 80% of these low
frequency SNPs (25) were located at the beginning of the reference sequence, starting from
coordinates 13 to 553, and bringing about 17 amino acid changes. This high SNP rate may be
associated with inefficient PCR and consequent low coverage (45×-129×) of this GC rich
region (Appendix 1). Re-sequencing verified only four SNPs between coordinates 13-553,
with a minimum frequency of 8-10% for the minor allele. Taking the high false positive rate
40
into account, a total of nine SNPs and one Indel (six amino acid changes) were identified in
this gene (Appendix 1). Three single nucleotide 3-allelic SNPs [G/T/A] and a G/T SNP at
positions 72, 77, 81 and 87 of the reference sequences respectively are new polymorphisms.
Of the three single nucleotides 3-allelic SNPs, one ´G/T/A´ SNP is presumed to cause the
most critical amino acid substitution of Arg26Met, Lys which induces a polar to non-polar
alteration in the protein.
SSIIb
It is believed SSIIb is a low level early expressed gene which is primarily expressed in sink
and source leaf blades and sheaths (leaf specific) at an early stage of grain filling (Hirose and
Terao, 2004). However, a recent study presented evidence it contributes to six other starch
genes to alter some RVA (Rapid Viscosity Analyser) parameters in glutinous rice (Yan et al.,
2010). Only three SNPs and two Indels were found, both of which were non-functional,
indicating this gene does not affect phenotypic variation of this population.
SSIIIa
The highest rate of polymorphismin in terms of amino acid changes was observed in this
gene. In total, 83 SNPs and 15 Indels including 23 non-synonymous and nine synonymous
substitutions were detected, indicating this is the most diverse gene in our population.
Previous findings have detected 52 SNPs. Of 23 non-synonymous substitutions, 10 are amino
acid changes which alter polarity and may produce significant changes in the protein
structure (Appendix 1). The SSIIIa encoding gene is highly expressed in the endosperm,
although some reports revealed its expression on green tissues (Dian et al., 2005). A recent
study of amylopectin chain length in a SSIIIa deficient mutant suggests SSIIIa plays an
important role in the elongation of amylopectin B2 to B4 chains. Furthermore, in these
41
mutants, the amylose content and the extra long chains of amylopectin increased by 1.3- and
12-fold, due to an increase in GBSSI activity (Fujita et al., 2007). Conversely, no functional
effect of SSIIIa differentiation was observed on RVA parameters, at least between glutinous
cultivars (Yan et al., 2010).
SSIIIb
SSIIIb is mainly expressed in endosperm but transient expression in leaf sheaths and leaves
has also been reported (Hirose et al., 2006). This might be due to the existence of two
divergent groups of SIIIb in rice that are expressed in different tissues (Dian et al., 2005). It
has also been classified into two different categories on the basis of timing of expression in
the developing seed. In late expression category the gene expressed in the mid to later stage
of grain filling (Hirose and Terao, 2004), and in early expressing category the transcript level
usually increases to peak, at 3-5 days after flowering (Ohdan et al., 2005). An association
study of rice glutinous near-isogenic lines suggested SSIIIb has a significant impact on RVA
parameters such as peak time and pasting temperature (Yan et al., 2010).
In total, 26 SNPs and 11 Indels were found in this gene. No functional Indels were detected
in this gene and of the seven amino acid changes; three changed the polarity of the amino
acids, Thr1176Ala, Glu634Gly and Ser756Ile.
SSIVa
SSIVa is one of the least well known starch genes in plants. Like most starch synthase genes,
SSIVa is exclusively involved in amylopectin biosynthesis. Expression analysis with reverse
transcription PCR has indicated SSIVa is preferentially expressed in rice endosperm and to a
degree in leaf blades as a late or steady expresser gene during grain filling (Hirose and
Terao, 2004). QTL mapping and expression profile analysis have shown that high
42
temperature during the grain filling can considerably increase the transcription level of SSIVa
up to 1.11-fold, which is considerably higher than the other starch synthase genes, with a
general expression level range of 0.8-to 1.2 (Yamakawa et al., 2007), and may contribute to
grain chalkiness (Yamakawa et al., 2008). SSIVa may also affect some secondary RVA
parameters such as breakdown and setback (Yan et al., 2010). Of 27 SNPs identified, five
were non-synonymous and six intronic Indels. Only one SNP modified amino acid polarity, a
'C/T' SNP at coordinate 4048 of the gene nucleotide sequence induced a Gly708ASP
substitution.
Starch Branching enzymes (SBEs)
Starch branching enzymes (SBEs) determine the structure of amylopectin by breaking α(1→4)-linkages in existing chains and attaching the released reducing ends to C6 hydoxyls,
forming the elongated and branched glucan, amylopection (Tetlow et al., 2004). The
nucleotide polymorphisms of different isoforms of branching enzymes were studied and the
results are as follows.
BEI
BEI is mainly expressed in the endosperm. Biochemical observations with purified BEI from
maize endosperm indicate that BEI preferentially branches amylose-type polyglucans and has
a high capacity for branching less branched α-glucans (Takeda et al., 1993).
Analysis of the catalytic properties of BEI has indicated the N- and C-termini play a critical
role in chain length transfer and substrate preference (Kuriki et al., 1997). BEI transcript
levels increase rapidly 3-5 days after flowering. A rice BEI deficient mutant induced by
mutagenesis exhibited modified amylopectin structure and grain morphology but the same
quantity of starch as the wild type (Satoh et al., 2003) and the BEI encoding gene also effects
43
the RVA profile (Yan et al., 2010). The maize sugary gene arises from a mutation in the
maize BEI encoding ortholog (Boyer and Preiss, 1978). In total, 18 SNPs and 6 Indels were
found in this gene, one of which is non-synonymous ´C/T´ SNP which alters Gly607ASP
which is potentially very important as it changes the polarity of the amino acid.
BEIIa
BEIIa is a leaf expressed gene involved in amylopectin synthesis. BEIIa is also expressed in
the endosperm but at levels 10-fold lower than in leaf tissue (Gao et al., 1997). Variation in
this gene/enzyme may have a significant influence in rice starch properties, considering that
BEIIa is preferentially expressed along with at least one important starch synthesis gene
expressed in leaf and endosperm (both tissue expressing genes) (Hirose et al., 2006). An
association study including the gene and RVA properties demonstrated a low F value (6.60)
with a very slight influence in glutinous rice (Yan et al., 2010). Application of antibodyspecific BEIIa has demonstrated this protein is present in both soluble and granule bound
forms in developing wheat endosperm (Rahman et al., 2001). In total, six SNPs were detected
including a non-synonymous ´T/G´ which causes a Tyr140Ser substitution, with no polarity
alteration. No SNP/Indel has been previously reported for this gene, suggesting BEIIa might
be one of the most conservative starch-related genes in rice.
BEIIb
A relatively high variation rate of 6.422 was detected for this important gene (Table 3), which
is also known as amylose extender (ae) in maize and other cereals (Yun and Matheson,
1993). Many studies have reported the significance of this gene on starch properties on
various plant species (Fisher et al., 1993; Sun et al., 1997; Sun et al., 1998). This is a granuleand soluble- associated enzyme which is only expressed in the endosperm. Expression of
44
three different functional maize SBE genes in BE-deficient yeast strains demonstrated that
the presence of BEIIb is necessary to activate BEI and BEIIa (Seo et al., 2002). A recent
association study has determined very high F value of 11.12 between SSIIb and RVA
properties in rice (Yan et al., 2010). Additionally, a 0.5- to 0.7 fold decrease in the expression
of BEIIb (amylose extender) during grain filling creates chalky rice (Tanaka et al., 2004).
There were 53 SNPs, three of which were non-synonymous, and 17 Indels were found in
amylose extender. No functional polymorphism were recorded in the available databases but
three non-synonymous SNPs ´C/T´ (Val403Ile), ´C/T´( His196Arg) and ´C/A´ (Leu97Val)
were detected here, none of which changed amino acid polarity.
Debranching Enzymes (DBEs)
DBEs belong to α-amylase family of which two classes exist in plants, Isoamylase and
Pullulanase. These enzymes debranch (hydrolase) α-(1-6)-linkages in amylopectin and
pullulan. Defective DBEs in plants are thought to be responsible for accumulation of
phytoglycogen rather than starch, and in turn, change the phenotypic appearance of the
endosperm (Bustos et al., 2004).
ISA1
In wheat, the expression of ISA1 cDNA was highest in the developing endosperm and
undetectable in mature grains. This suggests a fundamental biosynthetic role of Isoamylase 1
in plant starch, although the precise roles of DBEs are not yet known (Tetlow et al., 2004).
The regulation of ISA1 gene at the transcriptional level during grain filling of rice in response
to high temperatures has been reported (Yamakawa et al., 2008). In rice endosperm, antisense
inhibition of Isoamylase 1 has altered the structure of amylopectin and the physiochemical
properties of starch (Fujita et al., 2003). The ISA genes are also presumed to have some sort
45
of contributions to the degree of setback on glutinous rice cultivars (Yan et al., 2010). No
functional polymorphism was found in this gene in the studied population. Only 9 Indels in
the intronic regions were detected. This suggests that this gene has no or minimum effect on
variation in starch properties in this population.
ISA2
The existence of this type of Isoamylase was first reported in maize endosperm (Doehlert and
Knutson, 1991). It was suggested two isoamylase isoforms I and II exist in maize endosperm
which were distinguishable by anion-exchange chromatography. On the basis of enzymic
characteristics, the sugary1 (su1) protein corresponds to the isoamylase II form in the maize
endosperm, (Beatty et al., 1999). Association between ISA2 and rice grain quality is
unknown. There is no intron in this relatively small gene (2625 bp), thus each detected
SNP/Indel can be potentially important. The polymorphism rate was significantly high about
0.66. There were 16 SNPs including nine non-synonymous SNP and no Indels in the ISA2
gene. Three of the non-synonymous SNPs altered the polarity of amino acids as follows: T/C,
C/A and T/G at coordinates 960, 1712 and 2067 of reference sequence which cause
Thr482Ala, Arg231Leu and Thr113Pro substitutions, respectively.
Pullulanase (PUL)
In rice endosperm a defect in pullulanase-type DBE activity triggers and modulates some
phenotypic effects (Nakamura et al., 1998). In maize endosperm, it is believed that
pullulanase has a dual role, contributing either to starch synthesis or degradation (Dinges et
al., 2003). Kubo et al. 1999 suggest pullulanase plays a predominant and essential role in
amylopectin synthesis and compensates shortages of isoamylase activity in the construction
of multiple cluster structure of amylopectin. A recent association study between pullulanase
46
and RVA profile parameters in glutinous rice has shown strong relations of this gene with
peak viscosity, hot paste viscosity, breakdown viscosity and peak time (Yan et al., 2010). The
highest polymorphism rate (1.14) was seen in pullulanse, where in total, 109 SNPs and 10
Indels were detected. This number of SNPs exactly equals the number already reported in
OryzaSNP database. In our population, only one non-synonymous SNP was detected at
coordinate 2319 of the reference which substitutes a Ser to Asn at position 217 of the protein.
This alteration might not be very influential as it does not change the polarity of the
molecule.
Distribution of SNPs across the loci
Distributions of detected polymorphism and coverage patterns of short reads across the length
of candidate genes indicated no specific correlation among 17 studied loci (Fig. 2). Some
genes such as SPHOL and GBSSI exhibited similar distribution patterns. However, there
were no associations among the patterns of different genes. Based on the distribution
patterns, it can be concluded that most of candidate genes have shown higher polymorphism
rate in the median intron/exon regions rather than UTR ends.
Ka/Ks ratio; "purifying" vs "diversifying" genes)
The proportion of non-synonymous (Ka) relative to synonymous (Ks) SNP can reveal
whether a gene has been under purifying, neutral or diversifying selection. The Ka ⁄ Ks ratio
has been created to classify candidate genes into two main categories of ‘purifying’ and
‘diversifying’ genes. Under neutral conditions of evolution, at the amino acid level, Ka
should equal Ks and hence the ratio Ka ⁄ Ks = 1. Any deviation from this score shows the
selection pressure on genetic structure of population or candidate genes. A Ka ⁄ Ks ratio < 1
indicates negative (purifying) selection, while positive (adaptive) selection is indicated when
47
Ka ⁄ Ks > 1. This indicator was applied to assess diversity of samples in the database (20
diverse cultivars) and the Australian population of 233 genotypes.
GBSSI and GBSSII were classified as highly conservative genes which are being passed
through the adaptive phase by the rice breeder’s artificial selection pressure as they showed a
low polymorphism rate and high Ka/Ks ratio. Some genes such as AGPS2b, SSIIa and
SPHOL had a Ka/Ks ratio of one which means they probably have not been under significant
selection pressure.
Discussion
This MPS analysis of rice starch metabolism candidate genes identified a relatively high
SNP/Indel variation at all loci. In total, 501 SNPs and 113 Indels were detected in
comparison with 399 SNPs that are already available in the public domain. No Indels are
recorded in public databases such as OryzaSNP. Out of 501 SNPs, 75 SNPs (~14.9%) were
non-synonymous leading to amino acid changes. All Indels resided in the intronic regions and
so no obvious functional Indels were found. The highest and lowest polymorphism rates were
observed in Pullulanase (11.443) and GBSSI (0.574), respectively (Table 3). The low
polymorphism rate in important GBSSI gene is one of the surprising results of this study. A
possible cause is the source of this population which was an Australian rice breeding
population. Indica cultivars do not grow in temperate Australia because the day length is not
suitable and so indica cultivars are rarely used as parents. Therefore, one of the waxy gene
(GBSSI) SNPs had a very low frequency (<0.05) which was confirmed by Sequenom resequencing (Chapter 6). The low frequency of SNPs in some genes can also be attributed to
the negative selection pressure (purifying) imposed by breeders within this population.
Massively parallel sequencing combined with LR-PCR ensured high sequence depth in terms
of the number of candidate genes and number of samples at all studied loci (Pettersson et al.,
48
2009). Among the numerous elements involved in the MPS, amplification efficiency and
pooling strategy are the most important parameters.
The error rate of BioRad iProofTM High-Fidelity DNA polymerase is low, 4.4×10-7, which is
approximately 50-fold lower than normal polymerases, and the extension efficiency is high,
5-30s/kb, four times faster and thus makes the PCR faster, making it the enzyme of choice.
However, establishment of an efficient long range PCR for large genomic regions can be
costly and time consuming (Ingman and Gyllensten, 2008). To solve this problem, semi-long
range PCR (SLR-PCR) which is generally more robust and saves time and cost of primers
can be used. Preparing an optimal SLR-PCR will increase the performance of MPS and must
be established before pooling genomic DNA samples.
The error rate of the GA Illumina is reportedly about 0.5-1.0%. In this experiment 233 DNA
samples were pooled. This means that discovery of only one SNP (variant) out of 233 will
make the SNP frequency ~0.43% which is lower than the reported error rate. A two step
pooling strategy and high coverage depth reducde the risk of false SNP detection. These two
strategies significantly overcome the effect of GA Illumina platform error.
Figure 1 displays the coverage for all loci, starting from 12000×. The raw data shows that the
maximum coverage in some regions reaches to 240,000× (data not shown). This high
coverage has significantly neutralised the error rate.
Out et al., 2009 have discussed the correlations between allele frequencies, pool size,
coverage depth and error rates in GA Illumina. They demonstrated that a coverage depth of
25000× would be enough for detection of SNP frequencies on or above 0.3%.
Currently, there is much interest in applying the Illumina GA platform to targeted sequencing
of specific candidate genes, particularly for finding SNPs in a large number of individuals in
the targeted populations (Hodges et al., 2007). An incorrect pooling strategy is another
important issue that may be encountered in generating and analysing data. DNA samples
49
from different individuals may not be amplified with the similar efficiency in PCRs, creating
random bias. To rectify this issue, smaller pools (20) were tried which minimised the chance
of biased amplification of target regions. Using this strategy, rare SNPs which occurred at a
frequency lower than 1% were detected.
Coverage is also critical. It is believed 20-fold coverage is sufficient for accurate SNP
detection (Dohm et al., 2008). Even coverage is also highly desirable. Given this is the case,
with average coverage of 90 ×, reasonable SNP data from the beginning of SSIIaH1 fragment
should have been obtained, but only 18.5% of the observed SNPs in this region were
validated. This can be attributed to the difficulties encountered in amplifying this high GC
region.
The main reason for different coverage patterns is still unknown (Mardis, 2008b). The
highest peak was observed for positions 4885 of BEIIa with 239,019 × coverage. However,
coverage only affects the accuracy of SNP frequency and not the number of discovered SNPs
(Morozova and Marra, 2008). Ingman and Gyllensten (2008) studied the effect of different
pooling strategies and coverage levels to evaluate SNP frequencies of pooled and un-pooled
individuals in a ~17 kb region and they found that all SNPs, including low frequency (not
under 0.4% ) can be detected at coverage levels above 500 ×. They suggested that for pooled
PCR products, 50 × coverage would be sufficient for SNP frequencies on or above 4%. The
very high coverage obtained enabled discovery of rare SNPs with frequencies lower than
0.5%. Sequencing errors are common in NGS and sequencing errors are easily confounded
with low frequency SNPs if the minimum number of reads is too low (Futschik and
Schlotterer, 2010). The high level of coverage for all candidate genes enabled us to recognise
rare SNPs effectively.
It has been demonstrated that allelic variation of amino acids and structure of proteins
correlate with the effect of natural selection seen as an excess of rare SNPs which affect
50
actual phenotypes (Sunyaev et al., 2000). The distribution of genetic variation in 17 starch
candidate genes indicates they have been selected for starch properties. The pressure of
natural selection can significantly influence the extant pattern of genetic variation (Akey et
al., 2002; Barreiro et al., 2008). In this study, the total polymorphism rate and distribution
pattern indicate that the candidate genes have been subjected to selection by breeders, as
some important genes with high impact on starch properties such as GBSSI and ISA1 have
shown unusually low levels of polymorphism. Artificial selection during the breeding
program has had a major influence on genetic variation of population studied. These changes
in population structure mainly occur due to narrowing the gene pool, and changing the
balance between genetic drift and population size during the breeding process.
Appendices: Chapter 2
Appendix 1: Full list of discovered SNP/Indel is 17 studies starch related genes.
Appendix 2: Full list of Australian breeding lines (population) and their pedigree
information.
Appendix 3: Target genes and sequence of gene-specific LR-PCR primers.
Appendix 4: SNP/Indel distribution and short read co
51
CHAPTER 3
Bioinformatic tools assist screening of functional SNPs in plants: rice
GBSSI as a model
Summary
Granule Bound Starch Synthase I (GBSSI) influences cereal grain quality and is one of the
most important plant genes. Using GBSSI as a model, a number of different computational
tools and programs were used to explore the functional SNPs and the possible relationships
between genetic mutation and phenotypic variation. A total of 51 SNPs/indels were retrieved
from databases, including three non-synonymous SNPs, namely those in exons 6, 9 and 10.
Sorting Intolerant from Tolerant (SIFT) results showed that a candidate [C/A] SNP (ID:
OryzaSNP2) in exon 6 (coordinate 2494) is most likely the most important non-synonymous
SNP with the highest phenotypic impact on GBSSI. This SNP alters a tyrosine to serine at
position 224 of GBSSI. Computational simulation of GBSSI with the Geno3D suggested this
mutant SNP creates a bigger loop on the surface of GBSSI and results in a shape different
from that of native GBSSI. A potential transcriptional binding factor site (TBF8) which has
one [C/T] SNP [rs53176842] at coordinate 2777 at the intron 7/exon 8 boundary site
according to Transcriptional Factor (TF) Search analysis might have an effect on regulation
and function of GBSSI. Combining SNP mining data and in silico structural analysis of
GBSSI is a computational pathway which can be applied for other plant genes.
Introduction
Single nucleotide polymorphisms (SNPs) are the most common and simplest type of genetic
variation in organisms. SNPs occur at a frequency of approximately one in a thousand base
pairs in the human genome (Brookes, 1999) and one in every 170 bp in rice (Yu et al., 2002).
52
Although SNPs can be found everywhere throughout the genome, such as gene promoter
regions, coding sequences, and intronic sequences, most of them are probably located in
intergenic regions, most of which are believed to be stable without any deleterious effect. The
occurrence of human disease and evolution (Shastry, 2002), as well as many important traits
in plants (Bryan et al., 2000; Kennedy et al., 2006; Edwards et al., 2007), can be attributed to
the presence of SNP.
SNPs can be categorized and named based on their location and function. For example, SNPs
within the coding regions (cSNPs) of functional genes which introduce amino acid sequence
variations are called non-synonymous SNPs (ns SNPs) and are of major interest. Those SNPs
which occur in the coding sequences, but do not change amino acids are called synonymous
SNPs. However, most SNPs occur in intronic regions. Study of these SNPs is also important
because of their influence on gene expression which can occur through different molecular
pathways such as changing regulatory elements, splicing patterns, up and down regulation of
exonic splice enhancers (ESE), intronic splice enhancers (ISE) and so forth (El Sharawy et
al., 2006).
Understanding the functional effect of SNPs is a major challenge. SNPs that lead to a single
amino acid substitution, stop codon or frame shift mutation are normally recognized as
functional and are easily detected. An experimental-based approach can provide the strongest
evidence for the functional role of genetic variations. Consequently, many different types of
SNP assays have been applied for experimental prioritization of SNPs (Chen and Sullivan,
2003). However, owing to the lack of reliable genotype and phenotypic data, these
experiments are not always easy to set up for characterizing the real effect of SNPs. For
example, functional analysis of SNPs in important plant genes needs a segregating population
or breeding lines, such as near isogenic lines (NILs) (Umemoto et al., 2008; Mikami et al.,
53
2008). On the other hand, many genes may have a vast number of intronic SNPs that cannot
be easily associated with in vivo variation of plant populations.
Previous studies have focused on non-synonymous SNPs of human disease genes (George
Priya Doss et al., 2008; Rajasekaran et al., 2008). Here, a plant-relevant computational
pipeline was developed which covers most of important functional elements at the DNA level
in addition to non-synonymous SNPs. The model gene chosen to identify and prioritize
substitutions was Granule Bound Starch Synthase I (GBSSI), a major and well characterised
gene affecting the amylose content of rice grain (Chen et al., 2004). Different computational
algorithm tools including Sorting Intolerant from Tolerant (SIFT), Exonic Splicing Enhancer
Finder (ESE Finder), Transcriptional Factor search (TF search) and Exonic Splicing Silencer
Search (FAS-ESS), were used to prioritize the candidate SNPs most likely to affect the
encoded protein and, subsequently, amylose content and rice grain quality.
Materials and methods
GBSSI encoding gene as a case study
Granule Bound Starch Synthase (GBSSI) was analysed as an example to provide a guide for
further confirmatory experimental studies for this or other genes. This gene is well known in
most cereals for its affect on amylose content (Chen et al., 2008a), pasting properties (Chen et
al., 2008b) and eating quality (Umemoto et al., 2008).
Sequence alignment
GBSSI coding DNA and mRNA sequences for Oryza sativa L. were downloaded from
GenBank (http://www.ncbi.nlm.nih.gov/nuccore/297423). (GenBank locus number for
genomic DNA is X65183.1 and NM_001063239.1 for mRNA.).
54
Nucleotide coordinates of 1765922 - 1769401 on chromosome 6 (LOC_Os06g04200) were
extracted from the Rice Genome Annotation Project at Michigan State University
(http://rice.plantbiology.msu.edu/LocusNameSearch.shtml).
The total sequence lengths of 5035 bp, 1830 bp and 609 amino acids were recognized in
genomic, cDNA and GBSSI protein, respectively.
SNP dataset
SNP dataset for the GBSSI gene (Figure 1) was retrieved from the NCBI database (Sherry et
al., 2001) at (http://www.ncbi.nlm.nih.gov/sites/entrez?db=snp&TabCmd=Limits) for the
relevant chromosome range (gene coordinates) and then checked with the SNP dataset in
Oryza SNP Consortium (http://oryzasnp.plantbiology.msu.edu/cgibin/find_snps_in_genes.pl) by using the following TIGR gene ID: LOC_Os06g04200. The
extra DNA length of 2 kbp from each end of the coding region was also searched for the
possible existence of SNPs in the 3´ and 5´ UTRs. Final alignment was carried out by
Sequencer 4.6 software (Ann Arbor, MI) and ClustalW2
(http://www.ebi.ac.uk/Tools/clustalw2/index.html) to
identify exact location of SNPs in UTR
or intronic/exonic regions. Five different functional classes of SNPs were selected to cover
the entire gene region, as follows: (1) non-synonymous coding (2) intronic (3) coding
synonymous (4) locus region (5) 5´and 3´ UTRs.
Computational tools for SNP analysis
Several computational software programs were applied to predict the actual or possible
impact of SNPs on plant phenotypes, as follows: (1) UTRScan (2) TF Search (3) SIFT:
Sorting Tolerant from Intolerant (4) GeneSplicer (5) SEE-ESE (6) FAS-ESS (7) Geno3D (8)
PDB viewer and (9) RasMol.
55
3D Modelling of GBSSI and comparative study
The native and mutant structure of GBSSI was modelled by Geno3D software
(http://geno3d-pbil.ibcp.fr/cgi-bin/geno3d_automat.pl?page=/GENO3D/geno3d_home.html).
This program predicts 3D structures of proteins and enzymes based on amino acid gene
sequences. This program is capable of extracting 3D structures of very similar proteins from
different databases (specifically, PDB) and then modelling the query sequence using
available structure, which, for the GBSSI gene, has the PDB identification number of 3D1J.
The modelled structure can be validated by PROCHECK (Laskowski et al., 1993).
Table 1. List of SNPs extracted from dbSNP and OryzaSNP consortium sorted by
chromosome base position from 5´ UTR region.
SNP No
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
SNP ID
rs20225948
rs53176842
rs54208215
rs52851700
rs53723422
rs53846809
rs53666816
rs53675396
rs54192177
rs53646676
rs54248596
rs54113008
rs53274252
rs54167626
rs53460774
rs53001101
rs53728166
rs53922988
rs54148480
rs53616087
rs54179168
rs54262002
rs53810836
rs53561561
rs53120489
rs52893679
rs53682342
rs54228853
rs54192742
rs53932573
rs53273159
rs53746900
SNP/indel
[A/C]
[C/T]
[G/T]
[A/T]
[A/G]
[A/T]
[A/C]
[C/T]
[-/ATAT]
[-/T]
[-/C]
[A/G]
[C/T]
[C/T]
[C/T]
[A/G]
[-/A]
[A/T]
[C/T]
[A/G]
[C/T]
[A/T]
[A/G]
[C/T]
[A/C]
[C/T]
[-/TT]
[-/TTC]
[-/G]
[G/T]
[A/G]
[A/T]
*IUPAC Nucleotide symbol
*Symbol
M
Y
K
W
R
W
M
Y
in/del
in/del
in/del
R
Y
Y
Y
R
in/del
W
Y
R
Y
W
R
Y
M
Y
in/del
in/del
in/del
K
R
W
Position
5´ UTR
Intron7
Exon8
Intron8
Intron8
Intron8
Intron8
Intron8
Intron8
Intron8
Intron8
Intron8
Intron8
Intron8
Intron8
Intron8
Intron8
Intron8
Inton8
Exon9
Exon9
Exon9
Exon9
Exon9
Intron9
Intron9
Intron9
Intron9
Intron9
Intron9
Intron9
Intron9
56
†Coordinate (bp)
157
2777
2845
2902
2910
2912
2913
2927
2931
2958
2971
2973
2983
2987
2989
2992
2994
3004
3007
3023
3056
3059
3065
3235
3264
3266
3268
3269
3280
3282
3285
3291
Functional class
UTR
Intronic
Coding-synonymous
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Coding-synonymous
Coding-synonymous
Coding-synonymous
Coding-synonymous
Coding-ns
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
† Coordinates from the beginning of genomic DNA(Genbank accession: X65138).
Table 1. Continued
SNP No
SNP ID
SNP/indel
*Symbol
Position
33
34
35
36
37
38
39
40
41
42
43
44
45
rs53843894
rs54317579
rs54396762
rs53235554
rs53562846
rs53208426
rs53304553
rs53124355
rs53134907
rs54228170
rs53693780
rs54363195
rs53781551
[-/C]
[A/C]
[G/T]
[-/GA]
[-/C]
[A/T]
[-/GAA]
[A/G]
[A/G]
[A/C]
[A/C]
[A/T]
[G/T]
in/del
M
K
in/del
in/del
W
in/del
R
R
M
M
W
K
Intron9
Intron9
Intron9
Intron9
Intron9
Intron9
Intron9
Intron9
Intron9
Intron9
Intron9
Intron9
Exon10
†Coordinate
(bp)
3292
3294
3295
3297
3298
3300
3311
3350
3352
3357
3360
3365
3422
46
47
48
OryzaSNP1
OryzaSNP2
OryzaSNP3
[G/T]
[C/A]
[C/T]
K
M
Y
Intron 1
Exon 6
Exon 9
246
2494
3212
49
50
51
OryzaSNP4
OryzaSNP5
OryzaSNP6
[A/-]
[G/-]
[C/T]
in/del
in/del
Y
Intron 9
Intron 9
Exon 10
3304
3309
3486
Functional
class
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Codingsynonymous
Intronic
Coding-ns
Codingsynonymous
Intronic
Intronic
Coding-ns
*IUPAC Nucleotide symbol
† Coordinates from the beginning of genomic DNA(Genbank accession: X65138).
Figure 1: The structure of GBSSI encoding gene. Blue boxes represent exons.
57
Distribution of SNPs in Waxy gene
intronic
78%
UTR
csSNPs
intronic
nsSNPs
csSNPs
14%
UTR
2%
nsSNPs
6%
Figure 2. Distribution and percentage of SNPs in GBSSI encoding gene. csSNP: coding
synonymous; nsSNPs: coding non-synonymous SNPs; UTR: 5´ Untranslated regions
Comparative studies were also carried out by Swiss PDB viewer (http://spdbv.vitalit.ch/download.html) and RasMol (http://openrasmol.org) based on superimposed structure
and homology analysis of native and mutant protein (Rajesh et al. 2008).
Functional flow chart
A flow chart was prepared for computational analysis and prioritization of SNPs based on
their functionality and possible impact on plant phenotypes (Figure 3).
58
Figure 3. Computational pipeline for in silico analysis of functional SNPs.
59
Results
SNPs in GBSSI and comparative study
A total of 51 SNPs and in/dels, were extracted from databases consisting of the following:
one in the 5´ UTR, three in coding non-synonymous, seven in coding synonymous and forty
in the intronic sequences (Table 1) (Figure 2).
Computational algorithm tools
The following computational tools were used consecutively for comprehensive functional
analysis of the GBSSI encoding gene:
UTR Scan
UTR Scan (http://www.ba.itb.cnr.it/BIG/UTRScan/) identifies patterns of regulatory region
motifs from the UTR database and gives information about important elements in the 5´ and
3´ UTRs, including whether the matched pattern is damaged (Pesole and Liuni 1999).
One regulatory element was found in the 3´UTR. No functional element was recognized by
UTR Scan in the 5´ UTR region of the GBSSI gene. One ´A/C´ SNP [rs20225948] was found
in the non-regulatory element of 5´UTR. Since the number of SNPs in the UTR regions of the
GBSSI gene was limited to one only, with none in the regulatory regions, it may be presumed
that GBSSI UTRs do not change protein expression level.
TF Search
Two of the most important functional elements in plant genomes are transcriptional factors
(TFs) and transcriptional factor binding sites (TFBSs). These sites are usually short DNA
sequences, around 5-15 bp, where the TF elements bind to them to begin transcriptional
60
process involving RNA polymerase and the promoter. Occurrence of any mutation in these
regions can alter motifs and possibly transcriptional patterns (Bulyk, 2004).
Table 2. Transcriptional factor binding sites in GBSSI gene as distinguished by TF Search
program
No TFBs Sequence
Coordinates (gDNA) Score *SNP/Indel
1
TTCTAATTATTTGA
560-573
86.2
N/A
2
TCCAACCAA
741-749
85.1
N/A
3
GCGGTCGGT
1591-1599
85.1
N/A
4
GAGGTAGGA
1939-1947
86.0
N/A
5
ATGGTTGGA
2358-2366
85.6
N/A
6
AGCTACCTG
2639-2647
86.5
N/A
7
AACTACCAG
2654-2662
87.4
N/A
8
CAGGTTGCT
2777-2785
85.6
C/T
9
TCCTACCAG
2804-2812
89.6
N/A
10
TCAAATAATTAGAA
3652-3665
86.2
-/TAA
11
TCATTGTTAAATAT
4566-4579
86.8
N/A
12
ATATTTAACCAAAT
5198-5211
86.8
N/A
*SNP/indels which can be involved on TFBs and their function.
Various experimental and computational approaches have been used to identify genomic
locations of transcription factor binding sites, particularly in higher eukaryotic genomes
(Sinha and Tompa, 2002). This algorithm tool is capable of recognizing transcriptional
binding sites of genes if a non-coding SNP alters the transcription factor binding site of a
gene (Heinemeyer et al., 1998). This program can be reached at:
(http://www.cbrc.jp/research/db/TFSEARCH.html).
A total of twelve TFBSs were recognized in the GBSSI gene. The scoring scheme is very
straightforward in this version of TF Search (v 1.3), ranging from 85.1 to 89.6, where the
highest score is associated with the importance of TFBSs. A [C/T] SNP (rs53176842) was
found in TFBS number 8 (TFB8) which begins from position 2777 at the junction of intron 7
and exon 8 (end of intron 7 and beginning of exon 8) (Table 2). This [C/T] SNP at position
2777 (5´ end of TFBS8 sequence) potentially has the highest impact on transcriptional factor
61
binding sites in this gene. It should be noted that, in human, the strongest selective pressure
was detected for proteins involved in transcription regulation (Ramensky et al., 2002).
SIFT (Sorting Tolerant from Intolerant)
Each amino acid substitution has the potential to affect protein function. SIFT is a Web-based
program which predicts whether an amino acid substitution affects protein function based on
sequence homology and the physical properties of amino acids (Ng and Henikoff, 2003).
SIFT focuses on non-synonymous SNPs and can be applied to spontaneous occurrence and
laboratory-induced point mutations. SIFT is based on the premise that important amino
acids will be conserved among sequences in a protein family. As a consequence, changes at
amino acids conserved in the family should affect protein function (Ng and Henikoff, 2002).
There is a standard tolerance index of ≥ 0.05 in this program, and a separate index/value is
devoted to each amino acid position. The values above this threshold (gnomon) are assumed
to have lower impact on plants. The lower values (<0.05) are indexed as important, with
higher phenotypic impact. In fact, amino acids with probabilities < 0.05 and >0.05 are
predicted to have higher and lower impact, respectively. This program can be found at:
http://sift.jcvi.org/www/SIFT_seq_submit2.html.
Of 51 investigated SNPs, two have already been verified to be non-synonymous with
functional impact on waxy protein (in exon 6 and 10) (Chen et al., 2008 a, b; Larkin and
Park, 2003). The impact of other SNPs, in exon 9, has not been characterized and its
functional impact needs to be confirmed by association analysis (Table 1 and 3). These
nsSNPs at positions 2494, 3486 and 3235 can cause functional mutations at amino acids 224,
415 and 370 respectively (Table 3). Among these, the tyrosine to serine (Y→S) substitution
at amino acid 224, which arose from C/A SNP at exon 6, has the highest impact on GBSSI
(Figure 4) and corresponds to lowest possible SIFT score, less than 0.05 (Table 3).
62
These SNPs were predicted to be tolerated by SIFT analysis with index of 1.00 that indicates
these SNPs have smaller effects such as reduction (or increase) of amylose content and
endosperm quality.
SNP
No
47
Table 3. nsSNPs predicted to have functional significance by SIFT.
SNP ID
Position
OryzaSNP2
24
51
Amino acid
substitution
Y→S
SIFT Score
(tolerance)
*0.00
Impact on
protein
†
C/A
Coordinate
(protein)
224
High
Confidence of
prediction
High
3235
C/T
370
A→V
1.00
Low
High
3486
C/T
415
P→S
1.00
Low
High
SNP
Exon 6
Coordinate
(gDNA)
2494
rs53561561
Exon 9
OryzaSNP6
Exon 10
* Substitutions with scores under of 0.05 are predicted to be Not Tolerated
†
The confidence of predictions have been calculated based on default median conservation value of
3.0
GeneSplicer
Splicing is post-transcription modification of RNA in which introns are removed and exons
are joined to form mature mRNA. Splice sites SNP or in/dels may lead to a truncated or
mutant protein (El Sharawy et al., 2006). GeneSplicer
(http://www.cbcb.umd.edu/software/GeneSplicer/gene_spl.shtml) is a computational tool
which predicts splice sites in DNA sequence (Pertea et al., 2001). Although splice sites in the
GBSSI encoding gene have been identified, this tool was used to identify the exact location
of exon-intron boundaries and the possible existence of SNP/indel in unusual donor-acceptor
sites which might change GBSSI structure. A maximum 2 bp deviation was found in
predicting splicing sites by this software. This deviation probably results from the tendency
of this software to recognize alternative splicing patterns. Only one putative SNP was
recognized at the exon1-intron1 junction at position 246. The functional effect of this SNP on
waxy protein has already been reported (Cai et al., 1998).
63
SEE ESE (Sequence Evaluator of Exonic Splicing Enhancers)
Exonic Splicing Enhancers (ESE) are prevalent in plant sequences and normally promote
exon recognition and inclusion. These sequences have been identified in several plant genes
and reside at variable distances from splice sites. Although such splicing enhancers have been
identified in both exons and introns, exon splicing enhancers are generally better
characterized and are probably more common.
Table 4. Exon/intron boundaries in rice GBSSI gene recognized by GeneSplicer
and existence of possible SNPs/Indels
ID
5´ Donor
Acceptor 3´
Confidence
Exon/Intron1
Exon/Intron2
Exon/Intron3
Exon/Intron4
Exon/Intron5
Exon/Intron6
Exon/Intron7
Exon/Intron8
Exon/Intron9
Exon/Intron10
Exon/Intron11
Exon/Intron12
Exon/Intron13
Exon/Intron14
120
1408
1861
2048
2245
2433
2593
2781
3016
3372
3789
4086
4285
4772
246
1749
1942
2150
2335
2502
2692
2896
3256
3550
3981
4174
4416
4889
High
High
High
High
High
High
High
High
High
High
High
High
High
High
*The confidence score must be higher than 12
*Confidence
score
14.09
12.22
14.45
13.75
16.98
16.79
18.47
21.16
19.92
14.13
18.43
14.07
20.53
23.96
Deviation (bp)
0
0
1
1
0
2
1
2
0
0
1
1
0
1
SNP/Indel
Position
[G/T] 246
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
Most of the ESE candidates are hexamers, and the most important candidates are highlighted
by this software whenever they overlap the three 9-mers, GAAGAAGAA, CGATCAACG
and TGCTGCTGG, which have been found to be very effective ESEs in plants (Tacke and
Manley, 1999). Occurrence of SNPs in these regions may generate aberrant mRNAs that are
either unstable or code for defective, truncated or deficient protein isoforms.
Sequence Evaluator for ESEs (SEE ESE)
(http://www.cbcb.umd.edu/software/SeeEse/index.html) was applied to locate conserved
motifs represented by these hexamers in exonic regions near splice sites in GBSSI genes.
64
Although a total of 17 potential ESE motifs were found, no SNP/indels were distinguished in
these regions.
FAS-ESS (Systematic identification and analysis of exonic splicing silencers)
Exonic splicing silencers (ESSs) are cis-regulatory sequences (elements) in exons or introns
that either inhibit the use of adjacent splice sites, often contributing to alternative splicing
(AS), or promote exon skipping. Bioinformatics analyses suggest that these ESS motifs play
important roles in suppression of pseudo-exons, in splice site definition, and in AS (Wang et
al., 2004).
Table 5. List of Exonic Splicing Silencers in GBSSI gene recognized by FAS-SEE program
ESS ID
*ESS Coordinates
ESS Sequence
SNP ID
*SNP Position
SNP
ESS1
244-250
AGGTATA
OryzaSNP1 246
[G/T]
ESS2
2492-2497
TTATGG
OryzaSNP2 2494
[C/A]
ESS3
2986-2992
TCGTTCA
rs54167626
2987
[C/T]
ESS4
2986-2992
TCGTTCA
rs53001101
2992
[A/G]
ESS5
3061-3066
TCCTGG
rs53810836
3065
[G/A]
* All coordinates and position calculated based on GBSSI genomic DNA (Genbank accession
number X65183)
Underline-highlighted nucleotides show the position of SNPs.
FAS-hex2 set (http://genes.mit.edu/fas-ess/) found of 113 predicted domains, 77 were located
in SNP high density regions which are exons/introns 8, 9 and 10. FAS-ESS analysis
identified five exonic splicing silencer sites in the GBSSI gene (Table 5). The most important
silencer elements are probably ESS2, ESS1 and ESS3, respectively, because they contain
SNPs in coding regions or exon-intron splicing sites. ESS2 in exon 6 that has a putative nonsynonymous [C/A] SNP (ID: OryzaSNP2) which is responsible for a Y→S change which
may have significant effects on GBSSI protein characteristics.
65
Simulation for finding functional, constructive changes of ns-coding SNPs
Protein secondary and tertiary structure of molecules can alter function and activity (Fersht,
1985; Hrmova and Fincher, 2001). Modelling of GBSSI is the final stage of functional
analysis. These computational algorithms can recognize the impact of different nsSNPs by
simulation and comparison of native and mutant molecular structures. Results from SIFT
analysis were included at this level. Out of three SNPs which were non-synonymous, only
´C/A´ SNP at exon 6 [OryzaSNP2] was recognized as important and the other two SNPs in
exons 9 and 10 were found to have lower impact by SIFT analysis (Table 3). This nsSNP is
associated with the Y→S amino acid substitution at position 224 of GBSSI. The 3D
modelling was performed by Geno3D. This software identifies the most similar structure to
the query amino acid sequence and simulates a 3D protein automatically. Based on a Geno3D
search of different protein databases, the structure for GBSSI has a PDB id: 3D1J. The 3D1J
is a glycogen synthase, and the 3D crystal structure of this protein has been elucidated in E.
coli (Buschiazzo et al. 2004; Sheng et al. 2009). It is thought that synthesis of storage
polysaccharides in bacteria and plants is fulfilled by a similar ADP glucose-based pathway
(Ball and Morell, 2003). The exact location of the exon 6 SNP was detected by SWISS PDB
viewer. Figures 5a and 5c show the exact position of this high impact substitution in the
modelled protein and the Y→S substitution at residue 224 may change the shape, stability
and, in turn, the activity of protein (Figures 5b and 5d).
66
Figure 4. ClustalW2 alignment of native (WxN) and mutant (WxM) GBSSI. Y→ S, A →V
and P→S substitutions were found at residues 224, 370 and 415 corresponds to SNPs
number 47, 24 and 51; at exons 6, 9 and 10, respectively. SIFT analysis found the [C/T] SNP
at exon 6 had the highest impact on protein structure.
67
Discussion
Computational algorithms are useful and cost-effective tools for analysis of SNPs and genes.
Since the emergence of new high-throughput technologies to sequence the whole genome of
plants (Henry, 2008), it is not possible to recognize all functional SNPs in a pool of
sequencing data which contains neutral SNPs. Assessment of functional SNPs can be
performed by phylogenetic comparison (George Priya Doss et al., 2008), such as the study of
statistical correlation with residue substitution. Recently, SNP-linkage disequilibrium and
association studies, which need accurate phenotypic data of appropriate populations, have
gained acceptance as procedures to assess functional SNPs (Carlson et al., 2003). However,
these populations can be difficult to generate (Gupta et al., 2005), and they must have high
variation in the studied traits. The efficiency of computational tools for identification of
functional SNPs in human cancer-related genes such as BRCA1and BRCA2 has been reported
by a number of authors (Rajasekaran et al., 2007; Rajasekaran et al., 2008). Shen et al. (2006)
demonstrated application of in silico analysis tools like SIFT, Polyphen and UTRScan to
recognize SNPs in a cytokine gene that has a known role in human, immune-related diseases.
Based on the success of the latter, a computational screening pathway to prioritize and rank
plant SNPs to recognize their functionality and impact on plant phenotypes was developed.
The results here show there are significant numbers of important elements in the GBSSI gene
and the SNPs have been found within these region and these correlations have already been
demonstrated by Soussi et al. (2006). Four of these elements were found to have the highest
functional effects and these effects appeared to result from the existence of SNPs in these
regions. The non-synonymous [C/A] SNP at exon 6 [OryzaSNP2], for example, has the most
significant effect on amylose content according to SIFT and this has been previously shown
to be the case experimentally (Chen et al. 2008 b).
68
(a)
Tyrosine
(b)
Tyrosine
(d)
(c)
Serine
Serine
Figure 5. The 3D molecular modelling of GBSSI generated by Geno3D and viewed by
Swiss PDB viewer and RasMol software. Arrows indicate the exact location of Y→S
(tyrosine to serine) substitution at residue 224 derived from the most important SNP [C/A] in
waxy protein gene at exon 6 (OryzaSNP2). (5a and 5b) The native GBSSI protein contains
´A´ nucleotide. (5c and 5d). The mutant protein carries ´C´ instead of ´A´. The arrows in
Figures 5b and 5d also indicate a significant difference in the structural loop that occurs in the
substitution region.
SIFT assumes important amino acids will be conserved in the protein family, and so changes
at well conserved, charged or polar residues are predicted to be high impact, or to affect
protein function. If a position in an alignment contains hydrophobic amino acids, then SIFT
assumes this position can only contain or tolerate amino acids with hydrophobic character for
low level effect on protein function and these can be prioritized by SIFT score.
69
The quantitative score of SIFT allows prioritisation of the amino acid changes and to rank
the possible functional effects.
An important feature of this algorithm is the confidence value. Confidence in a high impact
predicted substitution depends on the diversity of the aligned protein sequences and how the
sequences are closely related. Therefore, many amino acid residues will become conserved
and SIFT will predict most of the substitutions to affect the function of protein which leads to
a high false positive or negative error. In fact, a number of functionally neutral substitutions
are predicted as high impact or vice versa (false negative effect). To alert the user to these
situations, SIFT calculates the median conservation value which measures the diversity of the
sequences in the alignment. In SIFT, the conservation is calculated for each position in the
alignment and the median of these values is defined. By default, SIFT builds alignments with
a median conservation value of 3.0. Predictions based on sequence alignments with higher
median conservation values are less diverse and will have a higher false positive error (Ng
and Henikoff, 2003). As the default median conservation value of 3.0 and aligned few
available homologous sequences was used, the highest possible confidence of SIFT in this
study was simply predicted (Table 3).
Based on SIFT analysis the [C/A] SNP at exon 6 [Oryza SNP2] is located at a conserved
region and is a charged or polar residue. Larkin and Park (2003) found two other coding
SNPs, ´C/T´ [OryzaSNP3] and ´C/T´ [OryzaSNP6] at exons 9 and 10 of GBSSI, which have
non-functional and functional effects, respectively (Table 3 and 1). They also verified that
haplotypes composed of SNP at the exon/intron1 boundary site, exon 6 and exon 10 regulate
the GBSSI function. Chen et al. (2008 a, b) have also confirmed that these SNPs can alter the
apparent amylose content and pasting properties of rice.
Since the [C/A] SNP at exon 6 [OryzaSNP2] had the highest possible impact on GBSSI, for
both native (Y) and mutant (S) GBSSI proteins (Y→ S = Tyrosine → Serine at residue 224)
70
were simulated (Fig 5). The superimposed structure of two proteins showed a distinctive
deformed loop at the mutation position in comparison with native structure. This deformed
loop is located at the outer layer (surface) of GBSSI and alters the 3D shape, structure and
function of protein, possibly owing to change in the accuracy of the protein binding site.
Sequence similarity confers structural similarity (Chothia and Lesk 1986; Hegyi and
Gerstein, 1999), but unfortunately, the relationship between a protein’s sequence similarity
and functional similarity is not straightforward (Bork and Koonin, 1998). Exonic Splicing
Silencers have relatively major effects on splicing pattern by recognition of splicing sites
(Wang et al., 2006). Application of FAS-ESS suggested OryzaSNP2 also has silencing action
because it is located in an exonic splicing silencer region although there are no reports of
experimental evidence of alternatively spliced mRNAs or altered protein size in rice plants
with regard to this SNP (Table 5).
The effect of the [C/A] SNP at exon 6 on amylose content and grain quality has been
confirmed by many authors (Sano, 1984; Larkin and Park, 2003; Chen et al., 2008a). FASESS analysis has also suggested another important silencer (ESS1) at the splice site of
exon/intron 1 which has a [G/T] SNP [OryzaSNP1]. Significance of this SNP to reduce
amylose content has already been reported by experimental analysis which found this SNP
decreases activity of GBSSI by alteration of the mRNA splice site (Cai et al., 1997).
Application of TF has also identified an important TBFS, including one [C/T] SNP
[rs53176842] at coordinate 2777, which may potentially have major effect on GBSSI
function.
Conclusion
Most genetic analysis software has been designed for human or animal genetic studies.
Application of a number of programs allowed construction of a computational pathway for
71
SNP analysis in plants. There is a significant relationship between in silico and experimental
results, thus confirming that computational tools can help identify and characterize functional
SNPs. Following Transcriptional Factor (TF) Search analysis, a new [C/T] SNP [rs53176842]
at coordinate 2777 near the boundary site of intron 7/exon 8 was predicted which may have a
major impact on GBSSI and related phenotypes.
72
CHAPTER 4
SNP in starch biosynthesis genes associated with the nutritional and
functional properties of domesticated rice
Summary
Starch is a major component of human diets. The physio-chemical properties of starch
influence the nutritional value of starch and the functional properties of starch containing
foods. Many of these traits have been under strong selection in domestication of rice as a
food. A population of 233 breeding lines of rice was analysed for variation at 110 functional
SNP loci in exonic regions of 18 starch-related genes and the results related to rice pasting
and cooking quality. Associations of 65 functional SNPs were detected. Five genes AGPL2a,
Isoamylase1, SPHOL, SSIIb an SSIVb showed no polymorphism. The GBSSI (waxy gene)
and SSIIa had a major influence on starch properties and the other genes had minor
associations. The ´G/T´ SNP at the boundary site of exon/intron1 in GBSSI showed the
strongest association with retrogradation and amylose content. The TT allele has been
selected in much of the domesticated japonica genepool providing rice with a desirable
texture but less resistant starch with associated human health advantages. The GC/TT SNP at
exon 8 of SSIIa showed a very significant association with pasting temperature (PT),
gelatinization temperature (GT) and peak time. No significant association was found
between SSIIa and retrogradation. Other genes contributing to retrogradation were SSI, BEI
and SIIIa. The highest level of polymorphism was observed in SSIIIa with 22 SNPs but only
limited associations were observed with starch phenotypic values. None of the SNPs were
found to be strongly associated with chalkiness except for a weak link with a ´T/C´ SNP at
position 960 (Thr482 to Ala) in Isoamylase2. These associations provide new tools for
deliberate selection of rice genotypes for specific functional and nutritional outcomes.
73
Introduction
Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation.
Many important plant traits and human genetic diseases are attributed to these sequence
variations (Shastry, 2002; Bryan et al., 2000) either through influencing gene expression or
protein function (Kennedy et al., 2006). Identifying SNP associated with grain starch quality
advances our understanding of the starch bio-synthesis pathway and highlights potential ways
to generate crops with higher yield and better quality, which directly impacts human nutrition
and health.
Starch is mainly composed of amylose and amylopection (Miles et al., 1985). Seven classes
of starch related enzymes with high impact on grain starch structure and quality are known,
including ADP-glucose pyrophosphorylase (AGPase), granule bound starch synthase
(GBSS), starch synthase (SS), branching enzyme (BE), debranching enzyme (DBE), starch
phosphorylase (PHO) and glucose phosphate translocator (GPT in chapter 5). These
genes/enzymes contribute directly or indirectly to the production of starch granules composed
of amylose and amylopectin.
The Rapid Visco Analyser (RVA) is one the most important means of measuring grain
quality parameters (Limpisut and Jindal, 2002). Using 43 gene-specific molecular markers,
Yan et al. (2010) analysed the association of 17 starch synthesis-related genes with RVA
profile parameters in a collection of 118 glutinous rice accessions. They found that 10 of 17
starch-related genes are involved in controlling RVA profile parameters with be most
significant being Pullulanase which plays an important role in the control of peak viscosity
(PKV), hot paste viscosity (HPV), cool paste viscosity (CPV), breakdown viscosity (BDV),
peak time (PKT), and paste temperature (PT) while seven other starch genes had minor
impacts on a few RVA profile parameters. The RVA parameters are controlled by a complex
74
genetic system involving many starch-related genes (Tester et al., 1995). This complexity can
be attributed to many factors such as genetic, epigenetic, environmental and G×E interaction
in studied population (Tester et al., 1995; Morell, 2003).
Granule bound starch synthase (GBSSI) is the most important starch synthesis gene in rice
and other cereal grains. A number of SNPs in rice GBSSI (waxy gene), at the intron/exon 1
junction site, exon 6 and exon 10, have a significant impact on starch quality (Chen et al.,
2008a, b; Cai et al., 1998; Larkin et al., 2003) via their impact on amylose content. Starch
synthase IIa (SSIIa) has a major affect on starch quality through its impact on amylopectin
structure (Craig et al., 1998; Morell, 2003). The effect of this gene on cooking quality and
starch texture has been extensively studied as measured by alkali spreading values,
gelatinisation temperature (GT) and eating quality of rice starch by polymorphism of two
SNPs, [A/G] and [GC/TT] in alk, the gene which codes for SSIIa (Umemoto et al., 2008;
Umemoto et al., 2004; Umemoto and Aoki, 2005; Waters et al., 2006).
Except for GBSSI and SSIIa., there are no reports of associations between SNP and starch
quality parameters with most studies focusing at gene level rather than SNP levels by
undertaking comparisons of gene-deficient mutants (Fujita et al., 2006).
Massively parallel sequencing (MPS) technology is a flexible and high-throughput platform
for genetic analysis and functional genomics which is based on ultra deep sequencing of short
read lengths and a huge number of sequencing reactions (Imelfort et al., 2009). KharabianMasouleh et al. (2011) discovered more than 501 SNPs and 113 In/dels in 17 starch related
genes in an Australian rice breeding population using a combination of a target-pooled long
range PCR and MPS approach, clearly indicating the capacity of high-throughput MPS
technology to discover new SNP variants in plant populations. This technology can be used
in combination with multiplexed-MALDI-TOF (Sequenom) to quickly identify genetic
variation within plant populations and then assign this variation to individual plants and
75
phenotypes, improving the efficiency of marker assisted selection (MAS) (Masouleh et al.,
2009).
Resistant starch has a significant impact on human health (Sajilata et al., 2006). The
incomplete digestion-absorption of non-digestible resistant starch in the small intestine leads
to starch fractions with the physiological functions similar to dietary fibre with significant
beneficial impact (Asp and Björck, 1992). Starch retrogradation describes the hardening of
cooked starch after cooling due to re-crystallization of gelatinized starch components (Fan
and Marks, 1998). There is a significant association between retrograded and resistant starch
and hence, in this study the term retrograded-resistant starch is used. The in vivo digestion
ability and structural features of resistant-retrograded starch with high amylose content in
maize, bean and potato flakes were assessed using the ileal contents of four human
populations (Faisant et al., 1993). The main resistant starch fraction consisted primarily of
retrograded amylose with degree of polymerization of approximately 35 glucose units and a
melting temperature of 150 °C. Likewise, retrograded amylose in peas, maize, wheat, and
potatoes was found to be highly resistant to amylolysis (Ring et al., 1988), suggesting this
fraction had high amylose content. Other characters which may an influence on the rate of
retrogradation, firmness and resilience of rice starch after cooking are protein and lipid
contents (Philpot et al., 2006).
High amylose rice cultivars are characterized by low RVA parameters, high resistant starch
(RS) content and lower estimated glycemic index (EGS) and highly retrograded rice starch
tends to a reduction of hydrolysis index (HI) and glycemic index (Hu et al., 2004). Waxy and
low amylose rice starch is more quickly and completely hydrolysised relative to intermediate
and high amylose rice (Chung et al., 2006; Hu et al., 2004).
In this study a novel SNP in Glucose-6- phosphate translocator 1 (GPT1) gene which is
highly associated with amylose content and retrogradation rate of resistant starch (Chapter 5)
76
is reported. In addition, an explicit-coherent one by one gene approach is established to
unveil association of 18 starch-related genes and their SNP polymorphisms with
physiochemical properties of rice starch.
Materials and methods
Plant materials
Plant material was supplied by Industry and Investment NSW, Yanco Agricultural Research
Institute, Australia. A population of 233 F6 lines selected from in the Australian temperate
(japonica-type) rice breeding program (Appendix 5). Selection was primarily based on
capacity to flower and set seed and the morphological traits of plant height, grain size and
shape. No selection took place for quality traits such as gelatinization temperature and RVA
curve characteristics.
Physiochemical properties
In total, 13 physiochemical traits including four phenotypic and RVA characteristics were
measured. The phenotypic traits consisting of apparent amylose content (AC), gelatinization
temperature (GT), grain chalkiness and retrogradation rate [scored by the Martin test (Philpot
et al., 2006)], were quantified according standard methods.
RVA characteristics such as peak viscosity (PKV), trough viscosity (TV), final viscosity
(FV), breakdown, setback, peak time (PKT) and pasting temperature (PT) were measured by
a Rapid Visco Analyser (Model, City, country) according to the manufacturer’s instructions.
77
Designation of starch-synthesis genes involved in starch metabolize
The available literature was used to identify the most likely candidate genes associated with
rice starch quality (Ohdan et al., 2005; Waters and Henry, 2007; Nakamura, 2002; Hirose et
al., 2006; Rahman et al., 2000). The genetic map and approximate location of genes on
chromosomes are shown in Appendix 8.
The general entries of nucleotide sequences (gDNA) and full-length cDNAs of important
gene classes which are involved in starch biosynthesis were retrieved from the NCBI
(http://www.ncbi.nlm.nih.gov/) and the Rice Genome Annotation Project
(http://rice.plantbiology.msu.edu/cgi-bin/putative_function_search.pl) databases and resequenced using long range PCR and massively parallel sequencing (Illumina® GAII) to find
novel SNPs/Indels in the studied population (Kharabian-Masouleh et al., 2011).
Amplification primers were designed based on consensus sequence alignment of each
candidate gene.
Candidate genes/enzymes for SNP genotyping
In total, eighteen genes representing seven groups of enzymes, namely ADP-glucose
pyrophosphorylase (AGPase), granule bound starch synthases (GBSSI and GBSSII), starch
synthases (SSI, SSIIa, SSIIb, SSIIIa, SSIIIb, SSIVa, SSIVb), branching enzymes (BEI,
BEIIa, BEIIb), debranching enzyme (ISA1, ISA2, Pullullanase), starch phosphorylase
(SPHOL) and glucose phosphate-6- translocator (GPT1) (in Chapter 5) were selected for SNP
genotyping.
SNP dataset
SNP for each gene was retrieved from the previous SNP discovery approach (KharabianMasouleh et al., 2011). The total number of functional polymorphisms discovered in the
78
population was then compared to SNPs available at OryzaSNP MSU database
(http://oryzasnp.plantbiology.msu.edu/) and 59 extra SNPs harvested to ensure that all known
non-synonymous SNP (nsSNP) were assayed.
In total, 110 nsSNPs with possible functional effects (amino acid change) were chosen for
primer design and genotyping, of which 65 were successfully genotyped with different status
as polymorphic or non-polymorphic (Appendix 6). In total, 45 SNPs/primer sets either failed
to genotype individuals or did not exist in the population (such as those extracted from
OryzaSNP databases) and therefore disregarded in the analysis.
Primer design and SNP genotyping
Several multiplexed assays were designed by Sequenom® MassARRAY® Assay design 3.1
software to cover all available SNPs. The optimal amplicon size containing the polymorphic
site in the software was set to 80–120 bp. A 10-mer tag (5-ACGTTGGATG-3) was added to
the 5′end of each amplification primer to avoid confusion in the mass spectrum and to
improve PCR performance (Masouleh et al., 2009).
Capture PCR protocol, primer extension and mass spectrometry
The steps of capture PCR primer extension, resin cleanup and mass spectrometry were
undertaken according to the manufacturer’s (Sequenom® MassARRAY®) instructions.
Association analysis
Assays were constructed for 110 polymorphisms defining each of the alleles of 18 genes
controlling starch quality traits and retrogradation. SNP calls data of genotyped polymorphic
alleles along with phenotypic data then transferred into TASSEL v2.1 (Bradbury et al., 2007)
software to find SNPs association with physiochemical properties. A gene by gene approach
79
was employed to understand association of each individual gene/SNP with target traits. A
comprehensive association analysis including all significantly associated SNPs with starch
properties was accomplished. The latter analysis shows the impact of significant SNPs or
starch quality traits when combined to one another, representing the possible compensating or
balancing effects of polymorphism on final phenotypic value of individuals.
Statistical parameters
Some critical statistics such as F-test, p-value, adjusted p-value and R2 were calculated to
measure associations, while the per-mutation was set to 1000.
Results
65 SNP-assays were designed and 233 individuals genotyped. To avoid complications of
association study, a gene by gene approach was applied to find the impact and possible
linkage of individual genes on physiochemical and quality-related properties of rice grain.
Appendix 6 and 7 represent the identification code; coordinate an association of all studied
SNPs.
AGPS2b (small subunit)
No functional polymorphism was found in AGPS2b suggesting that this gene does not have
any impact on physiochemical properties of grain quality in this population.
80
SPHOL (alpha 1,4 glucan starch phospholrylase)
Five SNPs was retrieved from OryzaSNP database and genotyped, but no functional
polymorphism recognised in this population at all. Therefore, this gene had no effect on
studied traits.
GBSSI (Granule bound starch synthase I)
This gene is the most important gene involved in starch synthesis of rice and other cereal
grains. Association study showed a strong correlation between WAXYEXIN1 (G/T) SNP at
the junction site of Exon1/Intron1 and RVA curve characteristics such as Peak Viscosity
(PKV) and Breakdown. The highest F-value=223.29 in this experiment was observed for this
SNP which shows a significant link to retrogradation rate (Martin test) and amylose content
(F-value=121.52). The R2 value for retrogradation and amylose content were 0.66 and 0.51,
respectively. The second SNP in GBSSI with association on grain properties was
WAXYEX10. This ´C/T´ SNP at coordinate 3486 of gene creates a P→S substitution and has
a very significant association with Trough, Final Viscosity (FV), set back, retrogradation and
amylose content with lower linkage than WAXYEX10. The R2 value for retrogradation and
amylose content was 0.39 and 0.16, respectively.
The latter SNP, WAXYEX6 also revealed some significant association according to
calculated p-values≤ 0.01 but did not show any remarkable F and R2 values which suggest
small control of critical pasting properties. In total, the results indicate that this gene can
solely interpret a significant portion of producing retrograded-resistant starch in rice. Section
3 of Appendix 7 shows a comprehensive result of association study for GBSSI genes. The
data suggest that SNPs WAXYEXIN1 and WAXYEX10 are closely contributing to one
another, while WAXYEX6 has less value in controlling starch properties.
81
GBSSII (Granule bound starch synthase II)
GBSSII is found exclusively bound to starch granules in green tissues and synthesises
amylose. The synthesised amylose subsequently consumed by the plant or accumulated in the
endosperm (Dian et al., 2003). During pre-heading, about 1-3 days after flowering, this
gene/enzyme is expressed in leaf, leaf sheaths, culm, and pericarp tissue at a low level
(Ohdan et al., 2005). The impact of Vrinten and Nakamura, (2000) confirmed the role of
GBSSII on elongation of amylose in non-storage tissues of cereals. One nsSNP found at
position 1638 of this gene and tested for association study with starch physiochemical traits
(Appendix 6). Only one considerable association with pasting temperature (PT) with R2 value
of 0.20 was observed for this SNP, although some minor association also calculated with GT
and Peak time (sect.4 Appendix 7).
SSI
Only one ´T/C´ SNP at position 5153 of this gene showed minor associations with FV, SB
and Martin test (MT), with R2 values of 0.16, 0.11, 0.16, respectively (Appendix 7).
SSIIa
Starch synthase IIa (SSIIa) gene has effects on starch quality, presumably by affecting
amylopectin structure. Two SNPs at positions 631 and 4827-4828 (ALKSSIIA4) were tested
for association (Appendix 6). The effect of [GC/TT] on alkali disintegration and eating
quality of rice starch is already known (Umemoto and Aoki, 2005; Waters and Henry, 2007).
Highly significant asociations were found between SNPs of SSIIa and important
physiochemical properties such as pasting temperature (PT), peak time (PKT), GT and
Breakdown viscosity. A highest F-test value of 199.65 was observed for ALKSSIIA4
[GC/TT] SNP and PT, suggesting this SNP controls PT, PKT and BDV with R2 values of
82
0.642, 0.323 and 0.168, respectively. This SNP has one of the strongest associations with the
physiochemical properties of rice studied in this population (R2=0.642). This suggests the
[GC/TT] SNP at position 4827-4828 of SSIIa as one of the most influential SNP across the
assayed polymorphism. The other G/T SNP at position 631 showed no singnificant
association with any traits.
SSIIb
It is believed SSIIb is a low level early expressed gene, which is primarily expressed in green
tissues and at an early stage of grain filling (Hirose and Terao, 2004). In total, 6 different
SNPs were genotyped in this population (Appendix 6) and no polymorphism was found,
suggesting that this gene has no effect in our study.
SSIIIa
The highest polymorphism was observed in this gene with 22 SNPs in the coding region
causing an amino acid change. Available Polymorphism in this gene showed association with
a number of studied properties such as FV, SBV, PT, M-test, AC, Predicted N, Dif, GT and
chalkiness. However, most of them revealed very low R2 values less than 0.1, indicating that
although they are associated but do not have highly significant effect on physiochemical
properties (sect 8. Appendix 7). Apparently, some SNPs in SSIIIa are highly associated with
GT and M-test. The highest R2 values of 0.243, 0.156, 0.130, 0.113 and 0.113 observed for
GT, M-test, Dif, AC and predicted N, respectively (Appendix 6).
83
SSIIIb
The main effect of SSIIb was observed on pasting temperature (PT). Strong associations were
found between ´T/G´ and ´C/A´ SNPs at positions 7232 and 4543 of SSIIIb with R2 values of
0.315 and 0.225, respectively.
This relatively high R2 value suggest the influence of SNPs in the coding regions of this gene
on PT, although minor association were found with peak viscosity (PKV) and difference as
well. These SNPs alter a Lys→Asn and Ser→Ile at position 207 and 756 of corresponding
amino acid, respectively. This gene can be classified as a major contributor to pasting
temperature, as some of its other SNPs also exhibited significant associations with PT (sect 9.
Appendix 7). Hence, this gene is called the pasting temperature (PT) gene.
SSIVa
SSIVa is one of the least known starch genes expressed in rice endosperm. Our study showed
the impact of this gene on PT and GT. Five SNPs were examined in this gene (Appendix 6),
of which four showed significant association with PT (sect 10. Appendix 7). A relatively high
R2 of 0.259 was observed for the ´A/G´ SNP at position 7160 which influences PT. In
addition, four other SNPs, with R2 values ranging from 0.198-0.222, had an influence on this
property. Considering all the influential SNPs in SSIVa, a large portion of phenotypic
variation of PT in this population of rice is explained by variation within SSIVa. Some minor
association were also observed with GT, PKT, AC and PN. Together, these data suggest
SSIIIb and SSIVa in combination have a very strong contribution to PT.
84
SSIVb
No variation was observed in two SNPs assayed in this population (sect 11. Appendix 7).
BEI
Only one C/T SNP at position 1558 of this gene was discovered (Kharabian-Masouleh et al.,
2011). Nine of 13 studied physiochemical traits were associated with this SNP at medium
level, with the highest R2 values observed for AC, MT, SBV and FV, respectively. The
relatively high R2 values of 0.260 and 0.238 for AC and MT suggests there is a significant
contribution of this gene to amylose content and retrogradation. Minor associations were also
found between this SNP and PV, BDV and FV (sect 12. Appendix 7).
BEIIb
BEIIb is coded by the amylose extender (ae) in maize and other cereals (Yun and Matheson,
1993). Two SNPs were examined in this gene (Appendix 6) but no significant association
was found with starch properties. Previous studies on biochemical analysis of amyloseextender (ae) mutant of rice (Oryza sativa) had revealed the influence of mutation in this
gene on gelatinization properties through the structural alteration of amylopectin by reducing
short chains and degree of polymerization (Nishi et al., 2001). No pleiotropic effect with
other genes such as BEIIa and SI was found, suggesting this is a neutral gene in this
population. The main reason for this inconsistency may be due to nature and minor
significance of SNPs. Most of studies of this gene have focused on mutant populations, where
a large segment of gene has been deleted. Therefore, the results of these experiments are not
comparable and must only be interpreted at gene level and cannot be expanded to naturally
occurred SNPs (sect 14. Appendix 7).
85
Debranching Enzymes (DBEs)
ISA1 (Isoamylase 1)
Two SNPs were retrieved from databases and genotyped. No polymorphism detected in this
population, indicating simply no association with physiochemical properties of rice (sect.15
Appendix 7).
ISA2 (Isoamylase 2)
Variation of two SNPs was assessed in this gene and very minor associations with BDV, PT
and chalkiness. All R2 values were less than 0.1, which indicate very low association with the
variability of associated traits (sect.16 Appendix 7).
Pullulanase (PUL)
A recent association study between pullulanase and RVA profile parameters in glutinous rice
has shown strong relations of this gene with peak viscosity, hot paste viscosity, breakdown
viscosity and peak time (Yan et al., 2010). In this study there were only weak associations
with two SNPs in pullulanase and PT, GT and CHK with R2 values of 0.174, 0.167 and
0.066, respectively. The values above 0.1 present a low degree association and can express a
portion of current variability in this population of rice (sect.17 Appendix 7).
Discussion
SSI transcript level has been measured at different seed developmental stages. A high
expression level reported at 1-3 DAF, peaking at 5 DAF, and remaining almost constant
during starch synthesis in endosperm. This suggests SSI as a major SS form in cereals (Cao et
al., 1999).
86
Neutral genes with no polymorphism or association
In total, 65 SNPs were successfully genotyped in 233 breeding lines (Appendix 6). No
polymorphism was detected for AGPS2b, SPHOL, SSIIb, SSIVb and ISA1. Moreover, there
was no association between BEIIa and BEIIb and any physiochemical properties in rice.
Therefore, seven genes out of eighteen did not contribute to physiochemical properties of this
population. Although there have been no reports of associations between naturally occurred
SNPs in these genes and quality properties, many studies have reported the importance of
these genes in physiochemical properties and quality of starch granules. For example,
Kawagoe et al., 2005 described that AGPS2b subunit plays important role in starch granule
synthesis and associated with rice shrunken mutants. SPHOL is also supposedly involved in
starch degradation and biosynthesis. The mechanism appears to be associated with
phosphorylation of some starch-related enzymes and proteins such as starch branching
enzymes (SBEs) and starch synthase (SSIIa)(Tetlow et al., 2004). As almost all of these
studies have been based on deficient mutants (Rolletschek et al., 2002) and it can be
concluded that massive mutations, such as In/dels, which abolish gene functions have an
impact on soluble sugar content, structure and appearance of starch granules and quality of
endosperm in rice and other species, but SNP may not have any impact on starch quality.
Despite the reported impact of BEIIb (amylose extender) and ISA1 on physiochemical
properties in several cereal species (Fisher et al., 1993; Sun et al., 1997; Sun et al., 1998;
Yamakawa et al., 2008), this study found there was no association any with any
physiochemical properties in rice starch. In rice endosperm, antisense inhibition of ISA1 has
altered the structure of amylopectin and the physiochemical properties of starch (Fujita et al.,
2003). The ISA genes are also presumed to have some sort of contribution to the degree of
setback on glutinous rice cultivars (Yan et al., 2010). No significant association was found
87
between two detected SNPs in BEIIb and quality traits in this population. The contradictory
results can be attributed to the composition or structure of populations. Not all alleles which
affect any one trait may be represented in this population or in a particular population;
different minor genes might have peculiar regulatory roles and impacts which is mediated by
different genetic backgrounds.
Major genes with highly significant associations
GBSSI and SSIIa are major genes involved in some of the most important grain quality
properties such as amylose content and gelatinization temperature. Highly significant
associations were found between GBSSI and retrogradation and amylose content in addition
to more significant relationships with RVA properties such as BDV, SBV and FV. A number
of authors have reported the importance of this gene on starch physiochemical properties of
rice and other cereals, where as SNPs at the intron/exon 1 junction site, exon 6 and 10 in rice
GBSSI (waxy gene) have the most significant impact on starch quality (Chen et al., 2008a, b;
Cai et al., 1998). Larkin and Park, (2003) has already reported a SNP in exon 6 to be effective
on amylose content. This study confirms the T/G SNP at intron/exon 1 junction site has a
major influence on a number of physiological properties.
SSIIa presented very high association with pasting temperature, gelatinization temperature
and peak time. The effect of this gene on cooking quality and starch texture has been
extensively studied (Umemoto et al., 2004; Umemoto et al., 2008). Umemoto and Aoki,
(2005) explained the alkali disintegration and eating quality of rice starch by polymorphism
of two SNPs, [A/G] and [GC/TT]. These SNPs within the exon 8 of alk loci are significantly
associated with gelatinisation temperature (GT) (Waters et al., 2006) and here it has been
confirmed there is a very significant association between the SSIIa exon 8 GC/TT SNP and
pasting temperature (R2=0.642).
88
Contributory genes with low-medium associations
In this study, six genes, GBSSII, SSI, SSIIIa, SSIIIb, SSIVa, BEI had low to medium effects
on the final phenotypic variation of individuals. In fact, SNPs in these genes have shown
significant association with a number of studied characters with low to medium R2 values and
here these genes are termed contributory, where addition of their effects can, in part or full,
represent the phenotypic values. Some of these genes might work with one another to reach a
certain level of phenotypic expression. The effect of contributing genes and how could they
be associated together have widely been studied at gene level (Dian et al., 2003; Fujita et al.,
2006; Hirose et al., 2006; Umemoto et al., 2008). SSIIIb and SSIVa are PT-associated genes
with relatively medium to high level of association with pasting and/or gelatinization
temperatures.
Minor genes with very low associations
Debranching enzymes showed minor influences in this population. Isoamylase was first
reported in maize endosperm (Doehlert and Knutson, 1991). ISA2, is relatively small gene
(2625 bp) with no intron. Therefore, it is presumed each detected SNP/Indel could be
potentially important in this gene. However, no strong association between the two SNPs in
this gene and any physiochemical property was found. However, another debranching
enzyme, pullulanase, had low associations with PT, GT and chalkiness (CHK). A recent
association study between pullulanase and RVA profile parameters in glutinous rice has
shown strong relations of this gene with peak viscosity, hot paste viscosity, breakdown
viscosity and peak time (Yan et al., 2010). However, our results differ from these and may be
attributed to the structure of the population. Minor genes are very population-specific and in
89
each population, different minor genes might contribute to the final phenotypic variability of
physiochemical properties.
Appendices: Chapter 4
Appendix 5: Full list of 233 studied Australian rice genotypes and their pedigree
information.
Appendix 6: Name and characteristics of SNPs genotyped in the rice population.
Appendix 7: The results of association study among 13 physiochemical traits and SNPs of 18
different starch-related genes.
Appendix 8: Linkage map of 17 starch-related genes, showing the approximate each gene’s
chromosomal location.
90
CHAPTER 5
Rice GPT1 SNP associated with resistant-retrograded starch
Summary
Resistant-retrograded starch is widely associated with human health. The highly retrograded
starches of cereals usually have a lower glycemic index (GI) which may be beneficial in
many human diets. Presented here is evidence the GPT1 gene, early in the biochemical
pathway of starch synthesis, encoding the 6-glucose-phosphate translocator enzyme, has a
major influence on resistant starch production in rice. A ´T/C´ SNP at position 1188 of the
GPT1 gene, alters Leu42 to Phe, and is associated with resistant-retrograded starch and
amylose content. The ´T´ and ´C´ alleles produce high and low levels of retrograded starch,
respectively. An association study of 233 genotypes demonstrated a significant correlation
(R2) of 0.57 and 0.36 (P=0.00099) between this SNP and retrogradation degree and apparent
amylose content, respectively. Haplotype and association analysis of this SNP and another
´G/T´ SNP at the boundary site of exon/intron1 in GBSSI gene can explain most of the
variability of retrogradation degree and amylose content in this rice population. These two
SNPs, ´T´ SNP in GPT1 and ´G´ in GBSSI, combine to produce higher levels of resistantretrograded starch and may provide a new tool for deliberate selection of rice genotypes for
specific functional and nutritional outcomes such as resistant-retrograded starch and high
amylose content non-sticky rices.
91
Introduction
Resistant starch is a major contributor to starch quality with a significant impact on human
health (Sajilata et al., 2006). The incomplete digestion-absorption of resistant starch in the
small intestine leads to non-digestible starch fractions with a physiological function similar to
the beneficial impact of dietary fiber in food (Asp and Björck, 1992). On the other hand, the
formation of resistant starch due to retrogradation results in the hardening of cooked starch
after cooling due to re-crystallization of gelatinized starch components leading to loss of
desirable food texture during the storage of some starch containing foods (Fan and Marks,
1998). The staling of bread or the hardening of pasta or rice on refrigeration after cooking are
examples of this process.
The concept of starch retrogradation and appropriate methods to measure and score its rate in
rice has already been described (Philpot et al., 2006). It is believed there is a significant
association between retrograded and resistant starches (Sajilata et al., 2006) and so in this
study the term retrograded-resistant starch in rice is used. The in vivo digestibility and
structural features of resistant-retrograded starch with high amylose content in maize, bean
and potato flakes) were assessed using the ileal contents of different human populations
(Faisant et al., 1993) and it was found resistant starch consisted mainly of retrograded
amylose with a degree of polymerization of approximately 35 glucose units and a melting
temperature of 150 °C. Retrograded amylose in peas, maize, wheat, and potatoes was found
to be highly resistant to amylolysis and digestion (Ring et al., 1988). The factors which might
have a direct or indirect influence on the rate of retrogradation, firmness and resilience of rice
starch after cooking are amylose, protein and lipid contents (Philpot et al., 2006). Highly
retrograded cooked rice has a low hydrolysis index (HI) and glycemic index (Hu et al., 2004)
while waxy and low amylose rice shows more rapid and complete hydrolysis (Chung et al.,
2006; Hu et al., 2004).
92
The Rapid Visco Analyser (RVA) has been widely used to measure grain quality parameters
(Limpisut and Jindal, 2002). Hu et al. (2004) reported that the high amylose rice cultivars are
normally characterized by low RVA parameters, such as peak viscosity (PKV), hot paste
viscosity (HPV) and cool paste viscosity (CPV), with higher resistant starch (RS) content and
lower estimated glycemic index (EGS). Yan et al. (2010) analysed the association of 17
starch synthesis-related genes with the rapid visco analyzer (RVA) profile parameters in a
collection of 118 glutinous rice accessions using 43 gene-specific molecular markers. They
concluded that 10 of 17 starch-related genes are involved in controlling RVA profile
parameters. The association analysis revealed that pullulanase plays an important role in
control of peak viscosity (PKV), hot paste viscosity (HPV), cool paste viscosity (CPV),
breakdown viscosity (BDV), peak time (PeT), and paste temperature (PaT) in glutinous rice.
Alleles associated with starch quality have been characterized. Granule bound starch synthase
(GBSSI) is the most important gene involved in starch synthesis in rice and other cereal
grains. A number of SNPs, one at the intron/exon 1 junction site, exon 6 and 10 in rice
GBSSI (waxy gene) with significant impact on starch quality have been characterized (Chen
et al., 2008a, b ; Cai et al., 1998; Larkin and Park, 2003). Starch synthase IIa (SSIIa) is also
known to have a major affect on starch quality and is exclusively expressed in the endosperm
at very high levels. SSIIa affects the amylopectin structure of starch (Craig et al., 1998;
Morell, 2003). The influence of this gene on cooking quality and starch texture has been
studied extensively (Umemoto et al., 2008; Umemoto et al., 2004). Umemoto and Aoki,
(2005) explained the alkali disintegration and eating quality of rice starch by polymorphism
of two SNPs, [A/G] and [GC/TT], in SSIIa. These SNPs within exon 8 of SSIIa are also
significantly associated with gelatinisation temperature (GT) (Waters et al., 2006).
Although the effect of many starch related genes on grain quality has been widely studied,
there is little know how polymorphisms in starch related genes influence starch quality
93
parameters, except for those reported for GBSSI and SSIIa. In fact, most studies have focused
on comparison of gene-deficient mutants (Fujita et al., 2006) at the gene level rather than the
DNA sequence level, probably due to lack of high-throughput technologies to discover new
variants in widely diverse populations. Emergence of new technologies such as next
generation sequencing and multiplexed-MALDI-TOF technologies has removed the
limitations of traditional sequencing and genotyping methods and improved the efficiency of
SNP-trait analysis in plants.
The Glucose-6-phosphate translocator (GPT) was first isolated from plastid envelope
membranes of maize (Zea mays) endosperm (Kammerer et al., 1998). GPT is a key enzyme
found early in the starch biosynthesis pathway and controls the production of precursors for
starch and fatty acid biosynthesis. Plant genomes normally contain two functional
homologous GPT genes, GPT1 and GPT2, both of which have glucose 6-phosphate
translocator activity in the plastids of non-green tissues and can import carbon in the form of
glucose 6-phosphate. Mutation in the GPT genes of Arabidopsis is disruptive in starch
synthesis and the oxidative pentose phosphate cycle of cereals, which in turn affects fatty
acid biosynthesis and oil accumulation (Niewiadomski et al., 2005; Wakao et al., 2008).
Sequencing 17 starch-related genes in rice using a long-range PCR protocol combined with
massively parallel sequencing discovered a number of novel SNPs in the GPT1 gene
indicating that this gene potentially has an influence on rice starch quality properties. This
study reports a novel SNP in the rice glucose-6- phosphate translocator 1 (GPT1) gene
closely associated with resistant-retrograded starch and amylose content and identifies an
allelic combination with the waxy gene which explains most of the variability in
retrogradation degree and amylose content in rice.
94
Materials and methods
Plant materials
A population of 233 F6 lines from the Australian temperate (japonica-type) rice breeding
program was supplied by Industry and Investment NSW, Yanco Agricultural Research
Institute, Australia. Selections for the capacity to flower and set seed and the morphological
traits of plant height, grain size and shape had been made on this population. No selection had
taken place for quality traits.
Physiochemical properties
Thirteen physiochemical traits including four phenotypic and RVA characteristics were
measured. The phenotypic traits consisting of apparent amylose content (AC), gelatinization
temperature (GT), grain chalkiness and retrogradation rate [scored by the Martin test (Philpot
et al., 2006), were quantified according standard methods.
RVA characteristics such as peak viscosity (PKV), trough viscosity (TV), final viscosity
(FV), breakdown, setback, peak time (PKT) and pasting temperature (PT) were measured by
a Rapid Visco Analyser (Perten RVA 4500, Segeltorp, Sweden) according to the
manufacturer’s instructions.
Designation of starch-synthesis genes involved in starch metabolism
The available literature was used to identify the most likely candidate genes associated with
rice starch quality (Ohdan et al., 2005; Waters and Henry, 2007; Nakamura, 2002; Hirose et
al., 2006; Rahman et al., 2000).
The general entries of nucleotide sequences (gDNA) and full-length cDNAs of gene classes
involved in starch biosynthesis were retrieved from the NCBI (http://www.ncbi.nlm.nih.gov/)
95
and the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/cgibin/putative_function_search.pl) databases. Amplification primers were designed based on
consensus sequence alignment of each candidate gene. SNP/Indels were identified by long
range PCR and massively parallel sequencing (Illumina® GAII) of the starch biosynthesis
genes in this population (Kharabian-Masouleh et al., 2011).
Discovery of novel SNP in GPT1 and SNP genotyping
A ´C/T´ SNP at reference position of 1188 GPT1 was found in this breeding population
which changes an amino acid from Leu to Phe (Leu42Phe). The position and function of a
SNP at the boundary of intron/exon1 of GBSSI (waxy gene) has been well characterized (Cai
et al., 1998; Isshiki et al., 1998). A specific multiplexed mass spectrometry assay
(Sequenom® MassARRAY) was designed for simultaneous genotype analysis of each of
these SNPs according to Masouleh et al., (2009) with modification in sequence of capture and
extension primers (Table 1).
Association analysis
SNP data and phenotypic data were analysed in TASSEL v2.1 (Bradbury et al., 2007)
software to identify SNP associated with physiochemical properties. The input genotypic and
phenotypic files prepared according to Bradbury et al. (2007) and then imported to the
software. The general linear model (GLM) was used for alignment of data with 1000
prematuration.
Results
Two SNPs were genotyped in 233 individuals. To avoid complications in the association
study, a gene by gene approach was applied. The results of the association study for each
96
gene were then related to the physio-chemical properties to find the combinations which
cause the highest and lowest retrogradation degree and amylose content.
Table 1. MassARRAY primers for GPT1 and GBSSI.
Primer
Sequence 5´→3´
GPT1_GA_Ref1188ER
F
R
E†
*ACGTTGGATGGCTTCGGTTTCATCTGTCTC
*ACGTTGGATGTAGTGGTGCAAGGTAGAGTG
AAGGTAGAGTGGTCTGA
GBSSI_EXIN1
F
*ACGTTGGATGGATCGATCTGAATAAGAGGG
R
E
†
*ACGTTGGATGCTGCTTGTGTTGTTCTGTTG
AGGAAGAACATCTGCAAG
*A 10-mer tag, sequence 5´-ACGTTGGATG-3´, was added to the 5´ end of each
amplification primer to avoid confusion in the mass spectrum and improve chain reaction
performance.
†
Extention primer.
GPT1 (Glucose-6-phosphate translocator)
GPT1 is found early in the starch biosynthesis pathway (Fig 2). Theoretically, any
polymorphism in the coding regions or critical domains of genes can influence the starch
properties. GPT1 is strongly expressed in the endosperm and imports the essential carbon
substrates such as Glc6P into plastids during grain development (Fischer and Weber, 2002;
Jiang et al., 2003). A number of SNP/Indels in GPT1 and a novel non-synonymous ´C/T´
SNP at position 1188 of the gene were detected. This SNP generates two alleles that encode
either a Leu or a Phe. The results of this association study revealed a significant association
between this SNP and some physiochemical properties of rice starch. The C/T SNP showed
an association with amylose content, predicted N, difference and set back (Table 2).
However, the strongest association was found between this SNP and retrogradation degree.
The R2 value for this important starch property was 0.58. Apparent amylose content, which is
one of the other critical components of starch, has a very strong association with this SNP
with an R2 value of 0.365. The ´C´ allele in GPT1 results in the lowest degree of
97
retrogradation of about 0.34 while the ´T´ conversely renders the highest value of 2.74 (Fig
1a). Highly retrograded resistant starch releases glucose monomers very slowly, which is
highly desirable in human diets. Fig 1b. also shows how
GPT1 gene
3.5
30
3
25
2.5
20
Amylose content (%)
Retrogradation degree
GPT1 gene
2
1.5
1
15
Series1
10
5
0.5
0
0
C
T
C
(b)
(a)
30
2.5
25
2
1.5
1
0.5
0
(c)
Haplotype Combinations of SNPs in GPT1 and GBSSI
3
Amylose content (%)
Retrogradation degree
Haplotype combinations of SNPs in GPT1 and GBSSI
GPT1
GBSSI
C
T
GPT1
1
T
T
20
15
10
5
GBSSI
G
(d)
0
GPT1
GBSSI
C
T
GPT1
1
T
GBSSI
G
Figure 1. Effect of GPT1 and GBSSI SNPs on retrogradation degree and amylose content
(%). (a) Allele ´C´ in GPT1 represents the low retrogradation rate and (b) low amylose
content where as ´T´ produces the highest values in both studied traits (±SD=0.34 and 1.57,
respectively). (c) Haplotype combination of studied SNPs in GPT1 and GBSSI which creates
high and low retrogradation degree and (d) amylose content (%)(±SD=0.34 and 1.57,
respectively).
98
Table 2. Physiochemical properties associated with ´C/T´ SNP in GPT1. The R2 values show
the portion of total variation explained by SNP in GPT1 gene.
Trait
Peak viscosity (PV)
Break down (BD)
Final viscosity (FV)
Set back viscosity (SB)
Retrogradation degree (Martin-test)
Amylose content (AC)
Predicted N
Difference
F-test
21.1979
37.1798
31.1074
83.2826
292.1427
123.0474
122.5425
103.1431
p-value
7.10E-06
4.97E-09
7.32E-08
5.40E-17
7.10E-42
6.97E-23
8.19E-23
4.97E-20
p-value adjusted†
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
R2
0.090
0.148
0.126
0.280
0.577
0.365
0.364
0.325
*Values<0.01 are significant, †p-values adjusted for multiple tests when the permuation set to
1000 of run.
Figure 2. Simplified biochemical pathway of starch synthesis in cereals. GPT1 directly
affects on the structural fatty acid of amylose and causes high and low resistant-retrograded
starch upon occurrence of ´T´ and ´C´ SNPs, respectively.
different alleles of this gene influence the production of amylose in which, ´C´ and ´T´
generate the lowest and highest amylose content of 17.8 and 24.7%, respectively.
99
GBSSI (Granule bound starch synthase I)
GBSSI is probably the most important gene involved in starch synthesis in rice and other
cereal grains. This association study showed a strong correlation between the WaxyIN1 (G/T)
SNP at the junction site of Exon1/Intron1 and important RVA curve characteristics such as
peak viscosity (PKV), set back and breakdown. This SNP has an influence similar to GPT1
on physiochemical properties of rice starch (Table 3). The association study showed
significant F-values of 223.29 and 121.52 for this ´G/T´ SNP, indicating significant
association with to retrogradation degree (Martin test) and amylose content, respectively.
The R2 values were 0.66 and 0.51 for retrogradation and amylose content, respectively,
suggesting this SNP also controls these important traits and can explain a substantial portion
of variability in this rice population. Individuals with the T allele exhibited the lowest
retrogradation degree of 0.730, whereas the ´G´ allele gave the highly retrograded resistant
starch (2.60) with high amylose content. The range of amylose content for T and G alleles
was 17.5 and 24.5, respectively (Fig 1c and 1d).
Allelic combination of SNPs in GPT1 and GBSSI
An association between allelic combinations of GPT1 and GBSSI to control the
retrogradation degree and amylose content was detected. The T:G GPT1:GBSSI allelic
combination produces the highest amylose content and amount of retrograded resistant starch
(Fig 1c. and 1d). Conversely, the C:T allelic combination (GPT1:GBSSI) produces the lowest
retrogradation and amylose content. Other combinations of T:G and C:T SNPs resulted in
values of 0.73-2.0 and 14.1-22.1% for retrogradation and amylose content, respectively.
Some other starch related genes such as BEI, SSI and SSIIa may also play a complementary
role.
100
Table 3. Physiochemical properties associated with ´G/T´ SNP in GBSSI. The R2 values
show the portion of total variation explained by waxy1 SNP in GBSSI gene.
Trait
Peak viscosity (PV)
Break down (BD)
Final viscosity (FV)
Set back (SB)
Retrogradation degree (Martin-test)
Amylose content (AC)
Predicted N
Difference
F-test
34.346
35.189
15.050
76.273
223.294
121.529
121.542
54.612
p-value
8.87E-14
4.64E-14
7.18E-07
3.89E-26
1.29E-54
9.63E-37
9.56E-04
3.92E-04
p-value adjusted†
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
R2
0.231
0.234
0.115
0.398
0.660
0.513
0.513
0.322
*Values<0.01 are significant, †p-values adjusted for multiple tests when the permuation set to
1000 of run.
Discussion
Resistant starch can play an important role in human nutrition and health. Resistant starch is
digested more slowly than non-resistant starch and releases glucose slowly into the blood
stream, resulting in a low glycemic index (GI). An in vitro enzymatic starch digestion study
showed that there should be a close relationship between resistant starch, amylose content,
retrogradation and glycemic index (Hu et al., 2004). The study revealed that high amylose
rice cultivars, characterized by low major RVA parameters, such as peak viscosity, hot paste
viscosity, and cool paste viscosity, had more resistant starch content and resulted in a lower
estimated glycemic index (Hu et al., 2004). When the retrogradation degree is higher, the
starch is more resistant to digestion and the GI is lower. In this study, a significant association
between a ´T/C´ SNP at position 1188 of GPT1, which alters a Leu42 to Phe, and the
presence of resistant-retrograded starch and high amylose content, at which the ´T´ and ´C´
produces a high and low retrogradation rate, respectively, was found.
It is believed amylose content has a significant influence on retrogradation rate (Hu et al.,
2004) but some studies show that these two important starch properties might work
independently from one another.
101
Table 4. Allelic combinations of GPT1 and GBSSI represent different classifications of
amylose content and retrogradation degree.
Classification
GPT1
GBSSI
Amylose content
(%)
24.45±1.63
*Status
G:G
Retrogradation
Degree (M-test)
2.840±0.73
High AC and Ret
No of lines
in each class
15
Group 1-High
T:T
Group 2-High medium
C:C
G:G
1.577±0.45
22.85±1.08
High-medium AC and Ret
1
Group 3-Low medium
C:C
G:T
1.032±0.22
20.55±1.16
Low-medium AC and Ret
5
Group 4-Low
C:C
T:T
0.679±0.35
17.5±3.12
Low AC and Ret
205
*AC=Amylose content; Ret=Retrogradation; M-test=Martin test. The values presented as Mean±SD.
Panlasigui et al. (1991) revealed that rice cultivars with very similar amylose content have
different digestibility and glycemic index in humans, suggesting that some other mechanisms
such as retrogradation must be involved in the process (Panlasigui et al., 1991). In spite of a
correlation coefficient of 0.70 between amylose content and retrogradation degree in this
study, the conclusion here is these two traits work independently but have some contributing
influences on each other.
Major genes such as GBSSI and SSIIa and their functional SNPs have a major influence on
amylose and amylopectin content in cereals (Nakamura et al., 2005; Umemoto et al., 2004b;
Yamamori et al., 2006). Glucose 6-phosphate/phosphate translocator (GPT1) imports carbon
resources into non-photosynthetic plastids (Kammerer et al., 1998) and it appears to be a key
gene controlling retrogradation degree in rice.
Andriotis et al. (2010) conducted an experiment to determine the importance of GPT1 in
development of embryo of Arabidopsis. They reported a major influence of GPT1 on seed
development, where a strong reduction in activity of this gene resulted in abortion of the
embryo due to ultra-structural and biochemical defects including proliferation of starch
granules. It was proposed GPT1 is necessary for early embryo development because it
catalyses import into plastids glucose 6-phosphate as the substrate for NADPH generation via
the oxidative pentose phosphate pathway (Andriotis et al., 2010). Loss of GPT1 activity in
developing bean embryos has large effects on storage product synthesis (Rolletschek et al.,
102
2007). The same loss or activity variation (occurred by SNPs) in the biochemical pathway of
GPT1 can change the constitution of starch and particularly amylose content, which normally
accounts 20-30% of starch content.
GPT1 is important in fatty acid synthesis of oilseed where oil can account for up to 30-40%
of dry matter. In Brassica species, application of exogenous Glc6P changed the activity of
GPT1, in which uptake and metabolization of Glc6P to fatty acids was altered significantly
through plastidial glycolysis (Eastmond and Rawsthorne, 2000; Hutchings et al., 2005). The
influence of GPT1 on starch retrogradation may be explained by its role in fatty acid
synthesis.
The role of fatty acids and lipids in the helical structure of amylose has long been studied
(Tester and Morrison, 1990). It is thought lipids play a structural role as a core centre
scaffold in holding together the helical architecture of amylose and it has been suggested
amylose content is correlated with lipid content (Morrison, 1988). Philpot et al. (2006)
reported removal of lipids significantly increased retrogradation rate and the firmness of rice
starch gels. They found O. sativa cv Koshihikari grown in Japan had a lower retrogradation
rate relative to O. sativa cv Koshihikari grown in Australia, despite the fact that flour from
both origins were 18% amylose. This can be attributed to the amount of long amylose chains
complexed with lipids. Apparently, the amount of long amylose chains associated with lipid
is greater for the Japanese rice, and the higher lipid content linked to long amylose chains
explains the lower retrogradation in the Japanese rice.
It is possible then that GPT1 influences retrogradation degree via its influence on fatty acid
content rather than directly influencing amylose content. The lipids complex with long chain
amylose and relatively high concentrations of lipid disrupt recrysalisation, lowering the
extent of starch retrogradation.
103
CHAPTER 6
A high-throughput assay for rapid and simultaneous analysis of perfect
markers for important quality and agronomic traits in rice using
multiplexed MALDI-TOF Mass Spectrometry
Summary
The application of single nucleotide polymorphisms (SNPs) in plant breeding involves the
analysis of a large number of samples, and therefore requires rapid, inexpensive and highly
automated multiplex methods to genotype the sequence variants. A high-throughput
multiplexed SNP assay for eight polymorphisms which explain two agronomic and three
grain quality traits in rice was optimised. Gene fragments coding for the agronomic traits
plant height (semi-dwarf, sd-1) and blast disease resistance (Pi-ta) and the quality traits
amylose content (waxy), gelatinization temperature (alk) and fragrance (fgr) were amplified
in a multiplex polymerase chain reaction. A single base extension reaction carried out at the
polymorphism responsible for each of these phenotypes within these genes generated
extension products which were quantified by a matrix-assisted laser desorption ionizationtime of flight system. The assay detects both SNPs and indels and is co-dominant,
simultaneously detecting both homozygous and heterozygous samples in a multiplex system.
This assay analyses eight functional polymorphisms in one 5 μL reaction, demonstrating the
high-throughput and cost-effective capability of this system. At this conservative level of
multiplexing, 3072 assays can be performed in a single 384-well microtitre plate, allowing
the rapid production of valuable information for selection in rice breeding.
104
Introduction
Single nucleotide polymorphisms (SNPs) are the most abundant class of sequence variation
and explain the occurrence of human genetic disease (Shastry, 2002) and many important
traits in plants (Bryan et al., 2000; Kennedy et al., 2006). The high frequency of SNPs in
many plant species, including rice, where comparison of data from japonica and indica
cultivars identified one SNP every 170 bp and one indel every 540 bp (Yu et al., 2002), in
combination with their genome-wide distribution (Garg et al., 1999; Drenkard et al., 2000;
Nasu et al., 2002; Batley et al., 2003), means that they have the capacity to generate highresolution genetic maps (Bhattramakki et al., 2002). The capacity for high resolution means
SNP markers are an attractive tool for gene identification. When identified, causal SNPs are
the perfect markers within marker-assisted selection programs (Gupta et al., 2001; Rafalski,
2002; Batley et al., 2003).
Several techniques have been developed to assay SNPs, including SNP microarray
hybridization-based methods (Rapley and Harbron, 2004) and enzyme-based methods
including those involving the use of DNA ligase, polymerase and nuclease (McGuigan and
Ralston, 2002; Olivier, 2005; Costabile et al., 2006; Gunderson et al., 2006). Other methods,
such as Pyrosequencing (Ahmadian et al., 2000), and PCR based approaches (Hayashi et al.,
2004) including TaqMan® (Livak, 1999) have been designed for SNP and indel detection;
however, they are generally not cost- or time-effective per sample. PCR-based markers are
preferable because they are efficient, cost- effective and require only a small quantity of
genomic DNA for genotyping, and are thus suitable at all stages of plant growth, including
early seedling stages.
An increasing number of genes controlling important traits in plants are being discovered,
and the underlying polymorphisms can be converted into perfect molecular markers. Some
recent examples of perfect markers for important traits in plants include rice fragrance
105
(Bradbury et al., 2005), wheat grain hardness (Morris, 2002), rice blast resistance (Kennedy
et al., 2006) and a range of other disease resistance genes (Jeong et al., 2002), however, each
of these have been single- trait, uniplex assays. Plant breeders often track and select for more
than one trait within any one cross, and as the number of genes which control important traits
expands, the need for rapid, simple, inexpensive, reliable multiplex genotyping methods will
become more urgent (Hayashi et al., 2004).
The objective of this study was to investigate the capability of the multiplex matrix-assisted
laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry system
(Sequenom® MassARRAY®, San Diego, CA, USA) as a high-throughput platform for the
rapid, simultaneous and robust multiplex assay of SNPs responsible for important agronomic
and grain quality traits in rice. In this article, an assay for distinguishing between eight
different important polymorphisms simultaneously in a single 5 μL reaction is reported.
Materials and methods
Genotypes
All plant material was supplied by the Australian Plant DNA Bank
(http://www.biobank.com). Twenty-five commercial rice cultivars were analysed: Amaroo,
Amber, Basmati 370, BL24, Calrose, Calmochi 202, Dawn, Della, Dellmont, Domsorkh,
Doongara, Dragon Eye Ball, Goolarah, Jarrah, Jasmine, Kyeema, Khao Dawk Mali 105,
L202, Langi, Millin, M7, Nipponbare, Opus, Teqing and YRF204.
DNA extraction
Total plant DNA was extracted from individual seedlings at 10 days after germination using a
Qiagen (Valencia, CA, USA) DNeasy Plant Kit, according to the manufacturer’s instructions.
106
Primer design/generation of SNP markers
Capture and extension primers were designed by Sequenom® MassARRAY® Assay design
3.1 software, with the exception of the sd-1DEL primers which were designed by Primer3
(http:// frodo.wi.mit.edu). The optimal amplicon size containing the polymorphic site in the
software was set to 80–120 bp. A 10-mer tag (5-ACGTTGGATG-3) was added to the 5′ end
of each amplification primer to avoid confusion in the mass spectrum and to improve PCR
performance.
Capture PCR protocol
Platinum® Taq DNA Polymerase (Invitrogen, Carlsbad, CA, USA) in a final volume of 5 μL
was used for all capture PCRs. The eight-plex reaction was optimized by testing a number of
capture primer and MgCl2 concentrations in the ranges 0.2–1 μM and 1–3.5 μM,
respectively. Uniplex assays using identical PCR conditions confirmed the results of all
eight-plex experiments. The optimal eight-plex capture PCR consisted of 3–5 ng of template
DNA, 0.5 uL 10 × PCR buffer (InviTrogen), 3 mM MgCl2, 2.5 mM of each deoxynucleoside
triphosphate (dNTP), 5 μM of each primer and 1 unit of Taq polymerase (5 U/μL). The
reactions were heated to 94 °C for 15 min, followed by 45 cycles of amplification at 94 °C
for 20 s, 56 °C for 30 s and 72 °C for 1 min, followed by a final extension at 72 °C for 3 min.
As the sd-1DEL is relatively large, the amplification protocol was modified as follows: 3.75
μL of 10 × PCR buffer (50 mM), 2.25 μL of MgCl2 (50 mM), 2.1 μL of 10 μM primers
(each), 6 μL of dNTPs (2.5 mM), 12 μL of 2 × Enhancer [6% glycerol + 10% dimethyl
sulphoxide (DMSO)], 0.3 μL of Platinum®Taq polymerase and 1.5 μL (5 U/μL) template.
The thermocycling program was 94 °C for 5 min, followed by 45 cycles of amplification at
94 °C for 30 s, 55 °C for 30 s and 72 °C for 1 min, followed by a final extension of 72 °C for
3 min. Finally, 1 μL of PCR product was added to the multiplex test tubes.
107
Shrimp alkaline phosphatase (SAP) incubation
Unincorporated dNTPs were removed by SAP incubation according to the manufacturer’s
(Sequenom, San Diego, CA, USA) instructions.
108
Table 1. MassARRAY markers for eight different functional polymorphisms
Polymorphism
Trait
Capture primers
Extension Primer
Expected polymorphism
sd-1SNP
Semi-dwarf
F:*CGATGTTGATGACCATGGCG
R:*CATCCTCCTCCAGGACGAC
AGGACGACGTCGGCGGC
[C/T]
sd-1Del
Semidwarf**
F:*CACGCACGGGTTCTTCCAG
R:*AGGAGTTCCATGATCGTCAG
GCGACAGCTCCTTCATCTCCTCGC
[C/T/A]
Pi-ta
Blast
resistance
F:*GCTTCTTTCTTTCTCTGCCG
R:* CAAACAATCATCAAGTCAGG
AAGTCAGGTTGAAGATGCATAG
[G/T]
waxyIN1
Amylose
content
F:* GATCGATCTGAATAAGAGGG
R:* CTGCTTGTGTTGTTCTGTTG
CAGGAAGAACATCTGCAAG
[G/T]
waxyEX6
Amylose
Content
F:* ACCTCAACAACAACCCATAC
R:* GATCATCATGGATTCCTTCG
CCCATACTTCAAAGGAACTT
[C/A]
alk3
Gelatinizatio
n Temp
F:* TGTCCTCGAACGGGTCGAAC
R:* CTCAACCAGCTCTACGCCAT
CTTCTGCGGGCTGAGGGACACC
[A/G]
alk4
Gelatinizatio
n Temp
F:* TGACAAGGACCTCCTCGTAG
R:* CGCAAGTACAAGGAGAGCTG
AAGGAGAGCTGGAGGGG
[GC/TT]
fgr
Fragrance
F:* ACCTCAACAACAACCCATAC
R:* GTTAGGTTGCATTTACTGGG
TGGGAGTTATGAAACTGGTA
[TATAT/AAAAGATTATGGC]
* A 10-mer tag, sequence 5´-ACGTTGGATG-3´, was added to the 5´ end of each amplification primer to avoid confusion in the mass spectrum and improve PCR
performance. ** A modified method was applied to amplify this allele
109
Primer extension and mass spectrometry
The remaining assay steps of primer extension, resin cleanup and mass spectrometry were
undertaken according to the manufacturer’s (Sequenom®MassARRAY®) instructions.
Results
Analysis of PCR products
Assays were constructed for eight polymorphisms defining each of the alleles of five genes
controlling five important commercial traits. The traits and genes were semi-dwarf (sd-1, two
alleles) (Sasaki et al., 2002; Spielmeyer et al., 2002), blast disease resistance (Pi-ta, one allele)
(Bryan et al., 2000), amylose content (waxy, two SNPs) (Cai et al., 1998; Larkin and Park, 1999;
Chen et al., 2008), gelatinization temperature (alk, two SNPs) (Umemoto, 2005; Waters et al.,
2006) and fragrance (fgr, one allele) (Bradbury et al., 2005) (Table 1).
Optimal capture primer concentration
The optimal primer concentration for the amplification of each target polymorphism in uniplex and
eight-plex was 0.3 μM. Polymorphism detection at eight-plex was consistent with uniplex data.
Increasing the uniplex primer concentration to 0.5 μM led to PCR products of higher concentration,
except for waxyIN1, in which there was nonspecific amplification at this concentration. The
concentrations of PCR products, as measured by a Bioanalyser 2100 (Agilent Technologies, Palo
Alto, CA, USA) DNA 500 LabChip® Kit, ranged from 7.8 ng/μL (sd-1 SNP) to 12.2 ng/μL (alk4)
in uniplex (Figure 1a,b), and were relatively lower in eight-plex, ranging from 6.40 ng/μL (sd-1
SNP) to 11.21 ng/μL (alk4), which was sufficient to produce an excellent mass spectrum (Figure
1c,d).
110
Figure 1. Concentration of PCR products in uni-plex and 8-plex: (a) Concentration of sd-1SNP = 7.8 ng/µl
(major peak), minor peaks correspond to size standard. (b) alk4 = 12.2 ng/µl, µl (major peak), minor peaks
correspond to size standard (c) Concentration of PCR products in 8-plex (d) Concentration of PCR products
in a 8-plex which has been analysed individually (all in ng/µl).
MgCl2 concentration
MgCl2 concentration is one of the most important factors for accurate concurrent amplification of
different loci in a multiplex system. The optimal concentration for the amplification of all loci in
uniplex and multiplex was 3 mM. At this MgCl2 concentration, all target loci were amplified free
from nonspecific amplicons and primer dimers. At lower MgCl2 concentrations of 2 and 2.5 mM,
no target DNA was amplified and there were a surprising number of nonspecific bands and primer
111
dimers. At concentrations higher than 3 mM, nonspecific bands were present in addition to the
target loci. These results were consistent and reproducible in both uniplex and eight-plex.
Identification of SNPs and polymorphisms in agronomic and quality loci
All eight loci were amplified in 25 cultivars and genotyped by multiplex MALDI-TOF analysis of
single-base extension products, and the polymorphisms were compared (Table 2). Of these, three
were responsible for important agronomic traits and five for grain quality traits, including six
nucleotide substitutions and two insertions/deletions (indels). Polymorphisms were distinguished at
all agronomic and quality loci, as described below.
sd-1
The semi-dwarf phenotype is caused by a loss of function of the enzyme gibberellin 20-oxidase
(GA 20-oxidase). Plants carrying the non-functional form of the gene, sd-1, have a diminished
capacity to produce gibberellin, resulting in a reduced plant height and enhanced grain yield. Two
alleles of sd-1 were assayed. One sd-1 allele, here called sd-1SNP, contains a C/T SNP in exon 2 of
the gene (C TC = leucine/ T TC = phenylalanine), which does not affect phenotype as it causes a
synonymous mutation (Spielmeyer et al., 2002; Monna et al., 2002). The other allele, here called
sd-1Del, is characterized by a 280-bp (Spielmeyer et al., 2002) or 278-bp (Sasaki et al., 2002)
deletion of part of exon 1 and exon 2 and 102–105 bp of the intron sequence, a 380–383-bp deletion
in total (Figure 2).
112
Table 2. Polymorphisms (SNP) in 25 commercial rice cultivars at eight different functional loci.
Cultivars
Amaroo
Amber
Basmati 370
BL24
Calrose
Calmochi 202
Dawn
Della
Dellmont
Domsorkh
Doongara
Dragon Eye Ball
Goolarah
Jarrah
Jasmin
Kyeema
Khao Dawk Mali 105
L 202
Langi
Millin
M7
Nipponbare
Opus
Teqing
YRF 204
Polymorphism
sd-1SNP
T
C
C
C
C
T
C
C
C
C
C
C
C
T
T
C
C
C
T
T
C
C
T
C
C
sd-1Del
A
A
A
T
A
A
A
A
T
A
T
A
A
A
T
A
A
T
A
A
T
A
A
T
T
Pi-ta
T
T
T
G
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
G
T
waxyIN1
waxyEX6
T
G
G
G
T
T
G
G
G
G
G
T
T
T
T
T
T
T
T
T
T
T
T
G
T
A
C
C
A
A
A
C
C
C
C
C
A
A
A
A
A
A
A
A
A
A
A
A
A
A
113
alk3
G
G
G
G
G
G
G
G
G
G
G
A
G
G
G
G
G
G
G
A
G
A
A
G
G
alk4
TT
GC
GC
GC
TT
TT
GC
GC
GC
GC
GC
GC
GC
TT
TT
GC
TT
GC
GC
GC
GC
GC
GC
GC
GC
fgr
AAAGATT
TATAT
TATAT
AAAGATT
AAAGATT
AAAGATT
AAAGATT
TATAT
TATAT
TATAT
AAAGATT
TATAT
TATAT
AAAGATT
TATAT
TATAT
TATA
AAAGATT
AAAGATT
AAAGATT
AAAGATT
AAAGATT
AAAGATT
AAAGATT
TATAT
Semi-dwarf/Tall
Semi-dwarf
Semi-dwarf
Semi-dwarf
Tall
Semi-dwarf
Semi-dwarf
Semi-dwarf
Semi-dwarf
Tall
Semi-dwarf
Tall
Semi-dwarf
Semi-dwarf
Semi-dwarf
Tall
Semi-dwarf
Semi-dwarf
Tall
Semi-dwarf
Semi-dwarf
Tall
Semi-dwarf
Semi-dwarf
Tall
Tall
Figure 2. Determination of sd-1Del gene on a 2% agarose gel. Fragments around 300 bp
indicate 383 bp deletion in the sd-1Del gene which is responsible for the semi-dwarf
phenotype. Fragments of approximately 700 bp are the intact sd-1Del gene of tall plants.
Lanes from left to right respectively, 100 bp Ladder, Negative control; rice varieties
Nipponbarre, Kyeema, Doongara, Amaroo, BL24, Della and Domsorkh.
Although a large deletion, such as sd-1Del, can be determined by the size difference of
amplification products on a simple 2% agarose gel (Figure 2), the suitability of MALDI-TOF
for the identification of large indels was assessed. In theory, only one base (terminator) is
added to the SNP site down- stream of the extension primer. Therefore, accurate gene
sequence information, particularly the flanking region just before and after the indel, is
necessary because single-base extension either recognizes one base inside or outside of the
indel. Theoretically, the indel can be determined by the ddNTP which terminates the
extension reaction (Figure 3). However, when using the assay designed by Sequenom®
(MassARRAY® Assay design 3.1) in both uniplex and eight-plex, no logic call was detected
and all genotypes showed ‘A’, which corresponds to the sd-1Del allele. Modification of the
method substantially improved the accuracy of analysis of this allele, from 43.7% to 5%
(Table 3). The modification involved amplification of the region containing the deletion in
uniplex using PCR primers designed by Primer 3 (http://frodo.wi.mit.edu), followed by the
addition of these uniplex amplicons to the other loci which had been amplified in seven-plex
for all subsequent manipulations.
114
Figure 3. Determination of sd-1Del gene by MALDI –TOF. There is a 383bp deletion in
semi-dwarf plants, therefore extended single base (mass modified terminator) matches to “C”
or “A” which is located just after deletion otherwise there will be a peak of T for tall plants.
Pi-ta
Pi-ta is a major blast resistance gene in rice. Pi-ta encodes a 928-amino-acid polypeptide
with a molecular mass of 105 kDa. A [G/T] SNP distinguishes susceptible and resistant
genotypes (Bryan et al., 2000); amino acid 918 differs between resistant and susceptible
genotypes: all susceptible genotypes have a serine (T) at this position, whereas resistant
plants have alanine (G). Most of the cultivars in this study carried the ‘T’ allele that translates
to serine (susceptible), whereas BL24 and Teqing contained the resistant ‘G’ allele (alanine).
waxy
The waxy gene encodes the enzyme granule-bound starch synthase, which is one of the key
factors influencing rice starch quality by affecting apparent amylose content (Sano, 1984;
Webb, 1991; Chen et al., 2008). The [G/T] SNP at the intron 1/exon 1 splice site (waxyIN1)
differentiates between varieties of high and low amylose content (Cai et al., 1998) and, in
combination with the exon 6 [C/A] SNP (waxyEX6), differentiates between varieties of high,
intermediate and low amylose content in southern US germplasm (Chen et al., 2008).
Cultivars with ‘T’ in waxyIN1 and ‘A’ in waxyEX6 have the lowest amylose content or even
115
glutinous starch. High polymorphism was found at waxyIN1 in the studied cultivars Amber,
Basmati 370, BL24, Dawn, Della, Dellmont and Domsorkh, Doongara and Teqing contained
the ‘G’ allele, and Jasmine, Nipponbare, Langi and M7 carried the ‘T’ allele (Figure 4). At
waxyEX6, 18 of the 25 cultivars displayed ‘A’, which suggests low amylose content.
alk
The major gene regulating alkali disintegration in rice grains, alk (Gao et al., 2003), encodes
the enzyme starch synthase IIa (Umemoto et al., 2004). Alkali disintegration is a convenient
indirect measure of the gelatinization temperature of rice starch, which is, in turn, associated
with rice cooking and eating quality. Two polymorphisms within exon 8 of alk, [A/G] (alk3)
and [GC/TT] (alk4), are associated with gelatinization temperature class (Umemoto, 2005;
Waters et al., 2006).
Figure 4. Sequenom® MassARRAY® waxyIN1 uni-plex spectrum for cv Langi which shows
a peak for ´T´
116
Figure 5. An 8-plex Sequenom® MassARRAY® spectrum for cv Langi
A combination of alk3 ‘G’ and alk4 ‘GC’ is found within varieties of high gelatinization
temperature and low alkali spreading, whereas varieties with either alk3 ‘A’ or alk4 ‘TT’ are
low gel temperature varieties. Both the [GC/ T T] (alk4) and [A/G] (alk3) polymorphisms
were determined in all cultivars.
fgr
A recessive gene (fgr) on chromosome 8 controls rice fragrance. The intact Fgr allele
encodes a betaine aldehyde dehydrogenase (BADH2) in non-fragrant rice, whereas fragrant
rice contains an 8-bp deletion and three SNPs which prematurely terminate the translation of
BADH2. This changes the bio- synthetic pathways in which BADH2 is active, resulting in
the accumulation of 2-acetyl-pyrroline, which is responsible for fragrance (Bradbury et al.,
2005). The eight-plex assay identified 11 varieties with the fragrant allele fgr.
117
Missing data and heterozygosity
The highest rate of missing data belonged to sd-1DEL in eight- plex, which suggests that this
allele is not compatible with the multiplex system (Table 3). No missing data were found in
waxyIN1, Pi-ta, alk4 and fgr. The apparent heterozygosity values were 3.9% and 3.1% in sd1SNP and alk4, respectively.
Discussion
This report as demonstrates DNA polymorphisms can be efficiently confirmed and analysed
in rice using a MALDI-TOF mass spectrometry system (Ding and Cantor, 2003). These
assays can be used as a marker-assisted selection tool in conventional breeding programs.
Rice has been at the forefront of the application of genomics and genomics tools to plant
breeding and serves as a model for other crops. A whole rice genome sequence has been
available for several years (Goff et al., 2002; Yu et al., 2002), and a comprehensive DNA
polymorphism database has recently become available online (http://irfgc.irri.org/index.php).
The availability of these resources has accelerated the rate, at which gene function has been
elucidated. Emerging DNA sequencing technologies are revolutionizing the field of
genomics, bringing the reality of relatively inexpensive comparative genome sequencing of
all the major crops much closer. MALDI-TOF mass spectrometry, in combination with
comparative genome sequence data, will become increasingly useful in marker-assisted
breeding as more genes that control important traits are identified. An efficient PCR is the
most important predictor for producing a reliable and consistent assay on this platform
(Figure 5). The uniform simultaneous amplification of all loci will resolve the most
commonly encountered problems (Siebert and Larrick, 1992). The number and intensity of
correct SNP calls are increased with higher PCR product concentrations. The minimum
concentration of PCR product is 4 ng/μL for loci, which falls within the default size of 80–
120 bp; however, longer PCR products require a higher concentration as measured by mass to
118
Table 3. Percent of missing data in uni-plex and 8-plex and apparent heterozygosity in 8plex
Assays/SNPs
Plex level
sd-1SNP
sd-1Del
Pi-ta
waxyIN1
waxyEX6
alk3
alk4
fgr
Missing data
uni-plex *
0%
8%
0%
0%
0%
1.1%
0%
0%
Missing data
8-plex *
4.5%
43.7%
0%
0%
4.2%
3.1%
0%
0%
Missing data
modified
8-plex †
Apparent
heterozygosity
8-plex and
modified
8-plex
4.5%
5%
0%
0%
4.2%
3.1%
0%
0%
3.9%
0%
0%
0%
0%
0%
3.1%
0%
* Assays designed with Sequenom® MassARRAY® Assay design 3.1
† Assays designed with Sequenom® MassARRAY® Assay design 3.1 except for sd-1Del where PCR primers
were designed by Primer 3, sd-1Del amplified in uni-plex and extended and analysed in 8-plex.
maintain the molar concentration at acceptable levels for iPlex extension reactions. The
concentration of PCR products differs between uniplex and eight-plex systems, which may
have an effect on peak height calls. These differences are a result of competition between
each PCR in multiplex, and show 5.7%–17.8% reductions in the final eight-plex PCR assay
compared with the uniplex assay.
Even spectral peak heights (Figure 5) are critical for accurate genotype calls using MALDITOF mass spectrometry, and this is achieved by increasing the concentration of individual
extension primers, not by modifying capture PCR conditions, because this does not have a
significant effect on the final spectra. PCR yield is intrinsic to the PCR conditions and, when
optimized, should be adhered to; increasing the concentration of template, primer and Taq
enzyme above that recommended concentration may increase yield in uniplex; however, in
multiplex, it may lead to the generation of dimers and spurious PCR products.
119
Accurate DNA sequence data for each polymorphism represent the most important
prerequisite for accurate assay design. However, public domain databases and published
papers can have conflicting data for each locus. For example, three different sequences for
sd-1Del appear in the public domain: the deletion has been reported to be 382 bp (Spielmeyer
et al., 2002) or 383 bp (Monna et al., 2002; Sasaki et al., 2002), and differs by the length of
intron and the exact location of the deletion. In cases such as this, re-sequencing the target
region is necessary for accurate primer design, which ultimately leads to an accurate,
consistent assay. The capture PCR stage is important in uniplex reactions, but it is critical in
the multiplex system because of the high rate of competition between primers consuming
templates and enzyme. Some primers worked well in uniplex, but had missing calls in
multiplex, suggesting that there were interactions between primers in eight-plex (Table 3).
For example, interactions between waxyIN1 and fgr increased the number of missing calls in
eight-plex. There was, however, a high correlation of more than 98% between uniplex and
eight-plex calls, and missing calls were around 0.15% and 1.68% (not including sd-1Del) for
uniplex and eight-plex respectively, which compares favourably with other sequencing
methods (Jones et al., 2007).
Multiplex MALDI-TOF is a powerful tool for the detection and confirmation of SNPs in rice.
It has been suggested that this platform has the capability of determining more than 40 SNPs
in multiplex (Sequenom, 2006) and, given that the platform can process ten 384-well plates
per day, users can theoretically analyse in excess of 153 000 SNPs daily (Perkel, 2008). This
technique can be applied to segregating populations in the early stages of breeding programs
to positively select desired polymorphisms and traits, and is a co-dominant system, having
the ability to detect alleles in hybrids, heterozygotes (Jones et al., 2007) and polyploids
(Henry et al., 2008). The capacity of the system to accurately identify haplotypes at one or
120
more loci, alk and waxy for example, allows for the efficient selection of target phenotypes
within breeding programs.
121
CHAPTER 7
General discussion - Characterisation of starch traits and genes in
Australian rice germplasm
Background principles
Starch is a carbohydrate consisting of large number of glucose units. A significant number of
enzyme isoforms and activities contribute to starch synthesis in cereals including rice.
Therefore, a substantial number of genes are involved in the process of starch synthesis. A
simplified pathway diagram of starch bio-synthesis in Figure 1 (Chapter 1) shows how starch
is synthesised by different enzymes and genes in plant green tissues and then deposited in the
grain of cereals. Starch consists of two major components, amylose (~20-25%) and
amylopectin (~75-80%). Variation in the genes and enzymes involved in synthesis of starch
can change the composition and structure of starch considerably which can significantly
affect the quality and palatability of rice. The variations normally occur at the DNA level due
to spontaneous or induced mutations. The most abundant type of variation in all organisms
are SNPs (Bryan et al., 2000; Kennedy et al., 2006). The main hypothesis of this thesis
emerges from the fact that SNPs can change the gene, leading to alteration of enzymes, which
in turn modifies the biochemical and physiochemical properties of starch (quality). For this
purpose, a diverse set of Australian rice germplasm was obtained and the variation of starch
related genes at the SNP level studied and a comprehensive association study pursued to
ascertain the effect of each gene and its alleles on starch quality.
122
Search in SNP data bases and discovery of polymorphisms
First, a range of databases such as OryzaSNP (http://oryzasnp.plantbiology.msu.edu/) were
interrogated with BLAST (http://blast.ncbi.nlm.nih.gov/) to identify the previously reported
SNPs in the targeted genes. In total, 399 SNPs were detected in 18 starch related genes in
data base records. In contrast, sequencing 233 Australian rice breeding lines resulted in the
detection of 501 SNPs and 113 Indels, 102 more than were available in the public domain.
One of the advantages of this approach was the capacity to detect Indels, none of which were
recorded in public databases. However, all Indels detected resided in introns and therefore
had no obvious impact on gene function. Of 501 SNPs, only 75 (~14.9 %) were nonsynonymous leading to amino acid changes.
This study clearly demonstrated Massively parallel sequencing (MPS) in combination with
Long range PCR (LR-PCR) allows analysis of many candidate genes and ensures high
sequence depth at all loci (Pettersson et al., 2009). There are a number of available databases
which curate DNA polymorphism data which can be converted to DNA markers. Much time,
money and intellectual energy has been expended in building and maintaining these databases
which are now being rendered redundant by MPS technologies. MPS allows markers to be
easily developed for any population of interest at reasonable cost. This means questions can
now be tailored for each population/species and answers provided with much greater
precision than was possible with marker information derived from unrelated germplasm.
The error rate of the Illumina GAIIx and bias in coverage are challenges for this method and
accuracy of data. The error rate is reportedly about 0.5-1.0% (Out et al., 2009). In this
experiment, 233 DNA samples were pooled which means the detection of one SNP (variant)
out of 233 corresponds to a SNP frequency of ~0.43% which is lower than the reported error
123
rate. However, this error rate corresponds to single reads and does not take into account high
coverage and the creation of a consensus sequence. The coverage at each base pair reported
in this thesis was so high, generally ranging from 12000 to 38000× and in one gene reaching
240,000×., errors could be identified and screened out by imposing a minimum SNP
requirement. This has been discussed by Out et al. (2009) who reported there are strong
correlations between allele frequencies, pool size, coverage and error rates in Illumina GAIIx
sequencing. They demonstrated coverage of 25000× would be sufficient for detection of SNP
with frequencies at or above 0.3%. High coverage ensured Illumina GAIIx platform error was
neutralised in this experiment.
Screening of functional SNPs
Since the emergence of high-throughput whole genome sequencing technologies (Henry,
2008), it is not possible to recognize functional SNPs in a pool of DNA sequence data which
contains neutral SNPs (George Priya Doss et al., 2008). Computational algorithms are useful
and cost-effective tools for analysis of SNPs and genes. Recently, SNP-linkage
disequilibrium and association studies, which need accurate phenotypic data of appropriate
populations, have gained acceptance as procedures to assess functional SNPs (Carlson et al.,
2003). However, these populations can be difficult to generate (Gupta et al., 2005), and they
must have high variation in the studied traits.
In this thesis, a computational screening pathway was developed to prioritize and rank plant
SNPs to predict their functionality and impact on plant phenotypes. This showed there are
significant numbers of important elements in the GBSSI gene, some of which have a strong
association with starch physiochemical properties (Soussi et al., 2006).
Based on computational analysis, the [C/A] SNP at exon 6 [Oryza SNP2], SNPs, ´C/T´
[OryzaSNP3] and ´C/T´ [OryzaSNP6] at exons 9 and 10 of GBSSI have been the most
124
influential SNPs in this population. Larkin and Park (2003) verified that haplotypes
composed of SNPs at the exon 1/intron1 boundary site, exon 6 and exon 10 regulate GBSSI
function. Chen et al. (2008a and 2008b) have also confirmed these SNPs can alter apparent
amylose content and pasting properties of rice.
The effect of the [C/A] SNP at exon 6 on amylose content and grain quality has been
confirmed by many authors (Sano, 1984; Larkin and Park, 2003; Chen et al., 2008a). In silico
analysis with FAS-ESS has also suggested another important silencer (ESS1) at the splice site
of exon 1/intron 1 which has a [G/T] SNP [OryzaSNP1]. The significance of this SNP which
reduces amylose content was confirmed by Cai et al. (1997) and in the association study
(Chapter 4). With the advent of whole genome sequencing, this suggests a computational
analysis of whole genome data is a means by which identification of important
polymorphisms can be accelerated.
Gene copy number in the rice genome
In polyploid species which have two or more genomes such as wheat and brassica, there are
as many copies of each gene as there are haploid genomes. Tetraploids have at least two
copies of each gene while hexaploids have three copies and so on. Rice is a diploid species
with mostly one copy of each gene. Some genes exist as gene families such as the starch
synthases. The rice genome has fully been sequenced and annotated and there is an explicit
list of genes available and in this study, there were 17 different genes, some of which were
similar in sequence such as the starch synthase genes such as SSI, SSIIa, SSIIb etc.
However, this gene family has been extensively characterised at both the gene and enzyme
level and so each member of the gene family is now uniquely identified. In addition, these
genes are located on different chromosomes, a means by which they are further separated.
125
Multiplexed MALDI-TOF Mass Spectrometry markers help to genotype individuals in
a cost effective manner
Multiplex MALDI-TOF is a powerful tool for the detection and confirmation of SNPs in rice.
It has been suggested that this platform has the capability of determining more than 40 SNPs
in multiplex (Sequenom, 2006) and, given that the platform can process ten 384-well plates
per day, users can theoretically analyse in excess of 153 000 SNPs daily (Perkel, 2008). This
technique can be applied to segregating populations in the early stages of breeding programs
to select desired polymorphisms and traits, and is a co-dominant system, having the ability to
detect alleles in hybrids, heterozygotes (Jones et al., 2007) and polyploids (Henry et al.,
2008).
After recognition and prioritization of important SNPs it was essential to find an appropriate
way to genotype 110 functional SNPs in 233 individuals. Based on regular sequencing
methods at least (110 × 233 = 25630) assays were needed, which would be too elaborate and
expensive. In this thesis, it was demonstrated that DNA polymorphisms can be efficiently
confirmed and analysed in rice using a MALDI-TOF mass spectrometry system. These
assays can be used as a marker-assisted selection tool in conventional breeding programs. For
this reason, an 8-plex assay was designed to check the suitability of some multiplexed
MALDI-TOF SNP-specific markers for the first time in plants. The results showed that an
optimal condition can be achieved and the method can be efficiently used in the genotyping
of rice individuals in association studies (Chapter 4).
The only drawback of this system was the inefficient recognition of large indels. However,
this study found no Indels in protein coding sequences suggesting Indels are of minor
importance in terms of trait determination generally. In the specific case of this rice breeding
population, there were no functional indels and they therefore did not influence these data.
126
Association between SNPs in starch biosynthesis genes and the nutritional and
functional properties of domesticated rice
The main aim of this thesis was to find associations among SNPs in different starch-related
genes and rice physiochemical properties. For this reason, 110 functional SNPs derived from
database searches and direct sequencing (Chapter 2) were chosen and then genotyped in 233
Australian rice lines using Multiplexed MALDI-TOF (Chapter 6).
In total, 65 SNPs were successfully genotyped in 233 breeding lines. No polymorphism was
detected for AGPS2b, SPHOL, SSIIb, SSIVb and ISA1 (Yan et al., 2010). Moreover, there
was no association between BEIIa and BEIIb (Fisher et al., 1993; Sun et al., 1997; Sun et al.,
1998; Yamakawa et al., 2008) and any physiochemical properties in rice. Therefore, seven
genes out of eighteen did not contribute to physiochemical properties of this population.
Despite the existence of some reports at the gene level of the importance of these genes, no
association between any of these genes and quality properties was found (Kawagoe et al.,
2005; Tetlow et al., 2004). As almost all of these studies have been based on artificially
induced mutants that abolish enzyme activity (Rolletschek et al., 2002). The data here
suggests artificially induced mutations, such as In/dels, which abolish gene function may
have utility in understanding the role of particular genes and enzymes in starch biosynthesis,
provide little guidance as to what genes are important in the natural system.
In contrast, GBSSI and SSIIa are major determinants of the important grain quality properties
amylose content and gelatinization temperature. Highly significant associations were found
between GBSSI and retrogradation and amylose content in addition to more significant
relationships with RVA properties such as BDV, SBV and FV (Chen et al., 2008a, b; Cai et
al., 1998). Larkin and Park, (2003) has already reported a SNP in exon 6 to be effective on
amylose content. This study confirms the T/G SNP at exon 1/intron1 junction site has a major
influence on a number of physico-chemical properties.
127
SSIIa presented very high association with pasting temperature, gelatinization temperature
and peak time (Umemoto et al., 2004; Umemoto et al., 2008). Umemoto and Aoki, (2005)
explained the alkali disintegration and eating quality of rice starch by polymorphism of two
SNPs, [A/G] and [GC/TT]. These SNPs within exon 8 of the alk locus is also significantly
associated with the gelatinisation temperature (GT) (Waters et al., 2006). We also confirmed
a very significant association between GC/TT SNP at exon 8 of SSIIa with pasting
temperature (R2=0.642).
In this thesis, six genes, GBSSII, SSI, SSIIIa, SSIIIb, SSIVa, BEI had low to medium effects
on the final phenotypic variation of individuals so these genes were called contributory (Dian
et al., 2003; Fujita et al., 2006; Hirose et al., 2006; Umemoto et al., 2008). Here for the first
time, SSIIIb and SSIVa were identified as PT-associated genes with relatively medium to
high levels of association with pasting and/or gelatinization temperature. A statistically high
R2 value of 0.315 was calculated between a T/G SNP in position 7232 of SSIIIb and pasting
temperature.
Analysis of this population identified a mixture of genes previously known to have a major
impact on rice starch quality, GBSSI and SSIIa, and a set of genes which play relatively
minor roles. It is possible the minor genes may make a contribution to starch quality in the
Australian rice growing environment only, and other sets of genes are important in other
environments. This may be a blueprint for future approaches to developing molecular
markers for plant breeding. In the past the relative paucity of data has meant only genes of
major effect could be pursued in plant breeding. It is now possible to identify more easily
environmentally affected genes of small effect which in combination may have a significant
impact on traits of interest.
128
The 6-glucose-phosphate translocator (GPT1) may contribute to resistant starch
Glucose 6-phosphate/phosphate translocator (GPT1) imports carbon into non-photosynthetic
plastids (Kammerer et al., 1998) and it appears to be a key gene controlling retrogradation
degree in rice. The results of this study revealed that high amylose rice cultivars,
characterized by low major RVA parameters, such as peak viscosity, hot paste viscosity, and
cool paste viscosity, had more resistant starch and a lower estimated glycaemic index. When
the retrogradation degree is higher, the starch is more resistant to digestion and the GI is
lower (Hu et al., 2004). In this study, a significant association was found between a ´T/C´
SNP at position 1188 of GPT1, which determines Leu42Phe, and resistant-retrograded starch
and amylose content.
It is believed amylose content has a significant influence on retrogradation rate (Hu et al.,
2004) but some studies show that these two important starch properties might work
independently from one another. In spite of a correlation coefficient of 0.70 between amylose
content and retrogradation degree in this study, the conclusion here is these two traits work
independently but have some contributing influences on each other.
Conclusion and future directions
I conclude that there are three different gene categories affecting starch quality in Australian
rice germplasm. First genes with major effects, such as GBSSI and SSIIa, which greatly
impact starch characteristics. Any new SNP found in these genes can significantly influence
starch phenotype. The second category contains genes with an intermediate impact on starch
properties. In this thesis, I named this category contributory genes, which contribute to the
major genes to shape the final starch composition. Any variation in these genes has a low to
intermediate impact on starch. I suggest six genes, GBSSII, SSI, SSIIIa, SSIIIb, SSIVa, BEI
reside in this category. Finally, there are genes such as debranching enzyme genes, which had
minor or no impact on starch physiochemical properties. It should be noted that contributory
129
and minor genes may differently impact starch phenotypic variation in different germplasm
and environments. Environmental factors may have a major influence on plant growth,
starch genes, enzyme activities and traits and, therefore, the results of any association study.
Do all roads lead to GBSSI and SSIIa? This study suggests that primary determinants of rice
starch quality are GBSSI and SSIIa. However, there might be some other genes in the
genome that have a significant impact on starch quality. In this thesis, I suggested GPT1 is
one of these novel important contributing genes. Expanding the collection of known alleles
of starch synthesis structural genes through whole genome sequencing and associating these
with starch traits will improve resolution of the interactions in the starch gene network.
Whole genome sequencing of up to 3000 cultivars is underway. When complete, associations
between starch gene alleles and starch quality parameters will help us to reach an
understanding of the role of natural variation in starch genes in determining starch quality.
Protein quantity influences rice eating quality. However, does protein composition influence
rice eating quality? Protein bodies (PBs) may be an important factor influencing rice
composition and quality. PBs reside among starch granules and their distribution may have
significant impact on starch quality. There are two types of protein bodies: PB-I that are
prolamins and consist 18-20% of grain protein of which there are several subunits of known
amino acid/ gene sequence. PB-II are glutelins and consist 70-80% of grain protein.
There has been much activity investigating mechanism of storage protein synthesis and
relatively little investigating mature grain quantity and amino acid composition of different
storage protein classes. A possible path for assessing the role of proteins in rice quality may
involve assembling a panel of rice genotypes with known differences in eating quality
parameters as measured by taste panel and associating these differences with protein body
subunit concentration (proteomics). Protein body subunit composition could be confirmed
through whole genome sequencing and assembly to known subunit gene sequence and
130
correlations/association of protein body subunit concentration and composition with eating
quality parameters tested.
Regardless of whether the target for further investigation of rice grain quality is starch or
protein, high throughput sequencing applied to structured genetic populations will prove to be
a powerful tool which will be invaluable in determining the contribution to each of these
entities to grain quality.
131
References
Ahmadian A, Gharizadeh B, Gustafsson AC, Sterky F, Nyrén P, Uhlén M and Lundeberg J
(2000) Single-nucleotide polymorphism analysis by pyrosequencing. Anal. Biochem.
280, 103–110.
Akey JM, Zhang G, Zhang K, Jin L and Shriver MD (2002) Interrogating a high-density SNP
map for signatures of natural selection. Genome. Res. 12, 1805-1814.
Andriotis VME, Pike MJ, Bunnewell S, Hills MJ and Smith AM (2010) The plastidial glucose
6 phosphate/phosphate antiporter GPT1 is essential for morphogenesis in Arabidopsis
embryos. Plant. J. 64 (1) 128-139
Andriotis VME, Pike MJ, Kular B, Rawsthorne S and Smith AM (2010) Starch turnover in
developing oilseed embryos. New Phytol. 187:791-804.
Asp NG and Björck I (1992) Resistant starch. Trends. Food. Sci. Tech. 3:111-114.
Baldwin PM (2001) Starch Granule Associated Proteins and Polypeptides: A Review. StarchStärke 53:475-503.
Ball SG and Morell MK (2003) From bacterial glycogen to starch: Understanding the
biogenesis of the plant starch granule. Annu. Rev. Plant. Biol. 54(1): 207-233.
Barreiro LB, Laval G, Quach H, Patin E and Quintana-Murci L (2008) Natural selection has
driven population differentiation in modern humans. Nat. Genet. 40, 340-345.
Batley J, Mogg R, Edwards D, O’Sullivan H and Edwards KJ (2003) A high-throughput SNuPE
assay for genotyping SNPs in the flanking regions of Zea mays sequence tagged simple
sequence repeats. Mol. Breed. 11, 111–120.
Beatty MK, Rahman A, Cao H, Woodman W, Lee M, Myers AM and James MG (1999)
Purification and molecular genetic characterization of ZPU1, a pullulanase-type starchdebranching enzyme from maize. Plant. Physiol. 119, 255-266.
Bentley DR (2006) Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16, 545-552.
Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP,
Evers DJ, Barnes CL and Bignell HR (2008) Accurate whole human genome sequencing
using reversible terminator chemistry. Nature. 456:53-59.
Bhattramakki D, Dolan M, Hanafey M, Wineland R, Vaske D, Register JC, Tingey SV and
Rafalski A (2002) Insertion–deletion polymorphisms in 3′ regions of maize genes occur
frequently and can be used as highly informative genetic markers. Plant Mol. Biol.48,
539–547.
132
Bodmer W and Bonilla C (2008) Common and rare variants in multifactorial susceptibility to
common diseases. Nat. Genet. 40, 695-701.
Bork P and Koonin EV (1998) Predicting functions from protein sequences—where are the
bottlenecks? Nat. Genet. 18(4): 313-318
Boyer CD and Preiss J (1978) Multiple forms of starch branching enzyme of maize: evidence
for independent genetic control. Biochem. Biophys. Res. Commun. 80, 169-175.
Bradbury LMT, Fitzgerald TL, Henry RJ, Jin Q and Waters DLE (2005) The gene for
fragrance in rice. Plant Biotechnol. J. 3, 363–370.
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y and Buckler ES (2007)
TASSEL: software for association mapping of complex traits in diverse samples.
Bioinformatics. 23:2633-2635.
Brookes AJ (1999) The essence of SNPs. Gene. 234 (2): 177-186
Bryan GT, Wu KS, Farrall L, Jia Y, Hershey HP, McAdams SA, Faulk KN, Donaldson GK,
Tarchini R, and Valent B (2000) A single amino acid difference distinguishes resistant and
susceptible alleles of the rice blast resistance gene Pi-ta. Plant. Cell. Online. 12(11): 20332046
Buléon A, Colonna P, Planchot V and Ball S (1998) Starch granules: structure and biosynthesis.
Int. J. Biological. Macromol. 23:85-112.
Bulyk ML (2004) Computational prediction of transcription-factor binding site locations.
Genome. Biol. 5(1): 201-201
Buschiazzo A, Ugalde JE, Guerin ME, Shepard W, Ugalde RA and Alzari PM (2004) Crystal
structure of glycogen synthase: homologous enzymes catalyze glycogen synthesis and
degradation. EMBO. J. 23(16): 3196
Bustos R, Fahy B, Hylton CM, Seale R, Nebane NM, Edwards A, Martin C and Smith AM
(2004) Starch granule initiation is controlled by a heteromultimeric isoamylase in potato
tubers. Proc. Natl. Acad. Sci. USA. 101, 2215-2220.
Cai XL, Wang ZY, Zheng FQ and Hong MM (1997) Regulation-related Intron in
5'Untranslated Region of Rice Waxy Gene. Acta. Phytophysio. Sini. 23: 257-261
Cai XL, Wang ZY, Xing YY, Zhang JL and Hong MM (1998) Aberrant splicing of intron 1
leads to the heterogeneous 5 UTR and decreased expression of waxy gene in rice cultivars
of intermediate amylose content. Plant. J. 14(4): 459-465
133
Cao H, Imparl-Radosevich J, Guan H, Keeling PL, James MG and Myers AM (1999)
Identification of the soluble starch synthase activities of maize endosperm. Plant. Physiol.
120, 205-216.
Carlson CS, Eberle MA, Rieder MJ, Smith JD, Kruglyak L and Nickerson DA (2003)
Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome
association studies in humans. Nat. Genet. 33(4): 518-521
Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT,
Fu G and Hinds DA (2007) Common sequence polymorphisms shaping genetic diversity in
Arabidopsis thaliana. Science. 317:338
Craig J, Lloyd JR, Tomlinson K, Barber L, Edwards A, Wang TL, Martin C, Hedley CL and
Smith AM (1998) Mutations in the gene encoding starch synthase II profoundly alter
amylopectin structure in pea embryos. Plant. Cell. Online. 10, 413-426.
Chen X and Sullivan PF (2003) Single nucleotide polymorphism genotyping: biochemistry,
protocol, cost and throughput. Pharmacogenomics. J. 3(2): 77-96
Chen MH, Bergman C and Fjellstrom R (2004) Waxy locus genetic variation associated with
amylose content in international rice germplasm. In: 30th Proceedings of the Rice Technical
Working Group Meeting. Beaumont, Texas, USA
Chen MH, Bergman C, Pinson S and Fjellstrom R (2008a) Waxy gene haplotypes: Associations
with apparent amylose content and the effect by the environment in an international rice
germplasm collection. J. Cereal. Sci. 47(3): 536-545
Chen MH, Bergman CJ, Pinson S and Fjellstrom R (2008b) Waxy gene haplotypes:
Associations with pasting properties in an international rice germplasm collection. J.
Cereal. Sci. 48(3): 781-788
Chothia C and Lesk AM (1986) The relation between the divergence of sequence and structure
in proteins. EMBO. J. 5(4): 823-826
Chung HJ, Lim HS and Lim ST (2006) Effect of partial gelatinization and retrogradation on the
enzymatic digestion of waxy rice starch. J. Cereal. Sci. 43:353-359.
Commuri PD and Keeling PL (2001) Chain-length specificities of maize starch synthase I
enzyme: studies of glucan affinity and catalytic properties. Plant. J. 25:475-486.
Costabile M, Quach A and Ferrante A (2006) Molecular approaches in the diagnosis of
primary immunodeficiency diseases. Hum. Mutat. 27, 1163–1173.
134
Dayong X, Jun J, Suyun H, Xiehong W, Yun G and Qingsen Z (2004) Effects of N, P and K
fertilizer amount on rice grain amylose content and starch viscosity properties. Chinese.
Agri. Sci. Bull. 20:99-99.
Debet MR and Gidley MJ (2007) Why do gelatinized starch granules not dissolve completely?
Roles for amylose, protein, and lipid in granule “ghost” integrity. J. Agri. Food. Chem.
55:4752-4760.
Dian W, Jiang H, Chen Q, Liu F and Wu P (2003) Cloning and characterization of the granulebound starch synthase II gene in rice: gene expression is regulated by the nitrogen level,
sugar and circadian rhythm. Planta. 218, 261-268.
Dian W, Jiang H and Wu P (2005) Evolution and expression analysis of starch synthase III and
IV in rice. J. Exp. Bot., 56, 623-632.
Ding C and Cantor CR (2003) A high-throughput gene expression analysis technique using
competitive PCR and matrix-assisted laser desorption ionization time-of-flight MS. Proc.
Natl. Acad. Sci. USA. 100:3059-3064.
Dinges JR, Colleoni C, James MG and Myers AM (2003) Mutational analysis of the
pullulanase-type debranching enzyme of maize indicates multiple functions in starch
metabolism. Plant. Cell. Online. 15, 666-680.
Doehlert DC and Knutson CA (1991) Two classes of starch debranching enzymes from
developing maize kernels. Jahresheft der Albrecht-Thaer-Gesellschaft (Germany).138(5)
566-572
Dohm JC, Lottaz C, Borodina T and Himmelbauer H (2008) Substantial biases in ultra-short
read data sets from high-throughput DNA sequencing. Nucleic. Acids. Res. 36 (16) e105
Domon E, Saito A and Takeda K (2002) Comparison of the waxy locus sequence from a nonwaxy strain and two waxy mutants of spontaneous and artificial origins in barley. Genes.
Genet. Sys. 77:351-359.
Druley TE, Vallania FLM, Wegner DJ, Varley KE, Knowles OL, Bonds JA, Robison SW,
Doniger SW, Hamvas A and Cole FS (2009) Quantification of rare allelic variants from
pooled genomic DNA. Nat. Methods. 6:263-265.
Drenkard E, Richter BG, Rozen S, Stutius LM, Angell NA, Mindrinos M, Cho RJ, Oefner PJ,
Davis RW and Ausubel FM (2000) A simple procedure for the analysis of single nucleotide
polymorphisms facilitates map-based cloning in Arabidopsis. Plant. Physiol. 124, 1483–
1492.
Eagles HA, Cane K, Appelbee M, Kuchel H, Eastwood RF and Martin PJ (2012) The storage
protein activator gene Spa-B1 and grain quality traits in southern Australian wheat breeding
programs. Crop and Pasture Sci. 63(4) 311-318.
135
Eastmond PJ and Rawsthorne S (2000) Coordinate changes in carbon partitioning and plastidial
metabolism during the development of oilseed rape embryos. Plant. Physiol. 122:767-774.
Edwards D, Forster JW, Cogan NOI, Batley J and Chagné D (2007) Chapter 4: Single
nucleotide polymorphism discovery in plants. Association mapping in plants Springer, New
York: 53–76
ElSharawy A, Manaster C, Teuber M, Rosenstiel P, Kwiatkowski R, Huse K, Platzer M, Becker
A, Nurnberg P and Schreiber S (2006) SNPSplicer: systematic analysis of SNP-dependent
splicing in genotyped cDNAs. Hum. Mutat. 27(11) 1129-1134
Fairbrother WG and Chasin LA (2000) Human genomic sequences that inhibit splicing. Mol.
Cell. Biol. 20(18): 6816-6825
Fairbrother WG, Yeh RF, Sharp PA and Burge CB (2002) Predictive identification of exonic
splicing enhancers in human genes. Science 9(297) 1007-1013.
Faisant N, Champ M, Colonna P, Buleon A, Molis C, Langkilde A, Schweizer T, Flourie B and
Galmiche J (1993) Structural features of resistant starch at the end of the human small
intestine. Europ. J. Clin. Nutr. 47:285.
Fan J and Marks B (1998) Retrogradation kinetics of rice flours as influenced by cultivar.
Cereal. Chem. 75:153-155.
Fersht AR (1985) Enzyme structure and function New York, Freeman and Co. USA
Fischer K and Weber A (2002) Transport of carbon in non-green plastids. Trends. Plant. Sci., 7,
345-351.
Fisher DK, Boyer CD and Hannah LC (1993) Starch branching enzyme II from maize
endosperm. Plant. Physiol. 102, 1045-1046.
Fitzgerald M (2004) Starch. Rice Chemistry and Technology. 109–141.
Fujita N, Kubo A, Suh DS, Wong KS, Jane JL, Ozawa K, Takaiwa F, Inaba Y and Nakamura Y
(2003) Antisense inhibition of isoamylase alters the structure of amylopectin and the
physicochemical properties of starch in rice endosperm. Plant. Cell. Physiol., 44, 607-618.
Fujita N, Yoshida M, Asakura N, Ohdan T, Miyao A, Hirochika H and Nakamura Y (2006)
Function and characterization of starch synthase I using mutants in rice. Plant. Physiol.
140:1070.
Fujita N, Yoshida M, Kondo T, Saito K, Utsumi Y, Tokunaga T, Nishi A, Satoh H, Park JH and
Jane JL (2007) Characterization of SSIIIa-deficient mutants of rice: the function of SSIIIa
and pleiotropic effects by SSIIIa deficiency in the rice endosperm. Plant. Physiol. 144,
2009-2023.
136
Futschik A and Schlotterer C (2010) The next generation of molecular markers from massively
parallel sequencing of pooled DNA samples. Genetics. 186, 207-218.
Gao M, Fisher DK, Kim KN, Shannon JC and Guiltinan MJ (1997) Independent genetic control
of maize starch-branching enzymes IIa and IIb (Isolation and characterization of a Sbe2a
cDNA). Plant. Physiol. 114, 69-78.
Gao ZY, Zeng DL, Cui X, Zhou YH, Yan MX, Huang DN, Li JY and Qian Q (2003) Mapbased cloning of the ALK gene, which controls the gelatinization temperature of rice.
Sci. Chin. Ser. C, Life Sci. 46, 661–668.
Garg K, Green P and Nickerson DA (1999) Identification of candidate coding region
single nucleotide polymorphisms in 165 human genes using assembled expressed
sequence tags. Genome. Res. 9, 1087–1092.
George Priya Doss C, Sudandiradoss C, Rajasekaran R, Choudhury P, Sinha P, Hota P and
Batra UP (2008) Applications of computational algorithm tools to identify functional SNPs.
Funct. Integr. Genomic. 8(4) 309-316
Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P
and Varma H (2002) A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica).
Science 296:92-100.
Gunderson KL, Steemers FJ, Ren H, Ng P, Zhou L, Tsan C, Chang W, Bullis D, Musmacker
J and King C (2006) Whole- genome genotyping. Methods. Enzymol. 410, 359–376.
Gupta PK, Roy JK and Prasad M (2001) Single nucleotide polymorphisms: a new
paradigm for molecular marker technology and DNA polymorphism detection with
emphasis on their use in plants. Curr. Sci. 80, 524–535.
Gupta PK, Rustgi S and Kulwal PL (2005) Linkage disequilibrium and association studies in
higher plants: present status and future prospects. Plant Mol Biol 57(4): 461-485
Hanashiro I, Itoh K, Kuratomi Y, Yamazaki M, Igarashi T, Matsugasako J and Takeda Y (2008)
Granule-bound starch synthase I is responsible for biosynthesis of extra-long unit chains of
amylopectin in rice. Plant. Cell. Physiol. 49:925.
Harismendy O and Frazer K (2009) Method for improving sequence coverage uniformity of
targeted genomic intervals amplified by LR-PCR using Illumina GA sequencing-bysynthesis technology. Biotechniques. 46, 229-231.
Harn C, Knight M, Ramakrishnan A, Guan H, Keeling PL and Wasserman BP (1998) Isolation
and characterization of the zSSIIa and zSSIIb starch synthase cDNA clones from maize
endosperm. Plant. Mol Biol. 37:639-649.
137
Hayashi K, Hashimoto N, Daigen M and Ashikawa I (2004) Development of PCR-based
SNP markers for rice blast resistance genes at the Piz locus. Theor. Appl. Genet. 108,
1212–1220.
Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouze P and Brunak S (1996) Splice
site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence
information. Nucleic. Acids. Res. 24(17): 3439-3452
Hegyi H and Gerstein M (1999) The relationship between protein structure and function: a
comprehensive survey with application to the yeast genome. J. Mol. Biol. 288(1) 147-164
Heinemeyer T, Wingender E, Reuter I, Hermjakob H, Kel AE, Kel OV, Ignatieva EV, Ananko
EA, Podkolodnaya OA and Kolpakov FA (1998) Databases on transcriptional regulation:
TRANSFAC, TRRD and COMPEL. Nucleic. Acids. Res. 26(1): 362-367
Henry RJ (2008) Future prospects for plant penotyping', (ed), Plant genotyping II: SNP
technology, CABI Publishing, Wallingford, UK, pp. 272-280.
Henry RJ, Pattemore JA, Waters DLE, Kharabian-Masouleh A, Bundock PC and Eliott FG
(2008) Applications of the sequenom platform to SNP analysis in plants. In: Plant and
Animal Genomes XVI Conference. San Diego, CA: Town and Country Convention
Center.
Hillier LDW, Marth GT, Quinlan AR, Dooling D, Fewell G, Barnett D, Fox P, Glasscock JI,
Hickenbotham M and Huang W (2008) Whole-genome sequencing and variant discovery in
C. elegans. Nat. Methods. 5:183-188.
Hirose T and Terao T (2004) A comprehensive expression analysis of the starch synthase gene
family in rice (Oryza sativa L.). Planta. 220. 9-16.
Hirose T, Ohdan T, Nakamura Y and Terao T (2006) Expression profiling of genes related to
starch synthesis in rice leaf sheaths during the heading period. Physiol. Plant. 128, 425-435.
Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, Smith SW, Middle CM, Rodesch MJ,
Albert TJ and Hannon GJ (2007) Genome-wide in situ exon capture for selective
resequencing. Nat. Genet. 39, 1522-1527.
Hovenkamp-Hermelink JHM, Jacobsen E, Ponstein AS, Visser RGF, Vos-Scheperkeuter GH,
Bijmolt EW, Vries JN, Witholt B and Feenstra WJ (1987) Isolation of an amylose-free
starch mutant of the potato (Solanum tuberosum L.). Theor. Appl. Genet. 75:217-221.
Hrmova M and Fincher GB (2001) Structure-function relationships of ß-D-glucan endo-and
exohydrolases from higher plants. Plant. Mol. Biol. 47(1): 73-91
138
Hu P, Zhao H, Duan Z, Linlin Z and Wu D (2004) Starch digestibility and the estimated
glycemic score of different types of rice differing in amylose contents. J. Cereal. Sci.
40:231-237.
Hutchings D, Rawsthorne S and Emes MJ (2005) Fatty acid synthesis and the oxidative pentose
phosphate pathway in developing embryos of oilseed rape (Brassica napus L.). J. Exp. Bot.
56:577
Hu P, Zhao H, Duan Z, Linlin Z and Wu D (2004) Starch digestibility and the estimated
glycemic score of different types of rice differing in amylose contents. J. Cereal. Sci.
40:231-237.
Imelfort M, Duran C, Batley J and Edwards D (2009) Discovering genetic polymorphisms in
next generation sequencing data. Plant Biotechnol J. 7:312-317.
Imparl-Radosevich JM, Nichols DJ, Li P, McKean AL, Keeling PL and Guan H (1999)
Analysis of purified maize starch synthases IIa and IIb: SS isoforms can be distinguished
based on their kinetic properties. Arch. Biochem. Biophys. 362:131-138.
Imparl-Radosevich JM, Gameon JR, McKean A, Wetterberg D, Keeling PL and Guan H (2003)
Understanding catalytic properties and functions of maize starch synthase isozymes. J. Appl.
Glycoscience. 50:177-182.
Ingman M and Gyllensten U (2008) SNP frequency estimation using massively parallel
sequencing of pooled DNA. Eur. J. Hum. Genet. 17, 383-386.
Ishikawa N, Ishihara J and Itoh M (1995) Artificial induction and characterization of amylosefree mutants of barley. Barley. Genet. Newsl. 24:49-53.
Isshiki M, Morino K, Nakajima M, Okagaki RJ, Wessler SR, Izawa T and Shimamoto K (1998)
A naturally occurring functional allele of the rice waxy locus has a GT to TT mutation at the
5 splice site of the first intron. Plant. J. 15:133-138.
James MG, Denyer K and Myers AM (2003) Starch synthesis in the cereal endosperm. Curr.
Opin. Plant Biol. 6, 215-222.
Jeong SC, Kristipati S, Hayes AJ, Maughan PJ, Noffsinger SL, Gunduz I, Buss GR and
Maroof MAS (2002) Genetic and sequence analysis of markers tightly linked to the
soybean mosaic virus resistance gene, Rsv 3. Crop. Sci. 42, 265–270.
Jiang H, Dian W, Liu F and Wu P (2003) Cloning and characterization of a glucose 6phosphate/phosphate translocator from Oryza sativa. J. Zhejiang. Univ-Sci. A., 4, 331-335.
Jones ES, Sullivan H, Bhattramakki D and Smith JSC (2007) A comparison of simple sequence
repeat and single nucleotide polymorphism marker technologies for the genotypic analysis
of maize (Zea mays L.). Theor. Appl. Genet. 115:361-371.
139
Juliano BO, Onate LU and Del Mundo AM (1965) Relation of starch composition, protein
content, and gelatinization temperature to cooking and eating qualities of milled rice. Food.
Technol. 19.
Kaiser J (2008) DNA sequencing: A plan to capture human diversity in 1000 Genomes.
Science. 319:395.
Kammerer B, Fischer K, Hilpert B, Schubert S, Gutensohn M, Weber A and Flügge UI (1998)
Molecular characterization of a carbon transporter in plastids from heterotrophic tissues: the
glucose 6-phosphate/phosphate antiporter. Plant. Cell. Online. 10:105.
Kawagoe Y, Kubo A, Satoh H, Takaiwa F and Nakamura Y (2005) Roles of isoamylase and
ADP glucose pyrophosphorylase in starch granule synthesis in rice endosperm. Plant. J. 42,
164-174.
Kennedy BG, Waters DLE, Henry RJ (2006) Screening for the rice blast resistance gene Pi-ta
using LNA displacement probes and real-time PCR. Mol Breeding 18(3): 185-193
Kharabian-Masouleh A, Waters DLE, Reinke RF and Henry RJ (2011) Discovery of
polymorphisms in starch related genes in rice germplasm by amplification of pooled DNA
and deeply parallel sequencing. Plant. Biotechnol. J. 9 (9):1074-1085.
Kiesselbach TA (1944) Character, field performance, and commercial production of waxy corn.
J. Am. Soc. Agron. 36:668-682.
Kim JI, Ju YS, Park H, Kim S, Lee S, Yi JH, Mudge J, Miller NA, Hong D and Bell CJ (2009)
A highly annotated whole-genome sequence of a Korean individual. Nature. 460:10111015.
Kubo A, Fujita N, Harada K, Matsuda T, Satoh H, Nakamura Y (1999) The starch-debranching
enzymes isoamylase and pullulanase are both involved in amylopectin biosynthesis in rice
endosperm. Plant. Physiol. 121(2): 399-410
Kuipers AGJ, Jacobsen E and Visser RGF (1994) Formation and deposition of amylose in the
potato tuber starch granule are affected by the reduction of granule-bound starch synthase
gene expression. Plant. Cell. Online. 6:43.
Kuriki T, Stewart DC and Preiss J (1997) Construction of chimeric enzymes out of maize
endosperm branching enzymes I and II. J. Biol. Chem. 272, 28999-29004.
Larkin PD and Park WD (2003) Association of waxy gene single nucleotide polymorphisms
with starch characteristics in rice (Oryza sativa L.). Mol. Breeding. 12(4): 335-339
Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993) PROCHECK: a program to
check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26(2): 283-291
140
Li Z, Rahman S, Kosar-Hashemi B, Mouille G, Appels R and Morell MK (1999) Cloning and
characterization of a gene encoding wheat starch synthase I. Theor. Appl. Genet. 98:12081216.
Li Z, Chu X, Mouille G, Yan L, Kosar-Hashemi B, Hey S, Napier J, Shewry P, Clarke B and
Appels R (1999) The localization and expression of the class II starch synthases of wheat.
Plant. Physiol. 120:1147.
Li Z, Mouille G, Kosar-Hashemi B, Rahman S, Clarke B, Gale KR, Appels R and Morell MK
(2000) The structure and expression of the wheat starch synthase III gene. Motifs in the
expressed gene define the lineage of the starch synthase III gene family. Plant. Physiol.
123:613.
Libessart N, Maddelein ML, Koornhuyse NV, Decq A, Delrue B, Mouille G, D'Hulst C and
Ball S (1995) Storage, photosynthesis, and growth: the conditional nature of mutations
affecting starch synthesis and structure in Chlamydomonas. Plant. Cell. Online. 7:1117.
Limpisut P and Jindal VK (2002) Comparison of rice flour pasting properties using Brabender
Viscoamylograph and Rapid Visco Analyser for evaluating cooked rice texture. StarchStärke. 54:350-357.
Livak KJ (1999) Allelic discrimination using fluorogenic probes and the 5′ nuclease assay.
Genetic analysis: Biomol. Eng. 14, 143 –149.
Lumdubwong N and Seib P (2000) Rice starch isolation by alkaline protease digestion of wetmilled rice flour. J. Cereal. Sci. 31:63-74.
Mardis ER (2008a) The impact of next-generation sequencing technology on genetics. Trends.
Genet. 24, 133-141.
Mardis ER (2008b) Next-generation DNA sequencing methods. Annu. Rev. Genom. Hum. G. 9
387-402.
Marshall W, Normand F and Goynes W (1990) Effects of lipid and protein removal on starch
gelatinization in whole grain milled rice. Cereal. Chem. 67:458-463.
Masouleh AK, Waters DLE, Reinke RF and Henry RJ (2009) A high-throughput assay for rapid
and simultaneous analysis of perfect markers for important quality and agronomic traits in
rice using multiplexed MALDI-TOF mass spectrometry. Plant. Biotechnol. J. 7, 355-363.
McGuigan FE and Ralston SH (2002) Single nucleotide poly- morphism detection: allelic
discrimination using TaqMan. Psychiatr. Genet. 12, 133–136.
141
McNally KL, Childs KL, Bohnert R, Davidson RM, Zhao K, Ulat VJ, Zeller G, Clark RM,
Hoen DR and Bureau TE (2009) Genomewide SNP variation reveals relationships among
landraces and modern varieties of rice. Proceedings of the National Academy of Sciences
106:12273.
Mikami I, Uwatoko N, Ikeda Y, Yamaguchi J, Hirano HY, Suzuki Y and Sano Y (2008) Allelic
diversification at the wx locus in landraces of Asian rice. Theor. Appl. Genet. 116(7): 979989
Miles MJ, Morris VJ, Orford PD and Ring SG (1985) The roles of amylose and amylopectin in
the gelation and retrogradation of starch. Carbohydate. Res. 135:271-281.
Monna L, Kitazawa N, Yoshino R, Suzuki J, Masuda H, Maehara Y, Tanji M, Sato M, Nasu S
and Minobe Y (2002) Positional cloning of rice semidwarfing gene, sd-1: ‘Rice green
revolution gene’ encodes a mutant enzyme involved in gibberellin synthesis. DNA. Res. 9,
11–17.
Morell MK, Kosar-Hashemi B, Cmiel M, Samuel MS, Chandler P, Rahman S, Buleon A, Batey
I.L and Li Z (2003) Barley sex6 mutants lack starch synthase IIa activity and contain a
starch with novel properties. Plant. J. 34, 173-185.
Morozova O and Marra MA (2008) Applications of next-generation sequencing technologies in
functional genomics. Genomics. 92, 255-264.
Morrison WR (1988) Lipids in cereal starches: A review. J. Cereal. Sci. 8:1-15.
Morris CF (2002) Puroindolines: the molecular genetic b a s i s of wheat grain hardness. Plant.
Mol. Biol. 48, 633–647.
Nakamura T, Vrinten P, Hayakawa K and Ikeda J (1998) Characterization of a granule-bound
starch synthase isoform found in the pericarp of wheat. Plant. Physiol. 118, 451-459.
Nakamura Y (2002) Towards a better understanding of the metabolic system for amylopectin
biosynthesis in plants: rice endosperm as a model tissue. Plant. Cell. Physiol. 43, 718-725.
Nakamura Y, Francisco PB, Hosaka Y, Sato A, Sawada T, Kubo A and Fujita N (2005)
Essential amino acids of starch synthase IIa differentiate amylopectin structure and starch
quality between japonica and indica rice varieties. Plant Mol. Biol. 58:213-227.
Nasu S, Suzuki J, Ohta R, Hasegawa K, Yui R, Kitazawa N, Monna L and Minobe Y (2002)
Search for and analysis of single nucleotide polymorphisms (SNPs) in rice (Oryza sativa,
Oryza rufipogon) and establishment of SNP markers. DNA Res. 9, 163-171.
Ng PC and Henikoff S (2002) Accounting for human polymorphisms predicted to affect protein
function. Cold Spring Harbor Laboratory Press, pp. 436-446.
142
Ng PC and Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function.
Nucleic. Acids. Res. 31(13): 3812
Niewiadomski P, Knappe S, Geimer S, Fischer K, Schulz B, Unte US, Rosso MG, Ache P,
Flügge UI and Schneider A (2005) The Arabidopsis plastidic glucose 6phosphate/phosphate translocator GPT1 is essential for pollen maturation and embryo sac
development. Plant. Cell. Online. 17:760.
Nishi A, Nakamura Y, Tanaka N and Satoh H (2001) Biochemical and Genetic Analysis of the
Effects ofAmylose-Extender Mutation in Rice Endosperm. Plant. Physiol. 127:459.
Novaes E, Drost DR, Farmerie WG, Pappas GJ, Grattapaglia D, Sederoff RR and Kirst M
(2008) High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized
genome. BMC. Genomics. 9:312.
Ohdan T, Francisco Jr PB, Sawada T, Hirose T, Terao T, Satoh H and Nakamura Y (2005)
Expression profiling of genes involved in starch synthesis in sink and source organs of rice.
J. Exp. Bot. 56, 3229-3244.
Olivier M (2005) The Invader® assay for SNP genotyping. Mutat. Res. 573, 103–110.
Out AA, van Minderhout I, Goeman JJ, Ariyurek Y, Ossowski S, Schneeberger K, Weigel D,
van Galen M, Taschner PEM and Tops CMJ (2009) Deep sequencing to reveal new variants
in pooled DNA samples. Hum. Mutat. 30, 1703-1712.
Panlasigui L, Thompson L, Juliano B, Perez C, Yiu S and Greenberg G (1991) Rice varieties
with similar amylose content differ in starch digestibility and glycemic response in humans.
Am. J. Clin. Nutr. 54:871.
Patron NJ, Smith AM, Fahy BF, Hylton CM, Naldrett MJ, Rossnagel BG and Denyer K (2002)
The altered pattern of amylose accumulation in the endosperm of low-amylose barley
cultivars is attributable to a single mutant allele of granule-bound starch synthase I with a
deletion in the 5'-non-coding region. Plant. Physio. 130:190.
Peng S, Huang J, Sheehy JE, Laza RC, Visperas RM, Zhong X, Centeno GS, Khush GS and
Cassman KG (2004) Rice yields decline with higher night temperature from global
warming. Proc. Natl. Acad. Sci. USA. 101:9971.
Perkel J (2008) SNP genotyping: six technologies that keyed a revolution. Nat
Methods, 5, 447–453.
Pertea M, Lin X, Salzberg SL (2001) GeneSplicer: a new computational method for splice site
prediction. Nucleic. Acids. Res. 29(5): 1185-1190
143
Pesole G and Liuni S (1999) Internet resources for the functional analysis of 5'and 3'
untranslated regions of eukaryotic mRNAs. Trends. Genet. 15(9): 378-378
Pettersson E, Lundeberg J and Ahmadian A (2009) Generations of sequencing technologies.
Genomics. 93, 105-111.
Philpot K, Martin M, Butardo Jr V, Willoughby D and Fitzgerald M (2006) Environmental
factors that affect the ability of amylose to contribute to retrogradation in gels made from
rice flour. J. Agri. Food. Chem. 54:5182-5190.
Raemakers K, Schreuder M, Suurs L, Furrer-Verhorst H, Vincken JP, Vetten N, Jacobsen E and
Visser RGF (2005) Improved cassava starch by antisense inhibition of granule-bound starch
synthase I. Mol. Breeding. 16:163-172.
Rafalski A (2002) Applications of single nucleotide polymorphisms in crop genetics.
Curr. Opin. Plant. Biol. 5, 94–100.
Ragaee S and Abdel-Aal ESM (2006) Pasting properties of starch and protein in selected
cereals and quality of their food products. Food. Chem. 95:9-18.
Rahman S, Li Z, Batey I, Cochrane MP, Appels R and Morell M (2000) Genetic alteration of
starch functionality in wheat. J. Cereal Sci. 31, 91-110.
Rahman S, Regina A, Li Z, Mukai Y, Yamamoto M, Kosar-Hashemi B, Abrahams S and
Morell MK (2001) Comparison of starch-branching enzyme genes reveals evolutionary
relationships among isoforms. Characterization of a gene for starch-branching enzyme IIa
from the wheat D genome donor Aegilops tauschii. Plant. Physiol. 125, 1314-1324.
Rajasekaran R, Sudandiradoss C, Doss CGP, Sethumadhavan R (2007) Identification and in
silico analysis of functional SNPs of the BRCA1 gene. Genomics. 90(4): 447-452
Rajasekaran R, George Priya Doss C, Sudandiradoss C, Ramanathan K, Rituraj P and Rao S
(2008) Computational and structural investigation of deleterious functional SNPs in breast
cancer BRCA2 gene. Chinese. J. Biotech. 24(5): 851-856
Rajesh S, Raveendran M and Manickam A (2008) Prediction of 3-dimensional structure of
EMV1, a group 1 late embryogenesis abundant protein of Vigna radiata Wilczek. Plant.
Omics. J. 1(1): 17-25
Ramensky V, Bork P and Sunyaev S (2002) Human non-synonymous SNPs: server and survey.
Nucleic. Acids. Res. 30(17): 3894-3900
Rapley R and Harbron SE (2004) Molecular analysis and genome discovery. Sussex, UK:
Wiley.
144
Ring SG, Gee JM, Whittam M, Orford P and Johnson IT (1988) Resistant starch: its chemical
form in foodstuffs and effect on digestibility in vitro. Food. Chem. 28:97-109.
Roth C and Liberles DA (2006) A systematic search for positive selection in higher plants
(Embryophytes). BMC. Plant. Biol. 6, 12.
Rowland-Bamford AJ, Allen LH, Baker JT and Boote K (1990) Carbon dioxide effects on
carbohydrate status and partitioning in rice. J. Exp. Bot. 41:1601.
Rolletschek H, Hajirezaei MR, Wobus U and Weber H (2002) Antisense-inhibition of ADPglucose pyrophosphorylase in Vicia narbonensis seeds increases soluble sugars and leads to
higher water and nitrogen uptake. Planta. 214, 954-964.
Rolletschek H, Nguyen TH, Häusler RE, Rutten T, Göbel C, Feussner I, Radchuk R, Tewes A,
Claus B and Klukas C (2007) Antisense inhibition of the plastidial glucose 6
phosphate/phosphate translocator in Vicia seeds shifts cellular differentiation and promotes
protein storage. Plant. J. 51:468-484.
Saito M, Konda M, Vrinten P, Nakamura K and Nakamura T (2004) Molecular comparison of
waxy null alleles in common wheat and identification of a unique null allele. Theor. Appl.
Genet. 108:1205-1211.
Sajilata M, Singhal RS and Kulkarni PR (2006) Resistant starch–a review. Comprehensive. Rev.
Food. Sci. F. 5:1-17.
Sano Y (1984) Differential regulation of waxy gene expression in rice endosperm. Theor. Appl.
Genet. 68(5): 467-473
Sasaki A, Ashikari M, Ueguchi-Tanaka M, Itoh H, Nishimura A, Swapan D, Ishiyama K,
Saito T, Kobayashi M and Khush GS (2002) A mutant gibberellin-synthesis gene in
rice. Nature. 416, 701–702.
Sato K, Inaba K and Tozawa M (1973) High temperature injury of ripening in rice plant. I. The
effects of high temperature treatments at different stages of panicle development on the
ripening, pp 207-213.
Sato K (1984) Starch granules in tissues of rice plants and their changes in relation to plant
growth. Jap. Agric. Res. Quarter. 18:78-86.
Satoh H, Nishi A, Yamashita K, Takemoto Y, Tanaka Y, Hosaka Y, Sakurai A, Fujita N and
Nakamura Y (2003) Starch-branching enzyme I-deficient mutation specifically affects the
structure and properties of starch in rice endosperm. Plant. Physiol. 133, 1111-1121.
145
Schofield J and Greenwell P (1987) Wheat starch granule proteins and their technological
significance. Cereals in a European context. First European Conference on Food Science
and Technology, Morton, I.D. (eds.).- New York, NY (USA): VCH, 1987.- ISBN 08-95735237. p. 407-420
Schuster SC (2008) Next-generation sequencing transforms today’s biology. Nat. Methods. 5,
1,16-18.
Seo B, Kim S, Scott MP, Singletary GW, Wong K, James MG and Myers AM (2002)
Functional interactions between heterologously expressed starch-branching enzymes of
maize and the glycogen synthases of brewer's yeast. Plant. Physiol. 128, 1189-1199.
Sequenom I (2006) iPLEX™ Gold Assay for SNP Genotyping. Biotechniques.
Protocol. Guide. 41. San Diego, CA: Sequenom.
Shapter FM, Eggler P, Lee LS and Henry RJ (2009) Variation in Granule Bound Starch
Synthase I (GBSSI) loci amongst Australian wild cereal relatives (Poaceae). J. Cereal. Sci.
49:4-11.
Shastry BS (2002) SNP alleles in human disease and evolution. J. Hum. Genet. 47(11): 561-566
Shen J, Deininger PL and Zhao H (2006) Applications of computational algorithm tools to
identify functional SNPs in cytokine genes. Cytokine. 35(1-2) 62-66
Sheng F, Jia X, Sheng F, Jia X, Yep A, Jack P and Geiger JH (2009) The crystal structures of
the open and catalytically competent closed conformation of Escherichia coli glycogen
synthase. J. Biol. Chem. 284(26): 17796-17807
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001)
dbSNP: the NCBI database of genetic variation. Nucleic. Acids. Res. 29(1): 308-311
Siebert PD and Larrick JW (1992) Competitive PCR. Nature 359:557-558.
Singh N, Pal N, Mahajan G, Singh S and Shevkani K (2011) Rice grain and starch properties:
effects of nitrogen fertilizer application. Carbohyd. Polym. 86 (1) 219-225
Sinha S and Tompa M (2002) Discovery of novel transcription factor binding sites by statistical
overrepresentation. Nucleic. Acids. Res. 30(24): 5549–5560
Smith AM (1999) Making starch. Curr. Opin. Plant. Biol. 2:223-229.
Soussi T, Asselain B, Hamroun D, Kato S, Ishioka C, Claustres M and Beroud C (2006) Metaanalysis of the p53 mutation database for mutant p53 biological activity reveals a
methodologic bias in mutation detection. Clin. Cancer. Res. 12:62-69
146
Spielmeyer W, Ellis MH and Chandler PM (2002) Semidwarf (sd-1),“green revolution” rice,
contains a defective gibberellin 20-oxidase gene. Proc. Natl. Acad. Sci. USA 99:9043-9048.
Sun C, Sathish P, Ahlandsberg S, Deiber A and Jansson C (1997) Identification of four starch
branching enzymes in barley endosperm: partial purification of forms I, IIa and IIb. New.
Phytol. 137, 215-222.
Sun C, Sathish P, Ahlandsberg S and Jansson C (1998) The two genes encoding starchbranching enzymes IIa and IIb are differentially expressed in barley. Plant. Physiol. 118,
37-49.
Sunyaev SR, Lathe WC, Ramensky VE and Bork P (2000) SNP frequencies in human genes-an
excess of rare alleles and differing modes of selection. Trends. Genet. 16, 335-337.
Tacke R and Manley JL (1999) Determinants of SR protein specificity. Curr. Opin. Cell. Biol.
11(3): 358-362
Takeda Y, Guan HP and Preiss J (1993) Branching of amylose by the branching isoenzymes of
maize endosperm. Carbohydr. Res. 240, 253-263.
Tanaka N, Fujita N, Nishi A, Satoh H, Hosaka Y, Ugaki M, Kawasaki S and Nakamura Y
(2004) The structure of starch can be manipulated by changing the expression levels of
starch branching enzyme IIb in rice endosperm. Plant. Biotech. J. 2, 507-516.
Tashiro T and Wardlaw I (1991) The effect of high temperature on kernel dimensions and the
type and occurrence of kernel damage in rice. Aust. J. Agric. Res. 42:485-496.
Tester R, Morrison W, Ellis R, Piggo J, Batts G, Wheeler T, Morison J, Hadley P and Ledward
D (1995) Effects of elevated growth temperature and carbon dioxide levels on some
physicochemical properties of wheat starch. J. Cereal. Sci. 22:63-71.
Tester RF and Morrison WR (1990) Swelling and gelatinization of cereal starches. I. Effects of
amylopectin, amylose, and lipids. Cereal. Chem. 67:551-557.
Tester RF, Karkalas J and Qi X (2004) Starch--composition, fine structure and architecture. J.
Cereal. Sci. 39:151-165.
Tetlow IJ, Wait R, Lu Z, Akkasaeng R, Bowsher CG, Esposito S, Kosar-Hashemi B, Morell
MK and Emes MJ (2004) Protein phosphorylation in amyloplasts regulates starch branching
enzyme activity and protein-protein interactions. Plant. Cell. Online. 16, 694-708.
Thomas RK, Nickerson E, Simons JF, Jänne PA, Tengs T, Yuza Y, Garraway LA, LaFramboise
T, Lee JC and Shah K (2006) Sensitive mutation detection in heterogeneous cancer
specimens by massively parallel picoliter reactor sequencing. Nature. Med. 12, 852-855.
147
Umemoto T, Yano M, Satoh H, Shomura A and Nakamura Y (2002) Mapping of a gene
responsible for the difference in amylopectin structure between japonica-type and indicatype rice varieties. Theor. Appl. Genet. 104:1-8.
Umemoto T, Aoki N, Lin H, Nakamura Y, Inouchi N, Sato Y, Yano M, Hirabayashi H and
Maruyama S (2004) Natural variation in rice starch synthase IIa affects enzyme and starch
properties. Funct. Plant Biol. 31, 671-684.
Umemoto T and Aoki N (2005) Single-nucleotide polymorphisms in rice starch synthase IIa
that alter starch gelatinisation and starch association of the enzyme. Funct. Plant Biol. 32,
763-768.
Umemoto T, Horibata T, Aoki N, Hiratsuka M, Yano M and Inouchi N (2008) Effects of
variations in starch synthase on starch properties and eating quality of rice. Plant Prod. Sci.
11, 472-480.
Varley KE and Mitra RD (2008) Nested Patch PCR enables highly multiplexed mutation
discovery in candidate genes. Genome Res. 18, 1844-1850.
Velicer GJ, Raddatz G, Keller H, Deiss S, Lanz C, Dinkelacker I and Schuster SC (2006)
Comprehensive mutation identification in an evolved bacterial cooperator and its cheating
ancestor. Proc. Natl. Acad. Sci. USA. 103 (21).8107-8112.
Visser RGF, Somhorst I, Kuipers GJ, Ruys NJ, Feenstra WJ and Jacobsen E (1991) Inhibition
of the expression of the gene for granule-bound starch synthase in potato by antisense
constructs. Mol. General. Genet. 225:289-296.
Vrinten PL and Nakamura T (2000) Wheat granule-bound starch synthase I and II are encoded
by separate genes that are expressed in different tissues. Plant. Physiol. 122, 255-264.
Wakao S, Andre C and Benning C (2008) Functional analyses of cytosolic glucose-6-phosphate
dehydrogenases and their contribution to seed oil accumulation in Arabidopsis. Plant.
Physiol. 146:277.
Wang YJ, White P, Pollak L and Jane J (1993) Characterization of starch structures of 17 maize
endosperm mutant genotypes with Oh43 inbred line background. Cereal Chem. 70:171-171.
Wang Z, Rolish ME, Yeo G, Tung V, Mawson M and Burge CB (2004) Systematic
identification and analysis of exonic splicing silencers. Cell. 119(6): 831-845
Wang Z, Xiao X, Van Nostrand E and Burge CB (2006) General and specific functions of
exonic splicing silencers in splicing control. Mol. Cell. 23(1): 61-70
Waters DLE, Henry RJ, Reinke RF and Fitzgerald MA (2006) Gelatinization temperature of
rice explained by polymorphisms in starch synthase. Plant. Biotechnol. J. 4, 115-122.
148
Waters DLE, Henry RJ (2007) Genetic manipulation of starch properties in plants: patents
2001-2006. Recent. Pat. Biotechnol. 1(3): 52-259.
Waters DLE, Henry RJ, Reinke RF and Fitzgerald MA (2006) Gelatinization temperature of
rice explained by polymorphisms in starch synthase. Plant. Biotechnol. J. 4:115-122.
Webb BD (1991) Rice quality and grades. In: Rice (Bor S. Luh ed.), pp. 89–93. College
Station, TX: USDA, Rice Quality Laboratory, Texas A&M University.
Yamakawa H, Hirose T, Kuroda M and Yamaguchi T (2007) Comprehensive expression
profiling of rice grain filling-related genes under high temperature using DNA microarray.
Plant. Physiol. 144, 258-277.
Yamakawa H, Ebitani T and Terao T (2008) Comparison between locations of QTLs for grain
chalkiness and genes responsive to high temperature during grain filling on the rice
chromosome map. Breed. Sci. 58, 337-343.
Yamamori M, Kato M, Yui M and Kawasaki M (2006) Resistant starch and starch pasting
properties of a starch synthase IIa-deficient wheat with apparent high amylose. Aust J Agri
Res. 57:531-536.
Yan CJ, Tian ZX, Fang YW, Yang YC, Li J, Zeng SY, Gu SL, Xu CW, Tang SZ and Gu MH
(2010) Genetic analysis of starch paste viscosity parameters in glutinous rice (Oryza sativa
L.). Theor. Appl. Genet. 122(1) 63-76.
Yu J, Hu S, Wang J, Wong GKS, Li S, Liu B, Deng Y, Dai L, Zhou Y and Zhang X (2002) A
draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science. 296, 79-92.
Yun SH and Matheson NK (1993) Structures of the amylopectins of waxy, normal, amyloseextender, and wx: ae genotypes and of the phytoglycogen of maize. Carbohydr. Res. 243,
307-321.
Zakaria S, Matsuda T, Tajima S and Nitta Y (2002) Effect of high temperature at ripening stage
on the reserve accumulation in seed in some rice cultivars. Plant Production Science-Tokyo5:160-168.
149
Appendices 1-8: Attached
Appendices: Chapter 2
Appendix 1: Full list of discovered SNP/Indel is 17 studies starch related genes.
Appendix 2: Full list of Australian breeding lines (population) and their pedigree
information.
Appendix 3: Target genes and sequence of gene-specific LR-PCR primers.
Appendix 4: SNP/Indel distribution and short read coverage pattern across candidate loci.
Appendices: Chapter 4
Appendix 5: Full list of 233 studied Australian rice genotypes and their pedigree information.
Appendix 6: Name and characteristics of SNPs genotyped in the rice population.
Appendix 7: The results of association study among 13 physiochemical traits and SNPs of 18
different starch-related genes.
Appendix 8: Linkage map of 17 starch-related genes, showing the approximate each gene’s
chromosomal location.
150
Appendix 1:
Full list of discovered SNPs/Indels in 17 starch related genes
Gene
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
AGPS2b
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEIIa
BEIIa
BEIIa
BEIIa
BEIIa
BEIIa
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
Reference Consensu Variation
position
s position type
2533
161 SNP
2644
277 SNP
2713
347 SNP
2799
436 SNP
2886
530 SNP
3380
1029 SNP
3497
1146 SNP
3585
1234 SNP
3609
1258 SNP
3685
1334 SNP
3919
1568 SNP
4259
1908 SNP
4260
1909 SNP
4309
1958 SNP
4355
2004 SNP
4359
2008 SNP
4361
2010 SNP
4372
2021 SNP
4460
2109 SNP
4486
2135 SNP
4499
2148 SNP
4501
2150 SNP
4694
2343 SNP
4787
2436 SNP
4826
2475 SNP
4952
2601 SNP
5026
2675 SNP
5035
2684 SNP
5087
2736 SNP
5477
3126 SNP
193
193 SNP
810
810 SNP
828
828 SNP
1218
1218 SNP
1268
1268 SNP
1341
1341 SNP
1558
1558 SNP
1902
1902 SNP
2410
2410 SNP
3178
3178 SNP
3610
3610 SNP
4480
4480 SNP
6386
6386 SNP
6403
6403 SNP
6554
6554 SNP
6887
6887 SNP
7130
7130 SNP
7214
7214 SNP
1690
531 SNP
1925
766 SNP
1941
782 SNP
1942
783 SNP
2048
889 SNP
3266
2018 SNP
538
541 SNP
1365
1368 SNP
1449
1452 SNP
1658
1661 SNP
2874
2877 SNP
2875
2878 SNP
3137
3140 SNP
3138
3141 SNP
3794
3797 SNP
3957
3960 SNP
4099
4102 SNP
Length
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Referenc
Allele
e
Variants variations
C
2 C/T
G
2 G/C
C
2 C/A
C
2 C/T
T
2 T/C
A
2 A/G
G
2 G/A
A
2 A/G
C
2 C/A
T
2 T/A
T
2 T/G
C
2 C/T
C
2 C/T
G
2 G/T
T
2 T/A
C
2 C/T
C
2 C/A
A
2 A/G
A
2 A/G
T
2 T/A
C
2 C/A
A
2 A/G
A
2 A/T
A
2 A/G
A
2 A/G
A
2 A/T
T
2 T/G
C
2 C/T
A
2 C/A
G
2 G/C
G
2 G/A
C
2 C/T
G
2 G/A
C
2 C/T
C
2 C/A
A
2 A/G
C
2 C/T
C
2 C/T
G
2 G/A
G
2 G/T
G
2 G/A
G
2 G/C
G
2 G/A
A
2 A/G
T
2 T/G
G
2 G/A
A
2 A/G
G
2 G/T
N
1T
G
1T
G
1T
G
1T
A
1T
T
2 T/G
T
2 G/T
T
2 T/C
A
2 C/A
C
2 T/C
C
2 T/C
A
2 G/A
T
2 T/A
T
2 T/A
A
2 C/A
C
2 G/C
T
2 C/T
Frequencies
94.7/5.3
95.2/4.7
99.0/1.0
94.8/5.2
98.9/1.1
94.9/5.1
94.8/5.2
99.0/1.0
98.3/1.7
95.0/5.0
98.9/1.1
97.6/2.4
99.3/0.7
98.6/1.4
99.3/0.7
99.2/0.8
99.2/0.8
98.6/1.4
95.7/4.3
95.6/4.4
97.5/2.5
95.8/4.2
98.6/1.4
98.6/1.4
96.1/3.9
97.5/2.5
96.4/3.6
98.6/1.4
77.1/22.9
99.2/0.8
89.2/10.8
88.9/11.1
88.7/11.3
88.3/11.7
88.9/11.1
89.0/11.0
94.5/5.4
88.9/11.1
94.3/5.7
93.9/6.1
88.5/11.5
87.3/12.7
86.8/13.2
97.8/2.1
91.8/8.1
86.8/13.2
99.4/0.6
99.0/1.0
100
100
100
100
100
97.2/2.8
92.1/7.9
99.2/0.8
90.5/9.5
90.5/9.5
90.0/10.0
89.8/10.2
99.5/0.5
99.1/0.9
89.7/10.3
89.9/10.0
86.8/13.2
Counts
22557/1252
65324/3253
27197/275
33604/1828
36781/412
37725/2013
35569/1941
27517/281
24298/413
19394/1014
90/1
2685/65
3463/25
31625/447
30364/229
29463/244
29349/228
28229/411
28708/1283
39609/1819
48230/1237
48626/2145
34913/507
44438/632
34982/1403
31775/822
35926/1341
38005/534
22765/6747
95747/781
15320/1847
20419/2556
20882/2652
20787/2765
21991/2743
21364/2644
19230/1108
19821/2479
21442/1297
26643/1726
35853/4638
32896/4768
13151/2003
16105/352
3257/288
8414/1276
20181/117
1271/13
27778
29024
24302
24331
22534
1795/51
27333/2350
34460/279
31003/3254
27136/2863
26475/2946
26541/3000
21745/114
18785/169
18492/2119
23577/2634
24411/3717
Coverage
23813
68588
27474
35435
37195
39740
37517
27799
24717
20411
91
2750
3488
32075
30593
29711
29580
28641
29991
41431
49469
50771
35423
45072
36390
32603
37273
38546
29514
96562
17170
22976
23536
23553
24739
24008
20341
22308
22744
28372
40502
37668
15155
16459
3546
9692
20298
1284
27780
29028
24305
24334
22543
1846
29685
34741
34258
30001
29421
29542
21859
18956
20613
26220
28133
Variant
#1
C
G
C
C
T
A
G
A
C
T
T
C
C
G
T
C
C
A
A
T
C
A
A
A
A
A
T
C
C
G
G
C
G
C
C
A
C
C
G
G
G
G
G
A
T
G
A
G
T
T
T
T
T
T
G
T
C
T
T
G
T
T
C
G
C
Frequency of
Frequency of
#1
Count of #1 Variant #2 #2
Count of #2
94.72557007
22557 T
5.257632386
1252
95.24115006
65324 C
4.742812154
3253
98.99177404
27197 A
1.000946349
275
94.83279244
33604 T
5.158741357
1828
98.88694717
36781 C
1.107675763
412
94.92954202
37725 G
5.065425264
2013
94.80768718
35569 A
5.17365461
1941
98.98557502
27517 G
1.010827728
281
98.30481045
24298 A
1.670914755
413
95.01739258
19394 A
4.967909461
1014
98.9010989
90 G
1.098901099
1
97.63636364
2685 T
2.363636364
65
99.28325688
3463 T
0.716743119
25
98.59703819
31625 T
1.39360873
447
99.25146275
30364 A
0.748537247
229
99.16529232
29463 T
0.821244657
244
99.21906694
29349 A
0.770791075
228
98.56150274
28229 G
1.435005761
411
95.72204995
28708 G
4.277950052
1283
95.60232676
39609 A
4.390432285
1819
97.49540116
48230 A
2.500555904
1237
95.77514723
48626 G
4.22485277
2145
98.56025746
34913 T
1.431273466
507
98.59336173
44438 G
1.402200923
632
96.13080517
34982 G
3.855454795
1403
97.46035641
31775 T
2.521240377
822
96.386124
35926 G
3.597778553
1341
98.59648213
38005 T
1.385357754
534
77.13288609
22765 A
22.86033747
6747
99.15598268
95747 C
0.808806777
781
89.22539313
15320 A
10.75713454
1847
88.87099582
20419 T
11.12465181
2556
88.72365738
20882 A
11.267845
2652
88.25627309
20787 T
11.73948117
2765
88.89203282
21991 A
11.08775617
2743
88.98700433
21364 G
11.01299567
2644
94.53812497
19230 T
5.447126493
1108
88.85153308
19821 T
11.11260534
2479
94.2754133
21442 A
5.702602884
1297
93.90596363
26643 T
6.083462569
1726
88.52155449
35853 A
11.45128636
4638
87.3314219
32896 C
12.65795901
4768
86.77664137
13151 A
13.21676015
2003
97.84920105
16105 G
2.138647548
352
91.8499718
3257 G
8.121827411
288
86.81386711
8414 A
13.16549732
1276
99.42358853
20181 G
0.576411469
117
98.98753894
1271 T
1.012461059
13
99.99280058
27778
99.9862202
29024
99.98765686
24302
99.98767157
24331
99.9600763
22534
97.23726977
1795 G
2.762730228
51
92.07680647
27333 T
7.916456123
2350
99.19115742
34460 C
0.803085691
279
90.49856968
31003 A
9.498511297
3254
90.45031832
27136 C
9.543015233
2863
89.98674416
26475 C
10.01325584
2946
89.84158148
26541 A
10.15503351
3000
99.47847569
21745 A
0.521524315
114
99.09791095
18785 A
0.891538299
169
89.71037695
18492 A
10.27992044
2119
89.91990847
23577 C
10.04576659
2634
86.76998543
24411 T
13.21224185
3717
Amino acid
Overlapping annotations
change
Quality
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
Low
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
Low
Gene: Os08g0345800
N/A
Low
Gene: Os08g0345800
N/A
Low
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
High
Gene: Os08g0345800
N/A
Low
Gene: Os06g0726400, CDS: Os06g0726400,
N/A
mRNA:High
Os06g0726400
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400, CDS: Os06g0726400,
Gly607Asp
mRNA:High
Os06g0726400
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400, CDS: Os06g0726400,
N/A
mRNA:High
Os06g0726400
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400
N/A
High
Gene: Os06g0726400
N/A
Low
Gene: Os06g0726400, CDS: Os06g0726400,
N/A
mRNA:High
Os06g0726400
Gene: Os04g0409200
N/A
Low
Gene: Os04g0409200
N/A
Low
Gene: Os04g0409200
N/A
Low
Gene: Os04g0409200
N/A
Low
Gene: Os04g0409200, CDS: Os04g0409200,
N/A
mRNA:Low
Os04g0409200
Gene: Os04g0409200, CDS: Os04g0409200,
Tyr140Ser
mRNA:High
Os04g0409200
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
Low
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
Low
Gene: Os02g0528200
N/A
Low
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
Isoamylase 2
GBSSI
GBSSII
GBSSII
GBSSII
GBSSII
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
4288
4293
4357
4538
5016
5259
5280
5349
5641
5828
5867
5933
6119
6159
6167
6375
6385
6429
6457
6727
6905
6908
6920
6932
6941
6959
6962
6968
7051
7207
7394
7826
8139
8272
8310
8334
8775
9035
9294
9761
10068
10561
136
799
960
1120
1462
1712
2040
2067
2122
2130
2161
2163
2169
2170
2214
2217
1086
670
1638
5170
5174
386
429
499
513
1094
1188
4291
4296
4360
4541
5019
5262
5283
5352
5644
5830
5869
5935
6121
6161
6169
6377
6387
6431
6459
6729
6907
6910
6922
6934
6943
6961
6964
6970
7053
7211
7398
7830
8143
8276
8314
8338
8779
9039
9298
9761
10068
10561
136
799
960
1120
1462
1712
2040
2067
2122
2130
2161
2163
2169
2170
2214
2217
1086
585
1553
5085
5089
331
374
444
459
1040
1134
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
A
C
C
A
A
A
A
G
T
C
G
T
T
G
T
T
C
C
A
A
T
C
T
G
A
G
T
C
C
T
C
A
C
T
A
G
T
T
A
T
A
C
A
A
T
C
G
C
G
T
C
C
C
C
C
C
C
C
A
T
A
C
C
G
C
A
G
C
C
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
C/A
A/C
T/C
G/A
A/G
C/A
A/G
T/G
C/T
A/C
A/G
C/T
A/T
A/G
C/T
T/C
C/G
C/T
A/G
T/A
T/C
C/T
T/C
G/A
A/T
G/A
T/A
C/T
T/C
C/T
T/C
T/A
T/C
C/T
A/T
A/G
A/T
C/T
C/A
C/T
C/A
C/A
A/G
A/G
T/C
C/T
G/A
C/A
G/T
T/G
C/T
C/A
C/A
C/A
C/T
C/T
C/A
C/A
A/C
T/A
A/G
C/G
C/G
G/A
C/G
A/G
G/T
C/T
C/T
90.1/9.9
90.0/10.0
90.6/9.4
89.9/10.1
99.4/0.6
90.4/9.6
99.1/0.9
85.0/15.0
86.5/13.5
86.4/13.6
86.1/13.9
87.0/13.0
86.2/13.8
84.3/15.6
84.2/15.8
99.5/0.5
93.4/6.6
98.4/1.6
98.1/1.9
86.1/13.9
96.1/3.9
96.1/3.9
92.7/7.3
89.9/10.1
89.9/10.1
96.4/3.6
96.4/3.6
99.1/0.9
84.8/15.2
85.1/14.9
85.5/14.5
87.7/12.3
85.2/14.8
60.3/39.7
57.2/42.7
84.0/16.0
84.4/15.6
83.6/16.4
84.8/15.2
87.0/13.0
85.1/14.9
99.2/0.8
54.2/45.8
53.3/46.7
51.9/48.1
59.7/40.3
65.8/34.2
66.1/33.9
65.3/34.7
85.7/14.3
99.5/0.5
99.5/0.5
98.3/1.7
98.3/1.7
97.6/2.4
97.8/2.2
98.6/1.4
98.0/2.0
99.1/0.9
97.3/2.7
97.5/2.5
97.9/2.1
98.0/1.9
95.2/4.8
75.0/24.9
95.1/4.9
97.9/2.1
94.9/5.1
95.8/4.2
33436/3659
32840/3638
30362/3141
31063/3499
66816/415
114671/12210
108792/953
2914/513
14612/2285
15779/2477
15582/2511
15158/2273
15951/2548
14548/2698
15838/2972
25803/136
20636/1449
18045/296
16662/317
13187/2124
18473/741
18278/735
19151/1513
18875/2125
19086/2146
18930/711
18937/711
19054/165
13272/2374
16485/2881
17518/2970
10001/1404
12817/2225
4666/3076
2930/2188
4810/919
15578/2889
13341/2612
14371/2582
13612/2030
11744/2062
125/1
8106/6843
10794/9440
34842/32345
66247/44769
5351/2786
3097/1590
673/358
240/40
183/1
196/1
58/1
59/1
41/1
45/1
72/1
50/1
47013/447
33112/909
39384/1003
33987/727
35723/709
4605/232
4213/1400
26198/1338
12441/270
16845/901
13206/574
37097
36479
33504
34570
67232
126895
109752
3428
16897
18260
18096
17432
18501
17249
18814
25940
22088
18343
16980
15311
19214
19018
20666
21002
21235
19646
19649
19223
15649
19367
20489
11406
15044
7743
5119
5729
18468
15958
16954
15645
13807
126
14949
20234
67193
111022
8137
4687
1031
280
184
197
59
60
42
46
73
51
47463
34021
40390
34718
36436
4837
5615
27537
12712
17755
13781
C
A
T
G
A
C
A
T
C
A
A
C
A
A
C
T
C
C
A
T
T
C
T
G
A
G
T
C
T
C
T
T
T
C
A
A
A
C
C
C
C
C
A
A
T
C
G
C
G
T
C
C
C
C
C
C
C
C
A
T
A
C
C
G
C
A
G
C
C
90.13127746
90.0243976
90.62201528
89.85536592
99.38124703
90.36683872
99.12530068
85.00583431
86.47688939
86.41292442
86.10742706
86.95502524
86.21696125
84.34112122
84.18199213
99.47185813
93.42629482
98.37540206
98.12720848
86.12762066
96.14343708
96.10894942
92.66911836
89.87239311
89.87991523
96.35549221
96.37640592
99.12084482
84.81053102
85.11901688
85.49953634
87.6819218
85.19675618
60.2608808
57.23774175
83.95880607
84.35131037
83.60070184
84.76465731
87.00543305
85.05830376
99.20634921
54.22436283
53.34585351
51.8536157
59.67015546
65.7613371
66.07638148
65.27643065
85.71428571
99.45652174
99.49238579
98.30508475
98.33333333
97.61904762
97.82608696
98.63013699
98.03921569
99.05189305
97.32812087
97.50928448
97.89446397
98.04314414
95.20363862
75.03116652
95.13745143
97.86815607
94.87468319
95.82758871
33436
32840
30362
31063
66816
114671
108792
2914
14612
15779
15582
15158
15951
14548
15838
25803
20636
18045
16662
13187
18473
18278
19151
18875
19086
18930
18937
19054
13272
16485
17518
10001
12817
4666
2930
4810
15578
13341
14371
13612
11744
125
8106
10794
34842
66247
5351
3097
673
240
183
196
58
59
41
45
72
50
47013
33112
39384
33987
35723
4605
4213
26198
12441
16845
13206
A
C
C
A
G
A
G
G
T
C
G
T
T
G
T
C
G
T
G
A
C
T
C
A
T
A
A
T
C
T
C
A
C
T
T
G
T
T
A
T
A
A
G
G
C
T
A
A
T
G
T
A
A
A
T
T
A
A
C
A
G
G
G
A
G
G
T
T
T
9.863331267
9.972861098
9.375
10.12149262
0.617265588
9.622128531
0.868321306
14.96499417
13.52311061
13.56516977
13.87599469
13.03923818
13.77222853
15.64148646
15.7967471
0.524286816
6.560123144
1.613694597
1.866902238
13.87237934
3.856562923
3.864759701
7.32120391
10.11808399
10.10595715
3.619057314
3.618504759
0.858346772
15.17029842
14.87581969
14.495583
12.30931089
14.78994948
39.72620431
42.74272319
16.04119393
15.64327485
16.36796591
15.22944438
12.9753915
14.93445354
0.793650794
45.77563717
46.65414649
48.13745479
40.3244402
34.2386629
33.92361852
34.72356935
14.28571429
0.543478261
0.507614213
1.694915254
1.666666667
2.380952381
2.173913043
1.369863014
1.960784314
0.941786233
2.671879133
2.483287943
2.094014632
1.945877703
4.796361381
24.9332146
4.858917093
2.123977344
5.074626866
4.165154923
3659
3638
3141
3499
415
12210
953
513
2285
2477
2511
2273
2548
2698
2972
136
1449
296
317
2124
741
735
1513
2125
2146
711
711
165
2374
2881
2970
1404
2225
3076
2188
919
2889
2612
2582
2030
2062
1
6843
9440
32345
44769
2786
1590
358
40
1
1
1
1
1
1
1
1
447
909
1003
727
709
232
1400
1338
270
901
574
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
Low
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:High
Os02g0528200
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:Low
Os02g0528200
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:High
Os02g0528200
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
Low
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200, CDS: Os02g0528200,
Val403IlemRNA:High
Os02g0528200
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:High
Os02g0528200
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:High
Os02g0528200
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:High
Os02g0528200
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:High
Os02g0528200
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:High
Os02g0528200
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:High
Os02g0528200
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:High
Os02g0528200
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:High
Os02g0528200
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:Low
Os02g0528200
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:High
Os02g0528200
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200, CDS: Os02g0528200,
His196Arg
mRNA:High
Os02g0528200
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200
N/A
High
Gene: Os02g0528200, CDS: Os02g0528200,
Leu94ValmRNA:High
Os02g0528200
Gene: Os02g0528200, CDS: Os02g0528200,
N/A
mRNA:Low
Os02g0528200
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
N/A
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
N/A
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
Thr482Ala
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
N/A
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
N/A
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
Arg231Leu
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
Leu122Met
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
Thr113Pro
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
N/A
Low
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
Gly92Cys
Low
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
N/A
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
Gly81Trp
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
Glu79Lys
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
N/A
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
Gly64Trp
High
Gene: OSJNBa0014C03.3, CDS: OSJNBa0014C03.3
Gly63Cys
High
Gene: Os06g0133000, CDS: Os06g0133000,
Tyr224Ser
mRNA:Low
Os06g0133000
Gene: Os07g0412100, mRNA: Os07g0412100
N/A
High
Gene: Os07g0412100, CDS: Os07g0412100,
Leu523Ser
mRNA:High
Os07g0412100
Gene: Os07g0412100
N/A
High
Gene: Os07g0412100
N/A
High
Gene: Os08g0187800
N/A
High
Gene: Os08g0187800
N/A
High
Gene: Os08g0187800
N/A
High
Gene: Os08g0187800
N/A
High
Gene: Os08g0187800, CDS: Os08g0187800,
N/A
mRNA:High
Os08g0187800
Gene: Os08g0187800, CDS: Os08g0187800,
Leu42PhemRNA:High
Os08g0187800
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
1680
2395
2479
2688
2736
2890
3325
3475
3507
3574
901
1021
1052
1098
1268
1271
1302
1356
1388
1390
1564
1595
1620
1625
1635
1642
1715
1742
1763
1792
1804
1813
1910
2023
2042
2125
2154
2159
2225
2319
2365
2374
2408
2429
2471
2478
2490
2606
2624
2631
2691
2742
2746
2799
2822
2849
2886
2911
2925
3013
3075
3168
3199
3256
3296
3332
3367
3392
3394
1626
2341
2425
2634
2682
2836
3271
3421
3453
3520
206
326
357
403
573
576
607
661
693
695
869
900
925
930
940
947
1020
1047
1068
1097
1109
1118
1215
1328
1347
1430
1459
1464
1530
1624
1670
1679
1713
1734
1776
1783
1795
1911
1929
1936
1996
2047
2051
2104
2127
2154
2191
2216
2230
2318
2380
2473
2504
2561
2601
2637
2672
2697
2699
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
G
C
A
C
T
G
A
A
T
C
C
T
C
A
T
T
T
T
A
G
A
T
C
C
C
G
G
A
G
G
G
C
T
T
A
C
A
G
G
G
G
G
T
A
T
T
G
C
T
T
T
C
T
C
A
C
C
G
T
G
C
A
G
T
C
T
C
T
T
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
G/A
C/G
A/G
C/T
T/G
G/A
A/G
A/G
T/C
C/A
C/T
T/G
C/T
A/C
T/C
T/C
T/C
T/A
A/G
G/C
A/G
T/A
C/T
C/T
C/T
G/T
G/T
A/G
G/A
G/A
G/A
C/T
T/C
T/C
A/T
C/T
A/G
G/T
G/A
G/A
G/A
G/C
T/A
G/A
T/C
T/G
G/A
C/T
T/A
T/C
T/C
C/T
T/C
C/T
A/G
C/T
C/T
G/A
T/G
G/A
C/G
A/C
G/A
T/C
C/G
T/A
C/G
T/C
T/C
95.6/4.4
96.2/3.8
68.6/31.4
73.5/26.5
95.9/4.1
95.7/4.3
73.0/27.0
95.8/4.2
96.8/3.2
97.0/3.0
99.0/1.0
98.8/1.1
98.8/1.1
99.1/0.9
98.9/1.1
98.9/1.1
98.9/1.1
99.0/1.0
99.1/0.9
99.1/0.9
98.9/1.1
98.7/1.2
98.7/1.3
98.7/1.3
98.8/1.2
98.7/1.3
99.3/0.7
98.8/1.2
98.7/1.2
98.7/1.3
98.7/1.3
98.7/1.3
98.4/1.5
99.1/0.9
99.4/0.6
98.2/1.8
98.8/1.1
99.4/0.6
98.5/1.5
99.1/0.9
98.4/1.6
98.3/1.7
98.2/1.8
64.7/35.3
98.4/1.6
98.5/1.4
98.5/1.5
98.9/1.1
98.5/1.5
98.6/1.4
98.2/1.8
98.4/1.6
98.4/1.6
99.4/0.6
99.5/0.5
99.3/0.6
98.0/2.0
99.3/0.7
98.9/1.1
99.4/0.6
99.2/0.8
97.9/2.0
98.2/1.8
99.0/1.0
99.2/0.8
99.4/0.6
98.7/1.3
98.3/1.7
98.3/1.7
18024/829
20520/818
12564/5754
17374/6259
24255/1027
22880/1027
14302/5295
21595/952
20097/655
17760/550
75204/779
43188/501
102730/1191
76276/700
34504/389
33664/384
41282/465
41726/412
37356/346
37209/346
45550/517
44787/562
42075/540
41551/528
40330/487
39111/499
39700/282
42595/521
44421/560
46344/599
47243/617
47570/614
36053/567
45060/399
40103/241
33016/616
31371/360
32002/181
41557/634
38785/354
41880/671
42169/729
45905/844
27149/14815
36288/598
36932/543
37148/548
43040/464
41713/617
43670/631
31735/566
33436/546
34317/550
42312/256
43328/230
44396/288
30765/636
34113/245
36011/411
44995/262
43947/347
41109/859
42926/803
44869/448
41967/327
36667/228
38002/484
37643/662
37830/662
18854
21340
18319
23636
25284
23909
19600
22547
20752
18311
75996
43693
103936
76979
34894
34048
41750
42140
37704
37558
46069
45356
42625
42082
40820
39613
39986
43119
44988
46945
47864
48188
36621
45462
40346
33634
31736
32200
42198
39143
42557
42899
46752
41965
36887
37478
37699
43509
42333
44302
32303
33990
34867
42573
43559
44690
31403
34364
36427
45267
44305
41971
43733
45320
42298
36897
38494
38305
38494
G
C
A
C
T
G
A
A
T
C
C
T
C
A
T
T
T
T
A
G
A
T
C
C
C
G
G
A
G
G
G
C
T
T
A
C
A
G
G
G
G
G
T
G
T
T
G
C
T
T
T
C
T
C
A
C
C
G
T
G
C
A
G
T
C
T
C
T
T
95.59775114
96.1574508
68.58452972
73.50651548
95.93023256
95.69618135
72.96938776
95.77770879
96.84367772
96.9908798
98.95783989
98.84420845
98.83967057
99.08676392
98.88232934
98.87218045
98.87904192
99.01756051
99.07702101
99.07077054
98.87342899
98.7454802
98.70967742
98.73817784
98.79960804
98.73273925
99.28474966
98.78475846
98.73966391
98.71977846
98.70257396
98.71752303
98.44897736
99.11574502
99.39770981
98.16257359
98.84988656
99.38509317
98.48097066
99.0854048
98.40919238
98.29832863
98.1883128
64.69438818
98.37612167
98.54314531
98.53842277
98.9220621
98.53542154
98.57342784
98.24164938
98.37010886
98.42257722
99.38693538
99.4696848
99.34213471
97.96834697
99.26958445
98.85798995
99.39912077
99.19196479
97.94620095
98.15471155
99.00485437
99.21745709
99.37664309
98.72187873
98.27176609
98.27505585
18024
20520
12564
17374
24255
22880
14302
21595
20097
17760
75204
43188
102730
76276
34504
33664
41282
41726
37356
37209
45550
44787
42075
41551
40330
39111
39700
42595
44421
46344
47243
47570
36053
45060
40103
33016
31371
32002
41557
38785
41880
42169
45905
27149
36288
36932
37148
43040
41713
43670
31735
33436
34317
42312
43328
44396
30765
34113
36011
44995
43947
41109
42926
44869
41967
36667
38002
37643
37830
A
G
G
T
G
A
G
G
C
A
T
G
T
C
C
C
C
A
G
C
G
A
T
T
T
T
T
G
A
A
A
T
C
C
T
T
G
T
A
A
A
C
A
A
C
G
A
T
A
C
C
T
C
T
G
T
T
A
G
A
G
C
A
C
G
A
G
C
C
4.396944945
3.833177132
31.41001146
26.48079201
4.061857301
4.295453595
27.01530612
4.222291214
3.156322282
3.003659003
1.02505395
1.146636761
1.145897475
0.909338911
1.114804838
1.127819549
1.113772455
0.977693403
0.917674517
0.921241813
1.122229699
1.239086339
1.26686217
1.254693218
1.193042626
1.259687476
0.705246836
1.208284051
1.244776385
1.275961231
1.289069029
1.274176143
1.548291964
0.877656064
0.597333069
1.83148005
1.134358457
0.562111801
1.502440874
0.904376261
1.576708885
1.699340311
1.805270363
35.30322888
1.621167349
1.448849992
1.453619459
1.066446023
1.457491791
1.424314929
1.752159242
1.60635481
1.577422778
0.601320086
0.528019468
0.644439472
2.025284209
0.712955418
1.128283965
0.57878808
0.783207313
2.046651259
1.836142044
0.988526037
0.773086198
0.617936418
1.257338806
1.728233912
1.719748532
829
818
5754
6259
1027
1027
5295
952
655
550
779
501
1191
700
389
384
465
412
346
346
517
562
540
528
487
499
282
521
560
599
617
614
567
399
241
616
360
181
634
354
671
729
844
14815
598
543
548
464
617
631
566
546
550
256
230
288
636
245
411
262
347
859
803
448
327
228
484
662
662
Gene: Os08g0187800, CDS: Os08g0187800,
N/A
mRNA:High
Os08g0187800
Gene: Os08g0187800
N/A
High
Gene: Os08g0187800, CDS: Os08g0187800,
N/A
mRNA:High
Os08g0187800
Gene: Os08g0187800
N/A
High
Gene: Os08g0187800
N/A
High
Gene: Os08g0187800
N/A
High
Gene: Os08g0187800
N/A
High
Gene: Os08g0187800
N/A
High
Gene: Os08g0187800
N/A
High
Gene: Os08g0187800
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900, CDS: Os04g0164900,
N/A
mRNA:High
Os04g0164900
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900, CDS: Os04g0164900,
Ser217Asn
mRNA:Low
Os04g0164900
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
Low
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Gene: Os04g0164900
N/A
High
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
SPHOL
SPHOL
SPHOL
SPHOL
SPHOL
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
3419
3513
3608
3612
3634
3666
3719
3878
3891
4030
4081
4082
4088
4131
4163
4181
4190
4198
4284
4309
4327
4347
4612
4703
4720
4985
5020
5062
5100
5127
5161
5259
5266
5295
5306
5319
5409
5418
5425
5441
5727
5729
5735
6160
7176
7837
7845
7859
8136
8708
3645
3683
3770
3785
6447
919
1132
1212
1231
2268
2334
2611
2712
3544
3773
4021
4335
4340
4399
2724
2818
2913
2917
2939
2971
3024
3183
3196
3335
3386
3387
3393
3436
3468
3486
3495
3503
3589
3614
3632
3652
3917
4008
4025
4290
4325
4367
4405
4432
4466
4564
4571
4600
4611
4624
4714
4723
4730
4746
5032
5034
5040
5465
6481
7142
7150
7164
7441
8013
3539
3577
3664
3679
6341
709
922
1002
1021
2058
2124
2401
2502
3334
3563
3811
4125
4130
4189
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
C
A
C
A
A
G
C
T
C
T
T
A
G
T
T
T
T
C
G
C
G
T
C
G
A
T
G
T
T
T
A
C
C
C
A
A
G
G
A
A
T
C
G
T
C
G
A
G
T
C
T
G
C
A
C
A
T
G
C
G
C
C
G
A
A
T
T
T
A
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
C/A
A/G
C/T
A/C
A/G
G/C
C/T
T/C
C/T
T/G
T/C
A/C
G/A
T/C
T/C
T/G
T/A
C/T
G/A
C/G
G/T
T/A
C/T
G/C
A/G
T/C
G/A
T/C
T/C
T/G
A/T
C/T
C/T
C/G
A/G
A/T
G/A
G/T
A/G
A/T
T/A
C/A
G/A
A/T
T/C
G/A
A/G
G/C
T/A
T/C
A
A
A
T
C/G
A/G
T/A
G/A
C/T
G/T
C/A
C/A
G/A
A/C
A/G
T/C
T/C
T/A
A/T
98.5/1.5
99.4/0.6
99.0/1.0
99.4/0.6
98.5/1.5
98.9/1.1
99.1/0.9
98.9/1.1
98.9/1.1
98.8/1.2
99.0/1.0
99.0/1.0
98.8/1.1
98.8/1.2
98.8/1.1
98.8/1.1
99.0/1.0
98.8/1.2
98.6/1.3
98.6/1.4
98.7/1.3
98.8/1.2
99.4/0.6
98.7/1.3
98.9/1.1
98.6/1.4
99.5/0.5
99.3/0.7
98.6/1.4
97.7/2.3
97.8/2.2
99.2/0.8
99.2/0.8
99.1/0.9
98.9/1.1
98.6/1.4
98.7/1.3
98.8/1.2
98.4/1.6
99.3/0.7
99.4/0.6
99.4/0.6
99.3/0.7
67.2/32.8
69.2/30.8
99.4/0.6
99.4/0.6
99.4/0.6
99.5/0.5
68.3/31.7
37581/555
49128/303
36594/374
38365/221
39035/584
42292/461
43624/400
42999/476
43089/487
34510/416
40136/422
40054/411
40780/474
41352/510
46158/536
46654/536
45224/472
47443/577
37601/510
39186/553
40404/542
43039/522
39027/236
40860/530
40899/456
32932/478
35331/186
36342/262
40252/577
37613/891
34930/772
39071/325
40306/335
41503/385
42190/465
41905/590
31268/416
31067/392
30639/503
32769/235
129483/779
127023/775
114076/802
10477/5105
13740/6119
19591/121
19142/119
18033/101
18870/98
11092/5153
100
100
100
100
98.9/1.1
99.5/0.5
99.5/0.5
99.5/0.5
99.4/0.5
99.3/0.7
99.1/0.9
99.4/0.6
99.5/0.5
99.4/0.6
93.5/6.5
99.4/0.6
98.1/1.9
98.2/1.8
98.5/1.5
69231
30446
19556
20255
89/1
36518/193
37891/205
37864/199
36320/199
32010/211
30531/284
34113/190
37591/193
42375/255
18944/1313
4922/28
83965/1607
87233/1627
33814/507
38140
49431
36975
38590
39624
42768
44028
43478
43581
34928
40558
40469
41261
41863
46696
47200
45698
48024
38116
39749
40946
43566
39266
41394
41355
33413
35525
36605
40830
38510
35705
39404
40648
41898
42655
42498
31687
31460
31142
33011
130270
127810
114886
15582
19859
19717
19262
18139
18971
16246
69244
30450
19561
20261
90
36711
38097
38071
36523
32225
30820
34304
37789
42635
20258
4950
85576
88866
34322
C
A
C
A
A
G
C
T
C
T
T
A
G
T
T
T
T
C
G
C
G
T
C
G
A
T
G
T
T
T
A
C
C
C
A
A
G
G
A
A
T
C
G
A
T
G
A
G
T
T
A
A
A
T
C
A
T
G
C
G
C
C
G
A
A
T
T
T
A
98.53434714
99.38702434
98.96957404
99.4169474
98.51352716
98.88701833
99.08240211
98.89829339
98.87106767
98.8032524
98.95951477
98.97452371
98.83425026
98.77935169
98.84786705
98.84322034
98.96275548
98.79018824
98.64886137
98.58361217
98.67630538
98.79034109
99.39133092
98.70995796
98.89735219
98.56044055
99.4539057
99.28151892
98.58437423
97.67073487
97.82943565
99.15490813
99.15863019
99.05723424
98.90985816
98.60464022
98.67769117
98.75079466
98.38481793
99.26691103
99.39587012
99.38424223
99.29495326
67.23783853
69.18777381
99.36095755
99.37701173
99.41562379
99.46760846
68.27526776
99.98122581
99.98686371
99.97443893
99.97038646
98.88888889
99.47427202
99.45927501
99.45627906
99.44418586
99.33281614
99.06229721
99.44321362
99.476038
99.39017239
93.51367361
99.43434343
98.11746284
98.16240182
98.51989977
37581
49128
36594
38365
39035
42292
43624
42999
43089
34510
40136
40054
40780
41352
46158
46654
45224
47443
37601
39186
40404
43039
39027
40860
40899
32932
35331
36342
40252
37613
34930
39071
40306
41503
42190
41905
31268
31067
30639
32769
129483
127023
114076
10477
13740
19591
19142
18033
18870
11092
69231
30446
19556
20255
89
36518
37891
37864
36320
32010
30531
34113
37591
42375
18944
4922
83965
87233
33814
A
G
T
C
G
C
T
C
T
G
C
C
A
C
C
G
A
T
A
G
T
A
T
C
G
C
A
C
C
G
T
T
T
G
G
T
A
T
G
T
A
A
A
T
C
A
G
C
A
C
1.455165181
0.612975663
1.011494253
0.572687225
1.47385423
1.077908717
0.908512765
1.094806569
1.117459443
1.19102153
1.040485231
1.015592182
1.148784567
1.218259561
1.147849923
1.13559322
1.032867959
1.201482592
1.338020779
1.391229968
1.323694622
1.198182069
0.60102888
1.280378799
1.102647806
1.430580912
0.523574947
0.715749215
1.413176586
2.313684757
2.162162162
0.824789361
0.82414879
0.918898277
1.090141836
1.388300626
1.312841228
1.246026701
1.615182069
0.711883917
0.597988793
0.606368829
0.698083317
32.76216147
30.81222619
0.613683623
0.617796698
0.556811291
0.516577935
31.71857688
555
303
374
221
584
461
400
476
487
416
422
411
474
510
536
536
472
577
510
553
542
522
236
530
456
478
186
262
577
891
772
325
335
385
465
590
416
392
503
235
779
775
802
5105
6119
121
119
101
98
5153
G
G
A
A
T
T
A
A
A
C
G
C
C
A
T
1.111111111
0.525727983
0.538100113
0.522707573
0.544862142
0.65477114
0.921479559
0.553871269
0.510730636
0.598100152
6.481390068
0.565656566
1.877862952
1.830846443
1.477186644
1
193
205
199
199
211
284
190
193
255
1313
28
1607
1627
507
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os03g0758100, mRNA: Os03g0758100
N/A
Gene: Os03g0758100
N/A
Gene: Os03g0758100
N/A
Gene: Os03g0758100
N/A
Gene: Os03g0758100
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700, mRNA: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700, mRNA: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
High
Low
High
Low
High
High
Low
High
High
High
High
High
High
High
High
High
High
High
High
High
High
High
Low
High
High
High
Low
Low
High
High
High
Low
Low
Low
High
High
High
High
High
Low
Low
Low
Low
High
High
Low
Low
Low
Low
High
Low
Low
Low
Low
High
Low
Low
Low
Low
Low
Low
Low
Low
Low
High
Low
High
High
High
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
4421
4487
4563
4612
4619
4666
4675
4696
4712
4715
4833
4875
4876
4908
4975
5127
5188
5200
5249
5271
5307
5361
5379
5437
5439
5538
5545
5579
5613
5636
5656
5723
5822
5846
5859
5872
5895
5939
6087
6090
6110
6443
6674
6727
6759
6828
6842
6939
6988
7028
7052
7100
7101
7115
7171
7177
7253
7254
7255
13
21
33
67
71
72
77
78
80
81
4211
4277
4353
4402
4409
4456
4465
4486
4502
4505
4623
4665
4666
4698
4765
4917
4978
4990
5039
5061
5097
5151
5169
5227
5229
5328
5335
5369
5403
5426
5446
5513
5612
5636
5649
5662
5685
5729
5877
5880
5900
6233
6464
6517
6549
6618
6632
6729
6778
6818
6842
6890
6891
6905
6961
6967
7043
7044
7045
13
21
33
67
71
72
77
78
80
81
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
Complex SNP
Complex SNP
Complex SNP
SNP
SNP
Complex SNP
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
G
C
G
C
T
C
C
C
T
T
A
G
C
C
G
A
G
G
G
G
A
T
C
G
A
G
C
C
G
A
C
G
T
C
G
T
A
A
T
G
A
C
A
C
T
A
T
C
A
C
A
T
T
T
A
G
C
C
A
G
G
G
G
G
G
G
G
G
G
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
2
2
3
G/A
C/T
G/A
C/T
T/C
C/T
C/T
C/T
T/G
T/G
A/C
G/A
C/T
C/T
G/T
A/T
G/A
G/A
G/A
G/T
A/G
T/C
C/G
G/A
A/C
G/T
C/A
C/T
G/A
A/T
C/T
G/T
T/C
C/T
G/A
T/C
A/G
A/G
T/C
G/A
A/G
C/T
A/C
C/G
T/G
A/G
T/C
C/T
A/G
C/A
A/C
T/A
T/C
T/C
A/G
G/C
C/T
C/T
A/G
G/T
G/T
G/T
G/T
G/C/T
G/T/A
G/T/A
G/T
G/A
G/T/A
98.3/1.7
98.1/1.9
98.0/2.0
98.1/1.8
98.1/1.9
98.1/1.9
98.1/1.8
98.1/1.9
98.2/1.8
98.1/1.9
98.2/1.8
98.4/1.6
98.4/1.6
98.0/2.0
97.9/2.1
98.0/2.0
97.6/2.4
97.7/2.3
98.3/1.7
98.0/1.8
98.0/2.0
97.7/2.3
97.5/2.5
98.0/2.0
98.0/2.0
98.2/1.8
98.1/1.9
97.5/2.5
97.7/2.3
97.8/2.2
97.8/2.2
98.0/2.0
97.9/2.1
97.8/2.2
97.8/2.2
97.8/2.2
98.0/2.0
98.1/1.8
98.2/1.8
98.2/1.8
98.3/1.7
98.2/1.8
98.8/1.2
98.7/1.3
98.1/1.9
97.3/2.7
97.9/2.1
98.5/1.5
98.6/1.4
97.9/2.1
97.7/2.3
98.3/1.7
98.3/1.7
97.4/2.6
98.0/2.0
92.8/7.2
97.7/2.2
97.8/2.2
97.8/2.2
97.8/2.2
98.4/1.6
98.9/1.1
97.9/2.1
95.5/2.2/2.2
90.8/8.0/1.1
87.2/11.5/1.3
96.2/3.8
98.8/1.2
93.8/4.9/1.2
21836/382
28376/558
34697/710
34464/648
34265/654
35320/666
34568/651
34719/655
34877/647
33983/646
35883/667
33423/535
33573/541
33040/662
34187/728
29893/601
32098/774
33341/768
35903/630
32696/610
31586/633
30340/701
31297/795
34033/680
33044/682
36486/678
36651/709
37012/939
38556/911
36868/831
36553/830
40860/830
36753/779
36993/833
37409/842
37961/855
38508/799
36385/685
31938/572
32045/581
29831/523
23554/432
23938/301
23066/295
24674/477
24897/692
24700/534
17105/260
14880/207
38472/806
30825/712
14591/254
15157/260
7899/211
2050/41
1669/129
44447/1016
44786/1001
44769/1020
44/1
61/1
92/1
95/2
85/2/2
79/7/1
68/9/1
77/3
79/1
76/4/1
22220
28937
35407
35116
34920
35990
35224
35378
35525
34634
36551
33962
34116
33712
34919
30497
32875
34115
36542
33368
32222
31042
32097
34716
33730
37167
37366
37958
39473
37702
37386
41693
37533
37827
38257
38816
39309
37071
32510
32631
30358
23988
24240
23366
25153
25590
25234
17367
15087
39283
31538
14847
15417
8111
2091
1799
45474
45803
45791
45
62
93
97
89
87
78
80
80
81
G
C
G
C
T
C
C
C
T
T
A
G
C
C
G
A
G
G
G
G
A
T
C
G
A
G
C
C
G
A
C
G
T
C
G
T
A
A
T
G
A
C
A
C
T
A
T
C
A
C
A
T
T
T
A
G
C
C
A
G
G
G
G
G
G
G
G
G
G
98.27182718
98.06130559
97.9947468
98.1432965
98.12428408
98.13837177
98.13763343
98.13726044
98.17593244
98.12034417
98.17241662
98.4129321
98.40837144
98.00664452
97.90372004
98.01947733
97.6365019
97.73120328
98.25132724
97.98609446
98.02619328
97.73854777
97.50755522
98.03260744
97.96620219
98.16772944
98.08649574
97.50777175
97.67689307
97.78791576
97.77189322
98.0020627
97.92182879
97.79522563
97.78341219
97.79730008
97.96229871
98.14949691
98.24054137
98.20416169
98.26404902
98.19076205
98.75412541
98.7160832
98.09565459
97.2919109
97.88380756
98.49139172
98.62795784
97.93549373
97.73923521
98.27574594
98.31354998
97.38626557
98.03921569
92.7737632
97.74156661
97.77962142
97.76812037
97.77777778
98.38709677
98.92473118
97.93814433
95.50561798
90.8045977
87.17948718
96.25
98.75
93.82716049
21836
28376
34697
34464
34265
35320
34568
34719
34877
33983
35883
33423
33573
33040
34187
29893
32098
33341
35903
32696
31586
30340
31297
34033
33044
36486
36651
37012
38556
36868
36553
40860
36753
36993
37409
37961
38508
36385
31938
32045
29831
23554
23938
23066
24674
24897
24700
17105
14880
38472
30825
14591
15157
7899
2050
1669
44447
44786
44769
44
61
92
95
85
79
68
77
79
76
A
T
A
T
C
T
T
T
G
G
C
A
T
T
T
T
A
A
A
T
G
C
G
A
C
T
A
T
A
T
T
T
C
T
A
C
G
G
C
A
G
T
C
G
G
G
C
T
G
A
C
A
C
C
G
C
T
T
G
T
T
T
T
C
T
T
T
A
T
1.719171917
1.928327055
2.005253199
1.845312678
1.872852234
1.850514032
1.848171701
1.851433094
1.821252639
1.865219149
1.824847473
1.57529003
1.585766209
1.963692454
2.08482488
1.970685641
2.354372624
2.251209146
1.724043566
1.828098777
1.964496307
2.258230784
2.476866997
1.958751008
2.021938927
1.824198886
1.897446877
2.473786817
2.30790667
2.204127102
2.220082384
1.990741851
2.075506887
2.202130753
2.20090441
2.202699918
2.032613396
1.847805562
1.759458628
1.780515461
1.722774886
1.80090045
1.241749175
1.262518189
1.896394068
2.704181321
2.116192439
1.497092186
1.372042155
2.051778123
2.257594014
1.710783323
1.686450023
2.601405499
1.960784314
7.170650361
2.234243744
2.185446368
2.227511956
2.222222222
1.612903226
1.075268817
2.06185567
2.247191011
8.045977011
11.53846154
3.75
1.25
4.938271605
382
558
710
648
654
666
651
655
647
646
667
535
541
662
728
601
774
768
630
610
633
701
795
680
682
678
709
939
911
831
830
830
779
833
842
855
799
685
572
581
523
432
301
295
477
692
534
260
207
806
712
254
260
211
41
129
1016
1001
1020
1
1
1
2
2
7
9
3
1
4
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700, mRNA: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700, mRNA: Os06g0160700
N/A
High
Gene: Os06g0160700, mRNA: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700, mRNA: Os06g0160700
N/A
High
Gene: Os06g0160700, mRNA: Os06g0160700
N/A
High
Gene: Os06g0160700, mRNA: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0160700
N/A
High
Gene: Os06g0229800, CDS: Os06g0229800, mRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
N/A
mRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
N/A
mRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Gly23StpmRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Gly24Ala,Val
mRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
N/A
mRNA:High
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Arg26Met,Lys
mRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Arg26SermRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Arg27LysmRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Arg27Ser,Arg
mRNA:Low
Os06g0229800
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIa
SSIIb
SSIIb
SSIIb
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
82
83
84
87
90
91
93
94
95
100
107
128
129
131
174
533
553
2524
4196
4327
4328
675
695
703
97
433
590
659
901
1058
1134
1357
1379
1457
1615
1680
1708
1722
1834
2024
2080
2276
2488
2618
2758
3073
3135
3136
3179
3274
3391
3481
3559
3779
4384
4493
4496
4742
4857
4926
5021
5047
5097
5110
5308
5330
5466
5515
5614
82
83
84
87
90
91
93
94
95
100
107
128
129
131
174
533
553
2524
4196
4327
4328
40
60
68
97
433
590
659
901
1058
1134
1357
1379
1457
1615
1680
1708
1722
1834
2024
2080
2276
2488
2618
2758
3073
3135
3136
3179
3274
3391
3481
3559
3779
4384
4493
4496
4742
4857
4926
5021
5047
5097
5110
5308
5330
5466
5515
5614
SNP
SNP
Complex SNP
SNP
Complex SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
G
G
G
G
G
G
G
G
G
G
C
G
C
G
T
G
G
G
A
G
C
C
G
G
G
G
A
C
C
T
A
G
A
A
C
G
G
G
C
G
C
T
C
C
G
G
C
G
C
G
T
G
T
A
G
A
A
C
G
T
T
A
C
A
A
C
G
G
T
2
2
3
2
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
G/T
G/T
G/T/A
G/T
G/T/A
G/T
G/T
G/T
G/T
G/T
C/A
G/A
C/A
G/T
T/C
G/T
G/T
G/C
G/A
G/T
C/T
C/A
G/A
G/T
G/A
G/A
A/G
C/T
C/T
T/A
A/G
G/A
A/C
A/C
C/T
G/A
G/A
G/A
C/T
G/A
C/T
T/C
C/T
C/T
G/A
G/A
C/A
G/A
C/T
G/A
T/A
G/A
T/A
A/T
G/A
A/T
A/G
C/T
G/A
T/A
T/C
A/G
C/T
A/C
A/C
C/T
G/A
G/A
T/C
96.3/3.7
95.5/4.5
95.3/3.5/1.2
90.9/9.1
93.1/4.6/2.3
97.6/2.4
97.6/2.4
97.5/2.5
97.4/2.6
95.7/4.3
98.8/1.2
99.2/0.8
99.2/0.8
99.2/0.8
98.6/1.4
99.4/0.6
99.1/0.9
96.2/3.8
87.9/12.1
63.9/36.1
55.6/44.4
99.5/0.5
61.7/38.3
97.6/2.4
87.3/12.7
87.5/12.5
80.7/19.3
94.7/5.2
78.4/21.5
94.8/5.2
95.0/5.0
87.6/12.4
94.7/5.3
87.7/12.3
79.9/20.1
94.3/5.6
94.2/5.8
94.7/5.3
79.8/20.2
87.0/13.0
86.9/13.1
62.6/37.4
87.7/12.3
87.2/12.8
87.3/12.7
87.0/13.0
80.0/20.0
87.2/12.8
86.9/13.1
87.1/12.9
88.8/11.2
88.2/11.8
62.8/37.2
92.0/8.0
89.6/10.3
58.2/41.8
57.7/42.3
89.7/10.3
91.2/8.8
56.1/43.9
58.1/41.9
92.6/7.4
72.2/27.8
56.4/43.6
71.8/28.2
95.5/4.5
95.3/4.7
56.9/43.0
95.8/4.2
79/3
84/4
82/3/1
80/8
81/4/2
83/2
80/2
78/2
76/2
67/3
84/1
126/1
128/1
124/1
71/1
169/1
109/1
25504/1015
3119/428
195/110
179/143
30150/152
3240/2011
1202/30
27329/3982
31199/4461
23183/5550
27668/1532
19741/5423
29861/1624
29780/1565
31457/4449
34679/1938
25359/3556
24247/6083
29003/1737
28942/1773
29504/1641
19948/5052
23646/3538
23232/3502
15456/9219
21796/3053
26402/3888
27977/4064
26439/3948
23988/5982
26441/3884
29015/4379
22993/3412
30877/3883
25143/3354
2011/1190
11550/1010
32666/3770
20437/14656
20230/14817
34312/3946
36783/3553
20884/16322
18861/13618
39441/3156
30372/11693
24651/19025
30418/11945
40705/1911
51381/2531
22510/17017
41044/1807
82
88
86
88
87
85
82
80
78
70
85
127
129
125
72
170
110
26522
3549
305
322
30305
5252
1232
31313
35667
28734
29206
25166
31485
31345
35910
36617
28918
30335
30745
30717
31148
25006
27186
26740
24676
24850
30292
32043
30390
29972
30330
33396
26406
34764
28500
3201
12561
36442
35099
35050
38262
40346
37212
32483
42599
42067
43679
42374
42627
53918
39531
42853
G
G
G
G
G
G
G
G
G
G
C
G
C
G
T
G
G
G
G
G
C
C
G
G
G
G
A
C
C
T
A
G
A
A
C
G
G
G
C
G
C
T
C
C
G
G
C
G
C
G
T
G
T
A
G
A
A
C
G
T
T
A
C
A
A
C
G
G
T
96.34146341
95.45454545
95.34883721
90.90909091
93.10344828
97.64705882
97.56097561
97.5
97.43589744
95.71428571
98.82352941
99.21259843
99.2248062
99.2
98.61111111
99.41176471
99.09090909
96.1616771
87.88391096
63.93442623
55.59006211
99.48853325
61.69078446
97.56493506
87.27684987
87.47301427
80.6814227
94.73395878
78.44313757
94.84198825
95.00717818
87.59955444
94.70737636
87.6927865
79.93077303
94.33403805
94.2214409
94.72197252
79.77285451
86.97859192
86.88107704
62.63575944
87.71026157
87.15832563
87.31080111
86.99901283
80.03469905
87.17771184
86.88166247
87.07490722
88.81889311
88.22105263
62.82411746
91.95127776
89.6383294
58.22673011
57.71754636
89.67644138
91.16888911
56.12168118
58.06421821
92.58668044
72.19911094
56.43673161
71.78458489
95.49112065
95.29470678
56.9426526
95.77859193
79
84
82
80
81
83
80
78
76
67
84
126
128
124
71
169
109
25504
3119
195
179
30150
3240
1202
27329
31199
23183
27668
19741
29861
29780
31457
34679
25359
24247
29003
28942
29504
19948
23646
23232
15456
21796
26402
27977
26439
23988
26441
29015
22993
30877
25143
2011
11550
32666
20437
20230
34312
36783
20884
18861
39441
30372
24651
30418
40705
51381
22510
41044
T
T
T
T
T
T
T
T
T
T
A
A
A
T
C
T
T
C
A
T
T
A
A
T
A
A
G
T
T
A
G
A
C
C
T
A
A
A
T
A
T
C
T
T
A
A
A
A
T
A
A
A
A
T
A
T
G
T
A
A
C
G
T
C
C
T
A
A
C
3.658536585
4.545454545
3.488372093
9.090909091
4.597701149
2.352941176
2.43902439
2.5
2.564102564
4.285714286
1.176470588
0.787401575
0.775193798
0.8
1.388888889
0.588235294
0.909090909
3.827011538
12.05973514
36.06557377
44.40993789
0.501567398
38.29017517
2.435064935
12.71676301
12.50735974
19.3150971
5.245497501
21.5489152
5.158011752
4.992821822
12.3893066
5.292623645
12.29683934
20.05274435
5.649699138
5.772048052
5.268396045
20.20315124
13.01405135
13.09648467
37.36018804
12.28571429
12.83507197
12.68295728
12.9911155
19.95862805
12.80580284
13.11234878
12.92130576
11.16960074
11.76842105
37.17588254
8.040761086
10.34520608
41.75617539
42.27389444
10.31310439
8.806325286
43.86219499
41.92346766
7.408624616
27.79613474
43.5564001
28.1894558
4.483074108
4.694165214
43.04722876
4.216740952
3
4
3
8
4
2
2
2
2
3
1
1
1
1
1
1
1
1015
428
110
143
152
2011
30
3982
4461
5550
1532
5423
1624
1565
4449
1938
3556
6083
1737
1773
1641
5052
3538
3502
9219
3053
3888
4064
3948
5982
3884
4379
3412
3883
3354
1190
1010
3770
14656
14817
3946
3553
16322
13618
3156
11693
19025
11945
1911
2531
17017
1807
Gene: Os06g0229800, CDS: Os06g0229800,
Gly28TrpmRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Gly28ValmRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
N/A
mRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Arg29SermRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
N/A
mRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Val31LeumRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
N/A
mRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Gly32CysmRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Gly32ValmRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Ala34SermRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Pro36GlnmRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Gly43AspmRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
N/A
mRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Arg44LeumRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
N/A
mRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Gly135Val
mRNA:Low
Os06g0229800
Gene: Os06g0229800, CDS: Os06g0229800,
Ala142SermRNA:Low
Os06g0229800
Gene: Os06g0229800
N/A
High
Gene: Os06g0229800, Gene: Os06g0229900,
Met737Val
CDS: Os06g0229800,
High
CDS: Os06g0229900, mRNA: Os06g0229800, mRNA: Os06g0229900
Gene: Os06g0229800, Gene: Os06g0229900,
Pro32ThrCDS: Os06g0229800,
High
CDS: Os06g0229900, mRNA: Os06g0229800, mRNA: Os06g0229900
Gene: Os06g0229800, Gene: Os06g0229900,
Leu781Phe
CDS: Os06g0229800,
High
CDS: Os06g0229900, mRNA: Os06g0229800, mRNA: Os06g0229900
Gene: Os02g0744700, Gene: Os02g0744800,
N/A
mRNA:
Low
Os02g0744700, mRNA: Os02g0744800
Gene: Os02g0744700, Gene: Os02g0744800,
N/A
mRNA:
High
Os02g0744700, mRNA: Os02g0744800
Gene: Os02g0744700, Gene: Os02g0744800,
N/A
mRNA:
High
Os02g0744700, mRNA: Os02g0744800
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
Low
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
5898
5925
6086
6204
6242
6414
6421
6507
6678
6699
6752
6795
6858
6886
6895
6998
7004
7395
7443
7444
7522
7660
7810
7869
7920
8102
8159
8456
8491
8519
8888
9076
9366
9467
9517
10336
10761
11041
1315
2155
2348
2493
3683
3686
3933
4543
5032
5057
5321
5333
5384
5451
5927
5945
5949
5967
5969
6475
7232
7255
7437
8079
8530
8544
1349
1748
2222
2874
3246
5898
5925
6086
6204
6242
6414
6421
6507
6678
6699
6752
6795
6858
6886
6895
6998
7004
7377
7425
7426
7504
7642
7792
7851
7902
8084
8141
8438
8473
8501
8870
9058
9348
9449
9499
10318
10743
11023
316
1170
1363
1508
2698
2701
2948
3558
4047
4072
4336
4348
4399
4466
4941
4959
4963
4981
4983
5489
6246
6269
6451
7093
7544
7558
141
542
1022
1674
2046
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
C
A
A
C
T
A
A
T
A
C
T
G
C
A
C
C
T
C
C
G
A
A
T
G
G
T
T
C
T
G
G
C
A
T
T
G
C
T
T
A
A
A
A
A
A
C
T
C
T
C
G
T
C
C
G
C
T
C
T
C
A
A
C
A
T
A
A
A
T
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
C/T
A/G
A/G
C/T
T/C
A/T
A/G
T/C
A/G
C/T
T/C
G/A
C/T
A/G
C/T
C/T
T/C
C/T
C/A
A/G
A/G
A/T
T/C
G/T
A/G
T/A
G/T
C/T
A/T
A/G
A/G
C/A
A/T
C/T
C/T
A/G
T/C
A/T
T/C
A/T
A/G
A/G
A/G
A/T
A/G
C/A
T/C
C/T
T/G
C/T
G/A
T/C
A
A
A
A
A
C/T
T/G
C/A
A/C
A/T
C/T
A/G
T/C
A/G
A/G
A/T
T/G
89.0/11.0
56.9/43.1
89.6/10.3
89.6/10.4
89.5/10.5
89.8/10.2
95.4/4.6
96.0/4.0
99.0/0.9
90.7/9.3
96.0/4.0
92.7/7.3
59.6/40.4
58.0/42.0
56.9/43.1
92.6/7.4
93.0/7.0
61.2/38.8
99.4/0.6
59.8/40.2
95.4/4.6
52.5/47.5
95.0/5.0
94.9/5.1
57.9/42.1
94.8/5.2
61.5/38.4
97.4/2.6
62.1/37.9
63.2/36.8
61.8/38.2
94.9/5.1
94.9/5.1
59.1/40.9
59.5/40.5
54.3/45.7
51.4/48.6
58.0/42.0
92.7/7.3
90.1/9.9
90.6/9.3
90.2/9.8
90.8/9.2
91.2/8.8
91.2/8.8
90.0/10.0
90.8/9.2
91.4/8.6
90.0/10.0
90.5/9.5
90.2/9.8
90.6/9.4
100
100
100
100
100
94.7/5.3
89.2/10.7
90.3/9.7
90.6/9.4
90.7/9.3
77.7/22.3
79.2/20.8
86.7/13.3
87.6/12.4
87.9/12.1
87.7/12.3
87.9/12.1
45236/5592
27586/20925
44693/5159
42505/4917
36574/4300
36697/4155
41615/1999
39114/1627
58954/565
49110/5048
25338/1051
22616/1779
23914/16215
24363/17639
20295/15387
18662/1488
21015/1588
1337/847
5756/33
3494/2346
17729/863
11967/10843
33289/1747
33544/1809
16585/12049
27061/1470
16935/10581
38756/1015
21756/13254
17456/10182
20575/12720
31344/1686
35309/1904
15656/10851
17021/11575
10401/8753
7587/7177
12822/9303
50816/3984
36870/4035
50981/5256
54436/5904
38648/3893
38783/3725
48003/4622
36879/4090
16420/1660
41687/3933
22468/2492
26146/2738
29827/3253
31071/3236
27741
31417
31395
30019
29935
25219/1424
31919/3841
37021/3996
33029/3437
23640/2431
3450/990
6541/1716
21321/3279
9650/1367
12246/1692
12149/1698
11799/1627
50832
48515
49857
47427
40875
40856
43617
40747
59522
54160
26391
24398
40135
42006
35685
20151
22604
2184
5793
5841
18592
22811
35038
35358
28636
28532
27519
39772
35012
27640
33300
33033
37216
26511
28598
19156
14765
22125
54800
40913
56241
60351
42544
42510
52633
40974
18081
45629
24961
28892
33085
34309
27744
31422
31400
30022
29940
26644
35764
41020
36466
26071
4441
8257
24602
11017
13938
13849
13427
C
A
A
C
T
A
A
T
A
C
T
G
C
A
C
C
T
C
C
A
A
A
T
G
A
T
G
C
A
A
A
C
A
C
C
A
T
A
T
A
A
A
A
A
A
C
T
C
T
C
G
T
A
A
A
A
A
C
T
C
A
A
C
A
T
A
A
A
T
88.99118665
56.86076471
89.6423772
89.62194531
89.47767584
89.82034463
95.41004654
95.99234299
99.04573099
90.67577548
96.01000341
92.69612263
59.58390432
57.99885731
56.87263556
92.61078855
92.97027075
61.21794872
99.36129812
59.81852423
95.35821859
52.46153172
95.00827673
94.86961932
57.91660846
94.84438525
61.53930012
97.445439
62.1386953
63.15484805
61.78678679
94.88693125
94.87585985
59.054732
59.51814812
54.29630403
51.38503217
57.95254237
92.72992701
90.11805539
90.64739247
90.1990025
90.84242196
91.23265114
91.20323751
90.00585737
90.8135612
91.36075741
90.01241937
90.49563893
90.15263715
90.56224314
99.98918685
99.98408758
99.98407643
99.99000733
99.98329993
94.65170395
89.24896544
90.25109703
90.57478199
90.67546316
77.68520603
79.21763352
86.66368588
87.59190342
87.86052518
87.72474547
87.87517688
45236
27586
44693
42505
36574
36697
41615
39114
58954
49110
25338
22616
23914
24363
20295
18662
21015
1337
5756
3494
17729
11967
33289
33544
16585
27061
16935
38756
21756
17456
20575
31344
35309
15656
17021
10401
7587
12822
50816
36870
50981
54436
38648
38783
48003
36879
16420
41687
22468
26146
29827
31071
27741
31417
31395
30019
29935
25219
31919
37021
33029
23640
3450
6541
21321
9650
12246
12149
11799
T
G
G
T
C
T
G
C
G
T
C
A
T
G
T
T
C
T
A
G
G
T
C
T
G
A
T
T
T
G
G
A
T
T
T
G
C
T
C
T
G
G
G
T
G
A
C
T
G
T
A
C
11.00094429
43.13099042
10.34759412
10.36751218
10.51987768
10.16986489
4.583075406
3.992931995
0.949228857
9.320531758
3.982418249
7.291581277
40.40114613
41.99162024
43.11895755
7.384248921
7.025305256
38.78205128
0.56965303
40.16435542
4.641781411
47.53408443
4.986015184
5.116239606
42.07640732
5.152109912
38.44979832
2.552046666
37.85559237
36.83791606
38.1981982
5.103986922
5.116079106
40.93017993
40.47485838
45.69325538
48.60819506
42.04745763
7.270072993
9.862390927
9.345495279
9.782770791
9.150526514
8.762644084
8.781562898
9.981939767
9.180908136
8.619518289
9.983574376
9.476671743
9.832250264
9.431927483
5592
20925
5159
4917
4300
4155
1999
1627
565
5048
1051
1779
16215
17639
15387
1488
1588
847
33
2346
863
10843
1747
1809
12049
1470
10581
1015
13254
10182
12720
1686
1904
10851
11575
8753
7177
9303
3984
4035
5256
5904
3893
3725
4622
4090
1660
3933
2492
2738
3253
3236
T
G
A
C
T
T
G
C
G
G
T
G
5.344542861
10.73985013
9.741589469
9.425218011
9.324536842
22.29227651
20.78236648
13.3281847
12.40809658
12.13947482
12.26081306
12.11737544
1424
3841
3996
3437
2431
990
1716
3279
1367
1692
1698
1627
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
Low
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
Low
N/A
High
N/A
High
N/A
High
N/A
Low
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
N/A
High
Gene: Os08g0191500, mRNA: Os08g0191500
N/A
High
Gene: Os04g0624600, CDS: Os04g0624600,
Thr1176Ala
mRNA:High
Os04g0624600
Gene: Os04g0624600, CDS: Os04g0624600,
N/A
mRNA:High
Os04g0624600
Gene: Os04g0624600
N/A
High
Gene: Os04g0624600, CDS: Os04g0624600,
N/A
mRNA:High
Os04g0624600
Gene: Os04g0624600
N/A
High
Gene: Os04g0624600
N/A
High
Gene: Os04g0624600
N/A
High
Gene: Os04g0624600, CDS: Os04g0624600,
Ser756IlemRNA:High
Os04g0624600
Gene: Os04g0624600
N/A
High
Gene: Os04g0624600
N/A
High
Gene: Os04g0624600, CDS: Os04g0624600,
N/A
mRNA:High
Os04g0624600
Gene: Os04g0624600, CDS: Os04g0624600,
N/A
mRNA:High
Os04g0624600
Gene: Os04g0624600, CDS: Os04g0624600,
N/A
mRNA:High
Os04g0624600
Gene: Os04g0624600, CDS: Os04g0624600,
Glu643Gly
mRNA:High
Os04g0624600
Gene: Os04g0624600
N/A
Low
Gene: Os04g0624600
N/A
Low
Gene: Os04g0624600
N/A
Low
Gene: Os04g0624600
N/A
Low
Gene: Os04g0624600
N/A
Low
Gene: Os04g0624600, CDS: Os04g0624600,
Glu460Lys
mRNA:High
Os04g0624600
Gene: Os04g0624600, CDS: Os04g0624600,
Lys207Asn
mRNA:High
Os04g0624600
Gene: Os04g0624600, CDS: Os04g0624600,
Val200Phe
mRNA:High
Os04g0624600
Gene: Os04g0624600, CDS: Os04g0624600,
Phe139Cys
mRNA:High
Os04g0624600
Gene: Os04g0624600
N/A
High
N/A
High
N/A
High
N/A
High
Gene: Os01g0720600, mRNA: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
3423
3575
4048
4394
4604
5720
6389
6800
6901
7160
7289
7401
7506
7702
7744
7823
8383
8772
9016
9109
10020
10411
2223
2375
2848
3194
3404
4520
5189
5600
5701
5960
6089
6201
6306
6502
6544
6623
7183
7572
7816
7909
8820
9211
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
SNP
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
List of Indels
Mapping
AGPS2b
AGPS2b
AGPS2b
AGPS2b
BEI
BEI
BEI
BEI
BEI
BEI
BEI
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
BEIIb
Isoamylase1
Isoamylase1
Isoamylase1
Isoamylase1
Isoamylase1
T
T
C
G
T
G
C
T
C
A
T
A
A
T
T
T
C
G
A
C
G
A
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
T/C
T/A
C/T
G/A
T/A
G/A
C/T
T/C
C/T
A/G
T/G
A/T
A/T
T/C
T/A
T/C
C/A
G/A
A/G
C/T
G/A
A/C
88.9/11.1
88.2/11.7
88.3/11.7
86.8/13.2
87.8/12.2
87.7/12.3
86.7/13.3
87.0/13.0
87.9/12.1
74.1/25.9
87.1/12.9
87.4/12.6
86.8/13.2
87.5/12.5
87.7/12.2
87.1/12.9
86.9/13.1
86.1/13.9
87.5/12.5
86.7/13.3
88.5/11.5
88.5/11.5
9943/1242
8572/1141
9640/1279
12943/1972
11431/1586
10685/1504
22821/3487
25532/3807
22567/3104
15409/5383
23450/3474
18706/2692
24225/3676
22936/3270
23961/3341
21389/3176
22602/3406
22505/3632
21481/3066
24682/3776
5112/662
8892/1152
11186
9715
10919
14915
13018
12190
26312
29340
25676
20794
26924
21399
27903
26207
27307
24565
26013
26138
24549
28462
5775
10046
T
T
C
G
T
G
C
T
C
A
T
A
A
T
T
T
C
G
A
C
G
A
88.88789558
88.23468863
88.28647312
86.778411
87.80918728
87.6538146
86.73228945
87.02113156
87.89141611
74.10310667
87.09701382
87.41529978
86.81862165
87.5186019
87.74673161
87.07103603
86.88732557
86.1006963
87.50254593
86.71913428
88.51948052
88.51284093
9943
8572
9640
12943
11431
10685
22821
25532
22567
15409
23450
18706
24225
22936
23961
21389
22602
22505
21481
24682
5112
8892
C
A
T
A
A
A
T
C
T
G
G
T
T
C
A
C
A
A
G
T
A
C
11.10316467
11.74472465
11.71352688
13.221589
12.18313105
12.33798195
13.25250836
12.97546012
12.08911045
25.88727518
12.90298618
12.5800271
13.17421066
12.47758233
12.23495807
12.92896397
13.09345327
13.89547785
12.4893071
13.26681189
11.46320346
11.46725065
1242
1141
1279
1972
1586
1504
3487
3807
3104
5383
3474
2692
3676
3270
3341
3176
3406
3632
3066
3776
662
1152
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600, CDS: Os01g0720600,
Gly708Asp
mRNA:High
Os01g0720600
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600, CDS: Os01g0720600,
Val480AlamRNA:High
Os01g0720600
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600, CDS: Os01g0720600,
Ser469Thr
mRNA:High
Os01g0720600
Gene: Os01g0720600, CDS: Os01g0720600,
N/A
mRNA:High
Os01g0720600
Gene: Os01g0720600, CDS: Os01g0720600,
N/A
mRNA:High
Os01g0720600
Gene: Os01g0720600, CDS: Os01g0720600,
His363Arg
mRNA:High
Os01g0720600
Gene: Os01g0720600, CDS: Os01g0720600,
Leu176Phe
mRNA:High
Os01g0720600
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600
N/A
High
Gene: Os01g0720600, mRNA: Os01g0720600
N/A
High
List of Insertion/Deletion
Reference Consensu Variation
position
s position type
3926
1579 INDEL
4245
1898 INDEL
4294
1947 INDEL
4305
1958 INDEL
5733
5733 INDEL
5733
5733 INDEL
6401
6401 INDEL
6574
6574 INDEL
6579
6579 INDEL
6772
6772 INDEL
6789
6789 INDEL
156
156 INDEL
408
409 INDEL
612
615 INDEL
612
615 INDEL
3127
3130 INDEL
3127
3130 INDEL
4058
4061 INDEL
4058
4061 INDEL
5378
5381 INDEL
5378
5381 INDEL
5813
5816 INDEL
7056
7058 INDEL
7056
7058 INDEL
7148
7151 INDEL
7724
7728 INDEL
8297
8301 INDEL
8309
8313 INDEL
8310
8314 INDEL
8311
8315 INDEL
8311
8315 INDEL
8311
8315 INDEL
9548
9552 INDEL
9600
9603 INDEL
9600
9603 INDEL
9642
9645 INDEL
3530
3530 INDEL
3539
3539 INDEL
3786
3786 INDEL
3970
3970 INDEL
4533
4533 INDEL
Length
1
1
4
1
1
1
1
1
4
2
1
1
2
1
1
1
1
1
1
1
1
1
2
1
1
8
2
2
1
1
3
5
1
1
1
3
1
1
2
1
1
Referenc
e
Variants
2
2
AACT
2
2
2
A
2
2
2
---2
GA
2
2
2
-2
A
2
2
T
2
2
2
A
2
A
2
2
T
2
-2
2
2
GGGGGGGG 3
AT
2
AA
3
2
2
--2
----2
T
2
2
T
2
TTA
2
2
2
TA
2
2
A
2
Allele
variations
Frequencies Counts
-/G
99.1/0.9
108/1
-/A
98.9/1.1
939/10
AACT/---98.7/1.1
11484/131
-/T
99.1/0.9
14049/121
-/A
97.9/2.1
12716/267
A/95.1/4.8
13121/665
-/G
92.0/8.0
4234/368
-/A
99.2/0.8
1159/9
----/CCTG
99.2/0.8
1553/12
GA/-99.4/0.5
3220/17
-/A
99.3/0.7
2630/19
G/81.4/18.6
5342/1218
CA/-69.0/31.0
3731/1675
A/96.6/3.3
14474/500
-/A
99.3/0.7
14574/101
T/96.9/2.9
6201/184
-/T
99.0/1.0
5923/57
-/A
98.1/1.9
11607/220
A/98.9/1.0
12075/128
A/98.9/1.1
2468/28
-/A
99.1/0.9
2344/21
-/T
73.1/26.8
5192/1902
--/AA
41.7/1.9
2455/110
A/56.4/41.7
3323/2455
T/81.0/19.0
7810/1829
GGGGGGGG/GGTGGGGG/-------97.2/1.4/0.7 138/2/1
AT/-98.7/1.3
3220/43
AA/AT/-64.9/33.9/1.1 1352/706/23
-/T
65.1/34.9
1356/726
-/T
64.7/15.2
1328/313
---/TAT
64.7/16.2
1328/332
-----/TATAT 64.7/3.8
1328/79
-/T
76.1/23.8
4778/1493
-/T
98.6/1.4
8114/112
T/96.4/3.5
8035/292
---/TTA
76.7/23.2
4224/1276
-/A
99.3/0.7
16623/113
-/T
98.2/1.8
17273/312
TA/-99.4/0.6
20518/124
-/A
99.0/0.9
17704/169
A/98.5/1.5
15699/240
Coverage
109
949
11639
14170
12991
13794
4602
1168
1565
3239
2649
6560
5408
14982
14676
6401
5981
11829
12205
2496
2366
7098
5894
5894
9642
142
3263
2082
2082
2053
2053
2053
6279
8232
8333
5505
16736
17585
20652
17877
15943
Variant Frequency of
Frequency of
#1
#1
Count of #1 Variant #2 #2
Count of #2 Variant #3
99.08256881
108 G
0.917431193
1
98.94625922
939 A
1.05374078
10
AACT
98.66827047
11484 ---1.125526248
131
99.14608327
14049 T
0.853916725
121
97.88314987
12716 A
2.055269032
267
A
95.12106713
13121 4.820936639
665
92.00347675
4234 G
7.996523251
368
99.22945205
1159 A
0.770547945
9
---99.23322684
1553 CCTG
0.766773163
12
GA
99.4133992
3220 -0.52485335
17
99.28274821
2630 A
0.717251793
19
G
81.43292683
5342 18.56707317
1218
CA
68.99038462
3731 -30.97263314
1675
A
96.60926445
14474 3.337338139
500
99.30498774
14574 A
0.688198419
101
T
96.8754882
6201 2.874550851
184
99.0302625
5923 T
0.95301789
57
98.1232564
11607 A
1.859835996
220
A
98.93486276
12075 1.048750512
128
A
98.87820513
2468 1.121794872
28
99.07016061
2344 A
0.887573964
21
73.14736546
5192 T
26.79628064
1902
-41.65252799
2455 AA
1.866304717
110
A
56.37936885
3323 41.65252799
2455
T
80.99979257
7810 18.96909355
1829
GGGGGGGG
97.18309859
138 GGTGGGGG1.408450704
2 -------AT
98.6821943
3220 -1.3178057
43
AA
64.93756004
1352 AT
33.90970221
706 -65.129683
1356 T
34.870317
726
64.68582562
1328 T
15.24598149
313
--64.68582562
1328 TAT
16.17145641
332
----64.68582562
1328 TATAT
3.848027277
79
76.09491957
4778 T
23.7776716
1493
98.56656948
8114 T
1.360544218
112
T
96.42385695
8035 3.504140166
292
--76.73024523
4224 TTA
23.17892825
1276
99.3248088
16623 A
0.675191205
113
98.22576059
17273 T
1.774239409
312
TA
99.35115243
20518 -0.600426109
124
99.03227611
17704 A
0.945348772
169
A
98.46954776
15699 1.505362855
240
Frequency of
#3
Count of #3 Variant #4
0.704225352
1
1.104707012
23
Frequency
of #4
Count of
#4
Overlappin
g
annotation Amino acid
s
change
Gene: Os08g0345800
N/A
Gene: Os08g0345800
N/A
Gene: Os08g0345800
N/A
Gene: Os08g0345800
N/A
Gene: Os06g0726400
N/A
Gene: Os06g0726400
N/A
Gene: Os06g0726400
N/A
Gene: Os06g0726400
N/A
Gene: Os06g0726400
N/A
Gene: Os06g0726400
N/A
Gene: Os06g0726400
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os02g0528200
N/A
Gene: Os08g0520900
N/A
Gene: Os08g0520900
N/A
Gene: Os08g0520900
N/A
Gene: Os08g0520900
N/A
Gene: Os08g0520900
N/A
Isoamylase1
Isoamylase1
Isoamylase1
Isoamylase1
Isoamylase1
Isoamylase1
Isoamylase1
Isoamylase1
GBSSI
GBSSI
GBSSII
GBSSII
GBSSII
GBSSII
GBSSII
GBSSII
GBSSII
GBSSII
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
GPT1
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
Pullulanase
SPHOL
SPHOL
SPHOL
SPHOL
SPHOL
SPHOL
SPHOL
SPHOL
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSI
SSIIa
SSIIb
SSIIb
SSIIb
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
4533
5685
5685
5685
5685
5871
5871
6331
2208
2208
291
491
3358
3447
3447
3529
3529
5173
315
315
315
326
326
365
518
852
3623
3623
3657
2133
2133
2169
2584
2683
2686
5166
5368
5662
6656
8651
8651
411
411
1613
2552
3567
3620
4569
4579
1712
3773
4884
4987
5365
5422
6578
7231
3196
703
3213
3213
199
448
4520
4520
5011
5011
5170
5196
4533
5685
5685
5685
5685
5871
5871
6331
2208
2208
212
412
3279
3368
3368
3450
3450
5094
260
260
260
271
271
310
463
797
3568
3568
3602
1438
1438
1474
1889
1988
1991
4471
4673
4967
5961
7956
7956
305
305
1507
2446
3461
3513
4463
4474
1501
3562
4673
4776
5154
5211
6367
7020
3196
68
2578
2578
199
448
4520
4520
5011
5011
5170
5196
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
1
2
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
2
1
3
1
2
1
1
4
1
1
1
1
1
2
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
1
1
1
1
1
1
1
1
2
1
1
1
1
AA
AAA
A
A
A
A
A
A
A
A
T
C
T
TT
----T
TTAC
T
-G
-A
T
T
T
A
A
C
T
C
A
A
A
G
GGT
G
A
T
-T
C
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
-/A
AA/-AAA/--A/-/A
-/A
A/A/-/A
A/A/A/A/A/-/A
T/-/T
C/T/TT/--/T
--/TA
-/A
---/GCC
-/A
--/TG
-/T
T/TTAC/---T/-/T
-/T
-/A
-/G
--/GA
G/--/GC
A/T/T/-/T
T/-/T
A/A/-/C
C/T/-/T
C/A/-/G
A/-/T
A/G/GGT/--G/-/T
-/T
A/-/A
-/A
T/-/A
--/AA
-/T
T/-/T
C/-
99.2/0.8
57.9/33.8
56.7/4.0
59.1/4.3
98.8/1.2
98.9/1.1
96.9/3.0
97.8/2.2
97.1/2.7
96.1/3.7
99.3/0.7
99.0/1.0
97.6/2.4
99.2/0.8
99.5/0.5
96.5/3.4
98.5/1.5
98.0/2.0
75.7/22.6
75.1/1.7
99.0/1.0
96.6/2.4
96.6/0.8
96.8/3.2
99.4/0.6
75.0/24.9
94.3/5.6
96.6/3.3
98.1/1.5
98.8/1.2
98.6/1.3
99.3/0.7
98.7/1.3
99.5/0.5
98.6/1.4
97.8/2.2
98.7/1.3
99.2/0.8
98.4/1.3
97.7/2.2
98.8/1.2
99.4/0.6
99.4/0.6
98.7/1.2
83.0/17.0
96.5/3.5
96.1/3.9
92.2/7.7
92.4/7.6
99.2/0.8
81.7/9.6/8.3
99.4/0.6
97.9/2.1
97.9/2.0
97.5/2.5
98.8/1.1
96.4/3.5
93.3/6.6
99.4/0.6
99.4/0.6
99.3/0.7
79.7/20.3
64.0/36.0
73.8/25.4
73.8/0.8
98.3/1.7
98.7/1.3
97.0/3.0
99.1/0.9
15529/123
6393/3731
6107/426
6733/490
11089/132
4179/48
4438/137
1088/24
6607/186
6967/270
26301/185
8488/82
22410/553
15182/126
14686/80
13208/464
13219/202
17655/359
8013/2394
7744/174
10293/104
7238/182
7238/63
1075/36
2535/16
5216/1734
6430/381
6679/230
5930/93
10170/122
9973/134
14079/104
16977/226
13502/70
13091/182
12509/277
13498/176
22397/171
9251/126
7532/172
7548/92
7412/46
7299/47
11876/147
13202/2696
3046/109
17705/717
6227/522
6872/569
10416/81
5135/602/522
12595/71
14704/311
13265/276
15682/395
8013/88
4016/146
10588/747
166/1
22033/125
21721/163
6615/1684
9456/5313
9839/3381
9839/109
12638/218
12963/171
17265/537
15444/134
15653
11050
10764
11386
11225
4227
4582
1113
6801
7249
26490
8573
22967
15311
14767
13681
13424
18020
10582
10315
10400
7492
7492
1111
2551
6952
6822
6916
6045
10295
10111
14183
17203
13572
13276
12788
13674
22577
9404
7712
7641
7458
7346
12027
15900
3155
18425
6751
7441
10503
6286
12667
15015
13544
16084
8113
4164
11343
167
22159
21885
8300
14772
13335
13335
12862
13135
17802
15588
AA
AAA
A
A
A
A
A
A
A
A
T
C
T
TT
----T
TTAC
T
-G
-A
T
T
T
A
A
C
T
C
A
A
A
G
GGT
G
A
T
-T
C
99.20781959
57.85520362
56.73541434
59.13402424
98.78841871
98.86444287
96.85726757
97.75381851
97.14747831
96.10980825
99.28652322
99.00851511
97.57478121
99.15746849
99.45147965
96.54265039
98.47288439
97.97447281
75.72292572
75.0751333
98.97115385
96.60971703
96.60971703
96.75967597
99.37279498
75.0287687
94.25388449
96.57316368
98.09760132
98.78581836
98.63514984
99.26672777
98.68627565
99.48423224
98.60650798
97.81826713
98.71288577
99.20272844
98.37303275
97.6659751
98.78288182
99.38321266
99.36019603
98.74449156
83.03144654
96.5451664
96.09226594
92.23818694
92.35317834
99.17166524
81.68946866
99.43159391
97.92873793
97.94004725
97.50062174
98.76741033
96.44572526
93.34391255
99.4011976
99.43138228
99.25062828
79.69879518
64.01299756
73.78327709
73.78327709
98.2584357
98.69052151
96.983485
99.07621247
15529
6393
6107
6733
11089
4179
4438
1088
6607
6967
26301
8488
22410
15182
14686
13208
13219
17655
8013
7744
10293
7238
7238
1075
2535
5216
6430
6679
5930
10170
9973
14079
16977
13502
13091
12509
13498
22397
9251
7532
7548
7412
7299
11876
13202
3046
17705
6227
6872
10416
5135
12595
14704
13265
15682
8013
4016
10588
166
22033
21721
6615
9456
9839
9839
12638
12963
17265
15444
A
---A
A
A
A
T
-T
TA
A
GCC
A
TG
T
---T
T
A
G
GA
GC
T
T
C
T
T
--T
T
A
A
A
AA
T
T
-
0.785791861
33.76470588
3.957636566
4.303530652
1.175946548
1.135557133
2.989960716
2.156334232
2.734891928
3.724651676
0.698376746
0.95649131
2.407802499
0.822937757
0.541748493
3.391564944
1.50476758
1.992230855
22.62332262
1.686863791
1
2.429257875
0.840896957
3.240324032
0.627205018
24.9424626
5.584872471
3.325621747
1.538461538
1.185041282
1.325289289
0.733272227
1.31372435
0.515767757
1.370894848
2.166093212
1.287114231
0.757407982
1.339855381
2.230290456
1.204030886
0.616787342
0.639803975
1.222249938
16.95597484
3.454833597
3.891451832
7.732187824
7.646821664
0.771208226
9.576837416
0.560511565
2.071262071
2.037802717
2.455856752
1.08467891
3.506243996
6.585559376
0.598802395
0.564104878
0.744802376
20.28915663
35.96669374
25.35433071
0.817397825
1.694915254
1.301865246
3.016514998
0.859635617
123
3731
426
490
132
48
137
24
186
270
185
82
553
126
80
464
202
359
2394
174
104
182
63
36
16
1734
381
230
93
122
134
104
226
70
182
277
176
171
126
172
92
46
47
147
2696
109
717
522
569
81
602 G
71
311
276
395
88
146
747
1
125
163
1684
5313
3381
109
218
171
537
134
8.304167992
522
Gene: Os08g0520900
N/A
Gene: Os08g0520900
N/A
Gene: Os08g0520900
N/A
Gene: Os08g0520900
N/A
Gene: Os08g0520900
N/A
Gene: Os08g0520900
N/A
Gene: Os08g0520900
N/A
Gene: Os08g0520900
N/A
Gene: Os06g0133000
N/A
Gene: Os06g0133000
N/A
Gene: Os07g0412100,
N/A
mRNA: Os07g0412100
Gene: Os07g0412100,
N/A
mRNA: Os07g0412100
Gene: Os07g0412100
N/A
Gene: Os07g0412100
N/A
Gene: Os07g0412100
N/A
Gene: Os07g0412100
N/A
Gene: Os07g0412100
N/A
Gene: Os07g0412100
N/A
Gene: Os08g0187800
N/A
Gene: Os08g0187800
N/A
Gene: Os08g0187800
N/A
Gene: Os08g0187800
N/A
Gene: Os08g0187800
N/A
Gene: Os08g0187800
N/A
Gene: Os08g0187800
N/A
Gene: Os08g0187800
N/A
Gene: Os08g0187800
N/A
Gene: Os08g0187800
N/A
Gene: Os08g0187800
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os04g0164900
N/A
Gene: Os03g0758100
N/A
Gene: Os03g0758100
N/A
Gene: Os03g0758100
N/A
Gene: Os03g0758100
N/A
Gene: Os03g0758100,
N/A
mRNA: Os03g0758100
Gene: Os03g0758100,
N/A
mRNA: Os03g0758100
Gene: Os03g0758100,
N/A
mRNA: Os03g0758100
Gene: Os03g0758100,
N/A
mRNA: Os03g0758100
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0160700
N/A
Gene: Os06g0229800
N/A
Gene: Os02N/A
Gene: Os02g0744700
N/A
Gene: Os02g0744700
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIIIb
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
SSIVa
6030
6914
6914
7619
7622
7658
7660
7757
7757
7757
10220
10852
1076
2957
2957
2957
3521
4720
4720
5529
5926
7722
7722
7862
7862
8436
8431
8439
4151
4492
4492
5509
5768
5774
9939
6030
6914
6914
7593
7596
7632
7634
7731
7731
7731
10194
10826
45
1943
1943
1943
2507
3706
3706
4515
4912
6707
6707
6847
6847
7421
7416
7424
2951
3292
3292
4309
4568
4574
8739
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
INDEL
Stringency Criteria for INDEL detection (DIP)
Mismatch cost 2
Similarity 0.7
Min variant frequency 0.5%
1
2
1
4
1
1
1
1
2
1
3
1
2
3
1
2
2
2
2
1
1
1
1
1
1
3
1
1
1
1
1
1
1
1
2
T
----A
-T
--TC
AAA
A
AA
AT
-GA
A
G
A
A
CCT
C
A
A
A
--
2
2
2
2
2
2
3
2
2
2
2
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
T/--/AA
-/A
----/CTTA
-/A
-/T
A/T/-/T
--/TT
T/---/CCT
-/T
TC/-AAA/--A/AA/-AT/---/GA
GA/-A/A/-/A
-/A
A/CCT/--C/-/C
-/A
-/A
A/A/-/C
A/--/CG
95.7/4.3
93.0/6.3
93.0/0.6
70.6/29.2
90.2/9.8
98.5/1.4
53.7/42.1/4.0
63.7/34.5
63.7/1.7
98.7/1.2
54.0/45.9
91.6/8.4
89.3/10.6
96.5/0.6
97.0/1.6
96.8/0.7
91.2/8.6
97.8/2.2
98.2/1.8
90.3/9.6
100
94.8/5.0
96.4/3.5
98.3/1.7
98.8/1.1
98.9/1.1
99.3/0.7
97.9/2.1
88.8/11.2
99.2/0.7
99.1/0.9
98.4/1.6
99.4/0.6
98.0/1.9
92.6/7.4
23530/1057
11219/761
11219/72
3382/1400
5548/605
10887/157
5911/4636/445
5078/2751
5078/136
8329/105
5151/4375
6537/598
18803/2237
7131/41
8201/133
7663/59
11174/1057
31903/704
30606/555
11203/1195
15265
9147/483
9023/325
6241/109
6759/73
370/4
432/3
374/8
5936/746
6970/51
7089/61
5383/87
4773/28
5026/98
2430/194
24590
12063
12063
4789
6154
11048
11013
7976
7976
8435
9532
7135
21052
7388
8456
7919
12249
32607
31174
12403
15272
9644
9356
6352
6841
374
435
382
6684
7023
7151
5471
4801
5126
2625
T
----A
-T
--TC
AAA
A
AA
AT
-GA
A
A
A
CCT
C
A
A
A
--
95.6893046
93.00339882
93.00339882
70.62017123
90.15274618
98.54272266
53.67293199
63.66599799
63.66599799
98.74333136
54.03902644
91.61878066
89.31692951
96.52138603
96.98438978
96.76726859
91.22377337
97.8409544
98.17796882
90.32492139
99.95416448
94.84653671
96.44078666
98.25251889
98.80134483
98.93048128
99.31034483
97.90575916
88.80909635
99.24533675
99.13298839
98.39151892
99.41678817
98.04916114
92.57142857
23530
11219
11219
3382
5548
10887
5911
5078
5078
8329
5151
6537
18803
7131
8201
7663
11174
31903
30606
11203
15265
9147
9023
6241
6759
370
432
374
5936
6970
7089
5383
4773
5026
2430
AA
A
CTTA
A
T
T
T
TT
CCT
T
-----GA
--
4.298495323
6.308546796
0.596866451
29.23366047
9.831004225
1.421071687
42.09570508
34.49097292
1.705115346
1.244813278
45.8980277
8.381219341
10.62606878
0.554953979
1.572847682
0.745043566
8.629275859
2.159045604
1.780329762
9.634765782
1057
761
72
1400
605
157
4636 2751
136
105
4375
598
2237
41
133
59
1057
704
555
1195
A
A
--C
A
A
C
CG
5.008295313
3.473706712
1.715994962
1.067095454
1.069518717
0.689655172
2.094240838
11.16098145
0.726185391
0.853027549
1.590202888
0.583211831
1.911822083
7.39047619
483
325
109
73
4
3
8
746
51
61
87
28
98
194
4.040679197
445
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
Gene: Os08g0191500,
N/A
mRNA: Os08g0191500
Gene: Os04g0624600,
N/A
mRNA: Os04g0624600
Gene: Os04g0624600
N/A
Gene: Os04g0624600
N/A
Gene: Os04g0624600
N/A
Gene: Os04g0624600
N/A
Gene: Os04g0624600
N/A
Gene: Os04g0624600
N/A
Gene: Os04g0624600
N/A
Gene: Os04g0624600
N/A
Gene: Os04g0624600
N/A
Gene: Os04g0624600
N/A
Gene: Os04g0624600
N/A
Gene: Os04g0624600
N/A
Gene: Os04g0624600,
N/A
mRNA: Os04g0624600
Gene: Os04g0624600,
N/A
mRNA: Os04g0624600
Gene: Os04g0624600,
N/A
mRNA: Os04g0624600
Gene: Os01g0720600
N/A
Gene: Os01g0720600
N/A
Gene: Os01g0720600
N/A
Gene: Os01g0720600
N/A
Gene: Os01g0720600
N/A
Gene: Os01g0720600
N/A
Gene: Os01g0720600
N/A
Appendix 2. The full list of breeding lines (studied population) and their pedigree information.
barcode
barcode 08
pedigree
Cross
*YRR07=01-01* *YRR08=01-03* ILLABONG/SARA
YC 01008-0-0-56-B
*YRR07=01-15* *YRR08=01-05* ILLABONG/VIALONE NANO-Y4 YC 01011-0-0-3-B
*YUR07=08-19* *YRR08=01-07* ILLABONG///M102//M201/YRM3 YC 99160-0-0-4-10-B
*YRR07=01-19* *YRR08=01-08* YRB4
YC 92013-1-108
*YRR07=03-14* *YRR08=01-08* YRB4
YC 92013-1-108
*YUR07=11-18* *YRR08=01-09* ILLABONG/4/YRB3///YRM2//M7/RINGO
YC 00067-0-0-51-B
*YUR07=02-17* *YRR08=01-10* ILLABONG/MILLIN
YC 97205-0-22-B
*YRR07=02-10* *YRR08=02-11* ILLABONG///YR83/M9//M7
YC 99159-0-0-29-B
*YRR07=02-20* *YRR08=02-13* ILLABONG/VIALONE NANO-Y4 YC 01011-0-0-5-B
*YRR07=02-16* *YRR08=02-16* ILLABONG///YR83/M9//M7
YC 99159-0-0-13-B
*YRE07=10-02* *YRR08=02-19* YRM67
YC 94002-1-62
*YRR07=02-13* *YRR08=02-20* JARRAH
YC 82003-14-2
*YRR07=04-17* *YRR08=02-20* JARRAH
YC 82003-14-2
*YRR07=02-06* *YRR08=03-02* ILLABONG/YRM39
YC 94208-0-12-B
*YRR07=02-07* *YRR08=03-09* M103///YRM34//YRM3/HUNG.NO.1 YC 00085-0-0-95-B
*YUR07=02-16* *YRR08=04-02* ILLABONG/4/YRB3///YRM2//M7/RINGO
YC 00067-0-0-46-B
*YRR07=01-03* *YRR08=04-12* ILLABONG/IR65600-27-1-2-2
YC 97157-0-1-13-B
*YRR07=01-08* *YRR08=07-01* YRB3/ARBORIO//MILLIN/WC1043 YC 00038-0-0-60-B
*YRR07=01-05* *YRR08=07-06* ILLABONG
YC 81116-1-5
*YRR07=03-11* *YRR08=07-06* ILLABONG
YC 81116-1-5
*YRI07=04-08*
*YRI08=01-01* 71048.200/H1//INGA///YRL113
YC 98023-0-44-B
*YUD07=14-15* *YRI08=01-02* YRL118///INGA/M9//213D.25
YC 99046-0-0-11-S-7
*YRI07=01-04*
*YRI08=01-04* 71048.200/H1//INGA///YRL113
YC 98023-0-63-B
*YUD07=03-24* *YRI08=01-05* BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-53-S-6
*YRI07=01-03*
*YRI08=01-06* YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-8-B
*YRI07=02-05*
*YRI08=01-07* YRL113/WAB450-11-1-P31-1-HB YC 00006-0-0-30-B
*YRI07=01-02*
*YRI08=01-08* YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-67-B
*YRI07=03-08*
*YRI08=02-01* BBL//M9/PELDE///YRL30/4/YRL101YC 98095-0-49-B
*YRI07=03-03*
*YUI07=08-12*
*YUI07=14-10*
*YRI07=02-03*
*YRI07=02-08*
*YRI07=02-04*
*YRI07=01-07*
*YRI07=03-02*
*YUI07=12-12*
*YUI07=12-10*
*YRI07=04-05*
*YUI07=06-10*
*YRI07=02-02*
*YRI07=01-06*
*YRJ07=05-15*
*YRJ07=01-08*
*YRJ07=04-08*
*YUJ07=09-22*
*YRJ07=03-13*
*YRJ07=03-12*
*YRI07=04-06*
*YRI07=04-03*
*YRI07=04-01*
*YRI07=03-07*
*YRI07=03-01*
*YRI07=01-08*
*YRI07=02-07*
*YRI07=01-01*
*YUD07=07-19*
*YRJ07=01-02*
*YRI08=01-09*
*YRI08=01-10*
*YRI08=02-03*
*YRI08=02-06*
*YRI08=02-08*
*YRI08=03-02*
*YRI08=03-03*
*YRI08=04-04*
*YRI08=04-09*
*YRI08=04-10*
*YRI08=05-09*
*YRI08=05-10*
*YRI08=06-02*
*YRI08=06-05*
*YRJ08=02-33*
*YRJ08=02-30*
*YRJ08=03-28*
*YRJ08=03-26*
*YRJ08=03-35*
*YRJ08=03-30*
*YRI08=06-07*
*YRI08=06-08*
*YRI08=07-03*
*YRI08=07-07*
*YRI08=08-10*
*YRI08=09-01*
*YRI08=09-10*
*YRI08=10-04*
*YRI08=14-09*
*YRJ08=01-13*
YRL113///RD91V55//P/GO(4)/D.10 YC 98149-0-6-B
YRL113/WAB450-11-1-P31-1-HB YC 00006-0-0-40-S-12
YRL113//L203/YRL34
YC 00064-0-0-82-S-6
BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-7-B
YRL113
YC 89045J-0-17
71011//73/M7///P/4/YRL34/5/BBL//M9/P/4/YRL34
YC 99221-0-0-3-B
71048.200/H1//INGA///YRL113
YC 98023-0-74-B
YRL113//L203/YRL34
YC 00064-0-0-3-B
BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-53-S-9
YRL113//L203/YRL34
YC 00064-0-0-82-S-10
BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-114-B
RIZABELL/YRL113
YC 99293-0-0-25-S-14
YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-69-B
YRL113//L203/YRL34
YC 00064-0-0-147-B
M103///M201//YR196/ARDITO
YC 92243-0-11-B
M103/OEIRAS
YC 99089-0-38-B
M103/YRK2
YC 94161S-2-0-10-B
YRM49///ILLABONG/YRM54
YC 02041-B-2S-9
AKIHIKARI//KOSHIHIKARI (T)/YRK4
YC 00238A-0-0-42-B
M103/YRM44
YC 95050S-5-0-B
BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-10-B
YRL113//L203/YRL34
YC 00064-0-0-29-B
BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-65-B
BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-12-B
LANGI/LAGRUE//YRL113
YC 01087-0-0-22-B
YRL101///PELDE/M9//M101
YC 94118-368-59-B
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-96-B
YRL113//L203/YRL34
YC 00064-0-0-76-B
L203//YRL101/LANGI
YC 02056-B-2S-10
M103///M201//196/ARD/4/M103/YRM31
YC 99140-0-117-B
*YRJ07=04-13*
*YRJ07=03-09*
*YUJ07=19-23*
*YRJ07=05-08*
*YRJ07=03-04*
*YRJ07=04-01*
*YRJ07=04-03*
*YRJ07=02-14*
*YRJ07=05-01*
*YRJ07=01-20*
*YRJ07=02-15*
*YRJ07=04-02*
*YRJ07=01-12*
*YRJ07=01-07*
*YRJ07=05-14*
*YUJ07=19-25*
*YRJ07=04-19*
*YRJ07=01-18*
*YRJ07=02-09*
*YRJ07=03-15*
*YRJ07=01-03*
*YUR07=01-16*
*YRJ07=05-19*
*YRJ07=02-04*
*YRJ07=02-19*
*YRJ07=02-03*
*YRJ07=04-09*
*YRJ07=02-12*
*YRJ07=05-06*
*YRE07=10-04*
*YRJ08=01-17*
*YRJ08=01-20*
*YRJ08=01-22*
*YRJ08=01-24*
*YRJ08=01-23*
*YRJ08=01-27*
*YRJ08=01-30*
*YRJ08=01-32*
*YRJ08=01-33*
*YRJ08=01-35*
*YRJ08=02-16*
*YRJ08=02-19*
*YRJ08=02-20*
*YRJ08=02-21*
*YRJ08=02-22*
*YRJ08=02-24*
*YRJ08=02-26*
*YRJ08=02-27*
*YRJ08=02-28*
*YRJ08=02-35*
*YRJ08=03-12*
*YRJ08=03-13*
*YRJ08=03-14*
*YRJ08=03-15*
*YRJ08=03-17*
*YRJ08=03-18*
*YRJ08=03-16*
*YRJ08=03-25*
*YRJ08=03-23*
*YRA08=01-03*
M103///M201//196/ARD/4/M103/YRM31
YC 99140-0-105-B
M103/YRM54
YC 99114-0-9-1-B
M103//M201/YRM3///IR65600-42-5-/MILLIN
YC 00039-0-0--38-S-6
M103///M201/EIKO//CALROSE
YC 92212-0-73-B
M103/YRM54
YC 99114S-26-B
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-5-B
ECHUCA/SHIMUZI MOCHI//MILLINYC 97035-0-309-B
M103///M201//196/ARD/4/M103/YRM31
YC 99140-0-3-B
M103/YRM54
YC 99114S-30-B
M103/YRM54
YC 99114-0-12-B
M103///M201//196/ARD/4/M103/YRM31
YC 99140-0-135-B
M102/M103//M103
YC 97063-0-106-B
ECHUCA/80023-TR166-2-1-4
YC 96041T-12-12-B
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-7-B
M103///M201//196/ARD/4/M103/YRM31
YC 99140-0-50-B
M103//M201/R///M201/YRM3//BOG.YC 98064-0-8-S-6
M103///M201//YR196/ARDITO
YC 92243J-0-5-B
M103/YRM18
YC 90019J-50-0-B
M103//M401/CALROSE
YC 96106-39-2-B
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-15-B
M102/M103
YC 92076S-301-0-B
VIALONE NANO
Y01/008
M103///M201//YR196/ARDITO
YC 92243J-0-11-B
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-30-B
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-23-B
M104
Y03/009
JARRAH
YC 82003-14-2
M103/YRM54
YC 99114-0-80-9-B
LIMAN
Y98/001
YRM64/NORIN PL11
YC 01012-0-0-9-B
*YRA07=03-10*
*YRE07=19-01*
*YRE07=15-01*
*YUA07=04-28*
*YUA07=05-27*
*YRA07=04-05*
*YRA07=03-04*
*YRA07=02-10*
*YRA07=04-14*
*YRA07=02-16*
*YRA07=03-05*
*YUA07=05-20*
*YRA07=04-09*
*YUA07=17-19*
*YRA07=05-03*
*YRA07=05-02*
*YRA07=03-03*
*YRA07=04-12*
*YRA07=03-12*
*YUA07=17-22*
*YRA07=03-06*
*YRA07=01-06*
*YRA07=05-10*
*YRE07=14-04*
*YUA07=05-21*
*YRA07=03-07*
*YRA07=02-08*
*YRA07=02-01*
*YRA07=05-08*
*YRA07=01-07*
*YRA08=01-06*
*YRA08=01-05*
*YRA08=01-09*
*YRA08=01-10*
*YRA08=02-01*
*YRA08=02-02*
*YRA08=02-03*
*YRA08=02-05*
*YRA08=02-06*
*YRA08=02-07*
*YRA08=02-08*
*YRA08=02-12*
*YRA08=03-01*
*YRA08=03-02*
*YRA08=03-03*
*YRA08=03-04*
*YRA08=03-05*
*YRA08=03-06*
*YRA08=03-08*
*YRA08=03-10*
*YRA08=03-11*
*YRA08=04-02*
*YRA08=04-04*
*YRA08=04-05*
*YRA08=04-07*
*YRA08=04-10*
*YRA08=04-11*
*YRA08=04-12*
*YRA08=05-02*
*YRA08=05-03*
REIZIQ
M401/YRM42//YRM54
NAMAGA///M201/YRM3//BOGAN
KOSHIHIKARI(T)/M202//BOGAN
YRM65//JARRAH/AMAROO
M201//YR196/ARDITO///YRM54
PARAGON
M201//YR196/ARDITO///YRM54
M7/M201//M103
YRM54/YRM61
YRM54/YRM61
YRM65//JARRAH/AMAROO
YRM66
IR65600-42-5-2/MILLIN//YRM63
YRM68
NAMAGA
M201//YR196/ARDITO///YRM54
M201//YR196/ARDITO///YRM54
QUEST_CT19
KOSHIHIKARI/M102//YRM43
YRK4/SR13925-13-1
M201//YR196/ARDITO///YRM54
OPUS/MATSURIBARE
YRM64/NORIN PL11
M201/YRM3//BOGAN///YRM33
M201//YR196/ARDITO///YRM54
YRM54/YRM61
QUEST
M201//YR196/ARDITO///YRM54
MILLIN
YC 86003S-12-0
YC 01038-0-0-118-B
YC 98140S-89-B
YC 00018-0-0-6-S-10
YC 02035-B-2S-19
YC 99113S-57-B
YC 92061-0-13
YC 99113-0-0-8-B
YC 94014-1-59-B
YC 97073S-93-0-B
YC 97073S-93-0-12-B
YC 02035-B-2S-1
YC 92175-1-30
YC 01088-0-0-42-B
YC 92086-1-15
YC 84019-43-3
YC 99113-0-0-57-B
YC 99113-0-0-19-B
YC 86008-96-3
YC 96102-1-94-4-B
YC 97053-1-112-7-B
YC 99113-0-0-100-B
YC 99009-0-0-47-B
YC 01012-0-0-47-B
YC 98039-0-44-9-B
YC 99113-0-0-141-B
YC 97073S-93-0-4-B
YC 86008-96-3
YC 99113-0-0-84-B
YC 82003-28-4
*YRA07=01-16*
*YRA07=04-03*
*YRA07=02-04*
*YRA07=01-02*
*YRA07=01-03*
*YRA07=05-06*
*YRA07=01-11*
*YUA07=17-25*
*YRA07=03-16*
*YUA07=01-28*
*YUA07=20-22*
*YRA07=01-13*
*YRA07=05-05*
*YRB07=03-08*
*YRB07=04-07*
*YRB07=05-08*
*YUB07=14-24*
*YUB07=08-26*
*YRB07=04-02*
*YRB07=05-03*
*YRB07=05-01*
*YUB07=13-20*
*YRB07=01-09*
*YRB07=04-15*
*YRB07=01-16*
*YRD07=04-10*
*YRB07=05-12*
*YRD07=04-02*
*YRB07=04-03*
*YUE07=08-07*
*YRA08=05-04*
*YRA08=05-08*
*YRA08=05-11*
*YRA08=06-01*
*YRA08=06-02*
*YRA08=06-04*
*YRA08=06-05*
*YRA08=06-06*
*YRA08=06-07*
*YRA08=06-08*
*YRA08=06-10*
*YRA08=06-11*
*YRA08=06-12*
*YRB08=01-03*
*YRB08=01-06*
*YRB08=01-09*
*YRB08=01-10*
*YRB08=01-11*
*YRB08=01-16*
*YRB08=02-01*
*YRB08=02-03*
*YRB08=02-04*
*YRB08=02-05*
*YRB08=02-08*
*YRB08=02-10*
*YRB08=02-12*
*YRB08=02-16*
*YRB08=03-03*
*YRB08=03-06*
*YRE08=05-11*
JARRAH
YC 82003-14-2
CALHIKARI
Y03/004
YRM42//BOGAN/M302
YC 97048S-28-0-B
OPUS/MATSURIBARE
YC 99009-0-0-14-B
YRM64
YC 92061-0-59
AMAROO
YC 79011S-0-32
YRM54//ECHUCA/SHIMUZI MOCHIYC 97086-0-43-B
M204/YRM43
YC 94227-B-2S-13-6B-S-5
ILLABONG/YRM54
YC 96027-1-33-B
M401/YRM42//YRM54
YC 01038-0-0-95-B
YRM64/NORIN PL11
YC 01012-0-0-21-B
OPUS
YC 87332-27-7
KOSHIHIKARI
YC 82003-28-4
YRF205/LANGI
YC 98086-0-18-B
DELLMONT/LANGI
YC 98081-0-7-B
DOONGARA/YRL38
YC 95096S-172-0-4-B
LANGI///&(DAWN/K//IR579/K)/P//DOONGARA
YC 02112-B-2S-4
YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-13
YRL34//INGA/M9(5)/PEL///DOONGARA
YC 99225-0-0-11-B
YRL125
YC 92108-1-11
YRL122/4/71011//M9/PEL//YRL29 YC 98090-0-251-B
YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-31
PELDE/GOPALBHOG(4)/YR71048-10//YRL101
YC 95200-34-9
YRF205/LANGI
YC 97074-1-38-B
PELDE/GOPALBHOG(4)/YR71048-10//YRL101
YC 95200-47-3
71011//73/M7///PEL/4/YRL34/5/IR20 YC 99231-0-0-2-B
YRM54/CN892-874/1
YC 98003-0-42-B
INGA/L201//DOONGARA///L202
YC 00184-0-0-37-B
KYEEMA
YC 83110-3-4
M102/M103//YRM42/SR11327-22-3-2YC 02058-B-2S-4
*YRB07=02-12*
*YRB07=05-11*
*YRB07=02-15*
*YRB07=03-17*
*YRD07=04-11*
*YRB07=04-13*
*YRE07=12-02*
*YUB07=02-21*
*YRB07=03-10*
*YRB07=05-09*
*YRB07=02-07*
*YRB07=02-01*
*YRB07=03-04*
*YRB07=05-10*
*YRB07=04-05*
*YRE07=17-02*
*YUE07=05-02*
*YRE07=14-02*
*YUE07=05-03*
*YRE07=11-02*
*YRE07=22-05*
*YRE07=22-01*
*YRE07=18-02*
*YUE07=15-01*
*YRE07=22-03*
*YUE07=11-04*
*YUE07=01-12*
*YRE07=09-05*
*YRE07=20-02*
*YRE07=23-04*
*YRB08=03-08*
*YRB08=03-09*
*YRB08=03-12*
*YRB08=03-13*
*YRB08=04-01*
*YRB08=04-02*
*YRE08=05-07*
*YRB08=04-03*
*YRB08=04-06*
*YRB08=04-07*
*YRB08=04-10*
*YRB08=04-11*
*YRB08=04-16*
*YRB08=05-02*
*YRB08=05-11*
*YRE08=01-01*
*YRE08=01-11*
*YRE08=02-01*
*YRE08=05-06*
*YRE08=02-02*
*YRE08=02-03*
*YRE08=02-12*
*YRE08=02-09*
*YRE08=05-08*
*YRE08=02-10*
*YRE08=02-11*
*YRE08=02-05*
*YRE08=05-05*
*YRE08=03-03*
*YRE08=03-06*
DOONGARA/YRL38
YC 95096S-146-0-B
DOONGARA/SERATUS MALAM YC 00128T-0-2-B
YRL123
YC 91063-1-11
LANGI
YC 82079-66-4
INGA/L201//DOONGARA///L202
YC 00184-0-0-15-B
YRF205/LANGI
YC 98086-0-18-B
QUEST
YC 86008-96-3
YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-2
YRF205/LANGI
YC 98086-0-21-B
INGA/L201//DOONGARA///L202
YC 00184-0-0-52-B
DOONGARA
YC 71048-111-9
YRL118
YC 89198J-1-1
YRF207/L202
YC 99092-0-0-24-B
YRL125_S_CT18
YC 92108-1-11
LANGI/JOJUTLA 4
YC 94070-0-2-B
YRM64/TOYONISHIKI
YC 01013-0-0-58-B
OPUS/4/M7/KITAKOGANE///M201//EIKO/H.NO.1
YC 00248-0-0-36-B
YRM65
YC 92061-0-51
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-9-B
M103/YRM54
YC 99114S-38-B
M103
YU87/001
YRM49//IR65600-42-5-2/MILLIN
YC 01029-0-0-38-B
M103/YRM54
YC 99114S-25-B
OPUS//KOSHIHIKARI (T)/M202
YC 00002-0-0-18-S-5
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-9-B
HITOMEBORE//YRM39/AKITAKOMACHI
YC 00056-0-0-98-S-1
YRM49///ILLABONG/YRM54
YC 02041-B-2S-5
M102//M201/BOGAN///YRM54
YC 99104S-9-B
MILLIN
YC 82003-28-4
JARRAH
YC 82003-14-2
*YUE07=20-13*
*YRE07=14-03*
*YUE07=14-11*
*YUE07=10-03*
*YRE07=10-03*
*YRE07=09-02*
*YUE07=08-02*
*YUE07=01-10*
*YUE07=17-11*
*YRE07=20-03*
*YUE07=02-01*
*YUE07=04-09*
*YRJ07=04-06*
*YUD07=02-14*
*YUD07=02-19*
*YUD07=01-16*
*YRD07=04-03*
*YUD07=05-19*
*YRD07=04-01*
*YRD07=02-05*
*YUD07=14-22*
*YRD07=01-11*
*YRD07=01-07*
*YRD07=03-04*
*YRD07=02-08*
*YRD07=05-12*
*YRD07=05-08*
*YUD07=08-15*
*YRD07=02-02*
*YRD07=02-07*
*YRE08=03-05*
*YRE08=03-09*
*YRE08=05-03*
*YRE08=03-10*
*YRE08=03-11*
*YRE08=04-01*
*YRE08=04-07*
*YRE08=04-09*
*YRE08=04-10*
*YRE08=04-11*
*YUE08=02-14*
*YUE08=02-18*
*YUJ08=13-20*
*YUD08=01-22*
*YUD08=01-15*
*YUD08=01-20*
*YRD08=01-03*
*YRD08=01-04*
*YRD08=01-08*
*YRD08=01-09*
*YRD08=01-10*
*YRD08=01-07*
*YRD08=02-03*
*YRD08=02-04*
*YRD08=02-05*
*YRD08=02-06*
*YRD08=02-08*
*YRD08=02-09*
*YRD08=03-01*
*YRD08=03-02*
M103/YRM54
YC 99114S-11-B
ECHUCA
YC 81121DS
M201/YRM3//BOGAN///OPUS
YC 99226-0-0-33-10-B
M103/YRM49
YC 99219S-7-B
QUEST_CT19
YC 86008-96-3
YRM64/TOYONISHIKI
YC 01013-0-0-47-B
M103/HITOMEBORE
YC 98052-0-51-2-S-11
YRM54//YRK4/KOSHIHIKARI (TYNAN)
YC 02020-B-2S-5
OPUS//KOSHIHIKARI (T)/M202
YC 00002-0-0-18-S-8
YRM54/M202
YC 97027S-22-0-B
MILLIN
YC 82003-28-4
JARRAH
YC 82003-14-2
SPRINT
Y98/005
L205
Y03/008
YRL113
YC 89045J-0-17
YRL118
YC 89198J-1-1
YRL111
YC 89097-0-55
L202///BASMATI 370/PELDE//BASMATI
YC 01102-0-0-40-B
370
YRL113//H263-9-1-1/YRL34
YC 00158-0-0-19-B
LANGI/IR66167-27-5-1-6
YC 97181-0-28-8-B
THAIBONNET/YRL101
YC 98073-0-28-4-S-6
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-65-B
L205
Y03/008
YRL118
YC 89198J-1-1
LANGI/IR 65600-38-1-2
YC 95341-1-95-4-B
213D.25/83//M7/IRR.ING///YRL38 YC 95146-2-0-6-B
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-35-B
DELLMONT//BASMATI 370/PELDEYC 00210-0-24-S-16
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-13-B
LANGI/INGA//PELDE
YC 99217-0-0-23-3-B
*YRD07=02-04*
*YRD07=05-11*
*YRD07=01-03*
*YRD07=01-04*
*YRD07=03-05*
*YUD07=03-22*
*YRD07=03-07*
*YRD07=03-08*
*YUD07=06-17*
*YRD07=02-03*
*YRD07=01-05*
*YUD07=03-20*
*YRD07=03-10*
*YRD07=05-04*
*YRD07=05-02*
*YUD07=11-14*
*YRD07=01-10*
*YUD07=05-18*
*YRD07=03-11*
*YRD07=05-07*
*YRD07=01-02*
*YRD07=05-10*
*YRD07=02-10*
*YRD07=03-02*
*YRD07=02-06*
*YRD08=03-04*
*YRD08=03-05*
*YRD08=03-06*
*YRD08=03-07*
*YRD08=03-10*
*YRD08=04-01*
*YRD08=04-02*
*YRD08=04-03*
*YRD08=04-04*
*YRD08=04-05*
*YRD08=04-08*
*YRD08=04-09*
*YRD08=05-01*
*YRD08=05-03*
*YRD08=05-05*
*YRD08=05-06*
*YRD08=05-07*
*YRD08=05-09*
*YRD08=05-10*
*YRD08=06-01*
*YRD08=06-02*
*YRD08=06-04*
*YRD08=06-06*
*YRD08=06-09*
*YRD08=06-10*
LANGI/INGA//PELDE
YC 99217-0-0-23-5-B
LANGI/LAGRUE
YC 95073-0-8-B
(PELDE*2/CALROSE76)*2//DOONGARA
YC 99248S-10-B
YRL101//IR72/YRL39
YC 97098-0-151-B
YRL122/THAIBONNET
YC 98110-0-42-3-B
M103/DOONGARA DH1
YC 03370DH-15
L203/YRL39//YRL101
YC 97107-0-23-14-B
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-72-B
YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-46-B
YRL113
YC 89045J-0-17
LANGI/IR 65600-134-2-2
YC 95342-1-1-7-B
YRF205/LANGI
YC 98086-0-4-B
L202/DOONGARA
YC 99247-0-0-17-B
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-84-B
YRL101/4/YRL39///213D.25/YR83//M7/IRR.INGA
YC 00153-0-0-25-B
M103/DOONGARA DH1
YC 03370DH-24
YRB90 V31/YRL34
YC 90041J-1-24-B
I/M9(5)/3/M101/73//P(2)/4/I/5/YRL101YC 99186-0-0-17-S-6
LANGI/IR65600.27.1.2.2.2
YC 97182A-0-2-4-B
GULFMONT//YRL39/IR65597-134-2-2
YC 1080-0-0-24-B
YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-42-B
71048.166//R/IR36///I/M9(5)/P
YC 92164-3-B
LANGI/IR66167-27-5-1-6
YC 97181-0-28-10-B
L202///BASMATI 370/PELDE//BASMATI
YC 01102-0-0-32-B
370
L203
YU85/001
Appendix 3. Target genes and sequence of gene-specific LR-PCR primers.
No
1
Gene
AGPS2b
Fragments
Length
Primer sequence 5-3
(bp)
Forward
Reverse
H1
3144
AATCTTGACCGCAGTGTCG
AAGTGTTGCCTGTGCATTAC
H2
3028
TTGTAATGCACAGGCAACAC
ATCTCGACTGCCCAGTTAAG
H1
3119
TGCTGGTTGGTCGTAATGTG
AGCCTCATTCCAGCTTAACC
2
SPHOL
H2
3362
CACTGTGCATTCCTGAGTTG
TTGTCCATAGCTGCAGGTAG
3
GPT1
C
3940
TATCAGATTCCGAGGGCTTG
GTTACCTTCCCACACCCAGA
4
GBSSI
H1
2164
CCAACTAGCTCCACAAGATG
CATTGGGCTGGTAGTTGTTC
H2
2475
CCTTCCGGTTTGTTACTGAC
CACACCCAGAAGAGTACAAC
5
GBSSII
C
5405
CCATCGCATAGGATGAGTGA
TGGAACACAACCCAGATGAA
6
SSI
H1
3201
AGCCCGATCTAGAAGGTACG
TGATAGGCTCAAACCTGATG
H2
3888
ACATCAGGTTTGAGCCTATC
GACACTTGACATCGCAGTAG
H1
2681
AGAAGAAACGGCTACTCCTC
TCATGCGCTTCATGATTTCC
H2
2316
AGGAGACGCAAATTCCTTAG
CATTGGTACTTGGCCTTGAC
H1
2229
ACACCTCCGGCAGATCTTTC
CAACAGCAGCTTTGCAGAAC
H2
2322
TCTGCAAAGCTGCTGTTGAG
ACTTCCACCGTTGCTCCTAC
H1
3833
GGTTCTCAGTGTGGTGTTTG
CATCCTTCGGAGTTCTTGTG
H2
3666
GTCACCACAGGACAATATCG
ACCCTGCATCTTAGGCTTAG
H3
3863
GTTCCTGTCGAGTACAAGAG
AGCCATAGTCCAGATGTAGC
H1
3709
GGAGCCTTTCTTTCTCCTTC
CTCCACTTGGGTTTCATGTC
H2
3880
AGCAACCTTGGGTAGGAATG
CAATGTAGAAGCCGGGATTG
H1
4549
GTCCCGATACTGTTGTCTTG
ATTGCCAGCAGACTACTTTC
H2
4767
AGAAGGTGCTAGGTTTGTTG
CTTAGCCACCCATTCTTCTC
H1
4303
AATCGCCGCCGATTTCGAAG
TTGCCAGGCGGAAGTCAAAC
H2
4159
TCTGGGATAGTCGTCTGTTC
TACTTGTCTGGTGCTAGGAG
H1
1511
GGTGCTCTTCAGGAGGAAGG
TACCTGCGGGTGAATCCAAG
H2
1754
GCGCTGAAGGCATTACCTAC
AGTTGAACAGGCGAGAATCC
H1
5437
GGTAAATCGTCGTGATCTTC
AAGGGAAGTAGCGATTAACG
H2
5581
CATCATTGGATGTGGGATTC
GATGTACAGAAGTGCAGAAC
H1
4027
GTCAATTTCGCCGTCTACTC
CAGTAGCCCTGAGAAATAGC
H2
2979
GGTCAGAATGGAATGGAAAG
TCATCTTTCGTCCACTCAAC
H1
1256
GCGGCGGAAGAGTTGTAGCG
GCTTCTGAGTCACCGGATGG
H2
1369
CGTGGCATTGATAATTCCTC
TCAGGGAACATGAAGGTAAC
H1
5150
TAACCCAGATGGTCCTAGTC
ACCAGTGGTCAACCTGTATG
H2
4857
CTATTGGTTTCCAGCCTAGC
CCTTACGGAGATGACAAAGC
33
109,067
7
SSIIa
8
SSIIb
9
10
SSIIIa
SSIIIb
11
SSIVa
12
BEI
13
BEIIa
14
BEIIb
15
ISA1
16
ISA2
17
Total
PUL
H1= first half, H2= Second half, H3= third fragment, C= Complete fragment
Appendix 4. SNP/Indel distribution and short read coverage pattern across candidate loci. Name of
each gene indicated at the top. X-Y plotters (up sides): The X-axis indicates the length of sequenced
area (genes) in kb and Y-axis shows the number of detected SNPs/Indes. The graphs show the
distribution of SNPs across the gene (The values under zero must be regarded as zero). Graphics in
the middle side show the relevant gene (Blue=introns and Yellow=exons) . Graphics in the down
side (pink colour) show the coverage pattern of each gene.
Appendix 5. The full list of breeding lines (studied population) and their pedigree information.
barcode
barcode 08
pedigree
Cross
*YRR07=01-01* *YRR08=01-03* ILLABONG/SARA
YC 01008-0-0-56-B
*YRR07=01-15* *YRR08=01-05* ILLABONG/VIALONE NANO-Y4 YC 01011-0-0-3-B
*YUR07=08-19* *YRR08=01-07* ILLABONG///M102//M201/YRM3 YC 99160-0-0-4-10-B
*YRR07=01-19* *YRR08=01-08* YRB4
YC 92013-1-108
*YRR07=03-14* *YRR08=01-08* YRB4
YC 92013-1-108
*YUR07=11-18* *YRR08=01-09* ILLABONG/4/YRB3///YRM2//M7/RINGO
YC 00067-0-0-51-B
*YUR07=02-17* *YRR08=01-10* ILLABONG/MILLIN
YC 97205-0-22-B
*YRR07=02-10* *YRR08=02-11* ILLABONG///YR83/M9//M7
YC 99159-0-0-29-B
*YRR07=02-20* *YRR08=02-13* ILLABONG/VIALONE NANO-Y4 YC 01011-0-0-5-B
*YRR07=02-16* *YRR08=02-16* ILLABONG///YR83/M9//M7
YC 99159-0-0-13-B
*YRE07=10-02* *YRR08=02-19* YRM67
YC 94002-1-62
*YRR07=02-13* *YRR08=02-20* JARRAH
YC 82003-14-2
*YRR07=04-17* *YRR08=02-20* JARRAH
YC 82003-14-2
*YRR07=02-06* *YRR08=03-02* ILLABONG/YRM39
YC 94208-0-12-B
*YRR07=02-07* *YRR08=03-09* M103///YRM34//YRM3/HUNG.NO.1 YC 00085-0-0-95-B
*YUR07=02-16* *YRR08=04-02* ILLABONG/4/YRB3///YRM2//M7/RINGO
YC 00067-0-0-46-B
*YRR07=01-03* *YRR08=04-12* ILLABONG/IR65600-27-1-2-2
YC 97157-0-1-13-B
*YRR07=01-08* *YRR08=07-01* YRB3/ARBORIO//MILLIN/WC1043 YC 00038-0-0-60-B
*YRR07=01-05* *YRR08=07-06* ILLABONG
YC 81116-1-5
*YRR07=03-11* *YRR08=07-06* ILLABONG
YC 81116-1-5
*YRI07=04-08*
*YRI08=01-01* 71048.200/H1//INGA///YRL113
YC 98023-0-44-B
*YUD07=14-15* *YRI08=01-02* YRL118///INGA/M9//213D.25
YC 99046-0-0-11-S-7
*YRI07=01-04*
*YRI08=01-04* 71048.200/H1//INGA///YRL113
YC 98023-0-63-B
*YUD07=03-24* *YRI08=01-05* BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-53-S-6
*YRI07=01-03*
*YRI08=01-06* YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-8-B
*YRI07=02-05*
*YRI08=01-07* YRL113/WAB450-11-1-P31-1-HB YC 00006-0-0-30-B
*YRI07=01-02*
*YRI08=01-08* YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-67-B
*YRI07=03-08*
*YRI08=02-01* BBL//M9/PELDE///YRL30/4/YRL101YC 98095-0-49-B
*YRI07=03-03*
*YUI07=08-12*
*YUI07=14-10*
*YRI07=02-03*
*YRI07=02-08*
*YRI07=02-04*
*YRI07=01-07*
*YRI07=03-02*
*YUI07=12-12*
*YUI07=12-10*
*YRI07=04-05*
*YUI07=06-10*
*YRI07=02-02*
*YRI07=01-06*
*YRJ07=05-15*
*YRJ07=01-08*
*YRJ07=04-08*
*YUJ07=09-22*
*YRJ07=03-13*
*YRJ07=03-12*
*YRI07=04-06*
*YRI07=04-03*
*YRI07=04-01*
*YRI07=03-07*
*YRI07=03-01*
*YRI07=01-08*
*YRI07=02-07*
*YRI07=01-01*
*YUD07=07-19*
*YRJ07=01-02*
*YRI08=01-09*
*YRI08=01-10*
*YRI08=02-03*
*YRI08=02-06*
*YRI08=02-08*
*YRI08=03-02*
*YRI08=03-03*
*YRI08=04-04*
*YRI08=04-09*
*YRI08=04-10*
*YRI08=05-09*
*YRI08=05-10*
*YRI08=06-02*
*YRI08=06-05*
*YRJ08=02-33*
*YRJ08=02-30*
*YRJ08=03-28*
*YRJ08=03-26*
*YRJ08=03-35*
*YRJ08=03-30*
*YRI08=06-07*
*YRI08=06-08*
*YRI08=07-03*
*YRI08=07-07*
*YRI08=08-10*
*YRI08=09-01*
*YRI08=09-10*
*YRI08=10-04*
*YRI08=14-09*
*YRJ08=01-13*
YRL113///RD91V55//P/GO(4)/D.10 YC 98149-0-6-B
YRL113/WAB450-11-1-P31-1-HB YC 00006-0-0-40-S-12
YRL113//L203/YRL34
YC 00064-0-0-82-S-6
BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-7-B
YRL113
YC 89045J-0-17
71011//73/M7///P/4/YRL34/5/BBL//M9/P/4/YRL34
YC 99221-0-0-3-B
71048.200/H1//INGA///YRL113
YC 98023-0-74-B
YRL113//L203/YRL34
YC 00064-0-0-3-B
BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-53-S-9
YRL113//L203/YRL34
YC 00064-0-0-82-S-10
BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-114-B
RIZABELL/YRL113
YC 99293-0-0-25-S-14
YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-69-B
YRL113//L203/YRL34
YC 00064-0-0-147-B
M103///M201//YR196/ARDITO
YC 92243-0-11-B
M103/OEIRAS
YC 99089-0-38-B
M103/YRK2
YC 94161S-2-0-10-B
YRM49///ILLABONG/YRM54
YC 02041-B-2S-9
AKIHIKARI//KOSHIHIKARI (T)/YRK4
YC 00238A-0-0-42-B
M103/YRM44
YC 95050S-5-0-B
BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-10-B
YRL113//L203/YRL34
YC 00064-0-0-29-B
BBL//M9/PELDE///YRL30/4/YRL113YC 98054-0-65-B
BBL//M9/PELDE///YRL30/4/LANGI YC 98084-0-44-12-B
LANGI/LAGRUE//YRL113
YC 01087-0-0-22-B
YRL101///PELDE/M9//M101
YC 94118-368-59-B
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-96-B
YRL113//L203/YRL34
YC 00064-0-0-76-B
L203//YRL101/LANGI
YC 02056-B-2S-10
M103///M201//196/ARD/4/M103/YRM31
YC 99140-0-117-B
*YRJ07=04-13*
*YRJ07=03-09*
*YUJ07=19-23*
*YRJ07=05-08*
*YRJ07=03-04*
*YRJ07=04-01*
*YRJ07=04-03*
*YRJ07=02-14*
*YRJ07=05-01*
*YRJ07=01-20*
*YRJ07=02-15*
*YRJ07=04-02*
*YRJ07=01-12*
*YRJ07=01-07*
*YRJ07=05-14*
*YUJ07=19-25*
*YRJ07=04-19*
*YRJ07=01-18*
*YRJ07=02-09*
*YRJ07=03-15*
*YRJ07=01-03*
*YUR07=01-16*
*YRJ07=05-19*
*YRJ07=02-04*
*YRJ07=02-19*
*YRJ07=02-03*
*YRJ07=04-09*
*YRJ07=02-12*
*YRJ07=05-06*
*YRE07=10-04*
*YRJ08=01-17*
*YRJ08=01-20*
*YRJ08=01-22*
*YRJ08=01-24*
*YRJ08=01-23*
*YRJ08=01-27*
*YRJ08=01-30*
*YRJ08=01-32*
*YRJ08=01-33*
*YRJ08=01-35*
*YRJ08=02-16*
*YRJ08=02-19*
*YRJ08=02-20*
*YRJ08=02-21*
*YRJ08=02-22*
*YRJ08=02-24*
*YRJ08=02-26*
*YRJ08=02-27*
*YRJ08=02-28*
*YRJ08=02-35*
*YRJ08=03-12*
*YRJ08=03-13*
*YRJ08=03-14*
*YRJ08=03-15*
*YRJ08=03-17*
*YRJ08=03-18*
*YRJ08=03-16*
*YRJ08=03-25*
*YRJ08=03-23*
*YRA08=01-03*
M103///M201//196/ARD/4/M103/YRM31
YC 99140-0-105-B
M103/YRM54
YC 99114-0-9-1-B
M103//M201/YRM3///IR65600-42-5-/MILLIN
YC 00039-0-0--38-S-6
M103///M201/EIKO//CALROSE
YC 92212-0-73-B
M103/YRM54
YC 99114S-26-B
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-5-B
ECHUCA/SHIMUZI MOCHI//MILLINYC 97035-0-309-B
M103///M201//196/ARD/4/M103/YRM31
YC 99140-0-3-B
M103/YRM54
YC 99114S-30-B
M103/YRM54
YC 99114-0-12-B
M103///M201//196/ARD/4/M103/YRM31
YC 99140-0-135-B
M102/M103//M103
YC 97063-0-106-B
ECHUCA/80023-TR166-2-1-4
YC 96041T-12-12-B
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-7-B
M103///M201//196/ARD/4/M103/YRM31
YC 99140-0-50-B
M103//M201/R///M201/YRM3//BOG.YC 98064-0-8-S-6
M103///M201//YR196/ARDITO
YC 92243J-0-5-B
M103/YRM18
YC 90019J-50-0-B
M103//M401/CALROSE
YC 96106-39-2-B
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-15-B
M102/M103
YC 92076S-301-0-B
VIALONE NANO
Y01/008
M103///M201//YR196/ARDITO
YC 92243J-0-11-B
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-30-B
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-23-B
M104
Y03/009
JARRAH
YC 82003-14-2
M103/YRM54
YC 99114-0-80-9-B
LIMAN
Y98/001
YRM64/NORIN PL11
YC 01012-0-0-9-B
*YRA07=03-10*
*YRE07=19-01*
*YRE07=15-01*
*YUA07=04-28*
*YUA07=05-27*
*YRA07=04-05*
*YRA07=03-04*
*YRA07=02-10*
*YRA07=04-14*
*YRA07=02-16*
*YRA07=03-05*
*YUA07=05-20*
*YRA07=04-09*
*YUA07=17-19*
*YRA07=05-03*
*YRA07=05-02*
*YRA07=03-03*
*YRA07=04-12*
*YRA07=03-12*
*YUA07=17-22*
*YRA07=03-06*
*YRA07=01-06*
*YRA07=05-10*
*YRE07=14-04*
*YUA07=05-21*
*YRA07=03-07*
*YRA07=02-08*
*YRA07=02-01*
*YRA07=05-08*
*YRA07=01-07*
*YRA08=01-06*
*YRA08=01-05*
*YRA08=01-09*
*YRA08=01-10*
*YRA08=02-01*
*YRA08=02-02*
*YRA08=02-03*
*YRA08=02-05*
*YRA08=02-06*
*YRA08=02-07*
*YRA08=02-08*
*YRA08=02-12*
*YRA08=03-01*
*YRA08=03-02*
*YRA08=03-03*
*YRA08=03-04*
*YRA08=03-05*
*YRA08=03-06*
*YRA08=03-08*
*YRA08=03-10*
*YRA08=03-11*
*YRA08=04-02*
*YRA08=04-04*
*YRA08=04-05*
*YRA08=04-07*
*YRA08=04-10*
*YRA08=04-11*
*YRA08=04-12*
*YRA08=05-02*
*YRA08=05-03*
REIZIQ
M401/YRM42//YRM54
NAMAGA///M201/YRM3//BOGAN
KOSHIHIKARI(T)/M202//BOGAN
YRM65//JARRAH/AMAROO
M201//YR196/ARDITO///YRM54
PARAGON
M201//YR196/ARDITO///YRM54
M7/M201//M103
YRM54/YRM61
YRM54/YRM61
YRM65//JARRAH/AMAROO
YRM66
IR65600-42-5-2/MILLIN//YRM63
YRM68
NAMAGA
M201//YR196/ARDITO///YRM54
M201//YR196/ARDITO///YRM54
QUEST_CT19
KOSHIHIKARI/M102//YRM43
YRK4/SR13925-13-1
M201//YR196/ARDITO///YRM54
OPUS/MATSURIBARE
YRM64/NORIN PL11
M201/YRM3//BOGAN///YRM33
M201//YR196/ARDITO///YRM54
YRM54/YRM61
QUEST
M201//YR196/ARDITO///YRM54
MILLIN
YC 86003S-12-0
YC 01038-0-0-118-B
YC 98140S-89-B
YC 00018-0-0-6-S-10
YC 02035-B-2S-19
YC 99113S-57-B
YC 92061-0-13
YC 99113-0-0-8-B
YC 94014-1-59-B
YC 97073S-93-0-B
YC 97073S-93-0-12-B
YC 02035-B-2S-1
YC 92175-1-30
YC 01088-0-0-42-B
YC 92086-1-15
YC 84019-43-3
YC 99113-0-0-57-B
YC 99113-0-0-19-B
YC 86008-96-3
YC 96102-1-94-4-B
YC 97053-1-112-7-B
YC 99113-0-0-100-B
YC 99009-0-0-47-B
YC 01012-0-0-47-B
YC 98039-0-44-9-B
YC 99113-0-0-141-B
YC 97073S-93-0-4-B
YC 86008-96-3
YC 99113-0-0-84-B
YC 82003-28-4
*YRA07=01-16*
*YRA07=04-03*
*YRA07=02-04*
*YRA07=01-02*
*YRA07=01-03*
*YRA07=05-06*
*YRA07=01-11*
*YUA07=17-25*
*YRA07=03-16*
*YUA07=01-28*
*YUA07=20-22*
*YRA07=01-13*
*YRA07=05-05*
*YRB07=03-08*
*YRB07=04-07*
*YRB07=05-08*
*YUB07=14-24*
*YUB07=08-26*
*YRB07=04-02*
*YRB07=05-03*
*YRB07=05-01*
*YUB07=13-20*
*YRB07=01-09*
*YRB07=04-15*
*YRB07=01-16*
*YRD07=04-10*
*YRB07=05-12*
*YRD07=04-02*
*YRB07=04-03*
*YUE07=08-07*
*YRA08=05-04*
*YRA08=05-08*
*YRA08=05-11*
*YRA08=06-01*
*YRA08=06-02*
*YRA08=06-04*
*YRA08=06-05*
*YRA08=06-06*
*YRA08=06-07*
*YRA08=06-08*
*YRA08=06-10*
*YRA08=06-11*
*YRA08=06-12*
*YRB08=01-03*
*YRB08=01-06*
*YRB08=01-09*
*YRB08=01-10*
*YRB08=01-11*
*YRB08=01-16*
*YRB08=02-01*
*YRB08=02-03*
*YRB08=02-04*
*YRB08=02-05*
*YRB08=02-08*
*YRB08=02-10*
*YRB08=02-12*
*YRB08=02-16*
*YRB08=03-03*
*YRB08=03-06*
*YRE08=05-11*
JARRAH
YC 82003-14-2
CALHIKARI
Y03/004
YRM42//BOGAN/M302
YC 97048S-28-0-B
OPUS/MATSURIBARE
YC 99009-0-0-14-B
YRM64
YC 92061-0-59
AMAROO
YC 79011S-0-32
YRM54//ECHUCA/SHIMUZI MOCHIYC 97086-0-43-B
M204/YRM43
YC 94227-B-2S-13-6B-S-5
ILLABONG/YRM54
YC 96027-1-33-B
M401/YRM42//YRM54
YC 01038-0-0-95-B
YRM64/NORIN PL11
YC 01012-0-0-21-B
OPUS
YC 87332-27-7
KOSHIHIKARI
YC 82003-28-4
YRF205/LANGI
YC 98086-0-18-B
DELLMONT/LANGI
YC 98081-0-7-B
DOONGARA/YRL38
YC 95096S-172-0-4-B
LANGI///&(DAWN/K//IR579/K)/P//DOONGARA
YC 02112-B-2S-4
YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-13
YRL34//INGA/M9(5)/PEL///DOONGARA
YC 99225-0-0-11-B
YRL125
YC 92108-1-11
YRL122/4/71011//M9/PEL//YRL29 YC 98090-0-251-B
YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-31
PELDE/GOPALBHOG(4)/YR71048-10//YRL101
YC 95200-34-9
YRF205/LANGI
YC 97074-1-38-B
PELDE/GOPALBHOG(4)/YR71048-10//YRL101
YC 95200-47-3
71011//73/M7///PEL/4/YRL34/5/IR20 YC 99231-0-0-2-B
YRM54/CN892-874/1
YC 98003-0-42-B
INGA/L201//DOONGARA///L202
YC 00184-0-0-37-B
KYEEMA
YC 83110-3-4
M102/M103//YRM42/SR11327-22-3-2YC 02058-B-2S-4
*YRB07=02-12*
*YRB07=05-11*
*YRB07=02-15*
*YRB07=03-17*
*YRD07=04-11*
*YRB07=04-13*
*YRE07=12-02*
*YUB07=02-21*
*YRB07=03-10*
*YRB07=05-09*
*YRB07=02-07*
*YRB07=02-01*
*YRB07=03-04*
*YRB07=05-10*
*YRB07=04-05*
*YRE07=17-02*
*YUE07=05-02*
*YRE07=14-02*
*YUE07=05-03*
*YRE07=11-02*
*YRE07=22-05*
*YRE07=22-01*
*YRE07=18-02*
*YUE07=15-01*
*YRE07=22-03*
*YUE07=11-04*
*YUE07=01-12*
*YRE07=09-05*
*YRE07=20-02*
*YRE07=23-04*
*YRB08=03-08*
*YRB08=03-09*
*YRB08=03-12*
*YRB08=03-13*
*YRB08=04-01*
*YRB08=04-02*
*YRE08=05-07*
*YRB08=04-03*
*YRB08=04-06*
*YRB08=04-07*
*YRB08=04-10*
*YRB08=04-11*
*YRB08=04-16*
*YRB08=05-02*
*YRB08=05-11*
*YRE08=01-01*
*YRE08=01-11*
*YRE08=02-01*
*YRE08=05-06*
*YRE08=02-02*
*YRE08=02-03*
*YRE08=02-12*
*YRE08=02-09*
*YRE08=05-08*
*YRE08=02-10*
*YRE08=02-11*
*YRE08=02-05*
*YRE08=05-05*
*YRE08=03-03*
*YRE08=03-06*
DOONGARA/YRL38
YC 95096S-146-0-B
DOONGARA/SERATUS MALAM YC 00128T-0-2-B
YRL123
YC 91063-1-11
LANGI
YC 82079-66-4
INGA/L201//DOONGARA///L202
YC 00184-0-0-15-B
YRF205/LANGI
YC 98086-0-18-B
QUEST
YC 86008-96-3
YRL39/IR65597-134-2-2//YRL123 YC 02093B-B-2S-2
YRF205/LANGI
YC 98086-0-21-B
INGA/L201//DOONGARA///L202
YC 00184-0-0-52-B
DOONGARA
YC 71048-111-9
YRL118
YC 89198J-1-1
YRF207/L202
YC 99092-0-0-24-B
YRL125_S_CT18
YC 92108-1-11
LANGI/JOJUTLA 4
YC 94070-0-2-B
YRM64/TOYONISHIKI
YC 01013-0-0-58-B
OPUS/4/M7/KITAKOGANE///M201//EIKO/H.NO.1
YC 00248-0-0-36-B
YRM65
YC 92061-0-51
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-9-B
M103/YRM54
YC 99114S-38-B
M103
YU87/001
YRM49//IR65600-42-5-2/MILLIN
YC 01029-0-0-38-B
M103/YRM54
YC 99114S-25-B
OPUS//KOSHIHIKARI (T)/M202
YC 00002-0-0-18-S-5
M103///YRM3/HUNG.NO.1//M401/4/YRM54
YC 00268-0-0-9-B
HITOMEBORE//YRM39/AKITAKOMACHI
YC 00056-0-0-98-S-1
YRM49///ILLABONG/YRM54
YC 02041-B-2S-5
M102//M201/BOGAN///YRM54
YC 99104S-9-B
MILLIN
YC 82003-28-4
JARRAH
YC 82003-14-2
*YUE07=20-13*
*YRE07=14-03*
*YUE07=14-11*
*YUE07=10-03*
*YRE07=10-03*
*YRE07=09-02*
*YUE07=08-02*
*YUE07=01-10*
*YUE07=17-11*
*YRE07=20-03*
*YUE07=02-01*
*YUE07=04-09*
*YRJ07=04-06*
*YUD07=02-14*
*YUD07=02-19*
*YUD07=01-16*
*YRD07=04-03*
*YUD07=05-19*
*YRD07=04-01*
*YRD07=02-05*
*YUD07=14-22*
*YRD07=01-11*
*YRD07=01-07*
*YRD07=03-04*
*YRD07=02-08*
*YRD07=05-12*
*YRD07=05-08*
*YUD07=08-15*
*YRD07=02-02*
*YRD07=02-07*
*YRE08=03-05*
*YRE08=03-09*
*YRE08=05-03*
*YRE08=03-10*
*YRE08=03-11*
*YRE08=04-01*
*YRE08=04-07*
*YRE08=04-09*
*YRE08=04-10*
*YRE08=04-11*
*YUE08=02-14*
*YUE08=02-18*
*YUJ08=13-20*
*YUD08=01-22*
*YUD08=01-15*
*YUD08=01-20*
*YRD08=01-03*
*YRD08=01-04*
*YRD08=01-08*
*YRD08=01-09*
*YRD08=01-10*
*YRD08=01-07*
*YRD08=02-03*
*YRD08=02-04*
*YRD08=02-05*
*YRD08=02-06*
*YRD08=02-08*
*YRD08=02-09*
*YRD08=03-01*
*YRD08=03-02*
M103/YRM54
YC 99114S-11-B
ECHUCA
YC 81121DS
M201/YRM3//BOGAN///OPUS
YC 99226-0-0-33-10-B
M103/YRM49
YC 99219S-7-B
QUEST_CT19
YC 86008-96-3
YRM64/TOYONISHIKI
YC 01013-0-0-47-B
M103/HITOMEBORE
YC 98052-0-51-2-S-11
YRM54//YRK4/KOSHIHIKARI (TYNAN)
YC 02020-B-2S-5
OPUS//KOSHIHIKARI (T)/M202
YC 00002-0-0-18-S-8
YRM54/M202
YC 97027S-22-0-B
MILLIN
YC 82003-28-4
JARRAH
YC 82003-14-2
SPRINT
Y98/005
L205
Y03/008
YRL113
YC 89045J-0-17
YRL118
YC 89198J-1-1
YRL111
YC 89097-0-55
L202///BASMATI 370/PELDE//BASMATI
YC 01102-0-0-40-B
370
YRL113//H263-9-1-1/YRL34
YC 00158-0-0-19-B
LANGI/IR66167-27-5-1-6
YC 97181-0-28-8-B
THAIBONNET/YRL101
YC 98073-0-28-4-S-6
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-65-B
L205
Y03/008
YRL118
YC 89198J-1-1
LANGI/IR 65600-38-1-2
YC 95341-1-95-4-B
213D.25/83//M7/IRR.ING///YRL38 YC 95146-2-0-6-B
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-35-B
DELLMONT//BASMATI 370/PELDEYC 00210-0-24-S-16
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-13-B
LANGI/INGA//PELDE
YC 99217-0-0-23-3-B
*YRD07=02-04*
*YRD07=05-11*
*YRD07=01-03*
*YRD07=01-04*
*YRD07=03-05*
*YUD07=03-22*
*YRD07=03-07*
*YRD07=03-08*
*YUD07=06-17*
*YRD07=02-03*
*YRD07=01-05*
*YUD07=03-20*
*YRD07=03-10*
*YRD07=05-04*
*YRD07=05-02*
*YUD07=11-14*
*YRD07=01-10*
*YUD07=05-18*
*YRD07=03-11*
*YRD07=05-07*
*YRD07=01-02*
*YRD07=05-10*
*YRD07=02-10*
*YRD07=03-02*
*YRD07=02-06*
*YRD08=03-04*
*YRD08=03-05*
*YRD08=03-06*
*YRD08=03-07*
*YRD08=03-10*
*YRD08=04-01*
*YRD08=04-02*
*YRD08=04-03*
*YRD08=04-04*
*YRD08=04-05*
*YRD08=04-08*
*YRD08=04-09*
*YRD08=05-01*
*YRD08=05-03*
*YRD08=05-05*
*YRD08=05-06*
*YRD08=05-07*
*YRD08=05-09*
*YRD08=05-10*
*YRD08=06-01*
*YRD08=06-02*
*YRD08=06-04*
*YRD08=06-06*
*YRD08=06-09*
*YRD08=06-10*
LANGI/INGA//PELDE
YC 99217-0-0-23-5-B
LANGI/LAGRUE
YC 95073-0-8-B
(PELDE*2/CALROSE76)*2//DOONGARA
YC 99248S-10-B
YRL101//IR72/YRL39
YC 97098-0-151-B
YRL122/THAIBONNET
YC 98110-0-42-3-B
M103/DOONGARA DH1
YC 03370DH-15
L203/YRL39//YRL101
YC 97107-0-23-14-B
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-72-B
YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-46-B
YRL113
YC 89045J-0-17
LANGI/IR 65600-134-2-2
YC 95342-1-1-7-B
YRF205/LANGI
YC 98086-0-4-B
L202/DOONGARA
YC 99247-0-0-17-B
BBL//M9/P///YRL30/4/LANGI/INGA YC 99222-0-0-84-B
YRL101/4/YRL39///213D.25/YR83//M7/IRR.INGA
YC 00153-0-0-25-B
M103/DOONGARA DH1
YC 03370DH-24
YRB90 V31/YRL34
YC 90041J-1-24-B
I/M9(5)/3/M101/73//P(2)/4/I/5/YRL101YC 99186-0-0-17-S-6
LANGI/IR65600.27.1.2.2.2
YC 97182A-0-2-4-B
GULFMONT//YRL39/IR65597-134-2-2
YC 1080-0-0-24-B
YRL39/IR65587-134-2-2//YRL113 YC 01084-0-0-42-B
71048.166//R/IR36///I/M9(5)/P
YC 92164-3-B
LANGI/IR66167-27-5-1-6
YC 97181-0-28-10-B
L202///BASMATI 370/PELDE//BASMATI
YC 01102-0-0-32-B
370
L203
YU85/001
Appendix 6. Name and characteristics of SNPs genotyped in the rice population.
No
Gene
SNP ID*
Expected
SNP
G/T
T/C
G/T
C/T
C/A
G/T
G/T
T/G
A/C
C/T
G/A
T/C
G/T
SNP Assayed†
Association with Physiochemical traits
Status
TBGU388647
TBGI050742
TBGU168031
TBGU168032
TBGU168027
TBGU168024
TBGU168039
WAXYEXIN1
WAXYEX6
WAXYEX10
GBSSII_GA_1638
TBGU272768
SSIIa_GA_Ref631
Coordinates on
gDNA
233
1507
2501
2920
1001
176
5514
246
2494
3486
1638
5153
631
1
2
3
4
5
6
7
8
9
10
11
12
13
AGPS2b
AGPS2b
SPHOL
SPHOL
SPHOL
SPHOL
SPHOL
GBSSI
GBSSI
GBSSI
GBSSII
SSI
SSIIa
G/G
T/T
G/G
C/C
C/C
G/G
G/G
T/G
A/C
C/T
G/A
T/C
G/T
N/A
N/A
N/A
N/A
N/A
N/A
N/A
P1,BD,FV,SB,MT,AC,PN,Dif
SB,BD,MT,AC
T1,FV,SB,MT,AC,PN,Dif
PT, GT
FV,SBV,MT
N/A
No polymorphism
No polymorphism
No polymorphism
No polymorphism
No polymorphism
No polymorphism
No polymorphism
Highly associated
Highly associated
Highly associated
Low-Medium association
Low-Medium association
No association
14
15
16
17
SSIIa
SSIIb
SSIIb
SSIIb
ALKSSIIA4
TBGU116115
TBGU116120
TBGU116121
4827-4828
3416
3948
3979
GC/TT
A/G
G/C
T/C
GC/TT
A/A
G/G
T/T
BDV,SB,PKT,PT,GT,CHK
N/A
N/A
N/A
Highly associated
No polymorphism
No polymorphism
No polymorphism
18
19
20
21
SSIIb
SSIIb
SSIIb
SSIIIa
TBGU116109
TBGU116119
TBGU116116
GA_Ref1058
330
3946
3487
1058
G/A
C/T
T/G
T/A
A/A
C/C
T/T
T/A
N/A
N/A
N/A
PT,MT,
No polymorphism
No polymorphism
No polymorphism
Low-Medium association
22
23
24
SSIIIa
SSIIIa
SSIIIa
GA_Ref1680
GA_Ref3136
GA_Ref3391
1680
3136
3391
G/A
G/A
T/A
G/A
G/A
T/A
SBV,PT,MT,AC,PN,Dif,GT
N/A
N/A
Low associated
No association
No association
25
26
27
28
29
30
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
SSIIIa
GA_Ref3559
GA_Ref4384
GA_Ref1379
GA_Ref1708
GA_Ref3274
GA_Ref6242
3559
4384
1379
1708
3274
6242
T/A
G/A
A/C
G/A
G/A
T/C
T/A
G/A
A/C
G/A
G/A
T/C
CHK
N/A
FV,SBV,PT,MT,AC,PN,Dif
MT,AC,PN,Dif,GT
N/A
N/A
Low association
No association
Low-Medium association
Low-Medium association
No association
No association
31
32
33
SSIIIa
SSIIIa
SSIIIa
GA_Ref1457
GA_Ref1615
GA_Ref1834
1457
1615
1834
A/C
C/T
C/T
A/C
C/T
C/T
N/A
N/A
N/A
No association
No association
No association
34
35
SSIIIa
SSIIIa
GA_Ref2758
GA_Ref1722ER
2758
1722
G/A
G/A
G/A
G/A
N/A
FV,SBV,PT,MT,AC,PN,Dif,GT
No association
Low-Medium association
36
37
38
SSIIIa
SSIIIa
SSIIIa
GA_Ref2488
GA_Ref3073
GA_Ref1357
2488
3073
1357
C/T
G/A
G/A
C/T
G/A
G/A
N/A
N/A
MT
No association
No association
No association
39
40
SSIIIa
SSIIIa
GA_Ref2080
GA_Ref3481
2080
3481
C/T
G/A
C/T
G/A
N/A
N/A
No association
No association
41
42
SSIIIa
SSIIIa
GA_Ref5466
GA_Ref10761
5466
10761
G/A
C/T
G/A
C/T
FV,SBV,PT,MT,AC,PN,Dif
PT
Low-Medium association
Low association
43
SSIIIb
GA_Ref1315
1315
T/C
T/C
PT
Medium association
44
45
SSIIIb
SSIIIb
GA_Ref4543
GA_Ref5451
4543
5451
C/A
T/C
C/A
T/C
PT
PT
Medium association
Medium association
46
47
48
49
SSIIIb
SSIIIb
SSIIIb
SSIVa
GA_Ref7232
GA_Ref7255ER
GA_Ref7437
GA_Ref4048
3232
7255
7437
4048
T/G
C/A
A/C
C/T
T/G
C/A
A/C
C/T
PT
PKV
PT, Dif
PT,GT
Medium-High association
Medium association
Low-Medium association
Low-Medium association
50
51
SSIVa
SSIVa
GA_Ref7160
GA_Ref7506
7160
7506
A/G
A/T
A/G
A/T
PKT,PT,AC,PN,GT
PT,GT
Low-Medium association
Low-Medium association
52
SSIVa
GA_Ref7823
7823
T/C
T/C
PT,GT
Low-Medium association
53
54
55
56
57
SSIVa
SSIVb
SSIVb
BEI
BEIIa
GA_Ref8383
TBGU260749
TBGU260765
GA_Ref1558
GA_Ref3266
8383
5090
9525
1558
3266
C/A
G/C
G/A
C/T
T/G
C/A
G/G
G/G
C/T
T/G
PT,GT
N/A
N/A
PV,BDV,FV,SBV,PT,MT,AC,PN,Dif
N/A
Medium association
No polymorphism
No polymorphism
Low-Medium association
No association
58
BEIIb
GA_Ref9035
9035
C/T
C/T
N/A
No association
59
60
BEIIb
ISA1
GA_Ref10068
TBGU362347
10068
1748
C/A
G/A
C/A
G/G
N/A
N/A
No association
No polymorphism
61
62
63
64
65
ISA1
ISA2
ISA2
Pullulanase
Pullulanase
TBGU362346
Iso2_GA_Ref960
Iso2_GA_Ref1712
TBGU185983
TBGU185989
1746
960
1712
1938
2380
C/G
T/C
C/A
G/A
T/C
C/C
T/C
C/A
G/A
T/C
N/A
BDV, PT, CHK
BDV, PT, CHK
PT, GT
CHK
No polymorphism
Low association
Low association
Low association
Low association
*SNP identification can be found from Kharabian-Masouleh et al., 2011 (starting with GA code) or OryzaSNP MSU database (http://oryzasnp.plantbiology.msu.edu/) starting
with TBG or TBU codes.
†Homozygosity of SNP calls mean no polymorphism in the corresponding allele.
MT=Martin test (retrogradation), PN=Predicted N, Dif=Difference, CHK=Chalkiness (%).
Appendix 7. The results of association study among 13 different physiochemical traits and SNPs of 18 different genes. The most important columns are F-test and R2 Marker.
Trait
Locus/SNP
df_MarkerF-test
AGPS2b
Section 1
No Functional Polymorphism found in this gene
SPHOL
Section 2
No Functional Polymorphism found in this gene
GBSSI
Section 3
Peak1
WAXYEXIN1
2
34.346
Trough1
WAXYEX10
1 36.9498
Breakdown
WAXYEXIN1
2 35.1893
Breakdown
WAXYEX10
1 18.9223
Final Viscosity WAXYEXIN1
2 15.0534
Final Viscosity WAXYEX10
1 106.0684
Setback
WAXYEXIN1
2 76.2739
Setback
WAXYEX10
1 59.8068
Martin_N
WAXYEXIN1
2 223.2942
Martin_N
WAXYEX10
1 147.7825
Martin_N
WAXYEX6
1 16.8014
AC_percent
WAXYEXIN1
2 121.5295
AC_percent
WAXYEX10
1 44.0661
AC_percent
WAXYEX6
1 16.2252
predicted_N
WAXYEXIN1
2 121.5429
predicted_N
WAXYEX10
1
43.967
predicted_N
WAXYEX6
1 16.3841
diff
WAXYEXIN1
2
54.612
diff
WAXYEX10
1 97.1222
GBSSII
Section 4
Past_temp
GBSSII_GA_Ref1638
2 27.8519
GT
GBSSII_GA_Ref1638
2
9.7254
SSI
Section 5
Trough1
SSI_TBGU272768_5153
1 14.2713
FinalVisc
SSI_TBGU272768_5153
1 43.6138
Setback
SSI_TBGU272768_5153
1 28.8805
Martin_N
SSI_TBGU272768_5153
1 45.7145
AC_percent
SSI_TBGU272768_5153
1 20.5891
SSI_TBGU272768_5153
1 20.4244
predicted_N
diff
SSI_TBGU272768_5153
1 22.1635
SSIIa
Section 6
Breakdown
ALKSSIIA4
2 22.4536
p-value
#perm_Marker
p-perm_Marker
p-adjusted df_Model
value
df_Error MS_Error
9.99E-04
9.99E-04
9.99E-04
0.002
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
0.028
9.99E-04
9.99E-04
0.026
9.99E-04
9.99E-04
0.023
9.99E-04
9.99E-04
R2_ModelR2_Marker
8.87E-14
4.99E-09
4.64E-14
2.05E-05
7.18E-07
1.06E-20
3.89E-26
3.26E-13
1.29E-54
1.37E-26
5.76E-05
9.63E-37
2.26E-10
7.64E-05
9.56E-37
2.36E-10
7.07E-05
3.92E-20
2.44E-19
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
2
1
2
1
2
1
2
1
2
1
1
2
1
1
2
1
1
2
1
230
230
230
230
230
230
230
230
230
230
230
230
230
230
230
230
230
230
230
38700.419
16578.651
33032.8292
39440.6237
36055.0683
27888.2408
41231.9601
54241.7925
0.041
0.0733
0.0974
1.2099
2.0869
2.2559
0.0199
0.0344
0.0372
0.042
0.0436
0.23
0.1384
0.2343
0.076
0.1157
0.3156
0.3988
0.2064
0.6601
0.3912
0.0681
0.5138
0.1608
0.0659
0.5138
0.1605
0.0665
0.322
0.2969
0.23
0.1384
0.2343
0.076
0.1157
0.3156
0.3988
0.2064
0.6601
0.3912
0.0681
0.5138
0.1608
0.0659
0.5138
0.1605
0.0665
0.322
0.2969
1.67E-11
9.57E-05
1000 9.99E-04 9.99E-04
1000
0.004 9.99E-04
2
2
219
188
15.5428
48.9948
0.2028
0.0938
0.2028
0.0938
2.02E-04
2.80E-10
1.91E-07
1.14E-10
9.23E-06
9.98E-06
4.36E-06
1000 9.99E-04 9.99E-04
1000 9.99E-04 9.99E-04
1000 9.99E-04 9.99E-04
1000
0.011 9.99E-04
1000
0.009 9.99E-04
1000
0.012 9.99E-04
1000
0.004 9.99E-04
1
1
1
1
1
1
1
227
227
227
227
227
227
227
16583.9218
28064.635
54692.8998
0.0718
2.1071
0.0347
0.042
0.0592
0.1612
0.1129
0.1676
0.0832
0.0825
0.089
0.0592
0.1612
0.1129
0.1676
0.0832
0.0825
0.089
1.32E-09
1000 9.99E-04 9.99E-04
2
222
36814.9988
0.1682
0.1682
Setback
ALKSSIIA4
PeakTime
ALKSSIIA4
Past_temp
ALKSSIIA4
GT
ALKSSIIA4
Chalk%
ALKSSIIA4
SSIIb
Section 7
No Functional Polymorphism found in this gene
SSIIIa
Section 8
FinalVisc
SSIIIa_GA_Ref1379
FinalVisc
SSIIIa_GA_Ref1722ER
FinalVisc
SSIIIa_GA_Ref5466
Setback
SSIIIa_GA_Ref1680
Setback
SSIIIa_GA_Ref1379
Setback
SSIIIa_GA_Ref1722ER
Setback
SSIIIa_GA_Ref5466
Past_temp
SSIIIa_GA_Ref1058
Past_temp
SSIIIa_GA_Ref1680
Past_temp
SSIIIa_GA_Ref1379
Past_temp
SSIIIa_GA_Ref1722ER
Past_temp
SSIIIa_GA_Ref10761
Past_temp
SSIIIa_GA_Ref5466
Martin_N
SSIIIa_GA_Ref1058
Martin_N
SSIIIa_GA_Ref1680
Martin_N
SSIIIa_GA_Ref1379
Martin_N
SSIIIa_GA_Ref1708
Martin_N
SSIIIa_GA_Ref1722ER
Martin_N
SSIIIa_GA_Ref1357
Martin_N
SSIIIa_GA_Ref5466
AC_percent
SSIIIa_GA_Ref1680
AC_percent
SSIIIa_GA_Ref1379
AC_percent
SSIIIa_GA_Ref1708
AC_percent
SSIIIa_GA_Ref1722ER
AC_percent
SSIIIa_GA_Ref5466
predicted_N
SSIIIa_GA_Ref1680
predicted_N
SSIIIa_GA_Ref1379
predicted_N
SSIIIa_GA_Ref1708
predicted_N
SSIIIa_GA_Ref1722ER
predicted_N
SSIIIa_GA_Ref5466
diff
SSIIIa_GA_Ref1680
diff
SSIIIa_GA_Ref1379
2
5.7025
0.0038
2 53.0867 1.44E-19
2 199.6523 2.45E-50
2
32.806 5.55E-13
2
8.9273 1.87E-04
1000
0.007
0.0659
1000 9.99E-04 9.99E-04
1000 9.99E-04 9.99E-04
1000
0.004 9.99E-04
1000
0.005 9.99E-04
2
2
2
2
2
222
222
222
192
222
67276.7238
0.0136
6.9583
40.2842
91.0399
0.0489
0.3235
0.6427
0.2547
0.0744
0.0489
0.3235
0.6427
0.2547
0.0744
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
222
226
225
225
222
226
225
224
225
222
226
218
225
224
225
222
221
226
223
225
225
222
221
226
225
225
222
221
226
225
225
222
37968.6918
38124.2411
38444.9081
57537.4747
62943.584
63363.2621
64066.5547
17.9321
18.1252
17.9153
17.9836
18.0475
17.7556
0.0778
0.0727
0.0994
0.0766
0.1035
0.1158
0.104
2.1222
2.2508
2.1493
2.298
2.2906
0.035
0.0371
0.0354
0.0379
0.0378
0.0423
0.0552
0.0753
0.0723
0.0736
0.0655
0.0948
0.0821
0.0745
0.0722
0.0622
0.0667
0.0763
0.062
0.0722
0.1058
0.1564
0.2002
0.1301
0.1546
0.0639
0.1536
0.084
0.1136
0.0779
0.0864
0.0902
0.0837
0.1135
0.0777
0.0862
0.09
0.0787
0.1305
0.0753
0.0723
0.0736
0.0655
0.0948
0.0821
0.0745
0.0722
0.0622
0.0667
0.0763
0.062
0.0722
0.1058
0.1564
0.2002
0.1301
0.1546
0.0639
0.1536
0.084
0.1136
0.0779
0.0864
0.0902
0.0837
0.1135
0.0777
0.0862
0.09
0.0787
0.1305
9.0413
8.8028
8.9423
7.8821
11.6269
10.1037
9.0543
8.7158
7.4574
7.9273
9.3315
7.2026
8.756
13.2478
20.8545
27.7893
16.5211
20.6652
7.6136
20.4182
10.3167
14.2201
9.3351
10.6866
11.1556
10.2716
14.2099
9.3091
10.6615
11.1281
9.6109
16.6551
1.68E-04
2.08E-04
1.83E-04
4.91E-04
1.58E-05
6.27E-05
1.65E-04
2.26E-04
7.31E-04
4.73E-04
1.28E-04
9.35E-04
2.18E-04
3.65E-06
4.91E-09
1.70E-11
2.06E-07
5.73E-09
6.33E-04
7.10E-09
5.17E-05
1.55E-06
1.28E-04
3.68E-05
2.40E-05
5.38E-05
1.56E-06
1.31E-04
3.76E-05
2.46E-05
9.88E-05
1.82E-07
0.013
0.006
0.01
0.003
0.002
9.99E-04
0.015
9.99E-04
9.99E-04
0.002
9.99E-04
0.002
9.99E-04
0.006
0.002
0.002
0.002
9.99E-04
0.03
0.003
0.002
9.99E-04
0.003
0.002
0.006
0.003
9.99E-04
0.007
0.004
0.006
0.005
0.007
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
diff
diff
diff
GT
GT
GT
Chalk%
SSIIIb
Peak Viscosity
Past_temp
Past_temp
Past_temp
Past_temp
Past_temp
Past_temp
diff
SSIVa
PeakTime
Past_temp
Past_temp
Past_temp
Past_temp
Past_temp
AC_percent
predicted_N
GT
GT
GT
GT
GT
SSIVb
SSIIIa_GA_Ref1708
SSIIIa_GA_Ref1722ER
SSIIIa_GA_Ref5466
SSIIIa_GA_Ref1680
SSIIIa_GA_Ref1708
SSIIIa_GA_Ref1722ER
SSIIIa_GA_Ref3559
Section 9
SSIIIb_GA_Ref7255ER
SSIIIb_GA_Ref4543
SSIIIb_GA_Ref5451
SSIIIb_GA_Ref1315
SSIIIb_GA_Ref7232
SSIIIb_GA_Ref7255ER
SSIIIb_GA_Ref7437
SSIIIb_GA_Ref7437
Section 10
SSIva_GA_Ref7160
SSIva_GA_Ref4048
SSIva_GA_Ref7160
SSIva_GA_Ref7823
SSIva_GA_Ref8383
SSIva_GA_Ref7506
SSIva_GA_Ref7160
SSIva_GA_Ref7160
SSIva_GA_Ref4048
SSIva_GA_Ref7160
SSIva_GA_Ref7823
SSIva_GA_Ref8383
SSIva_GA_Ref7506
Section 11
SSIvb_TBGU260749_5090
SSIvb_TBGU260765_9525
BEI
Section 12
Peak Viscosity BEI_GA_Ref1558
Breakdown Viscosity
BEI_GA_Ref1558
FinalViscosity BEI_GA_Ref1558
Setback ViscosityBEI_GA_Ref1558
Past_temp
BEI_GA_Ref1558
Martin_N
BEI_GA_Ref1558
2
2
2
2
2
2
2
7.166
12.0666
12.3189
10.0271
30.2791
15.2535
8.9878
9.65E-04
1.05E-05
8.38E-06
7.21E-05
3.99E-12
7.07E-07
1.83E-04
1000
0.012
1000
0.003
1000
0.005
1000
0.024
1000
0.006
1000
0.013
1000 9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
2
2
2
2
2
2
2
221
226
225
192
188
193
201
0.044
0.0566
0.0566
48.6731
40.6703
46.254
95.3296
0.0609
0.0965
0.0987
0.0946
0.2436
0.1365
0.0821
0.0609
0.0965
0.0987
0.0946
0.2436
0.1365
0.0821
2
2
2
2
1
2
2
2
7.7442
21.3553
23.0673
25.0653
41.4018
29.1937
21.0809
7.338
5.64E-04
7.21E-09
8.32E-10
1.55E-10
5.87E-09
5.91E-12
4.03E-09
8.17E-04
1000
1000
1000
1000
1000
1000
1000
1000
0.002
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
0.01
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
2
2
2
2
1
2
2
2
217
147
216
221
90
217
226
226
47808.4024
15.5668
15.4824
15.4957
13.8698
15.0332
16.2001
0.0591
0.0666
0.2251
0.176
0.1849
0.3151
0.212
0.1572
0.061
0.0666
0.2251
0.176
0.1849
0.3151
0.212
0.1572
0.061
2
2
2
2
2
2
2
2
2
2
2
2
2
10.7899
27.6864
39.5053
30.2856
30.8007
29.3874
9.1222
9.077
8.5371
19.7873
10.209
10.4426
10.6137
3.35E-05
1.82E-11
1.97E-15
2.41E-12
1.73E-12
4.41E-12
1.55E-04
1.62E-04
2.82E-04
1.54E-08
6.18E-05
5.05E-05
4.21E-05
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
0.002
0.019
9.99E-04
0.015
0.01
0.012
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
9.99E-04
2
2
2
2
2
2
2
2
2
2
2
2
2
225
223
225
220
215
228
225
225
190
192
188
185
195
0.0183
15.3704
14.1948
15.0459
15.2886
15.3101
2.3238
0.0383
49.075
44.2873
48.1588
48.9202
48.0344
0.0875
0.1989
0.2599
0.2159
0.2227
0.205
0.075
0.0747
0.0825
0.1709
0.098
0.1014
0.0982
0.0875
0.1989
0.2599
0.2159
0.2227
0.205
0.075
0.0747
0.0825
0.1709
0.098
0.1014
0.0982
1000
0.006 9.99E-04
1000 9.99E-04 9.99E-04
1000
0.013 9.99E-04
1000 9.99E-04 9.99E-04
1000 9.99E-04 9.99E-04
1000
0.004 9.99E-04
2
2
2
2
2
2
221
221
221
221
221
221
46937.7885
39184.4825
37058.8842
53674.1793
18.1687
0.0951
0.0796
0.092
0.1054
0.2255
0.0708
0.2383
0.0796
0.092
0.1054
0.2255
0.0708
0.2383
No polymorphism detected
2
2
2
2
2
2
9.5546
11.2003
13.0129
32.1812
8.4131
34.5608
1.05E-04
2.33E-05
4.54E-06
5.42E-13
3.01E-04
8.71E-14
AC_percent
predicted_N
diff
BEIIa
Iso1
Iso2
Breakdown
Breakdown
Past_temp
Past_temp
Chalk%
Chalk%
Pullulanase
Past_temp
GT
Chalk%
BEI_GA_Ref1558
BEI_GA_Ref1558
BEI_GA_Ref1558
Section 13
BEIIa_GA_Ref3266
Section 14
BEIIb_GA_Ref9035
BEIIb_GA_Ref10068
Section 15
TBGU362347_1748ER
TBGU362346_1746EF
Section 16
Iso2_GA_Ref1712
Iso2_GA_Ref960
Iso2_GA_Ref1712
Iso2_GA_Ref960
Iso2_GA_Ref1712
Is02_GA_Ref960
Section 17
Pullu_TBGU185983_1938
Pullu_TBGU185983_1938
Pullu_TBGU185989_2380
2
2
2
38.8652 3.44E-15
39.1031 2.89E-15
8.6861 2.34E-04
1000 9.99E-04 9.99E-04
1000 9.99E-04 9.99E-04
1000
0.005 9.99E-04
2
2
2
221
221
221
1.8804
0.031
0.0592
0.2602
0.2614
0.0729
0.2602
0.2614
0.0729
No significant association with starch properties was observed for this SNP
No significant association with starch properties was observed for this SNP
No polymorphism detected
2
2
2
2
2
2
2
2
2
8.2378
9.0028
7.8355
7.8341
7.2855
8.2391
3.66E-04
1.75E-04
5.31E-04
5.18E-04
8.85E-04
3.55E-04
1000
0.002 9.99E-04
1000
0.002 9.99E-04
1000 9.99E-04 9.99E-04
1000 9.99E-04 9.99E-04
1000 9.99E-04 9.99E-04
1000
0.002 9.99E-04
2
2
2
2
2
2
198
219
198
219
198
219
39338.6699
38822.3178
17.8716
18.0195
100.3691
96.6328
0.0768
0.076
0.0733
0.0668
0.0685
0.07
0.0768
0.076
0.0733
0.0668
0.0685
0.07
23.5989 5.05E-10
19.1496 2.63E-08
7.5266 6.96E-04
1000 9.99E-04 9.99E-04
1000 9.99E-04 9.99E-04
1000
0.002 9.99E-04
2
2
2
223
191
211
15.7171
23.0156
96.2102
0.1747
0.167
0.0666
0.1747
0.167
0.0666
Appendix 8: The linkage map of 17 starch-related genes, showing the approximate location of gene on chromosomes (Chr).
The red lines show the exact location of gene on chromosomes.
SSIVa
SSIIb
BEIIb
SPHOL
Pullulanase
BEIIa
SSIIIb
ISA2
GBSSI
SSI
SSIIa
BEI
GBSSII
GPT1
SSIIIa
ISA1
AGPS2b