The nonsynonymous/synonymous substitution rate ratio

MBE Advance Access published July 25, 2007
1
The nonsynonymous/synonymous substitution rate ratio versus the radical/conservative
2
replacement rate ratio in the evolution of mammalian genes
3
4
Kousuke Hanada1,2, Shin-Han Shiu2 and Wen-Hsiung Li1*
5
6
1. Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637
7
2. Department of Plant Biology, Michigan State University, East Lansing, MI 48824
8
9
Running head:
Ka/Ks ratio vs radical/conservative replacement ratio
10
11
Key words: positive selection, radical substitution, conservative substitution, classification of
12
amino acids, development.
13
14
*Corresponding author.
15
Wen-Hsiung Li, Department of Ecology and Evolution, University of Chicago 1101 East 57th
16
Street, Chicago, IL, 60637, USA.
17
Tel: +1- 773-702-3104. Fax: +1- 773-702-9740. E-mail: [email protected]
18
1
 The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology
and Evolution. All rights reserved. For permissions, please e-mail: [email protected]
1
Abstract
2
There are two ways to infer selection pressures in the evolution of protein-coding genes:
3
the nonsynonymous and synonymous substitution rate ratio (KA/KS) and the radical and
4
conservative amino acid replacement rate ratio (KR/KC). Since the KR/KC ratio depends on the
5
definition of radical and conservative changes in the classification of amino acids, we develop an
6
amino acid classification that maximizes the correlation between KA/KS and KR/KC. An analysis
7
of 3,375 orthologous gene groups among five mammalian species shows that our classification
8
gives a significantly higher correlation coefficient between the two ratios than those of existing
9
classifications. However, there are many orthologous gene groups with a low KA/KS but a high
10
KR/KC ratio. Examining the functions of these genes, we found an overrepresentation of
11
functional categories related to development. To determine if the over-representation is stage
12
specific, we examined the expression patterns of these genes at different developmental stages of
13
the mouse. Interestingly, these genes are highly expressed in the early middle stage of
14
development (Blastocyst to Amnion). It is commonly thought that developmental genes tend to
15
be conservative in evolution, but some molecular changes in developmental stages should have
16
contributed to morphological divergence in adult mammals. Therefore, we propose that the
17
relaxed pressures indicated by the KR/KC ratio but not by KA/KS in the early middle stage of
18
development may be important for the morphological divergence of mammals at the adult stage,
19
while purifying selection detected by KA/KS occurs in the early middle developmental stage.
20
2
Introduction
1
2
Selection pressure on protein-coding sequences is commonly estimated by the ratio of
3
the nonsynonymous substitution rate (KA) to the synonymous substitution rate (KS) (Li and
4
Gojobori 1983; Hughes and Nei 1988). If the KA/KS ratio is higher than 1, positive selection is
5
assumed to have occurred during the evolution of the sequence. The ratio of the radical
6
replacement rate (KR) to the conservative replacement rate (KC) has also been used to detect
7
positive selection (Hughes, Ota, and Nei 1990). The KR/KC ratio is useful for examining selection
8
pressure in distantly related protein-coding sequences because the KA/KS ratio cannot be
9
accurately estimated in this case due to saturation of KS (Gojobori 1983; Smith and Smith 1996).
10
Since there are two ways of inferring selection pressure on a sequence, an open question is
11
whether these two approaches give the same conclusion or not. Zhang (2000) and Smith (2003)
12
found that KA/KS is correlated with KR/KC based on the amino acid classification that considers
13
polarity and volume, using 47 mammalian and 25 Drosophila genes. However, there are several
14
types of amino acid classifications and it is not known which classification gives a KR/KC
15
measure that best correlates with the KA/KS ratio. Therefore, we do not know the degree of
16
correlation between the two ratios in general.
17
In the present study, we searched for an amino acid classification that gives the best
18
correlation between the two ratios. This amino acid classification is useful because the KR/KC
19
ratio based on this classification can identify genes undergoing similar selection pressures
20
inferred by the KA/KS ratio between distant protein-coding sequences.
21
Another issue is that it is likely that the two ratios are not completely correlated even if
22
the amino acid classification that gives the maximum correlation between the two ratios is used.
23
To address the differences between the selection pressures inferred by KA/KS and KR/KC in the
24
evolution of mammalian genes, we examined functions of genes that showed different selection
25
pressures inferred by the two ratios, using Gene Ontology (GO) categories and expression data of
26
a representative mammal, the mouse.
27
28
Materials & Methods
29
30
Construction of orthologous groups
cDNA data of five mammalian species were retrieved from the Ensembl database
31
(www.ensembl.org): Homo sapiens (NCBI35.may), Pan troglodytes (CHIMP1.may), Mus
3
1
musculus
(NCBIM33.may),
Rattus
norvegicus
(RGSC3.4.may)
and
Canis
familiaris
2
(BROADD1.may). Reciprocal best hits between every combination of two species were
3
identified with Blastp (Altschul et al. 1997). For sequences that are reciprocal best hits among all
4
species combinations (Fig. 1A), they were considered as an orthologous group among the five
5
species. 3,533 putative orthologous groups were constructed according to the procedure. To
6
further verify the 3,533 orthologous groups, phylogenetic trees were constructed using the protein
7
sequence alignments of members in an orthologous group by the neighbor-joining (NJ) method
8
(Saitou and Nei 1987; Thompson, Higgins, and Gibson 1994). When the topology was different
9
from the species tree, the data set was removed from the orthologous data (Fig 1B). The total
10
number of orthologous groups was reduced to 3,375. For the numbers of nucleotide sites used in
11
these orthologous groups, the interquartile range (25%-75%) and the median number of
12
nucleotide sites are 477.0-1175.0 and 756.0, respectively.
13
The orthologous gene groups in the five mammalian species were determined as follows.
14
The orthologous gene data were carefully constructed to reduce errors for estimating nucleotide
15
and amino substitutions. Only segments aligned among the five species without any gaps were
16
used for the calculation of the KA/KS and KR/KC ratios.
17
18
19
Estimation of KA/KS and KR/KC in each orthologous gene set
A phylogenetic tree was reconstructed for each orthologous gene group by the NJ
20
method (Saitou and Nei 1987). The ancestral sequence was inferred at each node in the
21
phylogenetic tree using the maximum likelihood method (Yang, Kumar, and Nei 1995). The
22
transition/transversion ratio was estimated in each orthologous group and the ratio was then used
23
to estimate KA and KS in all branches in the phylogenetic tree by the modified Nei-Gojobori
24
method (Zhang, Rosenberg, and Nei 1998). The sums of KA and KS of all branches were used to
25
determine the KA/KS ratio in each orthologous gene group.
26
Radical and conservative changes were defined by a classification (A) that gave the best
27
correlation between KR/KC and KA/KS and also by three previous classifications with respect to
28
the chemical properties: (B) polarity and volume, (C) charge and aromaticity, and (D) charge and
29
polarity (Zhang 2000; Hanada, Gojobori, and Li 2006) (Table 1). These so-called
30
physicochemical properties (aromaticity, charge, polarity, and volume) are thought to be relevant
31
for the evolution of proteins (Grantham 1974; Miyata, Miyazawa, and Yasunaga 1979). Based on
32
the ancestral sequences inferred at all nodes in the phylogenetic tree of each orthologous group,
4
1
KR and KC were estimated in all branches in the phylogenetic tree by the Zhang method (Zhang
2
2000). The sums of branch lengths that reflected KR and KC were used to determine the KR/KC
3
ratio in each orthologous group. Average KA, KS, KR and KC in each branch of species tree among
4
3,375 orthologous groups are given in Supplement A.
5
6
7
Construction of a new amino acid classification
To estimate the average KA/KS ratio for each amino acid replacement, we collected from
8
the orthologous gene groups the amino acid replacements that had occurred. The average KA/KS
9
ratio for each type of amino acid replacement is defined to be the average KA/KS ratio in the
10
collected orthologous gene groups. The average KA/KS ratios were estimated for each of the 75
11
kinds of amino acid replacement occurring by single nucleotide substitution. Since the amino
12
acid replacement having a low (high) KA/KS ratio should tend to be a conservative (radical)
13
change in the highly associated classification, radical and conservative scores were numbered for
14
75 types of amino acid replacement in descending (ascending) order of KA/KS (Supplement B).
15
Using the radical and conservative scores for the 75 types of amino acid replacement, we
16
calculated the totals of radical and conservative scores for each amino acid classification. To find
17
an amino acid classification that would give the maximum correlation between KR/KC and KA/KS,
18
amino acids were classified into two to five groups in all possible combinations and we identified
19
the classification with the highest score. The new classification is regarded as the amino acid
20
classification that can more adequately characterize the relationship between KA/KS and KR/KC.
21
22
23
Functional categories by Gene Ontology.
Orthologous gene groups with the top and bottom 10 % KA/KS or KR/KC values were
24
considered as relaxed selection groups and purifying selection groups, respectively. Under this
25
classification, there are four possible combinations for the orthologous gene groups: (1) relaxed
26
selection groups inferred by both KA/KS and KR/KC (a high KA/KS and a high KR/KC), (2)
27
purifying selection groups inferred by both KA/KS and KR/KC (a low KA/KS and a low KR/KC), (3)
28
relaxed and purifying selection groups inferred by KA/KS and by KR/KC (a high KA/KS and a low
29
KR/KC), respectively, and (4) purifying selection and relaxed selection groups inferred by KA/KS
30
and by KR/KC (a low KA/KS and a high KR/KC), respectively.
31
Gene Ontology (GO) assignments for the mouse genes were obtained from the mouse
32
genome database (Hill et al. 2002). To simplify functional interpretation, we used the GO
5
1
categories of biological processes from top to the 4th depth in the hierarchy. The expected
2
proportion of each GO category assigned by the mouse genes was compared with the observed
3
proportion of each GO category assigned by the mouse genes of orthologous gene groups
4
undergoing different selection pressures by the chi-square test. When the observed proportion is
5
significantly higher than the expected proportion in a given GO category (P<0.05), the
6
hierarchical pathways from the root to the overrepresented GO category were shown by the
7
Graphviz software (www.graphviz.org).
8
9
10
The expression pattern at a developmental stage.
The mouse expression dataset covering various stages of mouse development (Ringwald
11
et al. 2001) was used to determine the relationships between gene expression and the nature of
12
selection pressure as determined by the KA/KS and KR/KC measures. Among different selection
13
pressures, we compared the expression bias of genes at a developmental stage by the following
14
equation.
15
R=
Nob.
Nob.
=
Nex. Pall ⋅ Nselected
16
For a particular developmental stage, Nob. and Nex. are the observed and expected numbers of
17
expressed genes that experienced purifying or relaxed selection pressure at the developmental
18
stage, Pall is the proportion of all mouse genes expressed at a given developmental stage, and
19
Nselected is the total number of genes undergoing each of four types of selection pressures. Nex.
20
was calculated by multiplying Pall by Nselected.
21
22
Results
23
24
A new classification of amino acids
To find a new classification that yields the maximum correlation between KA/KS and
25
KR/KC, we first constructed all possible combinations in which the 20 amino acids can be
26
classified into two to five groups. Second, a table representing the average KA/KS ratio for each
27
type of amino acid replacement was constructed to see what kinds of amino acid replacements
28
more adequately characterize the KA/KS ratio (Supplement B). Based on the table, a new
29
classification of amino acids with a higher correlation between the KA/KS ratio and the radical or
30
conservative change was constructed (Classification A in Table 1). In the new classification,
31
amino acids are classified into basic, acidic and neutral charges. The aromatic amino acids belong
6
1
to the group of the basic charges because one of the aromatic amino acids has a basic charge. The
2
amino acids with neutral charge are classified into small and large volumes that fall into distinct
3
groups. Consequently, this new classification seems to be constructed with respect to the
4
chemical properties of charge, aromaticity and volume.
5
6
7
Correlation between KR/KC and KA/KS
Using three existing amino acid classifications and our new classification, we estimated
8
four KR/KC ratios for each orthologous gene group. The four KR/KC ratios were significantly
9
positively correlated with each other (P < 0.01) (Table 2). In terms of the correlation between
10
KR/KC and KA/KS, the correlation coefficient in the new classification (A, r=0.48 Table 2) was
11
expected to be the highest among the four chemical classifications because the new classification
12
(A) was constructed by the chemical properties associated with the KA/KS ratio. In fact, the
13
correlation coefficient between KA/KS and KR/KC based on the new classification is significantly
14
higher than those based on the other three classifications (P < 0.01), though the other three KR/KC
15
ratios are also each positively correlated with the KA/KS ratio (P < 0.01) (Fig.2).
16
However, even under the new classification, which gives the highest correlation between
17
the two ratios, the correlation coefficient is less than 0.5, indicating that selective pressures
18
inferred by the KR/KC ratio and by the KA/KC ratio differ substantially. In particular, there are
19
many orthologous gene groups with a low KA/KS and a high KR/KC ratio (Fig. 2). These
20
orthologous gene groups have likely undergone relaxed selection in radical amino acid
21
substitutions as indicated by the KR/KC ratio but experienced purifying selection in
22
non-synonymous changes as indicated by the KA/KS ratio.
23
24
25
26
Overrepresented functional categories undergoing opposite selection pressures inferred by two
ratios
There are four types of selection pressure experienced by the orthologous gene groups.
27
The number of orthologous gene groups that experienced relaxed or purifying selection pressures
28
in the two ratios is shown in Table 3 and the gene lists are given in Supplement C. Since KA/KS
29
was on the whole positively correlated with KR/KC in mammals, a larger number of groups
30
undergoing the same selection pressures in the two ratios was found in the comparison with the
31
number of groups that underwent the opposite selection pressures in the two ratios. The groups
32
with the opposite selection pressures are only found in a high KR/KC and a low KA/KS ratio.
7
1
To assess the functions of groups that underwent different selection pressures, we
2
examined significantly overrepresented Gene Ontology (GO) categories of mouse genes in
3
orthologous gene groups subject to each type of selection pressures (Fig. 3, Supplement D). The
4
overrepresented functions of genes with a high KR/KC and a high KA/KS ratio are related to
5
"response to stimulus” and “physiological process”. In particular, several functions related to
6
defense response can be clearly found in these genes. Since genes related to defense response are
7
in general accepted as genes undergoing positive selection, these results seem biologically
8
reasonable. On the other hand, the overrepresented functions of genes with a low KA/KS ratio are
9
related to development. This result is also reasonable because most of the genes related to
10
development are subject to purifying selection based on the KA/KS ratio between distantly related
11
species (Powell et al. 1993; Slack, Holland, and Graham 1993). However, it is unclear whether
12
this holds true if the KR/KC ratio is used to evaluate the selection pressure in genes related to
13
development. In genes with a low KA/KS ratio, sex determination and cell differentiation are
14
overrepresented in genes with a high and a low KR/KC ratio, respectively (Fig. 3). Sex
15
determination is likely conserved among mammals but cell differentiation may be required to be
16
somewhat different among mammals for the divergent evolution seen in mammals. Thus, it is
17
possible that relaxed selection pressures indicated by the KR/KC ratio may be one of the important
18
factors for the evolution in mammals.
19
To further examine the different gene functions between the high and low KR/KC ratios
20
in mammalian development, we examined the expression of mouse genes with different selection
21
pressures using the mouse expression dataset covering various stages of development (Fig. 4 A,
22
B). Genes subject to purifying selection based on both ratios are expressed at high levels at the
23
early developmental stages (One cell egg to Blastocyst). On the other hand, genes subject to
24
purifying selection indicated by KA/KS but relaxed selection indicated by KR/KC were expressed
25
predominantly in the early middle stage of development (Blastocyst to Amnion). The relaxed
26
pressures indicated solely by the KR/KC ratio in the early middle stage of development may be
27
important for the divergent evolution in mammals.
28
29
30
31
Discussion
The key finding of the present study is that a positive correlation between KA/KS and
8
1
KR/KC at a genomic scale is observed in all amino acid classifications, indicating that the two
2
tests of selection pressure give similar conclusions in mammalian evolution. In particular, the
3
KR/KC ratio of the new classification is useful for estimating selection pressure between distantly
4
related sequences (Gojobori 1983; Smith and Smith 1996). Since the evolutionary rate of
5
synonymous substitution is much faster than that of nonsynonymous substitution, KS is often
6
saturated between distant sequences. On the other hand, the KR/KC ratio is estimated by only
7
amino acid replacements and the evolutionary rate of amino acid replacement is much slower
8
than that of synonymous substitution, so that the KR/KC ratio can be estimated for distant
9
sequences. Thus, the new classification (A) can produce a useful KR/KC ratio for estimating the
10
selection pressure in distant sequences. It should be noted that several reports had classified
11
amino acid replacements into radical and conservative amino acid changes by the likelihood of
12
amino acid replacements and estimated selection pressures by such radical and conservative
13
amino acid changes (Tang et al. 2004; Gojobori et al. 2007). On the other hand, in the present
14
study, we defined radical and conservative changes by the likelihoods of nonsynonymous and
15
synonymous substitutions. Therefore, the selection pressures inferred by radical and conservative
16
changes under our definition should more likely lead to similar selection pressures inferred by the
17
KA/KS ratio.
18
However, a major limitation in substituting KR/KC for KA/KS is that, even when we used
19
the new classification aimed at maximizing the correlation between KR/KC and KA/KS, the
20
correlation between KR/KC and KA/KS is still less than 0.5. There are potentially two reasons why
21
the two ratios are not highly correlated. One reason is biological. For some genes, KR/KC may not
22
be related to the type of natural selection identified by KA/KS. The other reason is technical. In
23
the computation of the KR/KC ratio, radical and conservative changes were defined as amino acid
24
replacements between groups and within groups, respectively. In view of the fact that the radical
25
and conservative changes are defined to be always “0” or “1”, the KR/KC ratio may not fully
26
represent the selection pressure of amino acid replacements.
27
We note that there are many orthologous gene groups with a low KA/KS and a high
28
KR/KC as outliers. To address the opposite selection pressures, we examined the functions of
29
mouse genes and found that functional categories related to development were overrepresented in
30
these genes. We then examined these gene expression patterns at different developmental stages.
31
The mouse genes that underwent such selection pressures tend to be over-expressed in the early
9
1
middle developmental stages. Richardson (1999) proposed that the early middle developmental
2
stages were important for speciation of mammals because these are the stages when many adult
3
traits are specified even if these stages were conservative in the morphological level. Therefore,
4
we propose that the relaxed selection pressures indicated by KR/KC but not by KA/KS in the early
5
middle developmental stages may be important for the morphological divergence of mammals at
6
the adult stage, while purifying selection detected by KA/KS tends to occur in the early middle
7
developmental stages. The differences in the selection pressures assessed by KA/KS and KR/KC
8
indicate that, although genes involved in development have strong constraints in amino acid
9
substitutions, radical changes in the substitutions permitted are likely important for
10
developmental divergence of adult mammals. Thus, opposite selection pressures in the two ways
11
might play an important role in the evolution of genes related to development in mammals.
12
In summary, we inferred 3,375 orthologous gene groups in 5 mammalian species in a
13
stringent manner. KR/KC is positively correlated with KA/KS. The correlation was observed in
14
each of four chemical classifications taking account of aromaticity, charge, polarity or volume. In
15
particular, the chemical classification for aromaticity, charge and volume led to the highest
16
correlation between these two ratios. Moreover, the genes with high KR/KC but low KA/KS were
17
over-represented with genes expressed at a high level in the early middle developmental stages.
18
The selection pressures at these developmental stages may be important for the morphological
19
diversification of mammals.
20
21
22
23
Acknowledgements
24
We thank the members of our laboratories for valuable comments and discussion. This study was
25
supported by NIH grant (GM30998) to W.-H. L. and an NSF grant (DBI-0638591) to S.-H. S.
26
10
1
2
Table 1. Four classifications of amino acids.
Classification A by the maximum correlation with the KA/KS ratio
Neutral & small
ANCGPST
(MW*: 75-146)
Neutral & large
ILMV
(MW*: 146-204)
Basic acid, Aromaticity &
Relatively small
R QHK FWY
(MW*: 117-149)
Acidic charge &
Relatively large
DE
(MW*: 133-147)
Classification B by polarity & volume
Special
C
Neutral and Small
AGPST
Polar & relatively small
NDQE
Polar & relatively large
RHK
Nonpolar & relatively small
ILMV
Nonpolar & relatively large
FWY
Classification C by charge & aromatic
Acidic
DE
Neutral & No aromaticity
QAVLI C STN G PM
Neutral & Aromaticity
FYW
Basic
KRH
Classification D by charge & polarity
Neutral & Polarity
STYCNQ
Acidic & Polarity
DE
Basic & Polarity
KRH
No polarity
GAVLI F PM W
3
4
5
*MW: Molecular weight
11
1
2
3
Table 2 Correlation coefficient between KR/KC and KA/KS.
KR/KC
(Classification A)
KR/KC
(Classification B)
KR/KC
(Classification C)
KR/KC
(Classification D)
KR/KC
(Classification B)
KR/KC
(Classification C)
KR/KC
(Classification D)
KA/KS
0.77
0.67
0.35
0.48
0.73
0.35
0.38
0.52
0.37
0.22
4
5
12
1
2
Table 3 The number of orthologous groups undergoing different selection pressures
Orthologous groups under
relaxed selection indicated by
KR/KC (10 % top of KR/KC
ratio)
Orthologous groups under
purifying selection indicated
by KR/KC
(10 % bottom of KR/KC ratio)
Orthologous groups under
relaxed selection indicated
by KA/KS (10 % top of
KA/KS ratio)
Orthologous groups under
purifying selection indicated
by KA/KS
(10 % bottom of KA/KS ratio)
116
47
0
147
3
13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Literature Cited
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped
BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res
25:3389-3402.
Gojobori, J., H. Tang, J. M. Akey, and C. I. Wu. 2007. Adaptive evolution in humans revealed by the negative
correlation between the polymorphism and fixation phases of evolution. Proc Natl Acad Sci U S A
104:3907-3912.
Gojobori, T. 1983. Codon substitution in evolution and the "saturation" of synonymous changes. Genetics
105:1011-1027.
Grantham, R. 1974. Amino acid difference formula to help explain protein evolution. Science 185:862-864.
Hanada, K., T. Gojobori, and W. H. Li. 2006. Radical amino acid change versus positive selection in the evolution of
viral envelope proteins. Gene 385:83-88.
Hill, D. P., J. A. Blake, J. E. Richardson, and M. Ringwald. 2002. Extension and integration of the gene ontology
(GO): combining GO vocabularies with external vocabularies. Genome Res 12:1982-1991.
Hughes, A. L., and M. Nei. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci
reveals overdominant selection. Nature 335:167-170.
Hughes, A. L., T. Ota, and M. Nei. 1990. Positive Darwinian selection promotes charge profile diversity in the
antigen-binding cleft of class I major-histocompatibility-complex molecules. Mol Biol Evol 7:515-524.
Li, W. H., and T. Gojobori. 1983. Rapid evolution of goat and sheep globin genes following gene duplication. Mol
Biol Evol 1:94-108.
Miyata, T., S. Miyazawa, and T. Yasunaga. 1979. Two types of amino acid substitutions in protein evolution. J Mol
Evol 12:219-236.
Powell, J. R., A. Caccone, J. M. Gleason, and L. Nigro. 1993. Rates of DNA evolution in Drosophila depend on
function and developmental stage of expression. Genetics 133:291-298.
Ringwald, M., J. T. Eppig, D. A. Begley, J. P. Corradi, I. J. McCright, T. F. Hayamizu, D. P. Hill, J. A. Kadin, and J. E.
Richardson. 2001. The Mouse Gene Expression Database (GXD). Nucleic Acids Res 29:98-101.
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees.
Mol Biol Evol 4:406-425.
Slack, J. M., P. W. Holland, and C. F. Graham. 1993. The zootype and the phylotypic stage. Nature 361:490-492.
Smith, J. M., and N. H. Smith. 1996. Synonymous nucleotide divergence: what is "saturation"? Genetics
142:1033-1036.
Smith, N. G. 2003. Are radical and conservative substitution rates useful statistics in molecular evolution? J Mol Evol
57:467-478.
Tang, H., G. J. Wyckoff, J. Lu, and C. I. Wu. 2004. A universal evolutionary index for amino acid changes. Mol Biol
Evol 21:1548-1556.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix
choice. Nucleic Acids Res 22:4673-4680.
Yang, Z., S. Kumar, and M. Nei. 1995. A new method of inference of ancestral nucleotide and amino acid sequences.
Genetics 141:1641-1650.
Zhang, J. 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear
genes. J Mol Evol 50:56-68.
Zhang, J., H. F. Rosenberg, and M. Nei. 1998. Positive Darwinian selection after gene duplication in primate
ribonuclease genes. Proc Natl Acad Sci U S A 95:3708-3713.
14
Figure legends
1
2
3
Fig. 1. Construction of ortholog data.
4
The similarity search was conducted by Blastp as in Fig. 1A. Reciprocal best hits were identified
5
between every pair of species. The number of reciprocal best hits between pair of species is
6
shown between each pair of species. When sequences reciprocally had the best hits among the
7
five species, the sequences were considered an orthologous gene among the five species. A
8
phylogeny was then generated for each orthologous gene group. When the phylogeny of the
9
orthologs from the five species is different from the topology of the species phylogeny, this
10
putative ortholog was removed from the ortholog data. The species phylogeny is shown in Fig.
11
1B.
12
13
Fig. 2. Correlation between KA/KS and KR/KC.
14
The X-axis is the KA/KS ratio and the Y-axis is the KR/KC ratio. The ratios were computed based
15
on classification A (r=0.48) (A); classification B (r=0.38) (B); classification C (0.37) (C); and
16
classification D (r=0.22) (D).
17
18
Fig. 3. Overrepresented functions in genes with a low KA/KS and a high KR/KC and genes with a
19
low KA/KS and a low KR/KC ratio.
20
The arrowheads point to subcategories. (A) Categories overrepresented in genes with a low
21
KA/KS and a high KR/KC are in black circles (P < 0.05). (B) Categories overrepresented in genes
22
with a low KA/KS and a low KR/KC are in black circles (P < 0.05).
23
24
Fig. 4. Expression levels of genes with different selection pressures in each developmental stage.
25
26
(A) The X-axis indicates the developmental stage. The names of each stage are as follows: 1
27
(One cell egg), 2 (Beginning of cell division), 3 (Morula), 4 (Advanced division/segmentation), 5
28
(Blastocyst), 6 (Implantation), 7 (Formation of egg cylinder), 8 (Differentiation of egg cylinder),
29
9 (Advanced endometrial reaction; prestreak), 10 (Amnion; midstreak), 11 (Neural plate,
30
presomite; no allantoic bud), 12 (First somites; late head fold), 13 (Turning), 14 (Formation &
31
closure anterior neuropore), 15 (Formation of posterior neuropore, forelimb bud), 16 (Closure
32
post. neuropore, hindlimb & tail bud), 17 (Deep lens indentation), 18 (Closure lens vesicle), 19
15
1
(Complete separation of lens vesicle), 20 (Earliest sign of fingers), 21 (Anterior footplate
2
indented, marked pinna), 22 (Fingers separate distally), 23 (Toes separate), 24 (Reposition of
3
umbilical hernia), 25 (Fingers and toes joined together), 26 (Long whiskers) and 28 (Postnatal
4
development). The Y-axis indicates the normalized difference of expressed genes between genes
5
undergoing a selection pressure and all genes. (B) The sliding window analysis (5 stages) was
6
conducted based on (A). The X-axis is the mean of normalized difference in five developmental
7
stages. The Y-axis indicates the average normalized difference in each window.
8
16
FIG 1
A
B
Human
Dog
16,
022
Human
Chimpanzee
18,763
10,787
Mouse
34,789
15,425
470
18,
17,757
19,972
15,463
Mouse
28,584
Rat
Rat
Chimpanzee
Dog
2.5
2
1.5
1
0.5
B
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
0
0
0.2
0.4
0.6
0.8
KA/KS ratio
KR/KC ratio (Classification C)
KR/KC ratio (Classification B)
A
3
0
1
C
6
5
4
3
2
1
0
0
0.2
0.4
0.6
KA/KS ratio
0.8
1
KR/KC ratio (Classification D)
KR/KC ratio (Classification A)
FIG 2
0.2
0.4
0.6
KA/KS ratio
0.8
1
D
6
5
4
3
2
1
0
0
0.2
0.4
0.6
KA/KS ratio
0.8
1
FIG 3
A
embryonic_development
(sensu_Metazoa)
embryonic_development
axis_specification
development
biological_process
pattern_specification
cellular_process
cell_differentiation
regulation_of_biological
process
regulation_of_development
anterior/posterior
pattern_formation
epidermal_cell_differentiation
regulation_of_epidermis
development
regulation_of_binding
B
biological_process
sex_determination
male_sex_determination
development
pattern_specification
axis_specification
growth
developmental_growth
blastocyst_growth
response_to_stimulus
behavior
visual_behavior
FIG 4
5
A
Genes under purifying selection indicated
by both KA/KS and KR/KC
Genes under purifying selection indicated by KA/KS
but relaxed selection indiated by KR/KC
2.5
4
2
3
1.5
2
1
1
0.5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
0
B
Genes under relaxed selection indicated
by both KA/KS and KR/KC
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25