D. Kahn, COSTEX model

The COSTEX model: a cost-benefit model
relating gene expression and selection
Daniel Kahn, Jean-François Gout & Laurent Duret
Laboratoire de Biométrie & Biologie Evolutive
Lyon 1 University, INRIA BAMBOO team
& INRA MIA Department
Whole genome duplications as a tool to
investigate dosage selection
Following whole-genome duplication (WGD)
 Relative gene dosage is initially unchanged
 Duplicated genes are gradually lost with probability inversely related to
selective pressure
 This may be exploited to analyze selective pressure on gene dosage
D. Kahn, COSTEX model
2
Duplications in the Paramecium genome
Aury et al., 2006, Nature 444:171-178
D. Kahn, COSTEX model
3
Three successive rounds of WGD
Gene content: 2 x 2 x 2  2
D. Kahn, COSTEX model
5
Fate
A brief introduction about Whole-Genome Duplications
(WGDs)
of genes after WGD
ohnologon
• WGD creates identical copies of all genes (ohnologs)
D. Kahn, COSTEX model
Fate
A brief introduction about Whole-Genome Duplications
(WGDs)
of genes after WGD
• WGD creates identical copies of all genes (ohnologs)
• Mutations lead to pseudogenization of some ohnologs
D. Kahn, COSTEX model
Fate
A brief introduction about Whole-Genome Duplications
(WGDs)
of genes after WGD
• WGD creates identical copies of all genes (ohnologs)
• Mutations lead to pseudogenization of some ohnologs
D. Kahn, COSTEX model
Fate
A brief introduction about Whole-Genome Duplications
(WGDs)
of genes after WGD
• WGD creates identical copies of all genes (ohnologs)
• Mutations lead to pseudogenization of some ohnologs
• Finally, only a few pairs of genes are retained
D. Kahn, COSTEX model
Fate
A brief introduction about Whole-Genome Duplications
(WGDs)
of genes after WGD
Ohnologon that lost one copy
Retained ohnologon
• WGD creates identical copies of all genes (ohnologs)
• Mutations lead to pseudogenization of some ohnologs
• Finally, only a few pairs of genes are retained
D. Kahn, COSTEX model
Relationship between gene retention
and gene expression
Frequency
of gene
retention
Data from Paramecium
post-genomics
consortium
Jean Cohen & coll.
D. Kahn, COSTEX model
Expression level (log2)
11
Model for expression-dependent selection
 Protein expression has a cost
=>Trade-off between cost and benefit
 The model assumes that expression was optimal before WGD
 In vitro evolution experiments have shown that an optimum can indeed
be reached in only a few hundred generations (e.g. Dekel & Alon, 2005)
D. Kahn, COSTEX model
12
Modelling the cost of expression
Dekel & Alon, 2005, Nature 436:588-592
kX
C( X ) 
MX
expression cost
C(X)
X
k
M
expression level
D. Kahn, COSTEX model
cost function
expression level
cost parameter
maximal capacity
M
13
Cost-benefit optimization
Benefit : B(X)
fitness
fitness
expression cost
cost
expression
Cost: C(X)
levelX
Xexpression
o
o
The COSTEX model
Express fitness as a function of expression x relatively to
optimum level X0
X
x
X0
kX 0 x
w( x )  B ( x ) 
M  X0x
D. Kahn, COSTEX model
15
The COSTEX model
Approximate fitness around optimum X0 by Taylor expansion:
1 2w
2
w( x ) 1 
(1)(
x

1)
2 x 2
Therefore selection on expression can be quantified by:
2w
d 2B
2kMX 02
(1)  2 (1) 
0
2
3
x
dx
(M  X 0 )
D. Kahn, COSTEX model
16
Expression-dependent fitness
1
fitness
Low X0
Medium X0
High X0
Loss of duplication
0
0.5
1
1.5
Relative dosage or expression X/Xo
Fitness loss
Selection against loss of duplicated gene
kMX 0 2
1 d 2B
s 

(1)
3
2
4( M  X 0 ) 8 dx
Optimal expression X0
D. Kahn, COSTEX model
18
Selection against pseudogene formation
Pseudogene formation after WGD entails a loss of fitness that
can be expressed in the COSTEX model:
1
1 dB
kMX 0
s  B( )  B(1)  
(1) 
2
2 dx
2( M  X 0 ) 2
Therefore the pseudogenization path to gene loss is also under
expression-dependent selection: the higher the gene is
expressed, the less likely is the fixation of disabling mutations.
D. Kahn, COSTEX model
19
Expression constrains evolutionary rates
More generally, mutations that decrease the benefit function by
a fraction a are counter-selected in an expression-dependent
manner in the COSTEX model:
s(a )  a B(1)  a (1 
kX 0
)
M  X0
Mutations with an equivalent effect on protein function are more
deleterious for highly expressed genes because of higher
expression cost, a price the organism had to ‘pay’ for their
function.
This relationship also applies for potentially suboptimal
expression X  X0
D. Kahn, COSTEX model
20
Expression constrains evolutionary rates
Expression is the best predictor of evolutionary rates in coding
sequences (Duret & Mouchiroud, 2000, Drummond et al., 2006)
Drummond et al, 2005
PNAS,102:14338
D. Kahn, COSTEX model
21
Expression-dependent selection
 The COSTEX model can explain the relationship between
retention rate and gene expression
 The model is also supported by gene knockout experiments in
yeast (measure of fitness in heterozygotes wt/KO)
 The model predicts that the level of expression is all the more
conserved in evolution as expression is high
 It also explains the observation that highly expressed genes
have low rates of sequence evolution
D. Kahn, COSTEX model
22
Retention of metabolic genes
 Unexpected observation that metabolic genes are more
retained than other genes following WGD
 However little selective pressure is expected on the dosage of
individual enzyme genes (Kacser & Burns, 1981)
 Is this a paradox?
D. Kahn, COSTEX model
23
Metabolic genes are more expressed
D. Kahn, COSTEX model
24
High retention of metabolic genes: why?
 Retention of metabolic genes is best explained
by selection for gene expression
 Although the loss of individual enzyme genes should generally
be neutral, each successive loss will be more and more
counter-selected. For instance in a linear pathway:
J

J0
1
p
1   CiJ0
i 1
 Ultimately this would result in half of the flux, which should be
strongly counter-selected in general
D. Kahn, COSTEX model
25
 Metabolic fluxes are not proportional to enzyme activities
 They typically show a hyperbolic dependency
 Most enzymes have low control on flux
 Summation theorem
n
C
i 1
J0
i
1
Kacser & Burns 1981, Genetics 97:639-666
D. Kahn, COSTEX model
27
 Therefore little selective pressure is expected on the dosage of
individual enzyme genes
 This a classical explanation of the recessivity of metabolic defects
D. Kahn, COSTEX model
28
Ongoing dynamics of gene inactivation
 49% loss of duplicated genes following the recent WGD
 Contrary to initial expectation, metabolic genes are more
retained than other genes: 42% gene loss
( n = 1,144 metabolic genes, P-value < 10-3 )
 Why?
Gout, Duret & Kahn 2009, Mol. Biol. Evol., in press
D. Kahn, COSTEX model
29
D. Kahn, COSTEX model
30
D. Kahn, COSTEX model
31
50%
45%
40%
Gene frequency
35%
30%
25%
20%
15%
10%
5%
0%
1
2
3
4
5
6
7
8
Number of genes within ohnologon
D. Kahn, COSTEX model
32
b. Intermediary WGD
a. Recent WGD
100%
100%
Gene loss frequency
88% *
80%
80%
60%
60%
40%
54% **
40%
44%
42%
20%
0%
0%
D. Kahn, COSTEX model
2 genes or more
63% **
40%
20%
1 gene before WGD
78%
76%
1 gene before WGD
2 genes or more
33
1.2
Relative fitness
1.0
0.8
0.6
0.4
0.2
0.0
0
0.2
0.4
0.6
0.8
1
1.2
Relative dosage or expression
D. Kahn, COSTEX model
34
P. tetraurelia : the best model organism for studying
WGDs

P. tetraurelia : 3 successive WGDs with different loss rates
(Aury et al, 2006)
92 %
76 %
49 %
D. Kahn, COSTEX model