Causes and Effects of N-Terminal Codon Bias in Bacterial Genes

Causes and Effects of N-Terminal
Codon Bias in Bacterial Genes
Mikk Eelmets
Journal Club
21.02.2014
Introduction
•  Ribosomes were first observed in the
mid-1950s (Nobel Prize in 1974)
•  Nobel Prize in 2009 for determining
the detailed structure and
mechanism of the ribosome
There are still many questions without an answer
•  Many organism are enriched for poorly adapted codon at
the N terminus. Why?
•  Which mechanisms are causal for expression changes?
Strategies: study genome-wide sequence correlation,
study conservation among species,
use synthetic genes
Genetic code
Design of experiment
FlowSeq
Sharon E et al Nat Biotechnol. 2012 May 20;30(6):521-30
grew the entire pooled library to early exponential phase and
performed multiplexed measurements of the steady-state DNA,
RNA, and protein levels. We used sequencing, DNASeq and
RNASeq, to obtain steady-state DNA and RNA levels, respectively, across the library (12). For obtaining protein levels, we
used FlowSeq, which combines fluorescence-activated cell sorting and high-throughput DNA sequencing and is similar in design to recently published work (34, 35). Briefly, we sorted cells
into 12 log-spaced bins of varying GFP/mCherry ratios; isolated,
amplified, and barcoded DNA from each of the bins; and then
used high-throughput sequencing to count the number of constructs that fell into each bin (Fig. 1 A and D). Using the read
counts from each of the bins, we reconstructed the average
and S5). RNA level calculations also showed high concordance
between technical replicates (R2 = 0.995; Fig. S5). Overall, RNA
levels varied by three orders of magnitude, but within a single
promoter, the coefficient of variation was only 0.63 (Fig. 2, Left
and Fig. S6). RNASeq data also allowed us to identify dominant
transcriptional start sites for most promoters (Fig. S7). Eightyseven percent of all promoters had one dominant start position
(>60% of all mapped reads). Two promoters (marked with
asterisks in Fig. S7) had very few uniquely mapping reads, did
not show a strong start site, and showed unrealistic translation
efficiency calculations. These observations indicated that we
were missing most of the RNA (but not protein) reads from
these promoters, possibly because of transcription starting after
Design of experiment
FlowSeq
A
weak
sfGFP
mCherry
medium
+
114 promoters
111 ribosome
binding sites
+
sort on
sfGFP/mCherry ratio
transform,
DNAseq & RNAseq
library of 12,653
synthesized constructs
barcode
& sequence
FlowSeq
15
10
D
15
10
log(RFP)
count (x1000)
B
count (x1000)
strong
5
0
102
103
104
log(RFP)
105
5
0
102
100)
5
log(GFP)
104
105
15
Kosuri S et al. Proc Natl Acad Sci U S A. 2013 Aug20;110(34):14024-9
6
4
5
3
4
count (x1000)
C
103
10
estimate protein levels
for each construct
Design of experiment
Expression System and Library Design
ig. S1. Diagram
Expression
library the
construct
Each libraryofconstruct
contained aSystem
promoter, and
RBS, Library
and 11 codonDesign.
N-terminal Each
peptide (including
‘ATG’). The peptide sequences correspond to the N terminus of 137 natural E. coli genes. For
ontained ainitiating
promoter,
RBS, and 11 codon N-terminal peptide (including the initiating
each promoter/RBS/peptide combination, we encoded 13 codon variants including the most common
codons
(C), most
rare codons (R),
wild-type sequence
(wt), N
andterminus
codon variants
variable
secondary
ATG’). The
peptide
sequences
correspond
to the
ofwith
137
natural
E. coli
structures (ΔG). The library was cloned in-frame with superfolder GFP (sfGFP). The GFP expression
enes. For each
promoter/RBS/peptide
we encoded
13 codon variants
level is compared
via relative fluorescence tocombination,
a constitutively co-expressed
mCherry protein.
ncluding the most common codons (C), most rare codons (R), wild-type sequence (wt),
nd codon variants with variable secondary structures ( G). The library was cloned in-
Changing synonymous codon usage in the
11–amino acid N-terminal peptide resulted in a
mean 60-fold increase in protein abundance from
the most common or rare synonymous codon in
E. coli for every amino acid. The rare codon constructs displayed a mean 14-fold (median 4-fold)
C
Strong
High
Mid
High
Weak
High
WT
High
Strong
Low
Mid
Low
Promoter
BS
RBS
Protein expression of the reporter library
1x
4x
16x 64x 256x
nge in Expression from C to R
e reporter library. (A) N-terminal
odon variants show increased exs (C). (B) Fold change in expression
pendent of RBS strength. WT, wild
asured by the sfGFP:mCherry ratio)
on variant sets are grouped into
on variants include C, R, wild-type
condary structure (∆G). Not shown
ere mostly outside the quantitative
t data, and light gray squares cor-
Protein levels:
7327 (51.5%) constructs
They used UNAFOLD
software to predict free
energy of folding for
each RBS-codon variant
wt C R
***
K
Protein
Expression
(FlowSeq) 1
2
4
6
8
10 12
Protein expression of the library (as measured by the sfGFP:mCherry ratio) covers a ~200-fold range. The
13-member codon variant sets are grouped into columns by promoter/RBS combination (top). Codon
L
B C, R, wild-type sequence
C (wt), and 10 sequences with
variants
include
varying secondary structure (∆G).
3.0
ression
tion
I
inc ∆G
2x
2x
2
R = 0.41
1.5x
−16
1.5x
Usage (RSCU)
6x 1/4x
10-AA N-terminal peptides from 137 genes
S
RNA and DNA Abundance
DNARNA
abundances
for each
construct. Promoter
is the
first column
and RBS(ratio
identityof
is RNA
the
Fig. S4.
and DNA
Abundance.
(A) identity
Relative
RNA
abundances
to
column. DNA abundances varied due to differences in DNA synthesis efficiencies as well as
DNAsecond
contig
counts) for each construct are displayed grouped by promoter and RBS
lower growth rate for very highly expressed genes
identities. (B) Same plot as in (A) but displaying DNA abundances for each construct.
Labels at the right are the same as in Figure 1E, where promoter identity is the first
RNA and DNA Abundance
Relative RNA abundances (ratio of RNA to DNA contig counts) for each construct are displayed
grouped by promoter and RBS identities.
Gene expression measurements of
the reporter library
16x
4x
1x
1 4x
1 16x
R
C
Codon Variant
60
40
20
0
30
20
10
0
15
10
5
0
15
10
5
0
C
Strong RBS
Promoter
B
Number of N-terminal peptides
Fold Change in Protein Expression
A
Mid RBS
Weak RBS
WT RBS
1/64x 1/16x 1/4x 1x
RBS
Changing synonymous codon usage in the
11–amino acid N-terminal peptide resulted in a
mean 60-fold increase in protein abundance from
4x
16x 64x 256x
Fold Change in Expression from C to R
10-AA N-terminal peptides from 137 genes
termined by FlowSeq; 7327 (51.5%) constructs
were within the quantitative range of our assay
[coefficient of determination (R2) = 0.955, P < 2 ×
(A)  N-terminal
sequences
encoding the most
codon variants
show
expression
Fig.
1. Genepeptide
expression
measurements
of rare
the(R)
reporter
library.
(A)increased
N-terminal
whensequences
compared toencoding
the most common
onesrare
(C) (mean
14-fold;variants
median 4-fold).
peptide
the most
(R) codon
show increased ex(B)  (B) Fold change in expression between C and R codon variants is largely independent of RBS strength.
pression
when compared to the most common ones (C). (B) Fold change in expression
WT, wild type.
between C and R codon variants is largely independent of RBS strength. WT, wild
type. (C) Protein expression of the library (as measured by the sfGFP:mCherry ratio)
Strong
High
wt C R
inc ∆G
C
E
F
2x
2 ***
G
H
I
K
***
***
B
L
***
***
R = 0.41
***
p < 10
1x
−16
1.5x
***
0.75x
*
***
***
***
***
***
T
***
0.75x
1x
1 2x
1
log2 RSCU
***
***
R = 0.73
Y
***
***
1x
*
***
V
***
**
**
2
p = 10−9
***
***
***
***
CTG
TTA
TTG
CTC
CTT
CTA
S
AAA
AAG
ATT
ATC
ATA
CAT
CAC
R
***
1.5x
0.75x
GGC
GGT
GGG
GGA
Q
2x
0
TTT
TTC
P
1x
GAA
GAG
N
GAT
GAC
-1
C
1.5x
TGC
TGT
-2
Mean fold change in expression due to codon substitution
2x
D
**
4x
2x
*
*** ***
***
Mean fold change in expression
A
(RSCU)
Usage
Relative Synonymous Codon
due to codon
substitution
A
**
***
Codon
Frequency
Enrichment
***
***
at N-terminus of E. coli Genes
Protein
Expression
(FlowSeq) 1
2
4
C
2x
Relative
Synonymous
R = 0.41
1.5x
Codon
p < 10 Usage
2x
3.0
2
−16
1.5x
n = number of synonymous
codons
1x
(1 ≤ n ≤ 6) for the amino acid under
R2 = 0.73
study, Xi = number
0.75x of occurrences
p = 10−9
of codon i.
1x
0.75x
2.0
6
8
Relative Synonymous Codon Usage (RSCU)
wt C R
Choice of codon and Expression
GCG
GCC
GCA
GCT
-3
Protein
sequence (wt), and 10 sequences with varying
secondary structure (∆G). Not shown
Expression
are two additional inc
low-promoter
the quantitative
∆G panels, which were mostly1 outside
2 gray
4 squares
6
8 10 12
(FlowSeq)
FlowSeq range. Dark gray squares had insufficient
data, and light
correspond to duplicate constructs.
10 12
3.0
2.0
1.0
0.5
1x
4x
1 2x
1
If-1the 0synonymous
codons
of an 2x
Codon
Frequency
Enrichment
log
2 RSCU
amino
acid are used
equalof E. coli Genes
at with
N-terminus
frequencies, their RSCU values will
Fig. 2.1.0
Rare equal
codons1.generally increase expression levels. (A) The
-3
-2
average fold change in expression is correlated with the choice of codon.
The y axis is the slope of a linear model linking codon use to expression
change.0.5
Codons are sorted left to right by increasing genomic frequency
and colored according to their relative synonymous codon usage (RSCU)
in E. coli. (P values after Bonferroni correction: *P < 0.05, **P < 0.005,
***P < 0.001). (B) The individual codon slopes (y axis) as in (A) show an
inverse relationship with RSCU (x axis). (C) The individual codon slopes
correlate with enrichment of codons at the N terminus of genes in E. coli.
X
RSCU = n i
i 1 X
n∑ i
i=1
TAT
TAC
GTG
GTT
GTC
GTA
ACC
ACG
ACT
ACA
AGC
TCG
TCC
AGT
TCT
TCA
CGC
CGT
CGG
CGA
AGA
AGG
CAG
CAA
CCG
CCA
CCT
CCC
AAC
AAT
. Rare codons
generally
levels.
(A)
The The y axis is the slope of a
The average
fold changeincrease
in expressionexpression
is correlated with
the choice
of codon.
linear in
model
linking codon
to
change.
Codons
are
left to right by increasing genomic
e fold change
expression
isuse
correlated
with
the
choice
ofsorted
codon.
25expression
OCTOBER
2013
VOL
342 SCIENCE
www.sciencemag.org
476
frequency and colored according to their relative synonymous codon usage (RSCU) in E. coli. (P values
axis is the slope
of a linear
model
linking
codon
use
to expression
after Bonferroni
correction:
*P < 0.05,
**P < 0.005,
***P
< 0.001).
e. Codons are sorted left to right by increasing genomic frequency
lored according to their relative synonymous codon usage (RSCU)
squares
corthe quantitative
gray squares cor-
wt C R
wt C R
inc ∆G
inc ∆G
Protein
Expression
Protein
1 2
4 6
(FlowSeq)
Expression
(FlowSeq) 1 2 4 6
8
10 12
8
10 12
***
***
***
***
CTG
TTA
TTG
CTG
CTC
TTA
V Y
***
Y
C
C
2x
B2x
Mean fold change in expression
due to codon substitution
L
Mean fold change in expression
due to codon substitution
B
2x
2x
R2 = 0.41
2
R
p < 10=−160.41 1.5x
1.5x
p < 10−16
1.5x
1x
0.75x
1.5x
1x
1x
R2 2= 0.73
R = 0.73
p = 10−9−9
p = 10
0.75x
0.75x
0.75x
-3
1x
-2
-3
-1
-2
-1
0
0
1
log2log
RSCU
2 RSCU
1
1 2x
1 2x
1x
1x
2x
2x
4x
4x
(RSCU)
Usage
Codon
Synonymous
Relative
(RSCU)
Usage
Codon
Synonymous
Relative
Choice of codon and Expression
3.0
3.0
2.0
2.0
1.0
1.0
0.5
0.5
Codon
CodonFrequency
FrequencyEnrichment
Enrichment
at at
N-terminus
of
E.
N-terminus of E.coli
coliGenes
Genes
Fig. 2. Rare codons generally increase expression levels. (A) The
average
foldfold
change
in expression
is iscorrelated
average
change
in expression
correlatedwith
withthe
thechoice
choice of codon.
codon.
The The
y axis
is the
slope
of of
a linear
model
y axis
is the
slope
a linear
modellinking
linkingcodon
codonuse
use to
to expression
expression
change.
Codons
sorted
rightbybyincreasing
increasinggenomic
genomic frequency
frequency
change.
Codons
areare
sorted
leftleft
to to
right
(B) The individual codon slopes (y axis) as in (A) show an inverse relationship with RSCU (x axis).
Fig. 2. Rare codons generally increase expression levels. (A) The
(C) The individual codon slopes correlate with enrichment of codons at the N terminus of genes in E. coli.
mous codon with expression changes (Fig. 2A
and fig. S8). For most amino acids, we found a
link between the rarity of the codon and increased
expression (Fig. 2B). There is a strong correlation
0.00654), and did not significantly correlate with
codons that increased protein expression in our
data set (R2 = 0.024, P = 0.124). Others have proposed that slow ribosome progression at the N
Choice of codon and Expression
B
16x
E
16x
4x
4x
1x
1x
1 4x
1 4x
1 16x
∆∆ G from average codon variant
Fold Change in Protein Expression
A
1 16x
10
5
0
-5
-10
> -10% -8%
-4%
0%
+4%
+8% > +10%
> -6
n=277 803
1819
2431
2058
861
n=89 232
Change in %GC
from average variant
190
-6
-4
-2
0
+2
+4
675 1676 2800 2099 692
> +4
176
∆∆ G from average variant
(kcal/mol)
-15
R2 = 0.05
ssion
R2 = 0.87
ssion
n
G)
sion
n
Rare codons alter expression by reducing mRNA secondary structure. (A) Expression changes are
D boxplot includes ±2% of centered value.F(B)
correlatedCwith relative changes in %GC content. Each
Expression increases
correlate to relative increases in free2xenergy of folding at the front of the transcript
2x
(ΔΔG). Each boxplot includes ±2 kcal/mol of centered value.
16x
-0
p < 10−16
∆∆ G
1 16x
1 16x
-10
Choice
of codon and
Expression
∆∆ G from average variant
Change in %GC
> -10% -8%
-4%
0%
+4%
+8% > +10%
> -6
n=277 803
1819
2431
2058
861
n=89 232
190
-6
-4
0
+2
+4
> +4
675 1676 2800 2099 692
D
-15
F
-0.2
2x
2
R = 0.87
R2 = 0.05
1.5x
p < 10−16
1x
0.75x
∆tAI fro
∆tAI=0
Fold Change in Protein Expression
Mean fold change in expression
due to codon substitution
(After Controlling for ∆∆G)
Mean fold change in expression
due to codon substitution
2x
1.5x
176
(kcal/mol)
from average variant
C
-2
p = 0.197
1x
0.75x
16x
p < 10−16
R2 = 0.39
4x
1x
1 4x
1 16x
-1
0
1
2
Codon substitution, ∆∆G
(kcal/mol)
-3
-2
-1
0
log2 RSCU
1
-10
-5
0
∆∆G from average codo
linearslopes
regression,
there is no longe
3. alter
Rareexpression
codons alter
expression
reducing
mRNA (C)
secondary
RareFig.
codons
by reducing
mRNAby
secondary
structure.
Individual codon
correlate
slopes
and RSCUlinear
(compare with Fig.
are correlated(D)
withAfter
relative
changes in
with structure.
the ΔΔG (A)
per Expression
individual changes
codon substitution.
controlling
for%GC
ΔΔG with
a multiple
regression,
no longer
any relation
slopes
and RSCUplotted for all constructs within the qu
content.there
Eachisboxplot
includes
T2% ofbetween
centeredindividual
value. (B)codon
Expression
increases
correlate to relative increases in free energy of folding at the front of the
transcript (DDG). Each boxplot includes T2 kcal/mol of centered value. (C)
Individual codon slopes (same as Fig. 2A y axis) correlate with the DDG per
their relative fold change in expressi
the set. (F) Subsets of constructs corre
Points with constant codon adaptatio
imple linear regression model
se of each individual synonyh expression changes (Fig. 2A
most amino acids, we found a
arity of the codon and increased
B). There is a strong correlation
+2
E
NUPACK (24). We found that increased secondary structure was correlated with decreased expression, which explained more variation than any
other variable we measured (R2 = 0.34, P < 2 ×
10−16) (Fig. 3B). We made a similar linear regression model relating individual codon substitution
Choice of codon and Expression
10
∆∆ G from average codon variant
Subset
∆∆ G=0
5
∆ tAI=0
0
Fold Change in
Protein Level
< 1 16x
-5
1 16x
1 4x
1x
-10
+4
099 692
4x
> +4
16x
176
> 16x
ge variant
The classical translational
efficiency cTEi for each
codon is the tRNA
adaptation index (tAI)
-15
F
-0.2
0.0
∆tAI from average codon variant
Wi –absolute adaptiveness
ni - the number of tRNA
isoacceptors recognizing codon i
sij - a selective constraint on the
efficiency of the codon-anticodon
coupling
tGCNij – gene copy number of the
tRNA
0.2
in Expression
(E) The ΔΔG versus
change in tAI is plotted for
all constructs within the quantitative range. Constructs
∆tAI=0
∆∆G=0
are colored by their relative fold change in expression from the average codon variant within the set.
16x
4x
p < 10−16
p = 0.773
R2 = 0.39
R2 = 0
1x
-10
-6
9 232
-4
-2
0
+2
+4
675 1676 2800 2099 692
16x
176
> 16x
Choice of codon and Expression
-15
F
Fold Change in Protein Expression
p = 0.197
3
> +4
∆∆ G from average variant
(kcal/mol)
R2 = 0.05
4x
-0.2
F
16x
0.0
∆tAI from average codon variant
∆tAI=0
0.2
∆∆G=0
p < 10−16
p = 0.773
R2 = 0.39
R2 = 0
4x
1x
1 4x
1 16x
-2
-1
0
log2 RSCU
1
-10
-5
0
5 -0.2
-0.1
0.0
0.1
∆∆G from average codon variant ∆ tAI from average codon variant
regression,
there is notolonger
any relation
individual
codon
cing mRNA secondary
(F) Subsets linear
of constructs
corresponding
the shaded
boxes inbetween
(E). (Left)
Points with
constant codon
varied
structure,
points
withDDG
constant
secondary
slopes
and secondary
RSCU (compare
with (right)
Fig. 2B).
(E) The
versus
change instructure
tAI is and varied
relative changesadaptation
in %GC and
codon adaptation.
plotted for all constructs within the quantitative range. Constructs are colored by
. (B) Expression increases
ding at the front of the their relative fold change in expression from the average codon variant within
l of centered value. (C) the set. (F) Subsets of constructs corresponding to the shaded boxes in (E). (Left)
relate with the DDG per Points with constant codon adaptation and varied secondary structure, (right)
or DDG with a multiple points with constant secondary structure and varied codon adaptation.
left to uncover, and the extent to which codon
usage beyond the N-terminal region alters gene
expression remains unresolved (8, 14).
The N terminus of genes in almost all bacteria
displays reduced secondary structure, but enrichment of poorly adapted N-terminal codons is
only found in bacteria with GC content of at least
50% (3). Recent work further shows that AT-rich
would expect a disproportionate enrichment of A
over T due to G-U wobble pairing. Indeed, nucleotide triplets with A at the wobble position were
more consistently correlated with expression in our
data set and with enrichment at the N terminus of
E. coli genes (fig. S11).
Kudla et al. show that local RNA structure in
the region between –4 and +38 of translation start
References and Notes
1. J. B. Plotkin, G. Kudla, Nat. Rev. Genet. 12, 32–42 (2011).
2. T. Tuller et al., Cell 141, 344–354 (2010).
3. M. Allert, J. C. Cox, H. W. Hellinga, J. Mol. Biol. 402,
905–918 (2010).
4. S. Pechmann, J. Frydman, Nat. Struct. Mol. Biol. 20,
237–243 (2013).
5. K. Bentele, P. Saffert, R. Rauscher, Z. Ignatova,
N. Blüthgen, Mol. Syst. Biol. 9, 675 (2013).
6. G.-W. Li, E. Oh, J. S. Weissman, Nature 484, 538–541 (2012).
7. W. Gu, T. Zhou, C. O. Wilke, PLOS Comput. Biol. 6,
e1000664 (2010).
8. G. Kudla, A. W. Murray, D. Tollervey, J. B. Plotkin, Science
324, 255–258 (2009).
9. M. dos Reis, R. Savva, L. Wernisch, Nucleic Acids Res. 32,
5036–5044 (2004).
10. P. Shah, Y. Ding, M. Niemczyk, G. Kudla, J. B. Plotkin,
Cell 153, 1589–1601 (2013).
11. M. Welch et al., PLOS ONE 4, e7002 (2009).
12. M. Zhou et al., Nature 495, 111–115 (2013).
13. S. Navon, Y. Pilpel, Genome Biol. 12, R12 (2011).
14. T. Tuller, Y. Y. Waldman, M. Kupiec, E. Ruppin, Proc. Natl.
Acad. Sci. U.S.A. 107, 3645–3650 (2010).
15. A. R. Subramaniam, T. Pan, P. Cluzel, Proc. Natl. Acad.
Sci. U.S.A. 110, 2419–2424 (2013).
16. E. M. LeProust et al., Nucleic Acids Res. 38, 2522–2540
(2010).
17. J.-D.Pédelacq , S. E. P. Cabantous, T. Tran,
T. C. Terwilliger, G. S. Waldo, Nat. Biotechnol. 24, 79–88
(2006).
18. N. C. Shaner et al., Nat. Biotechnol. 22, 1567–1572 (2004).
19. S. Kosuri et al., Proc. Natl. Acad. Sci. U.S.A. 110,
14024–14029 (2013).
20. Y. Yamazaki, H. Niki, J.-I. Kato, Methods Mol. Biol. 416,
385–389 (2008).
21. D. L. Hartl, E. N. Moriyama, S. A. Sawyer, Genetics 138,
227–234 (1994).
22. M. Gouy, C. Gautier, Nucleic Acids Res. 10, 7055–7074
(1982).
23. P. M. Sharp, W. H. Li, Nucleic Acids Res. 15, 1281–1295
(1987).
24. J. N. Zadeh et al., J. Comput. Chem. 32, 170–173 (2011).
25. D. Voges, M. Watzele, C. Nemetz, S. Wizemann,
B. Buchberger, Biochem. Biophys. Res. Commun. 318,
601–614 (2004).
26. M. H. de Smit, J. van Duin, Proc. Natl. Acad. Sci. U.S.A.
87, 7668–7672 (1990).
27. J. R. Coleman et al., Science 320, 1784–1787 (2008).
28. A. A. Komar, Trends Biochem. Sci. 34, 16–24 (2009).
Hybridization probabilities and Expression
RBS
variable codons
GFP
+30%
Window Relative Hybridization Probability
+20%
+10%
A multiple linear regression
model that combines promoter
and RBS choice, as well as Nterminal secondary structure
and GC content, still explains
only 54% of variation in
expression levels
0%
-10%
-20%
Mean +/- 1 SD
Best 5% of constructs
(422 Constructs)
Worst 5% of constructs
(422 Constructs)
-30%
-log 10 (p-value)
Fig. 4. mRNA structure
downstream of start codon is most correlated
withreducedexpression.
Relative hybridization probabilities averaged in 10nuclotide windows are
plotted against their correlation with expression
change as a function of
position (–20 to +60 from
ATG). (Top) The best and
worst 5% of constructs—
as ranked by relative expression within a codon
variant set—are grouped
and plotted as blue and
red ribbons, respectively.
The ribbon tops and bottoms are one standard deviation from the mean,
which is shown as a solid
line. (Bottom) The P value
for linear regressions, correlating hybridization probabilities within each window
to expression fold change
in all constructs.
Hybridization probabilities was
calculated using UNAFOLD
Acknowledgments: We thank J. C. Way, E. R. Daugharthy,
and software
R. T. Sauer for comments. The research was supported
150
100
50
0
-20
0
20
40
10-bp Windows across transcript
60
by the U.S. Department of Energy (DE-FG02-02ER63445 to
G.M.C.), NSF SynBERC (SA5283-11210 to G.M.C.), Office of
Naval Research (N000141010144 to G.M.C. and S.K.), Agilent
mRNA structure downstream of start codon is most correlated with reduced expression. (Top) The best and worst 5% of
478
25 OCTOBER 2013 VOL 342 SCIENCE www.sciencemag.org
constructs
as ranked by relative
expression within a codon variant set are grouped and plotted as blue and red ribbons,
respectively. The ribbon tops and bottoms are one standard de- viation from the mean, which is shown as a solid line.
(Bottom) The p-value for linear regressions, correlating hybridization probabilities within each window to expression fold
change in all constructs.
Conclusion
•  Using N-terminal rare codons instead of common ones
increase expression by ~14 fold (median 4 fold)
•  Reduced RNA structure, not codon rarity itself, is
responsible for expression increases
•  This knowledge helps better predict how to
synthesize genes that make enzymes or drugs in
a bacterial cell.
Reference
Daniel B. Goodman, George M. Church, Sriram Kosuri
“Causes and Effects of N-Terminal Codon Bias in
Bacterial Genes“
Science. 2013 Oct 25;342(6157):475-9
THANK YOU
Flow Cytometry Controls
–16) with individually
Fig. S5.
All Flow
Cytometryratio
Controls.
fluorescence
ratio measured
FlowSeq
estimates
of fluorescence
correlateFlowSeq
well (R2 =estimates
0.955, p <of
2×10
2
–16
fluorescence
ratios(R
from=sequence-verified
51 individually
constructs thatmeasured
were outside
of the quantitative
correlate well
0.955, p < 2×10clones.
) with
fluorescence
ratios FlowSeq
range
aresequence-verified
shown in red.
from
clones. 51 constructs that were outside of the quantitative
FlowSeq range are shown in red.