Validation of multivariate optimization theories for directed

Supplementary Notes
PCK110700
-1-
Codexis, Confidential
Mutations Present in the Final Population of Variants
-2-
Codexis, Confidential
Figure 3 Sequences
WT
Round 3
Round 9
Round 17
Round 18
(1)
(1)
(1)
(1)
(1)
(1)
WT
Round 3
Round 9
Round 17
Round 18
(51)
(51)
(51)
(51)
(51)
(51)
1
10
20
30
40
50
MSTAIVTNVKHFGGMGSALRLSEAGHTVACHDESFKQKDELEAFAETYPQ
MSTAIVTNVKHFGGMGSALRLSEAGHTVACHDESFKHKDELEAFAETYPQ
MSTAIVTNVKHFGGMGSALRLSEAGHTVACHDESFKHQDELEAFAETYPQ
MSTAIVTNVKHFGGMGSALRLSEAGHTVACHDESFKHQDELEAFAETYPQ
MSTAIVTNVKHFGGMGSALRLSEAGHTVACHDESFKHQDELEAFAETYPQ
51
60
70
80
90
100
LKPMSEQEPAELIEAVTSAYGQVDVLVSNDIFAPEFQPIDKYAVEDYRGA
LKPMSEQEPAELIEAVTSAFGQVDVLVSNDIFALEFRPIDKYAVEDYRGA
LIPMSEQEPAELIEAVTSALGHVDVLVSNDIAPVEWRPIDKYAVEDYRDT
LIPMSEQEPAELIEAVNSALGHVDILVSNDIAPVEWRPIDKYAVEDYRDT
LIPMSEQEPAELIEAVTSALGHVDILVSNDIAPVEWRPIDEYAVEDYRDM
(101)
WT (101)
Round 3 (101)
Round 9 (101)
Round 17 (101)
Round 18 (101)
101
(151)
WT (151)
Round 3 (151)
Round 9 (151)
Round 17 (151)
Round 18 (151)
151
(201)
WT (201)
Round 3 (201)
Round 9 (201)
Round 17 (201)
Round 18 (201)
201
110
120
130
140
150
VEALQIRPFALVNAVASQMKKRKSGHIIFITSATPFGPWKELSTYTSARA
VEALQIRPFALVNAVASQMKKRKSGHIIFITSAAPFGPWKELSTYSSARA
VEALQIKPFALVNAVASQMKKRKSGHIIFITSAAPFGPWKELSTYSSARA
VEALQIKPFALANAVATQMKRRKSGHIIFITSAASFGPWKELSTYASARA
VEALQIKPFALANAVASQMKRRKSGHIIFITSAASFGPWKELSTYASARA
160
170
180
190
200
GACTLANALSKELGEYNIPVFAIGPNYLHSEDSPYFYPTEPWKTNPEHVA
GASALANALSKELGEYNIPVFAIGPNYLHSEDSPYYYPTEPWKINPEHVA
GASALANALSKELGEYNIPVFAIAPNYLHSGDSPYYYPSEPWKTSPEHVA
GASALANALSKELGEYNIPVFAIAPNAVDSGDSPYYYPSEPWKTSPEHVA
GASALANALSKELGEYNIPVFAIAPNAMDSGDSPYYYPSEPWKTSPEHVA
210
220
230
240
254
HVKKVTALQRLGTQKELGELVAFLASGSCDYLTGQVFWLAGGFPMIERWPGMPE
HVKKVTALQRLGTQKELGELVAFLASGSCDYLTGQVFWLAGGFPVIERWPGMPE
HVRKVTALQRLGTQKELGELVTFLASGSCDYLTGQVFWLAGGFPVIERWPGMPE
WVRKYTALQRLGTQKELGELVTFLASGSCDYLTGQVFWLAGGFPVVERWPGMPE
WVRKYTALQRLGTQKELGELVTFLASGSCDYLTGQVFWFAGGFPVVERWPGMPE
-3-
Codexis, Confidential
Detailed Description of a Round
In order to give a fuller picture of the decision-making process utilized in the ProSARdriven methodology, here we describe our 14th round of evolution. We had completed
our ProSAR analysis on two previous libraries, 12-1 and 12-2 (round 13 was still under
analysis). The best variant out of these libraries was chosen as the parent for the next set
of libraries with a 1.2-fold improvement over the round 12 parent. The ProSAR model
from library 12-1 was of relatively low quality (r=0.31, p=9.37x10-3, where r is the leaveone-out crossvalidated correlation coefficient and p is the frequency of observing such a
correlation by chance alone given the null hypothesis of no correlation), so regression
coefficients were not weighted heavily for purposes of decision making and all mutations
that appeared potentially beneficial were included in the next round, giving seven
mutations of interest. (It should be noted that the magnitudes of the regression
coefficients are particular to each library and cannot be meaningfully compared across
two models without normalization.) Two mutations from this library were in the chosen
backbone and appeared that they may be detrimental; these positions were allowed to
mutate back to their original residue (flip-out) in the next library. There were two other
mutations in the backbone that appeared positive, but due to the lack of confidence in this
model they were also allowed to mutate back to the original residue in the next library.
All told, 12 of the initial 15 mutations in 12-1 were tested in the next library. Library 122 gave a better model (r=0.47, p=2.51x10-4) and revealed four mutations that were either
neutral or beneficial. This round of evolution was at a point where we were running low
on mutations of interest and so we had completed multiple saturation mutagenesis
libraries (sat. mut.) within the binding pocket and at positions that had previously shown
influence on activity. These libraries gave us 18 mutations worth pursuing further. We
had also hit-shuffled three of our best variants and completed ProSAR analysis of this
library (Hit Shuffle 15). This analysis provided an additional five mutations of interest.
In total, these libraries provided 39 mutations to test in further combinatorial libraries as
shown in Tables 3 and 4.
We split these mutations into two libraries: 14-1 with 19 mutations and 14-2 with 20
mutations. The sequence of the backbone and the oligonucleotides used to construct the
libraries are listed after Tables 3 and 4. Both libraries were analyzed with ProSAR and
gave relatively high quality models (14-1: r =0.58, p=7.9x10-6, 14-2: r=0.71, p=4.55x1010
). The next library’s parent came from 14-1 and had three mutations with high
regression coefficients. Four of the mutations in the parent had negative regression
coefficients and so were allowed to mutate back to the original residue in the next library.
Two more mutations were positive, but not in the backbone so were included in the next
library. Library 14-2 provided 14 mutations that were neutral to beneficial. All told this
resulted in three positive mutations fixed in the new backbone, 20 mutations to be tested
in the next set of libraries, and 16 mutations removed from consideration.
-4-
Codexis, Confidential
Library 14-1
Fold improved of the
highest activity variant
with the mutation
Notes
-
L, 1.20
mutated back
0.257 yes
D121K
T152A
12-1
12-1
+
D, 1.20
T, 1.20
mutated back
mutated back
-0.27
-0.33
F177Y
Q38L
12-1
sat. mut.
+
F, 1.20
1.25
mutated back
0.152
no, A was better than Y or F
-0.12 yes
flip-out
S78N
T100M
sat. mut.
sat. mut.
1.25
1.04
-0.010
0.169
V101I
F177A
sat. mut.
sat. mut.
1.70
1.70
-0.31
0.59 yes
W238R
T67N
sat. mut.
sat. mut.
1.25
1.20
-0.13
0.00 yes
flip-out
G181W
V205Y
sat. mut.
sat. mut.
1.17
1.16
-0.24
-0.11 yes
flip-out
A114Q
D99G
sat. mut.
Hit Shuffle 15 0.07
1.15
0.98
-0.21
0.003
V112A
W139D
Hit Shuffle 15 0.05
Hit Shuffle 15 0.08
0.98
0.98
0.033 yes
-0.44
N176R
W238C
Hit Shuffle 15 0.06
Hit Shuffle 15 0.03
0.98
0.96
-0.02
-0.12 yes
In Next Library?
Previous Regression
Coefficient
12-1
In Next Backbone?
Previous Library
Regression Coefficient
Mutation
L10K
yes
yes
flip-out
Table 3 – 14-1 Library Design. The source of each mutation is given by the previous library it was
observed in along with any regression coefficient information from ProSAR analysis. In some cases
mutations present in the backbone were allowed to vary back to the previous residue (mutated back)
because we were unsure about their impact on function or believed the mutation may be deleterious.
The regression coefficient for the mutation in the context of the new library is given along with an
indication of its presence in the new backbone and whether it is part of the next round library design.
-5-
Codexis, Confidential
Library 14-2
D121E
V202L
12-1
12-1
+
+
V245A
P135S
12-1
12-2
M252V
E40V
mutated back
In Next Library?
Fold improved of the
highest activity variant
with the mutation
T, 1.2
0.92
In Next Backbone?
Previous Regression
Coefficient
+
+
Regression Coefficient
Previous Library
12-1
12-1
Notes
Mutation
T152A
E95G
-0.030
-0.120
yes
0.89
1.17
0.323
-0.500
yes
+
0.001
1.01
1.14
-0.990
0.651
12-2
12-2
-0.001
0.066
1.00
1.12
-0.050
-0.260
yes
A60V
R87Q
12-2
12-1
0.090
+
1.02
0.93
random mutation 0.051
-0.320
yes
S146A
T100A
12-1
12-1
+
+
0.93
1.12
0.132
random mutation 0.068
yes
yes
S180T
T144S
sat. mut.
sat. mut.
1.29
1.09
0.499
0.166
yes
yes
G251E
M54I
sat. mut.
sat. mut.
1.04
1.03
0.159
-0.020
yes
yes
D121R
G251S
sat. mut.
sat. mut.
1.03
1.18
0.119
0.087
yes
yes
W238T
sat. mut.
1.01
1.259
yes
I52T
sat. mut.
1.01
-0.260
yes
Table 4 – 14-2 Library Design. The source of each mutation is given by the previous library it was
observed in along with any regression coefficient information from ProSAR analysis. In some cases
mutations present in the backbone were allowed to vary back to the previous residue (mutated back)
because we were unsure about their impact on function or believed the mutation may be deleterious.
In some cases random mutations appeared in the combinatorial library and were included in the next
library design when they appeared potentially beneficial. The regression coefficient for the mutation
in the context of the new library is given along with an indication of its presence in the new backbone
and whether it is part of the next round library design.
-6-
Codexis, Confidential
Round 14 Backbone and Oligonucelotides Used in Library Constructions
The oligonucleotides listed cover a defined region of the backbone and set of mutations desired in that
region. In some cases, multiple oligonucleotides were required in order to allow for all combinations of
mutations in a targeted region, e.g. V112A and A114Q are collectively coded by two oligos
(aagccatttgctctagyaaatgccgtcgcttcgcaaatg and aagccatttgctctagyaaatcaggtcgcttcgcaaatg) though we do not
further indicate which mutation is carried by a particular oligonucleotide though this information can be
deduced by inspection.
Round 14 Backbone:
atgagcaccgctattgtcaccaacgtcctgcattttggaggtatgggtagcgctctgcgtctgagcgaagctggtcata
ccgtcgcttgccatgatgaaagctttaagcatcaggatgaactagaagcttttgctgaaacctacccacagctgatacc
aatgagcgaacaggaaccagctgaactgattgaagctgtcaccagcgcccttggtcatgtcgatatcctggtcagcaac
gatatcgcgcctgtggaatggcggccaatcgataaatacgctgtcgaggattacagggatactgtcgaagctctgcaga
tcaagccatttgctctagtgaatgctgtcgcttcgcaaatgaaggatcgaaagtcggggcacatcatcttcatcacttc
ggctgccccgttcgggccatggaaggagctatcgacttactcttcggctcgagctgggaccagtgcactagctaatgct
ctatcgaaggagctaggagagtacaatatcccggtgttcgctatcgctccgaattttctagactcgggggattcgccgt
actattacccctctgagccgtggaagacttctccggagcacgtggctcacgtgcgtaaggtgactgctctacaacgact
agggactcaaaaagagttgggggaattggtgacgtttttggcatctggctcttgtgattatttgactggccaggtgttt
tggttggcaggcggctttcccgttgtagagcgttggcccggcatgcccgaataatga
14-1 Oligos:
attgtcaccaacgtcaagcattttggaggtatg (L10K)
gaaagctttaagcatctggatgaactagaagct (Q38L)
ctgattgaagctgtcaatagcgcccttggtcat (T67N)
gtcgatatcctggtcaacaacgatatcgcgcct (S78N)
gtcgaggattacagggrcaygrtcgaagctctgcagatc (D99G, T100M, V101I)
aagccatttgctctagyaaatgccgtcgcttcgcaaatg (V112A, A114Q)
aagccatttgctctagyaaatcaggtcgcttcgcaaatg (V112A, A114Q)
gcttcgcaaatgaagaaacgaaagtcggggcac (D121K)
gccccgttcgggccagataaggagctatcgact (W139D)
tcggctcgagctggggcgagtgcactagctaat (T152A)
ttcgctatcgctccgcgttwtctagactcgkgggattcgccgtactat (N176R, F177YA,
ttcgctatcgctccgcgtgccctagactcgkgggattcgccgtactat (N176R, F177YA,
ttcgctatcgctccgaactwtctagactcgkgggattcgccgtactat (N176R, F177YA,
ttcgctatcgctccgaacgccctagactcgkgggattcgccgtactat (N176R, F177YA,
gctcacgtgcgtaagtacactgctctacaacga (V205Y)
actggccaggtgtttygtttggcaggcggcttt (W238CR)
G181W)
G181W)
G181W)
G181W)
14-2 Oligos:
tttaagcatcaggatgtgctagaagcttttgct (E40V)
acctacccacagctgaytccaatkagcgaacaggaacca (I52T, M54I)
agcgaacaggaaccagttgaactgattgaagct (A60V)
gcgcctgtggaatggcaaccaatcgataaatac (R87Q)
atcgataaatacgctgtcggcgattacagggat (E95G)
gattacagggatgccgtcgaagctctgcagatc (T100A)
gcttcgcaaatgaaggaacgaaagtcggggcac (D121RE)
gcttcgcaaatgaagcgccgaaagtcggggcac (D121RE)
atcacttcggctgccagcttcgggccatggaag (P135S)
tggaaggagctatcgasttackcttcggctcgagctggg (T144S, S146A)
tcggctcgagctggggccagtgcactagctaat (T152A)
gagcacgtggctcacctgcgtaaggtgactgct (V202L)
actggccaggtgtttactttggcaggcggcttt (W238T)
gcaggcggctttcccgcggtagagcgttggccc (V245A)
gtagagcgttggcccrgcrtgcccgaataa (G251SE, M252V)
gtagagcgttggcccgaartgcccgaataa (G251SE, M252V)
-7-
Codexis, Confidential