Supplemental Information - David R. Liu

S1
Supplementary Information
An In Vitro Translation, Selection, and Amplification System for Peptide Nucleic Acids
Yevgeny Brudno, Michael E. Birnbaum, Ralph E. Kleiner, David R. Liu*
Department of Chemistry and Chemical Biology and the Howard Hughes Medical Institute
Harvard University, 12 Oxford Street, Cambridge, MA 02138
Supplementary Methods
General Methods. Unless otherwise noted, all commercially available reagents and solvents
were purchased from Aldrich and used without further purification. All PNA monomers were
purchased from ASM Research. Unless otherwise noted, all commercially available enzymes
and buffers were purchased from New England Biolabs (NEB). Unless otherwise specified, all
reactions were conducted at room temperature.
Preparation of Individual DNA Templates. DNA templates for polymerization experiments in
main text Figures 2 and 3 and Supplementary Figure 1 were synthesized with a 5’-MMT-aminodT-CE phosphoramidite (Glen Research, Cat No. 10-1932) at the 5’ terminus. Purification of
DNA templates was carried out by (i) deprotection with 1:1 ammonium hydroxide: methylamine
for 15 min at 65 ˚C; (ii) reverse-phase HPLC purification of the MMT-containing fraction using
a [8% acetonitrile in 0.1 M TEAA, pH 7] to [40% acetonitrile in 0.1 M TEAA, pH 7] solvent
gradient with a column temperature of 45˚C; (iii) MMT cleavage with 3% trifluoroacetic acid in
water for 10 min at room temperature; (iv) reverse-phase HPLC purification of the MMTdeprotected product. Samples were quantitated by UV spectroscopy using a Nanodrop ND1000
spectrophotometer.
DNA Templates in Supplementary Figure 1B:
gaat: NGCGACGGTATACCGTCGCAATTCATTCATTCATTCATTCATTCATTCATTCATTCATTC
gact: NGCGACGGTATACCGTCGCAAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTC
gagt: NGCGACGGTATACCGTCGCAACTCACTCACTCACTCACTCACTCACTCACTCACTCACTC
gcgt: NGCGACGGTATACCGTCGCAACGCACGCACGCACGCACGCACGCACGCACGCACGCACGC
ggat: NGCGACGGTATACCGTCGCAATCCATCCATCCATCCATCCATCCATCCATCCATCCATCC
Nature Chemical Biology: doi: 10.1038/nchembio.280
S2
ggct: NGCGACGGTATACCGTCGCAAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCCAGCC
gggt: NGCGACGGTATACCGTCGCAACCCACCCACCCACCCACCCACCCACCCACCCACCCACCC
gcct: NGCGACGGTATACCGTCGCAAGGCAGGCAGGCAGGCAGGCAGGCAGGCAGGCAGGCAGGC
DNA Templates in Supplementary Figure 1C:
– lane (25 °C): NGCGACGGTTCCCCGTCGCAACCTTACCTTACCTTACCTTACCTTACCTTACCTT
aaggt: NGCGACGGTTCCCCGTCGCAACCTTACCTTACCTTACCTTACCTTACCTTACCTT
agagt: NGCGACGGTTCCCCGTCGCAACTCTACTCTACTCTACTCTACTCTACTCTACTCT
caagt: NGCGACGGTTCCCCGTCGCAACTTGACTTGACTTGACTTGACTTGACTTGACTTG
cttgt: NGCGACGGTTCCCCGTCGCAACAAGACAAGACAAGACAAGACAAGACAAGACAAG
agcat: NGCGACGGTTCCCCGTCGCAATGCTATGCTATGCTATGCTATGCTATGCTATGCT
cacat: NGCGACGGTTCCCCGTCGCAATGTGATGTGATGTGATGTGATGTGATGTGATGTG
cagtt: NGCGACGGTTCCCCGTCGCAAACTGAACTGAACTGAACTGAACTGAACTGAACTG
ggatt: NGCGACGGTTCCCCGTCGCAAATCCAATCCAATCCAATCCAATCCAATCCAATCC
ccatt: NGCGACGGTTCCCCGTCGCAAATGGAATGGAATGGAATGGAATGGAATGGAATGG
ctact: NGCGACGGTTCCCCGTCGCAAGTAGAGTAGAGTAGAGTAGAGTAGAGTAGAGTAG
DNA Templates in Figure 2:
acact: NGCGACGGTTCCCCGTCGCAAGTGTAGTGTAGTGTAGTGTAGTGTAGTGTAGTGTAGTGTAGTGTAGTGT
accat: NGCGACGGTTCCCCGTCGCAATGGTATGGTATGGTATGGTATGGTATGGTATGGTATGGTATGGTATGGT
acgtt: NGCGACGGTTCCCCGTCGCAAACGTAACCAAACGTAACCAAACGTAACCAAACGTAACCAAACGTAACCA
caact: NGCGACGGTTCCCCGTCGCAAGTTGAGTTGAGTTGAGTTGAGTTGAGTTGAGTTGAGTTGAGTTGAGTTG
cacat: NGCGACGGTTCCCCGTCGCAATGTGATGTGATGTGATGTGATGTGATGTGATGTGATGTGATGTGATGTG
cagtt: NGCGACGGTTCCCCGTCGCAAACTGAACTGAACTGAACTGAACTGAACTGAACTGAACTGAACTGAACTG
gtact: NGCGACGGTTCCCCGTCGCAAGTACAACCAAGTACAACCAAGTACAACCAAGTACAACCAAGTACAACCA
gtcat: NGCGACGGTTCCCCGTCGCAATGACAACCAATGACAACCAATGACAACCAATGACAACCAATGACAACCA
gtgtt: NGCGACGGTTCCCCGTCGCAAACACAACACAACACAACACAACACAACACAACACAACACAACACAACAC
tgact: NGCGACGGTTCCCCGTCGCAAGTCAAGTCAAGTCAAGTCAAGTCAAGTCAAGTCAAGTCAAGTCAAGTCA
tgcat: NGCGACGGTTCCCCGTCGCAATGCAAACCAATGCAAACCAATGCAAACCAATGCAAACCAATGCAAACCA
tggtt: NGCGACGGTTCCCCGTCGCAATGCAAACCAAACCAAACCAAACCAAACCAAACCAAACCAAACCAAACCA
DNA Templates in Figure 3:
top-left gel:
NGCGACGGTTCCCCGTCGCAAGTGTATGGTAACGTAGTTGATGTGAACTG
top-right gel:
NGCGACGGTTCCCCGTCGCAAGTTGAGTGTAACGTAACTGATGGTATGTG
bottom-left gel:
NGCGACGGTTCCCCGTCGCAAGTACATGACAACACAGTCAATGCAAACCA
bottom-right gel: NGCGACGGTTCCCCGTCGCAATGCAATGACAACCAAGTACAACACAGTCA
Nature Chemical Biology: doi: 10.1038/nchembio.280
S3
Synthesis of PNA Aldehyde Building Blocks. PNA pentamer aldehyde building blocks were
synthesized using preloaded H-Thr-Gly-NovaSyn TG resin (Novabiochem) and standard
automated Fmoc solid-phase peptide synthesis on an Applied Biosystems 433A peptide
synthesizer by adapting previously described protocols1,2. PNA pentamer aldehydes were
purified by reverse-phase high-pressure liquid chromatography (HPLC) using a C18 stationary
phase and an acetonitrile/0.1% trifluoroacetic acid (TFA) gradient, characterized by ESI or
MALDI-TOF mass spectrometry and quantitated on a Nanodrop ND1000 UV
spectrophotometer.
Building Block
Expected Mass (Da)
Observed Mass (Da)
aaggt
1400.6
1400.8
agagt
1400.6
1400.8
caagt
1360.6
1361.0
cttgt
1342.6
1342.8
agcat
1360.6
1361.0
cacat
1320.6
1320.8
cagtt
1351.6
1351.8
ggatt
1391.6
1391.8
ccatt
1311.6
1311.8
ctact
1311.6
1311.8
gaat
1109.4
1109.4
gagt
1125.4
1125.4
ggat
1125.4
1125.4
gggt
1141.4
1141.4
ggct
1101.4
1101.4
gcgt
1101.4
1101.4
gact
1085.4
1085.4
gcct
1061.4
1061.4
gaat
1109.4
1109.4
Building Block
Expected Mass (Da)
Observed Mass (Da)
acact
1320.6
1322.5
accat
1320.6
1322.9
Nature Chemical Biology: doi: 10.1038/nchembio.280
S4
acgtt
1351.6
1352.8
caact
1320.6
1322.0
cacat
1320.6
1322.9
cagtt
1351.6
1352.3
tgact
1351.6
1352.9
tgcat
1351.6
1352.9
tggtt
1382.6
1383.6
gtact
1351.6
1352.0
gtcat
1351.6
1355.8
gtgtt
1382.6
1385.3
biotin-ggatt
1618.6
1618.5
DNA Template Library Synthesis. The library of DNA templates was synthesized with intact
5’-DMT groups using the following sequence:
TCCGACCGAATTCCTGGCTCGGAAA68A68A68A68A68A68A68A68AAAAAACCGAGT
GGGCACGCC where 6 was an equimolar mixture of AC, GT, and TG dinucleotide
phosphoramidites (Chemgenes Corporation) and 8 was an equimolar mixture of AC, GT, TG,
and CA dinucleotide phosphoramidites (Chemgenes Corporation).
The positive control template was similarly synthesized with the sequence:
CGAATTCCTGGCTCGGAAAGTGTAGTGTAGTGTAGTGTAGTGTAGTGTAGTGTAATC
CGGAAAACCGAGTGGGCACGCC. Purification of DNA templates was carried out by (i)
deprotection with 1:1 ammonium hydroxide: methylamine for 15 min at 65 ˚C; (ii) reverse-phase
HPLC purification of the DMT-containing fraction using a [8% acetonitrile in 0.1 M TEAA pH
7] to [40% acetonitrile in 0.1 M TEAA pH 7] solvent gradient with a column temperature of
45˚C; (iii) DMT cleavage with 3% trifluoroacetic acid/ water for 10 min; and (iv) reverse-phase
HPLC purification of the DMT-deprotected product. Samples were quantitated by UV
spectroscopy.
PCR Amplification for Figure 5. The material to be amplified was diluted in 400 µL of
Thermopol buffer, 1 μM primer 1, 1 μM primer 2, 1 mM total dNTPs, and Vent (exo-)
Polymerase (16 U/ 400 μL). Primer 1: 5’ PCGAATTCCTGGCTCGGAAA where P is a 5’
phosphate (Glen Research, 10-1901); Primer 2: 5’ BGGCGTGCCCACTCGG where B is
installed with the 5’ Biotin phosphoramidite (Glen Research 10-5950). The PCR protocol is as
follows: 25 cycles of [30 sec at 94 °C, 30 sec at 55 °C, and 30 sec at 72 °C]. PCR reaction
Nature Chemical Biology: doi: 10.1038/nchembio.280
S5
products were purified using the Qiagen PCR purification kit to remove primers, resulting in 100
µL of purified PCR product per 400 µL of PCR reaction.
Strand Separation After PCR. Streptavidin-coated magnetic particles (200 µL, Roche) were
washed twice with wash buffer and then incubated with 3 mg/mL yeast RNA (Ambion) in wash
buffer for 30 min. To this mixture was added purified product DNA from PCR amplification in
wash buffer. The bead suspension was incubated for 1 hour at room temperature with agitation.
The beads were washed twice with 200 µL of wash buffer. The coding strand was eluted with
100 µL of 0.1 M NaOH and the eluant was neutralized with 25 µL 3 M sodium acetate (pH 5.2).
Samples were either precipitated with ethanol (Figure 5) or concentrated using a YM-30
concentrator column (Millipore, Figure 6). Samples were dissolved in water and quantitated
using UV spectroscopy. Typical yields were 10-20 pmoles of coding strand. The wash buffer
was 100 mM NaCl for the experiment in Figure 5 and 25mM Tris-HCl, 130 mM NaCl, pH 7.4
for selections in Figure 6 and in the Supplementary Figure 3 below.
Ligation of 3’ and 5’ Hairpins. The purified coding strand (~10 pmol) was dissolved in 20 µL
T4 ligase buffer with 1 µM hairpin 1 and 1 µM hairpin 2. Hairpin 1:
5’NTCCGAGCCAGGAATTCGCCCGGGXCTTCTTCCCGGG where N is from the 5’ AminodT-CE phosphoramidite (Glen Research 10-1932) and X is from the Cy3 phosphoramidite (Glen
Research 10-5913). Hairpin 2: 5’PCGAGTGGCTFTCCCACTCGGGCGTGC (for Figure 5) or
5’PCGAGTGGCTFTCCCACTCGTGGCGTGC (for Figure 6 and the Supplementary Figure 3)
where P is a 5’ phosphate (Glen Research, 10-1901) and F is from the fluorescein
phosphoramidite (Glen Research 10-1964). Samples were incubated for 2 hours at 16 ˚C and
purified either by concentration with YM-30 columns (Millipore, Figure 5) or by Minelute
Nucleotide Removal Kit (Qiagen, Figure 6). Library Sequencing. Library sequences from DNA synthesis were amplified by PCR in 100 μL
Amplitaq buffer (Bio-Rad) containing Hotstart Taq polymerase, 1 μM primer 1, and 1 μM
primer 2. Primer 1: 5’ GCCGACCGAATTCCTGGCTCGG; Primer 2: 5’
GGGCACACAGGAACAGCTATGACCATGGCGTGCCCACTC. An initial denaturation step
at 94 °C for 3 min was followed for 25 cycles of [30 sec at 94 °C, 30 sec at 55 °C, and 30 sec at
Nature Chemical Biology: doi: 10.1038/nchembio.280
S6
72 °C]. From the completed PCR reaction 0.5 µL was subjected to TOPO cloning (Stratagene)
using the pCR2.1-TOPO vector (Stratagene) and transformed into Mach 1 chemically competent
cells (Stratagene). Carbenicillin-resistant colonies were picked and subjected to the Templiphi
reaction (GE Healthcare) with the following sequencing primer:
GGGCACACAGGAACAGCTATGACCATGGCGTGCCCACTC
Selection Over Six Rounds of Translation, Selection, and Amplification. To a library of
DNA templates encoding 4.3 x 108 PNA 40-mers containing eight consecutive building blocks
was added 10-2, 10-4, or 10-6 equivalents of a DNA template uniquely containing an AATCC
codon (red) that translates into a biotinylated H2N-ggatt-CHO PNA building block. The positive
control DNA template, but not the other templates, contains an MspI cleavage site. The resulting
library of DNA templates was subjected to two to six iterated rounds of translation, PNA strand
displacement, in vitro selection for binding to immobilized streptavidin, and amplification by
PCR (see Figure 5). Enrichment was measure by MspI digestion of the PCR-amplified DNA
before and after selection. A control selection using 10-2 equivalents of the positive control
template but omitting the biotinylated building block during translation (-biotin lanes) was also
performed.
Nature Chemical Biology: doi: 10.1038/nchembio.280
S7
Supplementary Results
Analysis of Potential PNA Genetic Codes
GC and Pyrimidine Content. DNA codons with equal GC and pyrimidine content were
hypothesized to have similar affinity for their cognate PNA building blocks and thereby similar
polymerization efficiencies. Pools of candidate codons were analyzed for number of Gs and Cs
in their sequence as a fraction of the total length and the number of pyrimidines as a fraction of
the total length.
Sequence Specificity. Based on our previous studies2, we hypothesized that in order to exhibit
sequence-specific polymerization, codons and non-cognate building blocks must differ by at
least one full mismatch (i.e. not a G:T wobble) or two G:T wobbles. Note that because codons
that differ by one base (i.e. AACTT and AACCT) automatically have either different GC or
different pyrimidine contents, codons that meet the GC and pyrimidine content criteria
automatically meet this sequence-specificity criteria.
Out-of-Frame Hybridization. The genetic code should minimize the possibility that building
blocks polymerize out of frame to avoid mistranslation or inhibition of proper translation. From
a possible pool of codons, we implemented a computational screen to elucidate the largest
possible set of codons with no possible out-of-frame hybridization. A codon was chosen at
random from the pool and its cognate building block was checked for the ability to hybridize out
of frame to a template made containing the codon. If the chosen codon failed this test, it was
removed from analysis and another was chosen. If the first codon passed, another codon was
chosen at random and both cognate building blocks were tested for out-of-frame annealing to any
of the possible sequences generated by the two codons together. If possible out-of-frame
annealing was detected, the second codon was discarded and another chosen. If the sequence
analysis suggested no out-of-frame annealing, then another codon was chosen and the process
continued until no codons were left. The analysis was performed multiple times to assure that
the initial order of codons did not affect the analysis.
Nature Chemical Biology: doi: 10.1038/nchembio.280
S8
Ease of Template Library Synthesis. In addition to uniform polymerization efficiency and
sequence specificity, an ideal PNA genetic code enables corresponding template libraries to be
readily synthesized in a single column on a DNA synthesizer without the need to perform
laborious split-and-pool oligonucleotide synthesis. The synthesis of a library template
containing n occurrences of x codons by the split-and-pool method requires dividing synthesis
resin into x equal portions, performing the solid-phase synthesis of a codon on each of the x
portions, pooling the portions, and repeating this process for each of n codon positions in the
library. As n and x increase to even modest numbers, this strategy becomes difficult or
impractical.
In contrast, single-column library synthesis traditionally uses mixtures of A, C, G, and/or
T phosphoramidites to install degenerate bases at specific positions in a DNA strand. The
requirements that we identified above— identical GC and pyrimidine content, at least one nonG:T mismatch or two wobble mismatches between any two codons, and no out-of-frame
annealing— mathematically preclude the use of traditional degenerate bases. The only
degenerate bases that do not vary in pyrimidine content are Y (T or C), or R (A or G); the use of
either, however, necessarily changes GC content and introduces the possibility of two codons
that differ only by a single G:T wobble mismatch. Taken together, this analysis establishes that a
building block set of an acceptable size with predicted uniform and sequence-specific
polymerization, no out of frame annealing, and a readily synthesized template library cannot be
generated using mixtures of A, C, G, and T phosphoramidites.
We hypothesized that a PNA genetic code that supports single-column template library
synthesis while satisfying the requirements of constant GC content, constant pyrimidine content,
and the absence of any two codons that differ only by a single wobble mismatch could be
achieved by using mixtures of dinucleotide (Chemgenes Corporation, Supplementary Table 1) or
trinucleotide (Glen Research, Supplementary Table 2) phosphoramidites. The use of mixtures of
polynucleotide phosphoramidites allows for the single-column synthesis of template libraries
unavailable through other means and can assure conformity to the above rules of genetic code
design as long as each element in the mixture has the same GC and pyrimidine content.
Commercially Available Dinucleotide
Nature Chemical Biology: doi: 10.1038/nchembio.280
S9
Phosphoramidites
DNA codon
GC fraction
pyrimidine fraction
AA
0
0
AG
0.5
0
GA
0.5
0
GG
1
0
AT
0
0.5
TA
0
0.5
AC
0.5
0.5
CA
0.5
0.5
GT
0.5
0.5
TG
0.5
0.5
CG
1
0.5
GC
1
0.5
TT
0
1
CT
0.5
1
TC
0.5
1
CC
1
1
Supplementary Table 1. Commercially available dinucleotide phosphoramidites and their GC
and pyrimidine content.
Commercially Available Trinucleotide Phosphoramidites
DNA sequence
GC fraction
pyrimidine fraction
AAA
0
0
GAA
1/3
0
AAC
1/3
1/3
ATG
1/3
1/3
CAG
2/3
1/3
GAC
2/3
1/3
GGT
2/3
1/3
TGG
2/3
1/3
ACT
1/3
2/3
Nature Chemical Biology: doi: 10.1038/nchembio.280
S10
ATC
1/3
2/3
CAT
1/3
2/3
GTT
1/3
2/3
TAC
1/3
2/3
CGT
2/3
2/3
CTG
2/3
2/3
GCT
2/3
2/3
TGC
2/3
2/3
CCG
1
2/3
TCT
1/3
1
TTC
1/3
1
Supplementary Table 2. Commercially available trinucleotide phosphoramidites and their
respective GC and pyrimidine content.
Analysis of One-Base Codons
In principle, mononucleotide building blocks offer many attractive advantages such as no
possibility for out of frame hybridization, and the greatest possible library complexity for a
polymer of a given length. However, previous attempts to polymerize mononucleotide PNAs
through acylation chemistry revealed that they were prone to cyclize to form six-membered
rings3, which may also be a problem for reductive amination. In addition, mononucleotide PNAs
would violate criteria for uniform GC and pyrimidine content and would presumably differ
significantly in strength of their hybridization with a DNA template.
Analysis of Two-Base Codons
Two-base codons can be achieved through the use of dinucleotide phosphoramidites
(Table S1) all 16 of which are commercially available. One can separate the pool of 16 codons
into candidate sets of codons all of which have identical GC and pyrimidine content. Four
codons comprise their own sets (AA, TT, GG, and CC), and fail the out of frame annealing
analysis. Additionally six sets of two codons (AG/GA, AT/TA, AC/CA, TG/GT, CG/GC, and
CT/TC) reduce to a possible contribution of one building block each (for instance, the building
block for AG would bind out of frame on (GA)n templates, allowing the choice of only one or
the other of the codons). One candidate set contains four codons (AC, GT, TG, and CA).
Nature Chemical Biology: doi: 10.1038/nchembio.280
S11
However the two codons AC and CA become mutually incompatible as do the two codons GT
and TG. Therefore, the maximum possible genetic code comprising two-base codons consists of
two codons (for instance AC and GT).
Analysis of Three-Base Codons
Three-base codons can be achieved through the use of trinucleotide phosphoramidites, 20
of which are commercially available (Table S2). These 20 can separate into candidate sets of
codons all of which have identical GC and pyrimidine content. There are two candidate sets of
four codons (CAG/GAC/GGT/TGG and CGT/CTG/GCT/TGC) and one candidate set of five
codons (ACT/ATC/CAT/GTT/TAC) with the required sequence properties. The candidate set
CGT/CTG/GCT/TGC contains three codons (CTG, GCT and TGC) that are mutually
incompatible for out of frame annealing and of which two must be discarded, leaving a twomember candidate set. The candidate set CAG/GAC/GGT/TGG, contains two codons (GGT and
TGG) which are mutually incompatible and of which one must be discarded, leaving a threemember candidate set. The candidate set ACT/ATC/CAT/GTT/TAC, contains two pairs of
incompatible codons (ACT/TAC and ATC/CAT). As one must be discarded from each pair,
this leaves a three-member candidate set. This analysis demonstrates that for three-base codons
made from trinucleotide phosphoramidites, the largest possible codon set is three.
Three-base codons may also be achieved by using a YX’ or X’Y code where a mixture of
mononucleotide phosphoramidites (Y) is followed or preceded by a mixture of dinucleotide
phosphoramidites (X’). As discussed earlier, the mononucleotide component cannot be varied
since this would create unacceptable changes in the sequence composition. The dinucleotide
portion must conform to the sequence criteria and therefore mixtures of dinucleotide
phosphoramidites, of which the candidate set (AC, GT, TG, and CA) is the biggest. An analysis
of all possible three-base codons made up of a fixed base followed by or preceded by a mixture
of this four-member set gives four possible codons, one of which must be removed to account for
out of frame annealing. For example, in the set AAC, AGT, ATG, ACA the building block for
the ACA codon (tgt) can anneal out of frame to the sequence AACAGT. This analysis
demonstrates that for three-base codons made from dinucleotide phosphoramidites, the largest
possible codon set is three.
Nature Chemical Biology: doi: 10.1038/nchembio.280
S12
Analysis of Four-Base Codons
Four-base codons may be achieved by using a YX’ or X’Y code where a mixture of
mononucleotide phosphoramidites (Y) is followed or preceded by a mixture of trinucleotide
phosphoramidites (X’). As discussed earlier, the mononucleotide component cannot be varied
since this would create unacceptable changes in the sequence composition, which leaves
variations in the dinucleotide portion. The trinucleotide portion must conform to the sequence
criteria and therefore mixtures of trinucleotide phosphoramidites, of which the candidate sets
CAG/GAC/GGT/TGG, CGT/CTG/GCT/TGC, and ACT/ATC/CAT/GTT/TAC are the biggest.
Analyzing the possible codons formed by combining a fixed mononucleotide with a mixture of
trinucleotides one obtains several sets of five codons that do not suffer from out of frame
annealing (for instance: ACTG, CATG, TACG, GTTG, and ATCG).
Four-base codons can be also be achieved from two dinucleotides in sequence and
envisaged as Y’X’ where Y’ is one set of mutually-compatible dinucleotides and X’ is another.
Since all elements of Y’ and X’ must have identical GC and pyrimidine content, we analyzed all
possible X’Y’ combinations. The largest set was found to be X’=2, and Y’=2, giving a final
codon set of four building blocks (for example, ACTG, ACGT, CATG, CAGT).
This analysis demonstrates that for four-base codons made from mixtures of either trinucleotide
or dinucleotide phosphoramidites, the largest possible codon set is five.
Note that PNA building blocks longer than pentanucleotides (used in this work) would
also likely show efficient polymerization, but would decrease complexity for a library of a given
nucleotide length.
Nature Chemical Biology: doi: 10.1038/nchembio.280
S13
Supplementary Figure 1. Two PNA genetic codes that exhibit non-uniform DNA-templated
polymerization. (A) Representative polymerization of a PNA aldehyde building block on a
complementary DNA template. (B) Ten consecutive DNA-templated reductive amination couplings
result in PNA 40-mers containing secondary amine linkages between every fourth nucleotide. The
efficiency of this process is shown at 25 °C, 45 °C and 60 °C for eight PNA aldehydes of the form H2Ngvvt-CHO (where v = a, c, or g) using DNA templates containing ten repeats of the codon of interest.
Reaction conditions: 16 µM PNA aldehyde and 0.4 µM DNA templates were heated to 95 °C and cooled
to the specified temperature. NaBH3CN was added to 80 mM. After 30 min, the reactions were analyzed
by denaturing PAGE and visualized by staining with ethidium bromide. (C) Seven consecutive DNAtemplated reductive amination couplings result in PNA 35-mers containing secondary amine linkages
between every fifth nucleotide. The efficiency of this process is shown at 25 °C, 45 °C and 60 °C for the
ten PNA pentamer aldehydes. Reaction conditions were identical to those in (B). For both (B) and (C),
the building blocks that did not polymerize efficiently at 25 °C were not characterized at the higher
temperatures.
Nature Chemical Biology: doi: 10.1038/nchembio.280
S14
N40 Library
(x’y’t)8 Library
Supplementary Figure 2. Modeling of N40 and (x’y’t)8 libraries. 10,000 (x’y’t)8 library
members or 10,000 N40 library members were randomly generated and their secondary structure
folding energies were calculated as DNA using the Oligonucleotide Modeling Platform (OMP;
DNA Software, Inc.). All simulations to determine secondary structures were performed at 25
°C in 1.0 M NaCl with 100 nM template. The top histogram shows the results for the N40
library; the bottom histogram shows the results for the (x’y’t)8 library.
Nature Chemical Biology: doi: 10.1038/nchembio.280
S15
Supplementary Figure 3. Complete analysis of the model selection over six rounds of
translation, selection, and amplification. See also the main text and Figure 6. To a library of
DNA templates encoding 4.3 x 108 PNA 40-mers containing eight consecutive building blocks
was added 10-2, 10-4, or 10-6 equivalents of a DNA template uniquely containing an AATCC
codon (red) that translates into a biotinylated H2N-ggatt-CHO PNA building block. The positive
control DNA template, but not the other templates, contains an MspI cleavage site. The resulting
library of DNA templates was subjected to six iterated rounds of translation, PNA strand
displacement, in vitro selection for binding to immobilized streptavidin, and amplification by
PCR (see Figure 5). MspI digestion of the PCR-amplified DNA before and after selection
reveals that the selection resulted in > 1,000,000-fold enrichment for the positive control
sequence encoding the biotinylated PNA. A similar model selection using 10-2 equivalents of the
positive control template but omitting the biotinylated building block during translation (-biotin
lanes) results in no enrichment of the positive control sequence. Y’ = mixture of AC, GT, and
TG; X’ = mixture of AC, GT, TG, and CA. The images shown are from 2.5% agarose gels
stained with ethidium bromide.
Nature Chemical Biology: doi: 10.1038/nchembio.280
S16
Supplementary References Cited
1.
Kleiner, R.E., Brudno, Y., Birnbaum, M.E. & Liu, D.R. DNA-templated polymerization
of side-chain-functionalized peptide nucleic acid aldehydes. J. Am. Chem. Soc. 130,
4646-4659 (2008).
2.
Rosenbaum, D.M. & Liu, D.R. Efficient and sequence-specific DNA-templated
polymerization of peptide nucleic acid aldehydes. J. Am. Chem. Soc. 125, 13924-13925
(2003).
3.
Schmidt, J.G., Christensen, L., Nielsen, P.E. & Orgel, L.E. Information transfer from
DNA to peptide nucleic acids by template-directed syntheses. Nucleic Acids Res. 25,
4792-6 (1997).
Nature Chemical Biology: doi: 10.1038/nchembio.280