Identification of epitope-like consensus motifs using mRNA display

JOURNAL OF MOLECULAR RECOGNITION
J. Mol. Recognit. 2002; 15: 126–134
Published online in Wiley InterScience (www.interscience.wiley.com). DOI:10.1002/jmr.567
Identification of epitope-like consensus motifs
using mRNA display
Rick Baggio1*†, Petra Burgstaller2†, Stephen P. Hale1, Andrew R. Putney1, Meghan Lane1,
Dasa Lipovsek1, Martin C. Wright1, Richard W. Roberts3, Rihe Liu4, Jack W. Szostak4 and
Richard W. Wagner1
1
Phylos Inc., 128 Spring St, Lexington, MA 02421, USA
Noxxon Pharma AG, Gustav-Meyer-Allee 25, 13355 Berlin, Germany
3
Division of Chemistry and Chemical Engineering, California Technology Institute, Pasadena, CA 91125, USA
4
Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA
2
The mRNA display approach to in vitro protein selection is based upon the puromycin-mediated formation
of a covalent bond between an mRNA and its gene product. This technique can be used to identify peptide
sequences involved in macromolecular recognition, including those identical or homologous to natural
ligand epitopes. To demonstrate this approach, we determined the peptide sequences recognized by the
trypsin active site, and by the anti-c-Myc antibody, 9E10. Here we describe the use of two peptide libraries
of different diversities, one a constrained library based on the trypsin inhibitor EETI-II, where only the six
residues in the first loop were randomized (6.4 107 possible sequences, 6.0 1011 sequences in the library),
the other a linear-peptide library with 27 randomized amino acids (1.3 1035 possible sequences, 2 1013
sequences in the library). The constrained library was screened against the natural target of wild-type
EETI, bovine trypsin, and the linear library was screened against the anti-c-myc antibody, 9E10. The
analysis of selected sequences revealed minimal consensus sequences of PR(I,L,V)L for the first loop of
EETI-II and LISE for the 9E10 epitope. The wild-type sequences, PRILMR for the first loop of EETI-II and
QKLISE for the 9E10 epitope, were selected with the highest frequency, and in each case the complete wildtype epitope was selected from the library. Copyright # 2002 John Wiley & Sons, Ltd.
Keywords: molecular recognition; mRNA; protein; peptide, display; selection; 9E10; EETI-II; Myc; trypsin
INTRODUCTION
Specific interactions between proteins, such as between an
enzyme and its inhibitor or between an antibody and its
antigen, are mediated through molecular recognition of
subsites on both binding partners. To determine such
binding subsites, one strategy is to use randomized peptide
or protein libraries and a screening method to identify the
minimal binding domain for an interaction with a given
target. Armed with this information, it should in principle be
possible to search databases of protein sequences to identify
previously unknown protein:target binding pairs. A method
for determining the sequences of natural protein-binding
peptides would be a valuable tool in proteomic research.
Such a method would allow, for example, the identification
of the unknown binding partners of signaling proteins, and
the identification of the targets of antibodies with defined
biological properties but unknown antigens.
One example of a well-characterized complex between a
protein and a peptide is the complex between trypsin and its
*Correspondence to: R. Baggio, Phylos Inc., 128 Spring St, Lexington, MA
02421, USA.
E-mail: [email protected].
†Both authors contributed equally to this work.
Abbreviations used: EETI-II, Ecballium elaterium trypsin inhibitor two.
Copyright # 2002 John Wiley & Sons, Ltd.
inhibitor, the Ecballium elaterium trypsin inhibitor two
(EETI-II; Favel et al., 1989a,b). Trypsin is a 24 kDa
pancreatic serine protease and EETI-II is a 28-residue
trypsin inhibitor of the knottin family; like most knottins, it
contains a triple-stranded beta sheet and three disulfide
bonds (Heitz et al., 1989). Only the first loop of EETI-II
(residues 3–8) is involved in the interaction with trypsin
(Le-Nguyen et al., 1989, 1990).
Another example of a well-characterized complex
between a protein and a peptide is that of the c-Myc antigen
and the anti-c-Myc antibody 9E10, which recognizes the cMyc epitope between residues 408 and 439 (Evan et al.,
1985; Schiweck et al., 1997). Human c-Myc is a transcription factor of 65 kDa which, as a heterodimer with the Max
protein, binds to genes whose promoters contain the E box
motif (CACGTG, Amati et al., 2001).
A limitation of previous methods (Deroos and Muller,
2001) used to achieve the goal of finding epitope-like
consensus motifs is the overall size of the randomized
library that allows for adequate interaction space to be
sampled. In affinity-based peptide library selections (Mattheakis et al., 1994; Schatz et al., 1996), selection pressure
is applied on a random population, resulting in the
enrichment of specific binders. The association of phenotype with genotype is essential both to enrich binders at each
round of selection and to identify the selected molecules.
Among the technologies that take advantage of this link are
IDENTIFICATION OF EPITOPE-LIKE CONSENSUS MOTIFS
127
Figure 1. Ribbon model of a mRNA±peptide fusion molecule, to scale (drawn with Biopolymer2,
MSI). The double helix mRNA/DNA strand encodes for the tobacco mosaic virus untranslated
region and either the 28 amino acids of the EETI-II library or the 27 amino acids of the randomized
Ê extension of the mRNA
linear peptide library. The single-stranded poly-A linker is a covalent 100 A
strand. The three prime end of the poly-A linker terminates with puromycin, which forms the
carboxy-terminus of the peptide encoded by the mRNA.
in vivo technologies such as phage display (Hoogenboom et
al., 1998), E. coli display (Christmann et al., 1999; Wentzel
et al., 1999), yeast display (Kieke et al., 1997; Boder and
Wittrup, 2000) and in vitro technologies such as ribosome
display (He and Taussig, 1997), display technologies using
DNA-binding proteins (Fitzgerald, 2000; Speight et al.,
2001), and mRNA display (Roberts and Szostak, 1997;
Nemoto et al., 1997). mRNA display uses puromycin to
covalently attach an mRNA to its translated product (Fig. 1).
The advantages of this method are that the libraries are
generated in vitro, which allows large libraries (>1012
different sequences) to be made, and that the covalent link
between the genotype and the phenotype ensures that the
mRNA–peptide fusion remains intact under a range of
conditions. mRNA display has been used to select highaffinity streptavidin-binding peptides (Wilson et al., 2001)
and functional proteins (Keefe and Szostak, 2001) from a
randomized peptide library.
In this study, we applied the mRNA display technology to
test its ability to identify the binding domains for two
target:ligand interactions, bovine trypsin inhibitor and the
anti-myc 9E10 monoclonal antibody. We selected the
peptides that bound to these targets by iteratively selecting
and enriching for binders from randomized peptide libraries.
In addition to selecting epitope-like motifs from randomized
peptide libraries, our study sought to dissect the peptide–
macromolecular interactions and to determine the minimal
sequence required for the recognition of a peptide by a
macromolecule.
EXPERIMENTAL
Library construction
EETI-II. Three gel-purified oligonucleotides (Oligos Etc.
Inc.) were used to construct EETI-II with a randomized first
loop (residues 3–8). In a final volume of 100 ml, 1 nmol of
the template, 5' GAT AAA TGT TAA TGT TAC CCG
ACG (NNS)6 ACG TTC GTC CTG TCG CTG ACG GAG
CGG CCG ACG CAG ACG CCG GGG TTG CCG AAG
ACG CCG 3', N = A, G, C, T; S = C, G), was amplified for
Copyright # 2002 John Wiley & Sons, Ltd.
three rounds of PCR (94 °C, 1 min; 65 °C, 1 min; 72 °C, 2
min) using 1 mM of the 5' primer, 5' TAA TAC GAC TCA
CTA TAG GGA CAA TTA CTA TTT ACA ATT ACA
ATG GGC TGC 3' (Oligos Etc. Inc.), and the 3' primer, 5'
CGG CGT CTT CGG CAA CCC CGG CGT 3' (Oligos Etc.
Inc.) in the supplied buffer (Promega) with MgCl2 (2.5 mM),
NTP (250 mM) and 5 ml of Taq polymerase (Promega, 5
units/ml).
One nanomole of DNA was transcribed into RNA in a
0.5 ml reaction using the Megashortscript In vitro Transcription kit (Ambion). After incubation for 1 h at 37 °C, 20
units of DNAse I (Ambion) were added and the incubation
at 37 °C continued for an additional 15 min. The reaction
was twice purified with phenol/CHCl3/isoamylalcohol and
excess NTPs were removed by NAP-25 gel filtration
(Pharmacia). The RNA was precipitated by addition of
1/10 vol NaOAc pH 5.2 and 1 vol isopropanol and dissolved
in 200 ml H2O.
The puromycin-DNA oligonucleotide linker, 5'-A27CCPuromycin, was synthesized as described previously and
conjugated to the 3'-end of the library RNA by templatedirected enzymatic ligation (Roberts and Szostak, 1997). In
a 100 ml reaction volume, the purified RNA (1 nmol) was
ligated to 5'-A27CC-Puromycin linker (2 nmol) in the
presence of biotinylated DNA splint (2 nmol), 5'biotin-G
CAA CGA CCA ACT TTT TTT TTN GCC GCA GAA G 3'
(Oligos Etc. Inc.), with 10 ml of T4 DNA ligase (20 units/ml,
Promega) for 4 h at room temperature. Ligated mRNA
products that annealed to the biotinylated splint were
captured on 250 ml of neutravidin-agarose beads (Pierce)
suspended in 1 ml of PBS for 1 h at 35 °C. The beads were
washed three times with 1 ml of PBS to remove unbound
species, and mRNA-A27CC-Puromycin linker conjugates
were eluted by incubating the beads for one hour at 35 °C
with an excess (10 nmol) of antisplint, 5' CTT CTG CGG
CAA AAA AAA AAG TTG GTC GTT GC 3' (Oligos Etc.
Inc.). The mRNA-A27CC-Puromycin linker conjugates
were concentrated by isopropanol precipitation. The
purified ligated mRNA was then used to generate mRNApeptide fusion.
Purified ligated RNA (1 nmol) was translated using
1 ml of lysate from the Rabbit Reticulocyte IVT kit
J. Mol. Recognit. 2002; 15: 126–134
128
R. BAGGIO ET AL.
(Ambion) in the presence of 15 mCi 35S-methionine (1000
Ci/mmol). After a 30 min incubation at 30 °C, KCl and
MgCl2 were added to a final concentration of 530 mM and
150 mM, respectively. mRNA–protein fusion molecules
were isolated by incubation with 250 mg oligo dT25
cellulose (Pharmacia) in 10 ml of incubation buffer
(100 mM Tris–HCl pH 8.0, 10 mM EDTA, 1 M NaCl and
0.25% Triton X-100) for 1 h at 4 °C. Bound mRNA–protein
fusion molecules were isolated by filtration, washed with
incubation buffer and eluted with ddH2O. The concentration
of mRNA–protein fusion molecules was determined by
scintillation counting. To generate fusion molecules containing the corresponding cDNA strands, isolated mRNAprotein fusion molecules were reverse transcribed with 10 ml
of Superscript II Reverse Transcriptase (200 units/ml, Gibco
BRL) in the manufacturer’s buffer at 42 °C using a 5-fold
excess of ligation splint as primer. In order to promote
disulfide pairing of the EETI-II fusion library, DTT was
omitted for the reverse transcription reaction.
c-Myc. Three gel-purified oligonucleotides (Oligos Etc.)
were used to construct the linear library. One naromole of
the template [5'-AGC TTT TGG TGC TTG TGC ATC
(SNN)27 CTC CTC GCC CTT GCT CAC CAT-3', N = A,
G, C, T; S = C, G] was amplified by three rounds of PCR
(94 °C, 1 min; 65 °C, 1 min; 72 °C, 2 min) using 1 mM of the
5' primer (5'-AGC TTT TGG TGC TTG TGC ATC-3') and
the 3' primer (5'-TAA TAC GAC TCA CTA TAG GGA
CAA TTA CTA TTT ACA ATT ACA ATG GTG AGC
AAG GGC GAG GAG-3') in the supplied buffer (Promega)
with MgCl2 (2.5 mM), NTP (250 mM) and Taq polymerase
(Promega, 5 units) in a final volume of 5 ml. The reaction
mixture was extracted twice with phenol/CHCl3/isoamylalcohol, precipitated with ethanol, and dissolved in 100 ml TE.
The final construct contains a randomized region where
each codon has the sequence NNS, where ‘N’ represents
equal probability of A, G, C and T, and ‘S’ represents equal
probability of G and C, an upstream T7 transcription
promoter and a variant of the TMV translation initiation
sequence. Downstream from the randomized region is a
sequence designed to accommodate primers for PCR,
reverse transcription and ligation. Transcription followed
the protocols described in the EETI-II section.
For ligation, equal molar amounts (2 nmol) of RNA, 5'A27CC-puromycin, and splint (5'-TTT TTT TTT TNA GCT
TTT GGT GCT TG 3') were mixed in a 1.5 ml reaction with
T4 DNA ligase (1200 units, Promega) in buffer supplied by
the manufacturer. Following a 4 h incubation at room
temperature, the reaction was extracted twice with phenol/
CHCl3 isoamylalcohol and the nucleic acids were precipitated with isopropanol as described above.
Ligated RNA was separated from unligated RNA on a 6%
TBE/urea denaturing polyacrylamide gel, eluted from the
gel with 200 ml H2O. Ligated RNA (1.25 nmol) was
translated in a total volume of 7.5 ml using the Rabbit
Reticulocyte IVT kit (Ambion) in the presence of 150 mCi
35
S-methionine (1000 Ci/mmol). Subsequent steps in
preparing the library were similar to those described for
the low diversity, constrained peptide library construction.
The reverse transcription reaction, necessary to generate
fusion molecules containing the corresponding cDNA
strands used Superscript II Reverse Transcriptase (200
Copyright # 2002 John Wiley & Sons, Ltd.
units/ml, Gibco BRL) and followed the manufacturer’s
recommended protocol.
Selection
EETI-II. Potential column binders were removed from the
reverse transcribed EETI-II fusion library (1 pmol in 700 ml)
by incubating the library with 100 ml CL-sepharose beads
(Sigma) for 15 min at room temperature. The suspension
was spun at 1000 rpm for 30 s and the supernatant applied to
100 ml of trypsin-agarose bead slurry (Sigma) in selection
buffer (PBS pH 7.4, 1% Tween-20), and, in the instances
where high stringency conditions were used, 200 mM
benzamidine (Sigma). The mixture was incubated for 30 s
at room temperature. The beads were centrifuged at 1000
rpm and washed 4–8 times with 1 ml selection buffer.
Bound material was eluted with 4 100 ml of 0.33 N
NaOH. During the course of the selection, the enrichment of
the selected binders was determined by measuring the
amount of radio labeled protein fusion bound to a 10%
fraction of the trypsin–agarose bead slurry. All radiolabeled
measurements were performed on a scintillation counter
(Wallac 1409). The eluates were neutralized and the DNA
recovered by isopropanol precipitation.
c-Myc. The starting library (33 pmol, 2 1013 molecules)
was incubated with a 12-fold excess of the c-myc binding
antibody (Santa Cruz). An excess of protein A–sepharose
was used to precipitate the peptide fusion–antibody
complexes. After washing the sepharose with 5 vols of
selection buffer (1 PBS pH 7.4, 1 mg/ml BSA, 0.05%
Tween 20) to remove non-specific binders, bound species
were eluted by the addition of 15 mM acetic acid. The cDNA
portion of the eluted fusion molecules was amplified by
PCR and the resulting DNA was used to generate an
enriched population of fusion products for ensuing rounds
of selection.
Binding assay
EETI-II. Representative clones were subjected to further
analysis. Selected clones of the EETI-II selection were
prepared as mRNA-peptide fusions, the RNA in the fusion
was digested with RNAse One (Promega), and the target
binding of the remaining peptide was determined in the
following way. mRNA-peptide fusions (2–7 pmol) were
prepared as described in the EETI-II experimental section.
Following oligo dT25 cellulose purification, 1.5 ml of RNase
One2 (Promega) were added to, and incubated with, each
mRNA–peptide fusion for 1 h at room temperature.
Constant amounts of the rnase-treated mRNA–peptide
fusions were then separately incubated in 500 ml selection
buffer (PBS pH 7.4, 1% Tween-20) at room temperature for
20 min with various concentrations (20–200 nM) of
biotinylated bovine trypsin (Sigma). Streptavidin magnetic
beads (200 ml; Dynal) were added to each incubate. The
resulting suspensions were rotated for 10 min. The beads
were washed twice with 500 ml selection buffer; and the
total radioactive counts for the beads (bound radioligand
value), unbound material and washes (free radioligand
J. Mol. Recognit. 2002; 15: 126–134
IDENTIFICATION OF EPITOPE-LIKE CONSENSUS MOTIFS
value) was determined on a scintillation counter (Wallac).
Dissociation constants for the different EETI-II variants
were determined by plotting the data to the equation:
Fractional radioligand bound to trypsin ˆ
[trypsin concentration]/([trypsin concentration] ‡ KD †
The equation was derived from the law of mass action
model for a ligand receptor interaction:
Radioligand ‡ trypsin
!
Radioligand trypsin
c-Myc. 0.2 pmol aliquct 35S-labeled monoclonal RNApeptide fusion (R6-51, R6-52, R6-53, R6-55, R6-56, R6-58,
R6-60, R6-61, R6-63, R6-66, R6-67, R6-68) or wild-type
fusion construct (WT) were incubated with 100 pmol antimyc monoclonal antibody 9E10, with 100 pmol anti-myc
monoclonal antibody 9E10 in the presence of 2 nmol
synthetic myc peptide, or without monoclonal antibody,
respectively. The peptide fusion–antibody complexes were
precipitated by addition of protein A–sepharose. The values
represent the average percentage of fusion molecules that
bound to the antibody and could be eluted with 15 mM acetic
acid determined in duplicate binding reactions.
Sequence determination and analysis
DNA recovered from the fifth round of selection was PCR
amplified, gel-purified (Qiagen) and cloned into the pCR1
2.1-TOPO1 cloning system (Invitrogen). The resulting
plasmids were purified (Qiagen). Cycle sequencing was
performed on chosen samples and resolved on an ABI 310
sequencer. Sequences were analyzed with the MacVector
analysis package (Oxford Molecular) and aligned according
to amino acid sequence.
RESULTS
129
starting library consisted of 1011 molecules (1 pmol), which
exceeds the potential complexity of the EETI library. In
contrast to the linear library, its starting library consisted of
1013 molecules (33 pmol), which only samples 10 23 of the
potential complexity of the linear library.
EETI-II selection
In the first three rounds of selection the amount of
radiolabeled protein fusion bound to immobilized trypsin
increased from 1.8 to 22%, the latter being equivalent to the
binding of wild-type EETI. Since these measurements of
percent binding represent the ability of in vitro produced
radiolabeled fusion to bind to trypsin, a dramatic extension
of the c-terminus of the EETI scaffold by way of
puromycin-poly-A linker and mRNA/DNA stills yields a
construct capable of substantial binding to trypsin. In an
attempt to further enrich the population, 200 mM benzamidine, a competitive inhibitor of trypsin (KD = 18.4 mM;
Mares-Guia and Shaw, 1964), was introduced into the
incubation and wash buffers in the fourth round. Additionally, the binding time was reduced to 30 s. This increased
stringency decreased the amount of binding to 0.5% in the
fourth round. By the fifth round under the same set of
increased stringency conditions, the binding percentage of
radiolabeled protein fusions to immobilized trypsin increased to 2%, indicating a possible enrichment for higher
affinity binders.
Sequence analysis. The cDNA pool from the fifth round of
selection was cloned and 108 clones were sequenced. The
sequence analysis revealed that wild-type EETI-II represented 20% of the clones (Table 1). The proline at the P2
position (nomenclature from Schechter and Berger, 1967) is
conserved in all the analyzed sequences, and the arginine at
the P1 position is conserved except for one instance where it
is substituted by lysine. The hydrophobic residues isoleucine, leucine, or valine occupy the P1' and P2' positions.
The P3' and P4' positions were highly variable (Table 1).
Library complexity
The two libraries constructed were a constrained, lowdiversity library based on the EETI-II scaffold, which
consists of a randomly mutated first loop, and a highdiversity linear peptide library, which consists of 27
randomized residues within a linear peptide. EETI-II has
been shown to be tolerant of mutagenesis while retaining its
knottin fold (Favel et al., 1989a,b; Christmann et al., 1999;
Wentzel et al., 1999). The choice of these two libraries was
based upon the basic structural requirements for high
affinity binders to trypsin and the 9E10 anti-myc antibody.
Whereas a constrained peptide would be a more likely
trypsin inhibitor candidate than a linear peptide, the 9E10
anti-myc antibody recognizes a linear epitope. Therefore,
the library based upon the EETI-II scaffold was used against
trypsin, and the high diversity linear library was used
against the 9E10 anti-myc antibody.
The theoretical complexity of the EETI-II library (a
randomized 6-mer) is 206 or 6.40 107 different sequences;
that of the linear library (a randomized 27-mer), 2027 or
1.34 1035 different sequences. For the EETI-II library, the
Copyright # 2002 John Wiley & Sons, Ltd.
Characterization. Representative peptides were produced
in vitro as mRNA-protein fusions, treated with RNase and
Table 1. Representative clones from the EETI-II selection. Conserved residues are bold; residues that have
similar properties are italic. The motifs PRIL or PRLL
are present in most of the clones
wt EETI-II
20%
7.5%
3.75%
3.75%
3.75%
2.5%
2.5%
2.5%
2.5%
1.25%
Consensus
P
P
P
P
P
P
P
P
P
P
P
P
R
R
R
R
R
R
R
R
R
R
K
R
I
I
L
I
L
V
V
L
L
L
L
X
L
L
L
L
L
L
L
L
L
V
L
L
M
M
A
L
Q
A
A
V
H
Q
A
X
R
R
P
P
P
L
P
L
L
R
P
X
J. Mol. Recognit. 2002; 15: 126–134
130
R. BAGGIO ET AL.
Figure 2. Binding ef®ciency of selected EETI-II variants to trypsin-sepharose as determined by the percentage
of radiolabeled clone bound.
tested for binding to trypsin (Fig. 2). Each peptide was
therefore modified at the C-terminus due to the presence of
the puromycin-DNA linker. Selected clones were further
characterized by solution-based radioligand affinity analysis
(Table 2). The binding efficiency of each clone was
determined and, unexpectedly, a lower percentage of wildtype bound than any of the other screened clones; despite
the fact that the wild-type clone bound the tightest, followed
closely by the P1-lysine variant. The dissociation constant
obtained for the wild-type clone (16 3.2 nM) is close to
that obtained by Wentzel et al. (1999; 6.09 1.97 nM) and
Le-Nguyen et al. (1990; 1.8 nM) for wild-type EETI-II
constructs with an extended C-terminus. Under the conditions chosen for production, purification and binding, the
discrepancy between percentage binding and dissociation
constant for the wild-type clone may be a result of
differences in the percentage of active (i.e. folded) wildtype clone relative to similar percentages for the other
selected anti-trypsin clones.
Table 2. Af®nities of the selected EETI-II variants
measured in three different experiments. The clones
used for af®nity characterization have the puromycin±
DNA (30mer poly-dA) linker at their C-terminus
Amino acid sequence
(residues 3–8)
Kd (nM)
PRILMR
PRVLAL
PRLVQR
PRLLAP
PKLLAP
16 3.2
52 21.0
53 9.0
81 21.0
21 2.4
Copyright # 2002 John Wiley & Sons, Ltd.
c-Myc selection
Five rounds of selection were performed using the c-Myc
monoclonal antibody, 9E10, as the target. After four rounds
of selection, the amount of radiolabeled mRNA–peptide
fusion recovered increased (Fig. 3). After the fifth round,
34% of the linear peptide library bound to the c-Myc
antibody; this is comparable to the binding of wild-type cMyc in control experiments. No further increase in binding
was observed after round five. Eluants from the fifth and the
sixth round were used to characterize and sequence of the
resulting clones.
Sequence analysis. The cDNA pools from the fifth and
sixth rounds were cloned and 116 colonies were sequenced.
As presented in Table 3, two selected clones contained the
sequence of the wild-type c-Myc epitope (EQKLISEEDL).
A third clone differed from the wild-type by two point
mutations in the nucleotide sequence, only one of which
altered an amino acid (Ile to Val). The consensus sequence
selected [X(Q,E)XLISEXX(L,M)] included the full length
10 amino acid wild-type sequence; identical or homologous
sequences to the four of the core residues, LISE, were
conserved in 86% of the 116 clones examined.
Characterization. A series of experiments was performed
to determine if the pool of selected peptides bound to the
anti-c-Myc antibody 9E10. Fusion products from the sixth
round of selection were evaluated under different immunopreciptation conditions. In order to confirm that selected
fusion molecules interact with the antigen-binding site of
the anti-c-Myc antibody 9E10, synthetic c-Myc peptide was
introduced as a competitor for binding to the anti-c-Myc
antibody. The percentage of fusion bound to the antibody
J. Mol. Recognit. 2002; 15: 126–134
IDENTIFICATION OF EPITOPE-LIKE CONSENSUS MOTIFS
131
Figure 3. The enrichment of peptides binding to the c-Myc 9E10 antibody determined by measuring the increase in
bound and eluted radiolabeled methionine as incorporated into the peptide sequence. An increase in signal can be
observed emerging by round 4.
decreased with increasing amounts of c-Myc peptide, which
implies that the selected population binds to the c-Myc
antigen binding site on the 9E10 antibody. No binding was
detected in experiments without an antibody, which
indicates that the selected peptides do not bind to protein
A-agarose. The selected peptides were shown not to bind an
irrelevant antibody target, an anti-integrin monoclonal
antibody. The combined data suggests that the selected
Table 3. Representative clones from the selection for
binding to antibody 9E10. Conserved residues are
bold; residues that are less frequently conserved but
have similar properties are italic. The four-residue
motif, LISE, is conserved
c-Myc epitope E Q
R6-51
C A
R6-52
E E
R6-53
R Q
R6-55
L Q
R6-56
I V
R6-58
E E
R6-60
M Q
R6-61
T M
R6-63
E Q
R6-66
D M
R6-67
F Q
R6-68
Q R
consensus
X Q/E
K
S
Y
Y
R
R
Y
N
D
K
M
A
V
X
L
V
L
L
L
L
L
L
L
L
L
L
L
L
I
I
V
I
I
L
L
I
I
I
I
I
I
I
Copyright # 2002 John Wiley & Sons, Ltd.
S
S
S
S
S
S
S
S
P
S
S
A
S
S
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
R
Y
Y
Q
Y
Y
H
H
E
K
E
F
X
D L
E C
V M
E H
M F
H M
V M
E L
Y M
D L
E L
E L
W L
X L/M
population interacts specifically with the target molecule
against which it was raised.
Similar immunoprecipitation assays were performed on
12 selected sequences from the sixth round pool. These
experiments confirmed the specific binding of the RNA–
peptide fusions to the antigen-binding site of the anti-c-Myc
monoclonal antibody (Fig. 4). The selected sequences
bound the c-Myc antibody with an affinity similar to that
of the wild-type Myc fusion.
DISCUSSION
mRNA display was applied to selections against two targets,
the serine protease trypsin and the anti-c-Myc antibody
9E10, in order to determine the minimal binding sequence
of the peptide to its protein partner. Following multiple
rounds, each selection converged to a consensus binding
sequence. Remarkably, in both randomized peptide selections, the highest affinity selected sequences were identical
to the wild-type binding peptide, which suggests that the
selection of these peptides is driven almost entirely by
optimal binding interactions and not by other steps in the
selection protocol.
The EETI-II selection mapped the first loop of six amino
acids, which binds to the trypsin active-site. The trypsinbinding sequences from the EETI-II selection are highly
similar to the wild-type EETI-II first loop, including three
highly conserved residues and a fourth position with a
preference for hydrophobic amino acids. These results
J. Mol. Recognit. 2002; 15: 126–134
132
R. BAGGIO ET AL.
Figure 4. Competition between unlabeled myc peptide and radiolabeled fusion molecules (selected clones) for
binding to the anti-Myc antibody 9E10. The Myc peptide competes with the selected binders, indicating a common
binding site to the 9E10 antibody.
imply that the PR(I, L, V)L is essential for trypsin binding.
The observation that the P3' and P4' positions vary the most
in the selected EETI-II variants is in accord with the cocrystal structure of CMTI-trypsin (Helland et al., 1999). In
the crystal structure of the complex between CMTI and
trypsin, the two residues at the P3' and P4' positions are more
than 4 Å from the trypsin surface (Helland et al., 1999).
Since these positions are not making any direct contact with
the trypsin surface, we would expect that the selection
pressure at the P3' and P4' positions would be relatively low.
In other natural anti-trypsin knottins (Holak et al., 1989;
Wilusz et al., 1983; Ling et al., 1993; Hatakeyama et al.,
1991; Otlewski et al., 1987; Heitz et al., 1989; Wieczorek et
al., 1985), the PRILM sequence in the first loop is the most
common; this agrees with our EETI-II selection results,
where clones containing the PRILMR sequence were
dominant. The second most common sequence in this
selection was PRLLAP. A similar sequence (PRILMP), can
be found in the trypsin inhibitor knottin from Luffa
cylindrical (Ling et al., 1993).
Another interesting variant from the EETI-II selection
was found to have as its first loop sequence PKLLAP. This
Copyright # 2002 John Wiley & Sons, Ltd.
variant is similar in affinity to the wild-type sequence,
despite the substitution of Ile5 to Leu. NMR structural
studies of the wild-type Ile5 to Leu EETI-II variant showed
a first loop more disordered than the wild-type first loop,
resulting in reduced trypsin binding (Nielsen et al., 1994).
This decrease in binding relative to wild-type was not
apparent in our PKLLAP variant, possibly because
differences at other positions in the first loop compensated
for the detrimental effect of the Ile5 to Leu substitution.
The c-Myc antibody, Myc1-9E10, was originally raised
against the 32 amino acid C-terminus of the human c-myc
oncoprotein (Chan et al., 1987). The longest contiguous
selected peptide sequence with complete identity to c-myc
was EQKLISEEDL. Despite the low initial abundance of
peptides containing this 10-mer sequence in the 27-mer
linear peptide library, the selection pressure favored its
amplification and emergence. The sequence obtained is long
enough that a simple PIR (Barker et al., 2001) search would
have identified the c-Myc oncogene if the identity of the
antigen for the 9E10 antibody had not been known. In fact,
performing a PIR pattern search on the degenerative
consensus sequence, Q/E-X-L-I-S-E-X-X-L/M, yielded
J. Mol. Recognit. 2002; 15: 126–134
IDENTIFICATION OF EPITOPE-LIKE CONSENSUS MOTIFS
predominantly matches for c-Myc. Thirty two percent of the
matches (i.e. eight out of 25) from the PIR database were the
c-Myc oncogene. All human matches were c-Myc.
The report that the capacity of a monoclonal antibody
binding site may be as short as 5–9 residues (Dunn et al.,
1999) suggests that an antibody binding region can be
shorter than the selected 10 amino acid identity of the c-Myc
antigen. Indeed, the results of the selection against the 9E10
antibody show that enriched peptides converge on the
consensus sequence LISE. Nearly 61% of the clones contain
this sequence, and another 25% contain single homologous
substitutions within the LISE consensus. The remaining
14% had no more than two nonhomologous substitutions
within the LISE consensus. Two of the more frequent
nonhomologous substitutions were a proline for the serine
and a tryptophan for the glutamic acid of the LISE
consensus. Although flanking amino acids may contribute
to the affinity, the conservation of the LISE tetrapeptide
sequence within the majority of analyzed clones suggest that
LISE is the most important determinant of binding.
133
In summary, we used mRNA display with two peptide
libraries of different diversities: a low diversity, constrained
knottin library with six randomized residues in the first loop;
and a high diversity 27-mer, linear peptide. Both selections
converged to consensus sequences, and some clones
contained the precise wild-type sequence of their respective
ligands. In both cases, the sequence requirements for
molecular recognition of a peptide by a macromolecule
were mapped at a high resolution. The application of mRNA
display to identify native ligands and precise primary
sequence requirements of peptides binding to proteins may
become a useful tool both in proteomic applications and
peptidomimetic design.
Acknowledgements
The authors wish to thank Michael McPherson and Thomas Cujec for
critical reading of the manuscript.
REFERENCES
Amati B, Frank SR, Donjerkovic D, Taubert S. 2001. Function of
the c-Myc oncoprotein in chromatin remodeling and transcription. Biochim. Biophys. Acta. 1471: M135±M145.
Barker WC, Garavelli JS, Hou Z, Huang H, Ledley RS, McGarvey
PB, Mewes H-W, Orcutt BC, Pfeiffer F, Tsugita A, Vinayaka CR,
Xiao C, Yeh L-SL, Wu C. 2001. Protein Information Resource:
a community resource for expert annotation of protein data.
Nucl. Acids Res. 29: 29±32.
Boder ET, Wittrup KD. 2000. Yeast surface display for directed
evolution of protein expression. Meth. Enzymol. 328: 430±
444.
Chan S, Gabra H, Hill F, Evan G, Sikora K. 1987. A novel tumour
marker related to the c-myc oncogene product. Mol. Cell
Probes 1: 73±82.
Christmann A, Walter K, Wentzel A, Kratzner R, Kolmar H. 1999.
The cystine knot of a squash-type protease inhibitor as a
structural scaffold for Escherichia coli cell surface display of
conformationally constrained peptides. Protein Engng. 12:
797±806.
Deroos S, Muller CP. 2001. Antigenic and immunogenic phage
displayed mimotopes as substitute antigens: applications
and limitations. Comb. Chem. High Throughput Screen 4: 75±
110.
Dunn C, O'Dowd A, Randall RE. 1999. Fine mapping of the
binding sites of monoclonal antibodies raised against the PK
tag. J. Immunol. Meth. 224: 141±150.
Evan Gi, Lewis GK, Ramsay G, Bishop JM. 1985. Isolation of
monoclonal antibodies speci®c for human c-Myc protooncogene product. Mol. Cell Biol. 5: 3610±3616.
Favel A, Le-Nguyen D, Coletti-Previero MA, Castro B. 1989a.
Active-site chemical mutagenesis of Ecballium elaterium
trypsin inhibitor II: new microproteins inhibiting elastase and
chymotrypsin. Biochem. Biophys. Res. Commun. 162: 79±82.
Favel A, Mattras H, Coletti-Previero MA, Zwilling R, Robinson EA,
Castro B. 1989b. Protease inhibitors from Ecballium elaterium seeds. Int. J. Pept. Protein Res. 33: 202±208.
FitzGerald K. 2000. In vitro display technologiesÐnew tools for
drug discovery. Drug Discov. Today 5: 253±258.
Hatakeyama T, Hiraoka M, Funatsu G. 1991. Amino acid
sequences of the two smallest trypsin inhibitors from sponge
gourd seeds. Agric. Biol. Chem. 55: 2641±2642.
He M, Taussig MJ. 1997. Antibody-ribosome-mRNA (ARM)
complexes as ef®cient selection particles for in vitro display
Copyright # 2002 John Wiley & Sons, Ltd.
and evolution of antibody combining sites. Nucl. Acids Res.
25: 5132±5134.
Heitz A, Chiche L, Le-Nguyen D, Castro B. 1989. 1H 2D and
distance geometry study of the folding of Ecballium elaterium trypsin inhibitor, a member of the squash inhibitors
family. Biochemistry 28: 2392±2398.
Helland R, Berglund GI, Otlewski J, Apostoluk W, Andersen OA,
Willassen NP, Smalas AO. 1999. High-Resolution structures
of three new trypsin-squash-inhibitor complexes: a detailed
comparison with other trypsins and their complexes. Acta
Crystallogr. Sect. D D55: 139±148.
Holak TA, Bode W, Huber R, Otlewski J, Wilusz T. 1989. Nuclear
Magnetic resonance solution and x-ray structures of squash
trypsin inhibitor exhibit the same conformation of the
proteinase binding loop. J. Mol. Biol. 210: 649±654.
Hoogenboom HR, de Bruine AP, Hufton SE, Hoet RM, Arends JW,
Roovers RC. 1998. Antibody phage display technology and its
appilcations. Immunotechnology 4: 1±20.
Keefe AD, Szostak JW. 2001. Functional proteins from a randomsequence library. Nature 410: 715±718.
Kieke MC, Cho BK, Boder ET, Kranz DM, Wittrup KD. 1997.
Isolation of anti-T cell receptor scFv mutants by yeast surface
display. Protein Engng. 10: 1303±1310.
Le-Nguyen D, Mattras H, Coletti-Previero MA, Castro B. 1989.
Design and chemical synthesis of a 32 residue chimeric
microprotein inhibiting both trypsin and carboxypeptidase A.
Biochem. Biophys. Res. Commun. 162: 1425±1430.
Le-Nguyen D, Heitz A, Chiche L, Castro B, Boigegrain RA, Favel A,
Coletti-Previero MA. 1990. Molecular recognition between
serine proteases and new bioactive microproteins with a
knotted structure. Biochimie. 72: 431±435.
Ling MH, Qi HY, Chi CW. 1993. Protein, cDNA, and genomic DNA
sequences of the towel gourd trypsin inhibitor. A squash
family inhibitor. J. Biol. Chem. 268: 810±814.
Mares-Guia M, Shaw E. 1964. Studies on the active center of
trypsin. The binding of amidines and guanidines as models
of the substrate side chain. J. Biol. Chem. 240: 1579±1585.
Mattheakis LC, Bhatt RR, Dower WJ. 1994. An in vitro polysome
display system for identifying ligands from very large peptide
libraries. Proc. Natl. Acad. Sci. USA 91: 9022±9026.
Nemoto N, Miyamoto-Sato E, Husimi Y, Yanagawa H. 1997. In
vitro virus: bonding of mRNA bearing puroMycin at the 3'terminal end to the C-terminal end of its encoded protein on
J. Mol. Recognit. 2002; 15: 126–134
134
R. BAGGIO ET AL.
the ribosome in vitro. FEBS Lett. 414: 405±408.
Nielsen KJ, Alewood D, Andrews J, Kent SB, Craik DJ. 1994. An
1H NMR determination of mirror-image forms of a Leu-5
variant of the trypsin inhibitor from Ecballium elaterium
(EETI-II). Protein Sci. 3: 291±302.
Otlewski J, Whatley H, Polanowski A, Wilusz T. 1987. Amino-acid
sequences of trypsin inhibitors from watermelon (Citrullus
vulgaris) and red bryony (Bryonia dioica) seeds. Biol. Chem.
Hoppe-Seyler 368: 1505±1507.
Roberts RW, Szostak J. 1997. RNA-peptide fusions for the in vitro
selection of peptides and proteins. Proc. Natl. Acad. Sci. USA
94: 12297±12302.
Schatz PJ, Cull MG, Martin EL, Gates CM. 1996. Screening of
peptide libraries linked to lac repressor. Meth. Enzymol. 267:
171±191.
Schechter I, Berger A. 1967. On the size of the active site in
proteases. Biochem. Biophys. Res. Commun. 27: 157±162.
Schiweck W, Buxbaum B, Schatzlein C, Gunter Neiss H, Skerra A.
1997. Sequence analysis and bacterial production of the antic-Myc antibody 9E10: the VH domain has an extended CDRH3 and exhibits unusual solubility. FEBS Lett. 414: 33±38.
Copyright # 2002 John Wiley & Sons, Ltd.
Speight RE, Hart DJ, Sutherland JD, Blackburn JM. 2001. A new
plasmid display technology for the in vitro selection of
functional phenotype-genotype linked proteins. Chem. Biol.
8: 951±965.
Wentzel A, Christmann A, Kratzner, Kolmar H. 1999. Sequence
Requirements of the GPNG b-turn of the Ecballium elaterium
trypsin inhibitor II explored by combinatorial library screening. J. Biol. Chem. 274: 21037±21043.
Wieczorek M, Otlewski J, Cook J, Parks K, Leluk J, WilimowskaPelc A, Polanowski A, Wilusz T, Laskowski M Jr. 1985. The
squash family of serine proteinase inhibitors. Amino acid
sequences and association equilibrium constants of inhibitors from squash, summer squash, zucchini, and cucumber
seeds. Biochem. Biophys. Res. Commun. 126: 646±652.
Wilson DS, Keefe AD, Szostak JW. 2001. The use of mRNA
display to select high-af®nity protein-binding peptides. Proc.
Natl. Acad. Sci. 98: 3750±3755.
Wilusz T, W.ieczorek M, Polanowski A, Denton A, Cook J,
Laskowski M Jr. 1983. Amino-acid sequence of two trypsin
isoinhibitors, ITD I and ITD III from squash seeds (Curcurbita
maxima). Hoppe-Seyler's Z. Physiol. Chem. 364: 93±95.
J. Mol. Recognit. 2002; 15: 126–134