CDRs and B Cell Epitopes - The Journal of Immunology

This information is current as
of June 18, 2017.
Automated Identification of
Complementarity Determining Regions
(CDRs) Reveals Peculiar Characteristics of
CDRs and B Cell Epitopes
Yanay Ofran, Avner Schlessinger and Burkhard Rost
J Immunol 2008; 181:6230-6235; ;
doi: 10.4049/jimmunol.181.9.6230
http://www.jimmunol.org/content/181/9/6230
Subscription
Permissions
Email Alerts
This article cites 34 articles, 12 of which you can access for free at:
http://www.jimmunol.org/content/181/9/6230.full#ref-list-1
Information about subscribing to The Journal of Immunology is online at:
http://jimmunol.org/subscription
Submit copyright permission requests at:
http://www.aai.org/About/Publications/JI/copyright.html
Receive free email-alerts when new articles cite this article. Sign up at:
http://jimmunol.org/alerts
The Journal of Immunology is published twice each month by
The American Association of Immunologists, Inc.,
1451 Rockville Pike, Suite 650, Rockville, MD 20852
Copyright © 2008 by The American Association of
Immunologists All rights reserved.
Print ISSN: 0022-1767 Online ISSN: 1550-6606.
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
References
The Journal of Immunology
Automated Identification of Complementarity Determining
Regions (CDRs) Reveals Peculiar Characteristics of CDRs and
B Cell Epitopes1
Yanay Ofran,2,3* Avner Schlessinger,1†‡§ and Burkhard Rost†‡§
I
nteractions between Abs and other proteins involve several
mechanisms of molecular recognition and protein-protein
interaction (1–5). The specific immunological response to
non-self molecular challenges is conducted by Abs that bind
specifically to the antigenic molecule. The specificity of an Ab
to its target is determined by the complementarity determining
regions (CDRs),4 a set of relatively small structural elements,
three on each of the two chains. The overall structure of each of
the Ab chains is a skeleton of ␤-sheets with six loops that constitute the CDRs. The main differences between the sequences
of Abs of a given individual are in the CDRs. Apart from their
ability to recognize and bind to Ags through the CDRs, Abs can
use other patches on their surface to form interactions with
proteins and other molecules. However, these interactions
*The Mina and Everard Goodman Faculty of Life Science, Bar Ilan University,
Ramat-Gan Israel; †CUBIC, Department of Biochemistry and Molecular Biophysics,
Columbia University, New York, NY 10032; ‡Columbia University Center for Computational Biology and Bioinformatics (C2B2), New York, NY 10032; and §NorthEast Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032
Received for publication June 23, 2008. Accepted for publication August 16, 2008.
The costs of publication of this article were defrayed in part by the payment of page
charges. This article must therefore be hereby marked advertisement in accordance
with 18 U.S.C. Section 1734 solely to indicate this fact.
1
This work was supported by U54-GM074958 granted to the Northeast Structural
Genomics consortium from the Protein Structure Initiative of the National Institutes
of Health and by R0-GM079767 from the National Institute of General Medical
Sciences at the National Institutes of Health.
2
Y.O. and A.S. contributed equally to this study.
3
Address correspondence and reprint requests to Dr. Yanay Ofran, The Mina and
Everard Goodman Faculty of Life Science, Bar Ilan University, Ramat-Gan 52900,
Israel. E-mail address: [email protected]
4
Abbreviations used in this paper: CDR, complementarity determining regions; PDB,
Protein Data Bank.
Copyright © 2008 by The American Association of Immunologists, Inc. 0022-1767/08/$2.00
www.jimmunol.org
should be considered “nonantigenic” as they do not involve the
CDRs and hence do not require the specific identification of an
antigenic epitope. While some studies use antigenic interaction
as a model for understanding the general phenomenon of molecular recognition (1–5), it is commonly assumed that the
mechanisms that govern general protein-protein interactions are
different than those that govern antigenic interactions (5, 6).
Thus, to study antigenic interactions, including the specific recognition of antigenic sites, one needs to sieve the nonantigenic
contacts from the antigenic ones. This could not be done without precise identification of the CDRs.
The term CDR is sometimes used interchangeably with hypervariable regions or segments assembled to form complete variable
(V)-region genes. However, some studies have used it to describe
the residues that participate in binding. We use the term to refer to
those elements on the Ab that actively participate in the antigenic
interaction. We acknowledge that this may not be the way that
many people in the field use it. Hereafter, we use the term CDR to
refer to the residues in the antigenic interface, namely the residues
on the Ab that contact the epitopes.
The first systematic attempts to identify the CDRs have been
based on analysis of the amino acid sequence of Abs. The underlying assumption of this approach was that CDRs are the most
variable residues in the Ab sequence (7, 8), and therefore they
could be identified through the analysis and comparison of the
sequences of various Abs. The Kabat variability plot (7) offers a
consensus sequence motif that could be aligned with sequences of
new Abs to identify their CDRs. CDRs often have very unusual
sequences that typically lack sequence motifs, consensus, profiles,
or other similar elements (9) and, therefore, they are hard to align.
Thus, any method based only on sequence information is prone to
misaligning, and therefore misidentifying, loopy CDRs. Chothia
and colleagues, therefore, based their CDR identification on
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
Exact identification of complementarity determining regions (CDRs) is crucial for understanding and manipulating antigenic
interactions. One way to do this is by marking residues on the antibody that interact with B cell epitopes on the antigen. This, of
course, requires identification of B cell epitopes, which could be done by marking residues on the antigen that bind to CDRs, thus
requiring identification of CDRs. To circumvent this vicious circle, existing tools for identifying CDRs are based on sequence
analysis or general biophysical principles. Often, these tools, which are based on partial data, fail to agree on the boundaries of
the CDRs. Herein we present an automated procedure for identifying CDRs and B cell epitopes using consensus structural regions
that interact with the antigens in all known antibody-protein complexes. Consequently, we provide the first comprehensive analysis
of all CDR-epitope complexes of known three-dimensional structure. The CDRs we identify only partially overlap with the regions
suggested by existing methods. We found that the general physicochemical properties of both CDRs and B cell epitopes are rather
peculiar. In particular, only four amino acids account for most of the sequence of CDRs, and several types of amino acids almost
never appear in them. The secondary structure content and the conservation of B cell epitopes are found to be different than
previously thought. These characteristics of CDRs and epitopes may be instrumental in choosing which residues to mutate in
experimental search for epitopes. They may also assist in computational design of antibodies and in predicting B cell
epitopes. The Journal of Immunology, 2008, 181: 6230 – 6235.
The Journal of Immunology
Materials and Methods
Extraction of 3D structures and Ab-Ag contacts
The overall scheme for identifying the CDRs and the epitopes is described
in Fig. 1. To identify all structures in the PDB that contain at least one
Ab-Ag complex, we searched with BLAST (18) for a consensus sequence
of an Ab against the PDB. The rationale for using BLAST rather than
PSI-BLAST was to avoid capturing molecules such as T cell receptors that,
despite their similarity to Abs, participate in cell-mediated immune response and therefore represent a different type of antigenic interaction. We
then added to the database PDB structures that contain an Ig fold from the
Structural Classification of Proteins (SCOP) database (19), and using PDB
entries containing keywords (e.g., “antibody” and “Ag”) identifying them
as Ab-Ag complexes. Then, we manually discarded all complexes with T
cell receptors or MHC molecules, as these are formed during cell-mediated
immune response and are not relevant to the task at hand. In each complex,
we labeled residues as interacting if any of their respective atoms were
within ⱕ6 Å of one another (20). We thus created a list of interactions
between Abs and Ag.
Structural alignment
We located the CDRs in the known protein-Ab complexes through the
following knowledge-based approach. We began by creating multiple
structure alignment of Ab structures using SKA (21, 22). Since the light
and heavy chains have different CDRs, we conducted two different multiple structure alignments, one for all heavy chains and one for all light
chains. Additionally, since our dataset included several redundant sequences, we ran the structural alignment program on a sequence-unique
subset of all protein-Ab complexes. As Ab sequences are highly similar
to each other, the criteria for the redundancy of the complex set was
determined by the Ag sequences; sequence redundancy was reduced at
HSSP-values of 0 (corresponding to ⬍33% pairwise sequence identity
for long alignments) (23, 24). Then, we identified structurally aligned
positions that interact with a protein in ⬎10% of the complexes of the
alignment. The choice of the cutoff was based on analysis the number
of contacting positions that were structurally aligned in the non-CDR
part of the Abs. These highly populated positions were defined by us as
the boundaries of the CDRs.
After we identified the CDRs in the aligned Abs we transferred the
location of those CDRs to the Ab chains of the family that they represent
by structural pairwise alignment using combinatorial extension (25) (Fig.
1). Finally, we defined all the residues on the protein surface that are in
contact with the residues on the Ab CDRs as antigenic residues.
Content statistics
Overall, our analysis was based on 140 Ags from protein-Ab complex
structures with a current total of 10,180 interactions. From this set we
generated a sequence-unique set of Ags, that is, a list of Ags such that Ag
in the dataset has a level of sequence similarity that would enable coarsegrained homology modeling of another Ag in the dataset. All the data are
available at the Epitome database (26).
Secondary structure, solvent accessibility, and protein interfaces
Ag residue secondary structure state and solvent accessibility were computed using DSSP (Dictionary of Secondary Structure of Proteins) (27)
(one-letter code; G, H, and I correspond to helical structures, E and B to
strands, and T, S, and L to other). The protein interfaces were extracted
from PDB as described elswhere (20).
Results
Using the alignments of the Ab chains we found, both on the heavy
and on the light chains, three sequence stretches that contact the Ag in
all structures in the alignments. Individual positions in these stretches
were in contact with the Ag in up to 98% of the aligned Abs. This
figure was higher for positions closer to the center of the stretch and
lower toward the boundaries. We noticed that there was no position
outside of these stretches that created contacts with the Ag in ⬎7% of
the Abs. Therefore, to set a stringent definition of boundaries of the
CDRs, we defined the first and last position of a CDR as the position
where ⬍10% of the aligned Abs created contacts with the Ag.
Our automatically identified CDRs are typically only in partial agreement with other definitions of CDRs. Looking at all
available protein-Ab complexes we were able to characterize
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
structural information (10, 11). Analyzing a small number of experimentally solved three-dimensional structures of Abs, they defined the hypervariable loops in terms of their secondary structure
and the sequence position where each of them starts and ends.
However, as more experimentally solved structures became available, the analysis was run again and the boundaries of the CDRs
were changed (10, 12). These new definitions were still based
mostly on manual analysis of structures and are expected to be
changed with the ever-growing availability of experimentally determined three-dimensional structures of Abs. The different definitions of CDRs are also based on different definitions of secondary structures, thereby increasing the inconsistency in defining
hypervariable loops. To overcome this problem, the contact definition of CDRs (13) uses complexes of Ag and Abs to identify the
residues that bind the Ag. A pitfall of this approach may be the fact
that Abs can create nonantigenic interaction with proteins. Thus, if
the interaction between the Ab and a protein is not part of a specific antigenic response, the residues that bind the protein may not
be part of the CDR. Additional analysis of the advantages and
disadvantages of sequence- and structure-based methods is available elsewhere (www.bioinf.org.uk/abs/).
Several studies have used the increasing number of protein-Abs
in the Protein Data Bank (PDB) (14) to characterize antigenic interactions (1, 5, 6, 12, 13, 15–17). Based on the current definitions
of CDRs, these studies have attempted to conclude the general
properties of CDRs and the general characteristics of antigenic
interactions. Two recent studies have suggested that the amino acid
composition and the length of CDRs determine the type of Ag it
can bind (6, 15).
Identifying the CDRs on the Ab side of the interaction is, naturally, only half of the story of the antigentic interaction. The
properties of epitopes, namely the structural elements on the Ag to
which the CDR bind, could be studied only once the CDRs are
identified. Several studies have taken this approach to characterize
the epitopes on the antigenic sites (5, 16, 17). The results of these
studies were rather inconsistent. Among the possible reasons for
these inconsistencies are the small datasets used by some studies
and the significant differences in the methodologies. Most importantly, the definitions of the CDRs often differ greatly; that is, if
two studies investigate the same PDB complex and use the same
methodology, they might disagree on which of the interactions are
antigenic (9, 16).
Chothia et al. (10) suggested that some of the residues within
CDRs may not create contacts with the Ag but are important to
maintain the “canonical backbone conformation” of the Ab. MacCallum and coworkers observed that the hypervariable loops of
CDRs only adopt a limited number of backbone conformations
that are determined by a few key residues (13). Analyzing several
antibody-Ag complexes, they defined different types of Ags (haptens, peptides, proteins) and the typical positions of each CDR to
which they bind.
In essence, every attempt to define CDRs is based on comparing
the common denominators of different Abs. The amount of structural data for Abs available today does not allow for careful, manual analysis. To benefit from all available Ab information we developed procedure for the identification of CDRs using all
available protein-Ab complexes. As described in Fig. 1, we structurally align all available Abs (separating heavy and light chains)
to each other. Then, we mark the positions that are in contact with
the Ag in each Ab structure. Next, we search for structurally
aligned residues that create contacts with the Ag in most known
complexes. By thus defining CDRs, the procedure also identifies
the epitopes on the antigenic protein in the complex, allowing for
large-scale analysis of B cell epitopes.
6231
6232
AUTOMATED IDENTIFICATION OF CDRs AND B CELL EPITOPES
the amino acid composition of CDRs. All experimental datasets, particularly those derived from PDB, are biased in more
than one way: PDB is biased toward proteins that could be
expressed, purified, crystallized, phased, and so forth. It is also
biased by the interest of the researchers and funding agencies
and by the simplicity and cost of certain proteins and systems.
In the case of Abs, this bias may include biases toward certain
species (it is easier to obtain rodent Abs than human ones).
These biases should be kept in mind when looking at the result
of any bioinformatics analysis. However, by pooling all the data
together, reducing sequence homology, and focusing only on
the CDRs and epitopes, we were hoping to identify signals that
are stronger than any specific bias.
To the best of our knowledge, this is the first study that offers
a comprehensive description of CDRs based on all available
structural data. Earlier studies have suggested that the amino
acid composition of CDRs is peculiar (6). We found that these
peculiarities are more striking than previously thought; that is,
four amino acids account for 55% of the CDR residues: tyrosine
(24%), serine (12%), asparagine (10%), and tryptophan (9%).
Ten other amino acids account for ⬍10% of the CDR (C, M, Q,
K, A, P, V, E, H, L). Fig. 2 shows under- and overrepresentation
of all types of residues in CDRs (with respect to their frequencies in proteins in general). Almost all residues are underrep-
resented in all CDRs. There are only two residues that are
present in all known CDRs: tryptophan and tyrosine. Interestingly, glycine, histidine, and aspartic acid show different preferences in CDRs on the L chain and in CDRs on the H chain.
This is most obvious in the case of aspartic acid, which is substantially underrepresented in all the CDRs on the L chain but
is overrepresented in all the CDRs on the H chain. This pattern
is very different than that of glutamic acid (universally underrepresented) or asparagine (mostly overrepresented). Phenylalanine is underrepresented in CDRs 1 and 2 on both types of
chains but is overrepresented in CDR 3 on both chains. The
reverse is true for arginine.
The meticulous identification of CDRs led, naturally, to a better
identification of the antigenic epitopes on the antigenic side of the
interaction. Thus, we were able to analyze the general characteristics of B cell epitopes. Fig. 3 shows under- and overrepresentation of all types of residues in epitopes found in a nonredundant
set of antigenic proteins in PDB. The results are compared with
under- and overrepresentation of residues in all protein-protein
interfaces. Epitopes are not a simple subset of all protein-protein interfaces: that is, other than histidine, no residue shows
similar behavior in both types of interfaces. Aliphatic-hydrophobic residues (A, I, L, V) are strongly underrepresented. However, glycine, which is aliphatic but not hydrophobic, and some
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
FIGURE 1. Structure-based identification of antigenic contacts. A, We automatically extracted all complex structures containing immunoglobulins and
other proteins from the PDB. The contacts between the immunoglobulins and the antigenic proteins may be antigenic (i.e., involving the CDRs) or
nonantigenic. B, After manually removing complexes with T cell receptors we performed multiple structure alignments of Ab chains from a unique set of
complexes. C, We then mapped the interacting residues of the corresponding epitope residues. A residue in one of the Abs that was in contact with the
Ag in the original PDB complex is marked in this illustration with a filled circle. D, We searched for aligned positions that systematically create contacts
with the Ag in our dataset (light blue filled circles) and disregarded positions that contact the Ag only sporadically (red filled circles). Looking at a sequence
representation of the structural alignment, we found sequence stretches that are in contact with the Ag in all known complexes. We defined those as the
CDRs. We marked the boundaries of the CDRs as the positions that have at least 10% of their residues interacting with an Ab chain (see Results). E, Lastly,
we marked the locations of the CDRs on all Abs in PDB.
The Journal of Immunology
6233
FIGURE 2. Amino acids in CDRs.
For each amino acid we measured
whether it is over- or underrepresented with respect to its propensity in
proteins in general. A positive bar indicates overrepresentations and a negative bar indicates underrepresentation. This was done for three CDRs on
the light chains and three on the heavy
chains.
FIGURE 3. Amino acids in epitopes and in protein-protein interfaces.
For each amino acid we measured whether it is over- or underrepresented
in epitopes and in protein-protein interfaces with respect to its propensity
in protein structures in general. Similar to Fig. 2, the y-axis is log (frequency
in the interface/frequency in PDB). A positive bar indicates overrepresentation and negative bar indicates underrepresentation.
The most dramatic difference in this comparison is in the underrepresentation of helices and, although to a lesser extent, the
overrepresentation of strands in epitopes. Overall, it is possible
to generalize and say that with respect to the exposed residues,
CDRs prefer to bind strands and shun helices.
On average, residues involved in protein-protein interactions are
slightly more conserved than all other surface residues (28, 29).
However, as shown in Fig. 5, we found that this is not the case for
antigenic residues. Surface antigenic residues are less conserved
than other surface residues. This may be due to the fact that the
immune system avoids determinants that resemble self-proteins,
and this is more likely to bind to the nonconserved elements of the
Ag. In contrast, sometimes identifying conserved epitopes could
be an advantage if the epitope is conserved among different pathogens but have no homolog in the host. For example, Jun a1 is a
protein found in mountain cedar pollen (30). When humans are
infected by this protein an allergic reaction occurs and Abs bind
the protein. Interestingly, two-thirds of the epitopes of this protein
are conserved throughout the family and were also found to be
functional in its homolog, Cry j. In this example, the immune
system is efficient: by binding a conserved epitope it can be triggered when any of the proteins of this family penetrates the organism (30).
FIGURE 4. Secondary structure content of Ags. We compared the secondary structure content of antigenic residues to four other residue subsets:
a nonredundant version of the entire PDB, buried residues, surface residues, and residues in protein-protein interfaces. While Ags have the highest
loop content, they are also relatively enriched with strands compared with
surface residues and residues that are involved in other protein-protein
interactions.
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
residues that are hydrophobic but not aliphatic (C, M, P, W) are
overrepresented. Radical differences between the composition of
protein-protein interfaces and epitopes are observed in several residues: proline is highly overrepresented, probably due to its tendency to break secondary structure elements (note that Ags are
highly enriched with loop regions). Tyrosine is overrepresented in
protein-protein interfaces but is underrepresented in epitopes,
while the reverse holds for cysteine.
Fig. 4 shows the secondary structure composition (i.e., what
fraction of the residues are in each secondary structure state) of
the epitope residues. We compare the secondary structure composition of epitopes to that of: 1) the general population of
protein-protein interfaces, 2) PDB in general, 3) residues that
are buried in the core of known structures, and 4) residues that
are exposed to the solvent. It is clear that epitopes are very
different from any of these sets of residues. It has been claimed
before that the intrinsic flexibility of surface loops plays an
important role in the ability of Abs to recognize and bind them
(17). Indeed, we see that residues that are neither in helices nor
in strands constitute the largest group of secondary structure
states in epitopes. However, this enrichment (52%) is only marginally higher than their enrichment in surface residues (49%).
6234
AUTOMATED IDENTIFICATION OF CDRs AND B CELL EPITOPES
Acknowledgments
Thanks to Jinfeng Liu and Guy Yachdav (Columbia University) for computer assistance, and to Guy Nimrod (Tel Aviv University) for helpful
comments on the manuscript.
Disclosures
Discussion
The structural elements within a protein to which Abs bind are
commonly dubbed B cell epitopes. As opposed to the T cell
epitopes, which are processed by the cell and are presented to the
immune system as short peptides, B cell epitopes are structural
elements of a whole protein, typically a folded one, and could be
continuous or discontinuous in sequence (31). While the study of
T cell epitopes is striving and has led to numerous tools for prediction of T cell epitopes (32), the study of B cell epitopes, let
alone prediction thereof, is lagging behind (33). Identifying B cell
epitopes is, in essence, a problem in structural biology. To learn
the general characteristics of CDRs one needs to explore the elements in the Abs that bind to the B cell epitopes. Since both the Ab
and the Ag may also create nonantigenic interactions, this may
seem like a circular task: to identify the epitopes one needs to first
identify the CDRs, but to identify the CDRs one needs to first
identify the epitopes. Our solution is based on two notions: 1) the
overall structure of all Abs is highly similar, and 2) the CDRs
occupy only a tiny part of this structure. Thus, by structurally
aligning the Ig scaffold of the Abs we could identify the CDRs.
Indeed, given the hypervariability of CDRs, one may hypothesize
that the CDRs themselves may be very different from each other
structurally. However, since the CDRs are only a small fraction of
the structure, using global structural alignment (i.e., seeking to
minimize the overall root mean square deviation rather than
searching local similarities) it is possible to align the CDRs without relying on their similarity to each other. This approach enables
The authors have no financial conflicts of interest.
References
1. Jones, S., and J. M. Thornton. 1997. Prediction of protein-protein interaction sites
using patch analysis. J. Mol. Biol. 272: 133–143.
2. Lo Conte, L., C. Chothia, and J. Janin. 1999. The atomic structure of proteinprotein recognition sites. J. Mol. Biol. 285: 2177–2198.
3. Chen, R., J. Mintseris, J. Janin, and Z. Weng. 2003. A protein-protein docking
benchmark. Proteins 52: 88 –91.
4. Jones, S., and J. M. Thornton. 1997. Analysis of protein-protein interaction sites
using surface patches. J. Mol. Biol. 272: 121–132.
5. Jones, S., and J. M. Thornton. 1996. Principles of protein-protein interactions.
Proc. Natl. Acad. Sci. USA 93: 13–20.
6. Collis, A. V., A. P. Brouwer, and A. C. Martin. 2003. Analysis of the antigen
combining site: correlations between length and sequence composition of the
hypervariable loops and the nature of the antigen. J. Mol. Biol. 325: 337–354.
7. Wu, T. T., and E. A. Kabat. 1970. An analysis of the sequences of the variable
regions of Bence Jones proteins and myeloma light chains and their implications
for antibody complementarity. J. Exp. Med. 132: 211–250.
8. Johnson, G., and T. T. Wu. 2000. Kabat database and its applications: 30 years
after the first variability plot. Nucleic Acids Res. 28: 214 –218.
9. Allcorn, L. C., and A. C. Martin. 2002. SACS: self-maintaining database of
antibody crystal structure information. Bioinformatics 18: 175–181.
10. Chothia, C., A. M. Lesk, A. Tramontano, M. Levitt. S. J. Smith-Gill, G. Air,
S. Sheriff, E. A. Padlan, D. Davies, and W. R. Tulip. 1989. Conformations of
immunoglobulin hypervariable regions. Nature 342: 877– 883.
11. Chothia, C., and A. M. Lesk. 1987. Canonical structures for the hypervariable
regions of immunoglobulins. J. Mol. Biol. 196: 901–917.
12. Al-Lazikani, B., A. M. Lesk, and C. Chothia. 1997. Standard conformations for
the canonical structures of immunoglobulins. J. Mol. Biol. 273: 927–948.
13. MacCallum, R. M., A. C. Martin, and J. M. Thornton. 1996. Antibody-antigen
interactions: contact analysis and binding site topography. J. Mol. Biol. 262:
732–745.
14. Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig,
I. N. Shindyalov, and P. E. Bourne. 2000. The Protein Data Bank. Nucleic Acids
Res. 28: 235–242.
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
FIGURE 5. Antigenic residues are less conserved. We looked at the conservation scores of antigenic residues. A, The complex structure of an Ab and
the common ␤-chain receptor of IL-3, IL-5, and GM-CSF (PDB 1egj) is presented. We used Consurf server to assess the conservation of the Ag; conserved residues are colored in magenta and nonconserved residues are colored
in cyan. B, Plot of the residue conservation of antigenic residues (in blue)
against all other exposed residues (in red) of proteins from Epitome database.
a large-scale, continuous analysis of all known CDRs and all
known epitopes. A database in which all of these data are curated
is described elsewhere (26).
To our knowledge, this is the first method for objective and fully
automated identification of CDRs. It relies on protein structure and
not on sequence considerations. The results presented herein are
the most comprehensive large-scale quantification of the peculiarities of both CDRs (e.g., the fact that only four amino acids account for most of the CDRs) and B cell epitopes (e.g., their peculiar secondary structures), thus providing a firm ground for the
distinction between CDRs/epitopes and other molecular interfaces.
This knowledge has vast potential applications. It could be useful
when one attempts to identify the epitopes used by a specific Ab.
It provides the basis for selecting residues to mutate in mutation
analysis of epitopes. The distribution of amino acids in germline
sequences encoding the CDRs may be different than that of other
genomic sequences. This may account for part of the peculiarities
we report herein. However, the amino acids we identify as rare can
be found in the germline, but in reading frames that are typically
selected against (34). Therefore, to fully understand the unusual
composition of CDRs, one needs to understand the selective pressure against them. Attempts have been made to force use of alternative amino acids in CDRs, and immune responses appear to be
altered (34).
Furthermore, this dataset could serve as the basis for a computational method for predicting putative epitopes in a given protein,
a task that has been shown to be one of the hardest challenges of
molecular bioinformatics (35). Finally, the distinctions we made
herein may serve as the basis for tools that will improve the design
of Abs and epitopes.
The Journal of Immunology
26. Schlessinger, A., Y. Ofran, G. Yachdav, and B. Rost. 2006. Epitome: database of
structure-inferred antigenic epitopes. Nucleic Acids Res. 34: D777–D780.
27. Kabsch, W., and C. Sander. 1983. Dictionary of protein secondary structure:
pattern recognition of hydrogen-bonded and geometrical features. Biopolymers
12: 2577–2637.
28. Ofran, Y., and B. Rost. 2007. Protein-protein interaction hotspots carved into
sequences. PLoS Comput. Biol. 3: e119.
29. Neuvirth, H., R. Raz, and G. Schreiber. 2004. ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J. Mol.
Biol. 338: 181–199.
30. Midoro-Horiuti, T., C. H. Schein, V. Mathura, W. Braun, E. W. Czerwinski,
A. Togawa, Y. Kondo, T. Oka, M. Watanabe, and R. M. Goldblum. 2006. Structural basis for epitope sharing between group 1 allergens of cedar pollen. Mol.
Immunol. 43: 509 –518.
31. Barlow, D. J., M. S. Edwards, and J. M. Thornton. 1986. Continuous and discontinuous protein antigenic determinants. Nature 322: 747–748.
32. Brusic, V., V. B. Bajic, and N. Petrovsky. 2004. Computational methods for
prediction of T-cell epitopes: a framework for modelling, testing, and applications. Methods 34: 436 – 443.
33. Greenbaum, J. A., P. H. Andersen, M. J. Blythe, H. H. Bui, R. E. Cachau,
J. Crowe, M. Davis, A. S. Kolaskar, O. Lund, et al 2007. Towards a consesus on
datasets and evaluation metrics for developing B-cell epitope prediction tools.
J. Mol. Recognit. 20: 75– 82.
34. Ippolito, G. C., R. L. Schelonka, M. Zemlin, I. I. Ivanov, R. Kobayashi,
C. Zemlin, G. L. Gartland, L. Nitschke, J. Pelkonen, K. Fujihashi, et al. 2006.
Forced usage of positively charged amino acids in immunoglobulin CDR-H3
impairs B cell development and antibody production. J. Exp. Med. 203:
1567–1578.
35. Blythe, M. J., and D. R. Flower. 2005. Benchmarking B cell epitope prediction:
underperformance of existing methods. Protein Sci. 14: 246 –248.
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
15. Almagro, J. C. 2004. Identification of differences in the specificity-determining
residues of antibodies that recognize antigens of different size: implications for
the rational design of antibody repertoires. J. Mol. Recognit. 17: 132–143.
16. Davies, D. R., and G. H. Cohen, 1996. Interactions of protein antigens with
antibodies. Proc. Natl. Acad. Sci. USA 93: 7–12.
17. van Regenmortel, M. H. V. 1992. Structure of Antigens. CRC Press, Boca
Raton, FL.
18. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and
D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389 –3402.
19. Murzin, A. G., S. E. Brenner, T. Hubbard, and C. Chothia. 1995. SCOP: a structural classification of proteins database for the investigation of sequences and
structures. J. Mol. Biol. 247: 536 –540.
20. Ofran, Y., and B. Rost. 2003. Analysing six types of protein-protein interfaces.
J. Mol. Biol. 325: 377–387.
21. Petrey, D., Z. Xiang, C. L. Tang, L. Xie, M. Gimpelev, T. Mitros, C. S. Soto,
S. Goldsmith-Fischman, A. Kernytsky, A. Schlessinger, et al. 2003. Using multiple structure alignments, fast model building, and energetic analysis in fold
recognition and homology modeling. Proteins 53 (Suppl. 6): 430 – 435.
22. Petrey, D., and B. Honig. 2003. GRASP2: visualization, surface properties, and
electrostatics of macromolecular structures and sequences. Methods Enzymol.
374: 492–509.
23. Mika, S., and B. Rost. 2003. UniqueProt: creating representative protein sequence
sets. Nucleic Acids Res. 31: 3789 –3791.
24. Rost, B. 1999. Twilight zone of protein sequence alignments. Protein Eng. 12:
85–94.
25. Shindyalov, I. N., and P. E. Bourne. 1998. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 11:
739 –747.
6235