Antibody and Antigen Contact Residues Define Epitope and

Antibody and Antigen Contact Residues
Define Epitope and Paratope Size and
Structure
This information is current as
of June 17, 2017.
References
Subscription
Permissions
Email Alerts
J Immunol 2013; 191:1428-1435; Prepublished online 24
June 2013;
doi: 10.4049/jimmunol.1203198
http://www.jimmunol.org/content/191/3/1428
http://www.jimmunol.org/content/suppl/2013/06/24/jimmunol.120319
8.DC1
This article cites 29 articles, 11 of which you can access for free at:
http://www.jimmunol.org/content/191/3/1428.full#ref-list-1
Information about subscribing to The Journal of Immunology is online at:
http://jimmunol.org/subscription
Submit copyright permission requests at:
http://www.aai.org/About/Publications/JI/copyright.html
Receive free email-alerts when new articles cite this article. Sign up at:
http://jimmunol.org/alerts
The Journal of Immunology is published twice each month by
The American Association of Immunologists, Inc.,
1451 Rockville Pike, Suite 650, Rockville, MD 20852
Copyright © 2013 by The American Association of
Immunologists, Inc. All rights reserved.
Print ISSN: 0022-1767 Online ISSN: 1550-6606.
Downloaded from http://www.jimmunol.org/ by guest on June 17, 2017
Supplementary
Material
James W. Stave and Klaus Lindpaintner
The Journal of Immunology
Antibody and Antigen Contact Residues Define Epitope and
Paratope Size and Structure
James W. Stave and Klaus Lindpaintner1
M
onoclonal Abs (mAbs) are important tools, diagnostic
reagents, and therapeutic drugs. High-value commercial
diagnostic, functional, and therapeutic mAbs must recognize full-length native proteins folded in native conformations as
they exist in patients or patient samples. Given their potential value,
strategies for developing mAbs to native protein Ags is of intense
interest; however, the underlying biology dictating performance
makes the task of developing such Abs both complicated and costly
(1). For an Ab to exert a desired biological effect, it is usually the
case that it must bind to a specific, predefined site. In such cases,
B cell epitope prediction algorithms are of limited value because
the Ab must bind to the predefined site regardless of its inherent
immunogenicity. Of paramount importance is binding at the target
site to the three-dimensional conformation the protein exists in
under the conditions where the Ab must function.
The most common method for developing Abs is to inject purified
protein into animals and isolate mAbs that bind to native proteins. A
difficulty with this approach is the preparation of protein of suitable
purity and functional conformation. For decades, researchers have
attempted to use peptides as surrogates of full-length native proteins
with little success (2). Anti-peptide Abs react well to their cognate
peptides and may be useful for detecting linear epitopes in techniques like Western blot, but Abs raised to unstructured peptides
more often than not do not recognize the same sequence in native
protein (3, 4).
Antibody Discovery Research and Development, SDIX, Inc., Newark, DE 19702
1
Current address: Thermo Fisher Scientific Inc., Analytical Technologies, Waltham,
MA.
Received for publication November 19, 2012. Accepted for publication May 23,
2013.
Address correspondence and reprint requests to Dr. James W. Stave, SDIX, Inc., 111
Pencader Drive, Newark, DE 19702. E-mail address: [email protected]
The online version of this article contains supplemental material.
Abbreviations used in this article: CR, contact residue; CRR, contact residue region;
CRS, contact residue span; PDB, Protein Data Bank.
Copyright Ó 2013 by The American Association of Immunologists, Inc. 0022-1767/13/$16.00
www.jimmunol.org/cgi/doi/10.4049/jimmunol.1203198
Purified, full-length native and recombinant proteins are useful
for eliciting Abs that bind native proteins, but are often difficult
to prepare in quantities and qualities sufficient for immunization
and selection of mAbs. In addition, immunization with full-length
proteins results in a polyclonal response where the majority of
Abs are directed to a limited number of immunodominant sites of
the protein (5, 6). In cases where the predefined site of interest
is nonimmunodominant, or even nonimmunogenic, it is very difficult to isolate mAbs with the required performance specifications.
Thus, there is a critical need for a means of directing Ab development to predefined functional epitopes of native protein Ags regardless of their inherent immunogenicity.
In the case of therapeutics, once candidate mAbs have been
isolated with the requisite binding characteristics, the CDR can be
taken from the isolated mAb (7) and inserted into engineered Ab
constructs having desired effecter function and characterized immunogenicity and toxicity profiles. It has been reported for six
protein Ags that only a fraction of the CDR residues participate in
binding (8), and that Ab contact residues (CRs) in nine crystal
structures do not precisely correspond with residues modified by
somatic hypermutation in Vk (9). It is critical to identify specific
residues involved in contacting Ag and understand how they are
organized within CDR to engineer Ab reactivity. Thus, the native
three-dimensional structure of both Ag epitope and Ab paratope are
fundamental to the success of developing and engineering highperformance functional and therapeutic mAbs.
With these considerations in mind, we undertook the analysis of
a large number of protein–Ab x-ray crystal structures to better
understand both epitope and paratope size, structure, and interaction. A number of researchers have reported identifying CR of Abs
and Ags by analysis of publicly available crystal structures (10–12).
A large body of work in this field is aimed at identifying CR as a
means of developing algorithms for predicting B cell epitopes (10,
13–17), whereas others have focused on analyzing or predicting
CR in CDR of Abs (8, 18–20). The majority of the existing studies
define two residues to be in contact when they are within a specified
distance of each other. Many researchers use 4 Å as the definition of
Downloaded from http://www.jimmunol.org/ by guest on June 17, 2017
A total of 111 Ag–Ab x-ray crystal structures of large protein Ag epitopes and paratopes were analyzed to inform the process of
eliciting or selecting functional and therapeutic Abs. These analyses illustrate that Ab contact residues (CR) are distributed in three
prominent CR regions (CRR) on L and H chains that overlap but do not coincide with Ab CDR. The number of Ag and Ab CRs per
structure are overlapping and centered around 18 and 19, respectively. The CR span (CRS), a novel measure introduced in this
article, is defined as the minimum contiguous amino acid sequence containing all CRs of an Ag or Ab and represents the size of
a complete structural epitope or paratope, inclusive of CR and the minimum set of supporting residues required for proper
conformation. The most frequent size of epitope CRS is 50–79 aa, which is similar in size to L (60–69) and H chain (70–79) CRS.
The size distribution of epitope CRS analyzed in this study ranges from ∼20 to 400 aa, similar to the distribution of independent
protein domain sizes reported in the literature. Together, the number of CRs and the size of the CRS demonstrate that, on average,
complete structural epitopes and paratopes are equal in size to each other and similar in size to intact protein domains. Thus,
independent protein domains inclusive of biologically relevant sites represent the fundamental structural unit bound by, and
useful for eliciting or selecting, functional and therapeutic Abs. The Journal of Immunology, 2013, 191: 1428–1435.
The Journal of Immunology
1429
of predefined functional sites represent the basic structural unit
useful for immunization, screening, and selection of functional
and therapeutic mAbs.
Materials and Methods
A list of 110 unique Ab–Ag x-ray crystal structures representing 111 unique
Abs (structure 3LOH contains 2 different Abs bound to the same Ag at
different sites) was manually curated from the Protein Data Bank (PDB)
(24). To provide information regarding the full extent of Ag–Ab interaction,
only structures containing $85 Ag aa were included in the analysis. Each Ab
in the data set occurs only once. The resolution of all structures is #3.8 Å.
The amino acid sequence for each Ab was numbered using Abysis and the
Kabat convention (25) (http://www.bioinf.org.uk/abysis). CRs are defined as
Ag and Ab amino acid residues within 4 Å of each other (10). CRs were
identified from PDB x-ray crystal structures using the Cn3D tool and Molecular Modeling Database (21), found at the National Center for Biotechnology Information Web site (http://www.ncbi.nlm.nih.gov/Structure/CN3D/
cn3d.shtml). In some analyses, CR at other distances are also presented. The
majority of the Abs analyzed in this article are described by terms denoting
specific function including “neutralizing” (e.g., 3GBN, 3SE8), “therapeutic”
(e.g., 1N8Z, 1YY9), “inhibitory/antagonistic” (e.g., 3L95, 3HI6), “blocking”
(e.g., 2QQN), “stimulating” (e.g., 3GO4), “mitogenic” (e.g., 1YJD), and
“protective” (e.g., 3ETB).
Results
Ag and Ab CR
The list of 110 unique Ab–Ag x-ray crystal structures analyzed in this
work is presented in Table I. A summary of the amino acids analyzed
in this study is presented in Table II. The data set is composed of
78,666 aa, of which 5.3% are identified as CR. Ab amino acids represent 61% of the total and are divided almost equally between
heavy (IgH) and light (IgL) chains. About half of all CRs belong to
Ab (52%) and 65% of these lie within the H chain. Overall, there
are ∼7 CRs per L chain, 13 per H chain, and 18 per Ag; however, there
is significant variation between structures. The largest number of
CRs in a single structure, including both Ag and Ab, is 101 (1BGX),
whereas the smallest is 20 (2EXW).
The frequency distribution of numbers of CR in all 111 Ag–Ab
structures is shown in Fig. 1. CRs are distributed around ∼18–19 for
both Ag and Ab, and the distributions exhibit significant overlap.
Whereas Fig. 1 illustrates that the number of CRs across all 111
structures is distributed in a similar and overlapping manner for both
Ag and Ab, there is only a modest correlation between the number
of Ag and Ab CRs within individual structures (Pearson correlation
coefficient, r2 = 0.5394).
The ratio of Ab to Ag CR for each of the 111 structures is shown in
Fig. 2 and is represented as the log(Ab CR/Ag CR). Negative values
result when there are more CR in the Ag (36 structures), and positive
values result when there are greater numbers of Ab CR (59 structures).
Sixteen structures have equal numbers of Ag and Ab CRs. The
structure having the highest ratio in the data set is 2J88 (21 Ab CRs,
10 Ag CRs), and the lowest is 1ADQ (9 Ab CRs, 16 Ag CRs).
Table I. Ag–Ab crystal structures analyzed
PDB Structures Analyzed
1ADQ
1AFV
1BGX
1CZ8
1E6J
1EGJ
1EZV
1FJ1
1FNS
1FSK
1I9R
1IQD
1JPS
1JRH
1LK3
1N8Z
1NCD
1NSN
1OAZ
1OB1
1ORQ
1OSP
1QFU
1RJL
1S78
1V7M
1VFB
1WEJ
1YJD
1YNT
1YQV
1YY9
1Z3G
1ZTX
2ADF
2ARJ
2DD8
2EXW
2FD6
2GHW
2I9L
2J88
2JEL
2NY1
2NZ9
2Q8B
2QQL
2QQN
2R29
2R56
2VIS
2VXT
2W9E
2XQB
2XWT
2ZCH
3B9K
3BSZ
3C09
3CSY
3CVH
3D85
3D9A
3EOA
3ETB
3FMG
3G04
3G6J
3GBN
3GI8
3GRW
3H3B
3H42
3HB3
3HI6
3HMX
3I50
3IU3
3K2U
3KJ4
3L5X
3L95
3LDB
3LEV
3LIZ
3LOH
3LZF
3MA9
3MJ9
3MXW
3N85
3NH7
3NID
3NPS
3O2D
3OR6
3R1G
3RAJ
3RKD
3S35
3S37
3SE8
3SM5
3SOB
3T2N
3TT1
3TT3
3VG9
4D9R
4DAG
Downloaded from http://www.jimmunol.org/ by guest on June 17, 2017
CR (10, 14–17), and the Molecular Modeling Database (21), at the
National Center for Biotechnology Information, defines protein interactions using a 4-Å cutoff. One group has published several
works using 6 Å (13, 20, 22) and reported that their conclusions
were unaffected by cutoffs ranging from 4 to 6 Å (20).
Haste Andersen et al. (14) identified epitope CR using a 4-Å cutoff
in 76 crystal structures composed of Ags .25 aa (29 of the structures
were variants of lysozyme). Using this approach, these researchers
were able to determine the number of CRs per epitope, the maximum
continuous sequence of CR within each epitope, and the distribution
of continuous CR per segment. These results highlighted the discontinuous nature of protein Ag epitopes.
Padlan et al. (8) determined Ab CR in six crystal structures with
protein Ags and compared the sequence positions with CDR defined
by nucleic acid sequence variability (23). MacCallum et al. (12)
determined Ab CR sequence positions for 7 protein–Ab structures
(5 of which are lysozyme), and Almagro (19) determined numbers
and sequence positions of VL and VH CR for 19 protein Ags. Besides analyzing only a limited number of protein–Ag structures, these
studies did not report on CR sequence positions outside established CDRs. Recently, Kunik and coworkers (20) identified Ab
CR in 69 Ag–Ab structures with proteins $5 aa using a 6-Å cutoff
and reported that ∼20% of CRs fall outside CDR. It is not clear
how many of these structures contained large protein Ags, and the
authors did not report sequence positions of the identified CR.
The existing studies define CR in different ways, and the number
of unique structures analyzed is small. In addition, the number of Ag
amino acids in the crystals are often not reported or are too few to
draw conclusions about the nature of Ab binding to large protein Ags.
Furthermore, some studies report only on Ag CR, whereas others
focus solely on Ab.
In this article we apply a uniform analysis to both Ag and Ab
proteins in a large number of x-ray crystal structures containing
relatively large numbers of Ag amino acids ($85). The number of
structures analyzed is significantly greater than previous studies,
revealing the extent of variability and enabling generalization of the
underlying structural architecture that supports epitopes and paratopes. For each structure, we identify numbers of CR and CR span
(CRS) for both Ag and Ab, and calculate CR ratios between Ag and
Ab, as well as H and L chains. In addition, we identify the sequence
positions of all Ab CRs and map them against CDR revealing CR
regions (CRRs) that are similar to, but do not coincide with, CDR.
A significant observation of these analyses is that CR of both
Ag and Ab are distributed in long, continuous linear sequences
(CRS) that are similar in size to each other, and similar to the size
of complete protein domains. For the first time, to our knowledge,
we define complete structural epitopes as the surface CR and the
minimum set of underlying contiguous amino acids required to
form the intact native structure. On average, complete epitopes are
the size of protein domains, and intact protein domains inclusive
1430
EPITOPE AND PARATOPE STRUCTURE
Table II. Summary of amino acids analyzed
CR 4 Å
Structure
Total aa
Total
Mean
Median
Maximum
Minimum
IgL
IgH
All Ab
Ag
Ag+Ab
23,470
24,384
47,854
30,812
78,666
751
1,397
2,148
2,002
4,150
7
13
19
18
37
6
12
19
18
37
27
26
49
52
101
0
4
9
8
20
n = 111 Ag–Ab interactions.
Ag CRS
FIGURE 1. Frequency of CR in Ag–Ab structures. The number of CRs
in the entire data set per group is given in parentheses.
FIGURE 2. The distribution of the ratio of Ab to Ag CR for 111 x-ray
crystal structures.
longest contiguous sequence of CR being 5 (Fig. 4). These findings
are in good agreement with previous work (14). The 1FSK structure
is an example of an epitope containing a long, contiguous sequence
of 12 CRs (i.e., a linear peptide); however, the complete epitope
contains a total of 17 CR spread over a CRS of 56 residues. Thus,
Ag epitopes are discontinuous individual and peptide CRs contained
within protein regions ranging from ∼20 to 400 aa.
The size and distribution of native mAb neutralization epitopes
is illustrated in Fig. 5. The number of CRs for 9 HIV-1 gp120
neutralizing Abs range from 13 to 39 and are contained within long
CRS from 194 to 384 aa. Many of these mAbs contact the same
amino acid positions and the SUM line is a “heat map” indicating
the number of different neutralizing mAbs contacting residues at
each position. HIV-1 gp120 residues 281, 365, 366, 367, 368, 419,
421, and 473 are contacted by five or six of the mAbs. Together,
these 8 CRs represent the core of an epitope recognized by multiple broadly neutralizing mAbs spread over a linear amino acid
sequence of 193 residues.
Although each CR epitope is unique, the number of CRs and
pattern of sequence positions may be useful for grouping neutralization epitopes into classes. Structures 2NY1 and 3JWD have relatively few CRs located at almost identical positions and contained
within a 319-aa CRS. Similarly, structures 3NGB, 3SE9, 3SE8, and
FIGURE 3. The distribution of Ag epitope CRS determined at 3.5, 4.0,
and 4.5 Å. The number of structures included in each distribution is in
parentheses. Only crystal structures having resolution less than the value
used to determine CR are included. Figure annotations are based on the
4.0-Å distribution.
Downloaded from http://www.jimmunol.org/ by guest on June 17, 2017
In addition to counting the number of CRs within each structure, the
amino acid sequence position of each CR was also recorded for both
Ag and Ab. From the sequence positions, the CRSs were determined
for all 111 structures. Fig. 3 shows the distribution of the size of the
CRS across all 111 Ags using 3.5, 4.0, and 4.5 Å as the definition of
CR. Although there appears to be a trend toward longer distances
between CRs resulting in longer CRS, the effect is small using the
distances examined in this study. The large majority of Ag structures (86.5%) have CRS ranging from 20 to 229 aa. Only 2 of the
111 structures analyzed have all of the Ag CRs contained within
a CRS of ,20 aa (2J88 and 3BSZ). The most common class of
CRS in this data set is 50–79 residues. The average and median CRS
of all 111 Ag structures is 125 and 95 aa, respectively.
There is a significantly higher average number of CR for large Ag
structures $230 aa (19.8, n = 51), compared with smaller Ags ,230
aa (16.6, n = 60, p , 0.01), and there is a weak but significant
correlation between Ag size and CRS (Pearson correlation coefficient, r2 = 0.273). The relationship is readily seen in the distribution
of average CRS depicted in Supplemental Fig. 1. The epitope CRS
increases proportionally to the size of Ag in the crystal. Also, the
average CRS of Ags $230 aa (179) is significantly greater than Ags
,230 aa (79).
A small but significant percent of the structures (11.7%) have Ag
CRS .229 aa. The 442-aa CRS in structure 3TT3 is the longest in
the data set, whereas the 12-aa CRS in structure 2J88 is the shortest.
The 12-aa CRS containing the 2J88 epitope contains 10 Ag CRs. In
this respect, the 2J88 epitope is the most like a small, contiguous
peptide of all 111 structures. Although 2J88 is the only structure
where the number of CR CRS, the majority of the structures do
indeed have epitope CRs that are clustered together in short, contiguous stretches ranging from 2 to 12 aa, with the most frequent
The Journal of Immunology
FIGURE 4. Frequency distribution of the longest contiguous sequence
of Ag CR in each of the 111 structures analyzed.
Ab CR
The distribution of the number of CRs present within the H and L
chains of each Ab structure is shown in Fig. 6. Overall, 65% of the
Ab CRs identified here are contained within the H chain. The frequency of CRs are distributed around 5–7 per L chain and 11–13
per H chain. The minimum Ig structure retaining both IgH and IgL
variable domains (Fv) is ∼230 aa (∼110 aa VL and ∼120 aa VH).
The 2,148 Ab CRs identified here represent, on average, ∼8.4% of
an intact Fv fragment. The total number of CRs per Ab, however,
varies significantly between structures. Structures 1BGX and 3SE8
have 49 and 32 CRs per Ab, respectively, whereas structure 1ADQ
has only 9 Ab CRs.
The ratio of IgH to IgL CRs for each of the 111 structures is
depicted in Fig. 7. Three of the 111 structures have no CRs within
the L chain (1Z3G, 2I9L, and 3GBN). The overwhelming majority
of the structures (92) have more IgH CR than IgL; however, 11 Ab
structures have more IgL CRs than IgH. The structure having the
highest ratio in the data set is 3EOA (13 IgH and 1 IgL), and the
lowest is 3HB3 (5 IgH and 10 IgL).
A significant observation from these studies is the degree of variability in the number and ratio of CRs between individual structures
within an Ag–Ab complex. Although all the data taken together
suggest a model of Ag–Ab interaction involving equal numbers of
Ab and Ag CR, this is the case in only 16 of the 111 structures
analyzed. Similarly, the composite data indicate that 65% of Ab CRs
reside within H chains, but the IgH to IgL CR ratio within an individual Ab is highly variable ranging from no IgL CRs to twice as
many L chain CRs as H chain. It is commonly reported that the
majority of Ab CRs reside within IgH, and the analyses reported
in this article support these observations; however, there are 11 examples in this data set of Abs having more CRs in IgL than IgH.
Although the large number of structures analyzed in this study does
indeed enable construction of useful models of Ag–Ab interaction,
the significant variability within and between structures limits the
ability to predict individual CRs based on these composite data.
Ab CRS
As with Ag (Fig. 3), Ab CRs, comprising the Ab paratope, are
distributed along the IgH and IgL chains in a noncontiguous
manner. The CRS was determined for all 111 IgL and IgH structures, and the frequency was plotted in Fig. 8. The IgL CRs are
typically contained within a CRS of 60–69 aa, whereas the IgH
CRS is typically 70–79 residues. Abs make contact with Ag at three
discrete regions along each IgH and IgL chain known as CDR, and
the predominant CRS depicted in Fig. 8 represents Ig chains having
CR in both the first and third CDRs. Eleven IgH structures have
CRS $90 aa (e.g., 3VG9). In all 11 of these structures, there is a
CR in amino acid sequence position 1 or 2, which accounts for the
extraordinary length of these CRSs. Similarly, there are 3 IgL structures where amino acid sequence position 1 or 2 is a CR resulting in
CRS of 90–99 aa (e.g., 3SE8).
Ab CR sequence positions
The sequence positions of the Ab CR are depicted in Fig. 9. All
PDB Ab sequences were renumbered according to the Kabat
convention using Abysis. Abysis did not number structures 3SE8
FIGURE 5. CR epitope map of nine HIV-1 gp120 neutralizing mAbs. Red bars indicate gp120 amino acid sequence positions within 3.5 Å of any Ab
residue; yellow bars indicate amino acids within 4 Å. The number of CRs and length of CRSs are based on 4 Å data. The SUM line was constructed by
counting the number of structures having CRs at the indicated positions and applying a color scale (red . yellow . green) to the distribution.
Downloaded from http://www.jimmunol.org/ by guest on June 17, 2017
NIH45-46 all have large numbers of CRs spread over very long CRS
and located at very similar sequence positions. All of the HIV-1 Abs
examined in this study bind and neutralize structurally intact infectious virus illustrating the general phenomenon that functional Abs
contact Ag residues in highly conformationally dependant structures
consisting of long CRS. The identification of CRs recognized by
multiple neutralizing mAbs illustrates the utility of applying a uniform analysis to multiple structures and lends credence to the use of
CR defined as a simple distance between amino acids as a means for
identifying residues of viral pathogens that may be critical determinants of vaccines.
1431
1432
EPITOPE AND PARATOPE STRUCTURE
FIGURE 8. The frequency distribution of the CRS of IgH and IgL
chains. Numbers in parentheses indicate the total number of CRs per group
in the data set of 111 structures.
and 3T2N, and these were omitted from further analysis. The IgH
sequence positions that are most frequently CRs are 100, 52, 31, 97,
and 98. The most frequent IgL CR sequence positions are 32, 92,
91, 30, and 50. IgL positions 32, 92, and 91 are in agreement with
previous findings (9). In this study, we add as predominant CR the
IgL positions 30 and 50 and the IgH positions above.
Further inspection of the data reveals three prominent regions of
CR for both IgH and IgL chains. These CRRs, defined by distance
between Ag and Ab residues, are organized in a manner similar to
established CDR defined by Ab gene sequence variability (23).
Although the great majority of the total number of Ab CRs identified in this study lie within these prominent regions, a small but
significant percentage of CRs have sequence positions that fall
outside of these regions.
A more detailed analysis of Ab CRR is presented in Fig. 10. The
upper panel is an analysis of IgL CRR and the lower panel is the
same analysis for IgH. No CRs were identified beyond IgL sequence position 96 and IgH sequence position 103. The heat maps
highlight sequence positions where multiple structures have CR.
Sixty-three IgL and 59 IgH amino acid sequence positions, representing ∼54% of all positions within an Ab V region fragment
(Fv 230 aa), are CRs in at least 1 of the 111 structures analyzed.
Many of these sequence positions, however, are CRs in only a
small number of structures.
CRRs are defined in this article as contiguous sequences where
.10% of the structures have CR. With these boundaries, 91.6% of
all IgL and 93.7% of all IgH CRs lie within the indicated CRR. IgH
CR sequence positions 1–2 and 73–79 lie outside of defined CDR/
CRR. Eleven structures have CRs within IgH region 73–79 (e.g.,
1S78) and 11 within region 1–2 (see also Fig. 9). Region 3 contains
more CR than regions 1 and 2 for both IgL and IgH. The third
region of IgH is the largest, containing 24.6% of all Ab CR. There
are many Ab structures in this data set that do not have CR in each
of the six CRRs. There are 7 IgH (e.g., 2QQN) and 64 IgL structures that do not make contact in all 3 CRRs. The sequence positions of all CRs in relation to CDR and CRR for all 111 structures
are shown in Supplemental Figs. 2 (IgL) and 3 (IgH).
In individual Abs, CRs generally do not occur as long, contiguous
sequences, but are scattered in a discontinuous manner within and
across regions. It is interesting to note that in IgH and IgL CDR2/
CRR2, there are large numbers of structures having CRs at positions
50 and 52, but only a small number at position 51. Similarly, IgH
position 29 is included within CRR1 even though only 1 Ab of 111
contacts Ag at that position because the positions on either side are
significantly above the threshold. These observations illustrate the
discontinuous nature of CR within CRR and CDR of individual Abs.
The fact that IgH positions 29 and 51 and IgL position 51 are in-
FIGURE 7. The ratio of IgH to IgL CR in 111 Ag–Ab structures.
Numbers in parentheses indicate the total CR per group in the entire set of
111 structures.
FIGURE 9. The distribution of Ab CRs along the IgH and IgL chains.
Ab amino acid sequence positions from PDB files were renumbered using
Abysis (25).
Downloaded from http://www.jimmunol.org/ by guest on June 17, 2017
FIGURE 6. The frequency of CRs in Ab structures. Numbers in parentheses indicate the total number of CRs per group in the entire set of
111 Ag–Ab structures. IgH+L is the summation of IgH and IgL.
The Journal of Immunology
1433
cluded within CDR/CRR and yet are only infrequently CRs suggests
the possibility of a specific role for these positions apart from directly
binding Ag, perhaps in properly positioning actual CRs.
Comparison of CRR with CDR
A comparison of the boundaries, number of sequence positions, and
CR in each CRR with the corresponding CDR is presented in Fig. 11.
Although CRR and CDR significantly overlap, the boundaries are
different. The differences between CDR and CRR boundaries are
greater for IgH than for IgL. There are more total sequence positions within CDR than CRR. IgH region 3 contains the largest
number of CRs for both CDR and CRR, followed by IgH region 2.
IgL region 2 contains the fewest CRs for both CDR and CRR.
Using the Kabat definitions (discussed in Ref. 25), 13.0% of all Ab
CRs lie outside of established CDR boundaries and 7.0% outside of
CRR. Recently, Kunik et al. (20) reported ∼20% of the Ab residues
that actually bind the Ag fall outside the CDR. Fig. 12 illustrates the
overlap between CDR and CRR. The greatest difference between
the two is the inclusion of IgH residues 26–30 in CRR1 compared
with CDR1. Another major difference between the two is inclusion
of IgL position 49 in CRR2 compared with CDR2.The CRRs defined in this article contain a larger number of CR and are more
uniform in number of sequence positions in a region, and yet comprise fewer total sequence positions compared with CDR; that is,
they are more densely populated with CR than CDR.
The analyses presented in this article demonstrate a significant
overlap between CRR and established CDR defined using completely orthogonal methodologies. Indeed, the CRR boundaries reported in this article are even more closely aligned with refinements
FIGURE 11. Comparison of established CDR (Kabat)
and CRR. The total number of CRs in the data set for
each group is in parentheses.
of CDR reported by others (26). CRRs are defined by measuring
physical distance between atoms within crystal structures, whereas
CDRs are defined by Ig gene sequence variability. The overlap further
confirms the validity of both methodologies. Although the two approaches point to similar structural architecture, there are important
differences as well (Fig. 12, Supplemental Figs. 2 and 3). Ab engineers relying solely on definitions of CDR may not give adequate
consideration to sequence positions that are actually important CR.
Two examples are IgL sequence position 49 and IgH position 30.
These positions lie outside established CDR and yet frequently occur
as CR. Also, IgL positions 24, 25, 26, 90, and 97 and IgH positions 61
and 63 do not contact Ag in this data set but are included within CDR.
These observations point out the advantage of using multiple methodologies to characterize Ab binding residues and shed new light on
the organization of regions of Abs harboring important CR.
Discussion
These analyses were undertaken to better understand the nature of
Ab binding to large protein Ags, and in so doing provide insight to
strategies for developing high-performance, high-value functional
and therapeutic Abs. Abs are used in a variety of ways, and the
structure of an Ag is influenced by the assay or application in which
the Ab is used (4). In the Western blot, the Ag is typically boiled
in detergent and reducing agent, and the Ab must be capable of
recognizing the denatured form of the Ag. In sandwich immunoassay methods using minimally processed patient serum or plasma
samples, the Ab must be able to bind to the properly folded native
Ag structure. Therapeutic Abs must bind to native protein structures as they exist in the patient. Functional Abs must bind to
Downloaded from http://www.jimmunol.org/ by guest on June 17, 2017
FIGURE 10. Analysis of IgL (upper) and IgH (lower) CRRs. Heat maps for IgL and IgH were constructed by counting the number of occurrences of each
position in all structures and applying a color scale (red . yellow . green) to the distribution. Sequence positions were numbered using Abysis and the
Kabat convention. CRRs are highlighted in gray.
1434
EPITOPE AND PARATOPE STRUCTURE
FIGURE 12. Overlap of CDR and CRR. Amino acid
sequence position is numbered across the top using the
Kabat convention. The number of CRs in each region
(defined in Fig. 11) are indicated. The heat map was
constructed by applying a color scale to the number of
CRs (red . yellow . green).
CRs are contained within relatively long CRSs that come into close
spatial proximity because of complex folding inherent within native
protein structures. This is true of both Ag and Ab proteins. In the case
of Abs, the CRSs are highly uniform in length (60–69 for IgL and
70–79 for IgH), and CRs are organized into prominent CRRs. Ag
CR organization and CRS are more variable; however, the most
common CRS class, 50–79 residues, is very similar in size to IgL and
IgH CRS. Given that the function of all Abs is to bind, it seems reasonable that Ab CRs are organized into uniform CRRs and distributed
across CRSs that are highly homogeneous in length. In contrast,
epitopes reside on Ags that have a variety of different functions, so
it is perhaps not surprising that Ag CR and CRS are more variable.
The fact that the most frequent Ag, IgH, and IgL CRSs are very
similar in size suggests a common underlying structural organization. In addition, the similarity in size between the large majority of
Ag CRSs (20–229 residues), as well as the minimum Ab structure
retaining complete binding (Fv 230 aa), further suggests common
structural elements.
The basic structural unit within a protein, Ag and Ab, is a domain.
Domains fold independently, mediate specific biological function,
and combine to form larger modular proteins. The smallest domains
are ∼35 aa and the largest can be several hundred (27). The size of
the most frequently occurring protein domain is ∼100 aa (28), very
similar in size to the average (125 aa) and median (95 aa) CRSs of
the 111 Ag structures analyzed in this article. In addition, the size
distribution of complete protein domains (28, 29) is strikingly
similar to the distribution of Ag CRS presented in Fig. 3. Together,
these observations suggest that complete structural epitopes, represented by CRS, are the size of intact protein domains.
It is widely recognized that Ab framework regions are required for
proper paratope structure and Ab binding, and complete functional
paratopes are formed only when CDR and framework regions are
structurally organized into VH (∼120 aa) and VL (∼110 aa) domains
that together form a functional binding unit (Fv). In a similar fashion,
functional protein Ags are made up of “frameworks” of domains (30),
and functional epitope conformation is determined by the underlying
domain structure. When viewed in composite, the observations
that the number of Ag CRs is equal to the number of Ab CRs, Ag
CRSs are similar in size to IgH and IgL CRSs, and the size of the
majority of Ag CRSs is less than or equal to Fv suggest that Abs
recognize structural epitopes that are similar in size to VL, VH, and
Fv, that is, one to two protein domains.
Large protein epitopes are a collection of not only CRs, but many
more residues that are necessary to present a native structure that
may be as large as, and larger than, the physical footprint of an
Ab. Attempting to elicit or select Abs that bind full-length, native
proteins with peptides and polypeptides smaller than an entire epitope
not only risks non-native Ag folding, but Abs elicited in this way,
although capable of binding native structure, may still not bind to the
full-length protein because of steric hindrance imposed by Ag residues
within the footprint of the Ab but not included in the immunogen.
In the analysis of the functional, neutralizing epitopes of HIV-1
presented in this article, the most broadly neutralizing epitopes are
composed of structures of ∼380 aa. Immunization with smaller polypeptides would likely result in Abs with less CRs and, arguably, less
Downloaded from http://www.jimmunol.org/ by guest on June 17, 2017
native protein structures at specific sites in a way that influences
protein function.
In many applications, it is of greater interest to predict the sites on
an Ag that will result in functional modulation or therapeutic effect
than it is to predict sites that will elicit the production of Abs in an
immunized animal. Indeed, the site of interest may not be immunodominant or even immunogenic, and yet an Ab to the epitope is
highly desired. B cell epitope prediction algorithms are of little value
in such instances, and thus strategies for developing Abs to specific
sites of native proteins regardless of their inherent immunogenicity
are critical.
An important aspect of these analyses is that the majority of the
Abs are described in functional and therapeutic terms providing
information about this important class of molecules. Another important aspect of this work is that all of the Ags consist of $85 aa.
Structures with large Ags were selected in an attempt to reveal the
full extent of Ab interaction with large, natively folded proteins. In
this data set, there is a significantly higher average number of CRs
for large Ag structures compared with smaller Ags, and the epitope
CRS increases proportionally to the size of Ag in the crystal. Given
the earlier considerations and the fact that 92% of the 111 Ags
analyzed in this study are less than full-length, it seems likely that
the values reported in this article for CRS, and to a lesser extent CR,
may be underestimates of actual values, and previously reported
estimates of CR based on even smaller protein fragments are lower
still.
In this study, we identify Ag and Ab CRs in publicly available x-ray
crystal structures defined as amino acids within a distance of 4 Å.
Other researchers have performed similar analyses using this
methodology (10, 14, 15); however, prior studies have been more
limited in scope and have not focused exclusively on structures
comprising large protein Ags. In addition, the length of the linear
sequence containing all CR, the CRS, has not been previously reported. The large number of structures analyzed in this article
highlights the extent of variability associated with the number of
CRs and their relative ratios, the length of CRS, and the organization
of CR within prominent Ab CRRs. The large number of structures
also highlights specific differences between CDR and CRR. The
identification of viral Ag sequence positions contacted by multiple
neutralizing Abs, and the organization of Ab CR into CRR similar
to established CDR lend credence to distance between CR as
a useful means of characterizing Ag–Ab interactions.
The observation that the distributions of the numbers of Ag and
Ab CR significantly overlap and are both centered around ∼18 or 19
CRs per structure suggests that, in general, Abs contact regions on
large protein Ags that are similar in size to their own physical
dimensions. Thus, one characteristic of an Ag epitope is that the
two-dimensional surface area containing all CRs is less than or
equal to the physical dimensions of the binding region of an Ab.
Davies et al. (11) estimated these dimensions from studies of lysozyme–Ab crystal structures as an oval with the approximate
dimensions of 20 3 30 Å. These dimensions represent the “footprint” of Ab paratopes and large protein Ag epitopes.
A unique aspect of this work is the characterization of both the Ag
and Ab CRS. The analysis of many Ag–Ab complexes indicates that
The Journal of Immunology
Acknowledgments
We thank Dr. Ross Chambers for insight and review of the manuscript and
Dr. Fenglin Yin for review of the database and Ab sequence numbering.
Disclosures
The authors have no financial conflicts of interest.
References
1. Strohl, W. R. 2009. Therapeutic monoclonal antibodies: past, present and future.
In Therapeutic Monoclonal Antibodies. Z. An, ed. John Wiley & Sons, Inc.,
Hoboken, NJ, p. 3–50.
2. van Regenmortel, M. H. V. 2009. What is a B-cell epitope? In Methods in
Molecular Biology, Epitope Mapping Protocols, Vol. 524. U. Reineke, and M.
Schutkowski, eds. Humana Press, New York, p. 3–20.
3. Irving, M. B., L. Craig, A. Menendez, B. P. Gangadhar, M. Montero, N. E. van
Houten, and J. K. Scott. 2010. Exploring peptide mimics for the production of
antibodies against discontinuous protein epitopes. Mol. Immunol. 47: 1137–
1148.
4. Brown, M. C., T. R. Joaquim, R. Chambers, D. V. Onisk, F. Yin, J. M. Moriango,
Y. Xu, D. A. Fancy, E. L. Crowgey, Y. He, et al. 2011. Impact of immunization
technology and assay application on antibody performance—a systematic
comparative evaluation. PLoS ONE 6: e28718.
5. Jemmerson, R. 1987. Multiple overlapping epitopes in the three antigenic
regions of horse cytochrome c1. J. Immunol. 138: 213–219.
6. Newman, M. A., C. R. Mainhart, C. P. Mallett, T. B. Lavoie, and S. J. Smith-Gill.
1992. Patterns of antibody specificity during the BALB/c immune response to
hen eggwhite lysozyme. J. Immunol. 149: 3260–3272.
7. Jones, P. T., P. H. Dear, J. Foote, M. S. Neuberger, and G. Winter. 1986.
Replacing the complementarity-determining regions in a human antibody with
those from a mouse. Nature 321: 522–525.
8. Padlan, E. A., C. Abergel, and J. P. Tipper. 1995. Identification of specificitydetermining residues in antibodies. FASEB J. 9: 133–139.
9. Ramirez-Benitez, M. C., and J. C. Almagro. 2001. Analysis of antibodies of
known structure suggests a lack of correspondence between the residues in
contact with the antigen and those modified by somatic hypermutation. Proteins
45: 199–206.
10. Novotný, J., M. Handschumacher, E. Haber, R. E. Bruccoleri, W. B. Carlson,
D. W. Fanning, J. A. Smith, and G. D. Rose. 1986. Antigenic determinants in
proteins coincide with surface regions accessible to large probes (antibody
domains). Proc. Natl. Acad. Sci. USA 83: 226–230.
11. Davies, D. R., E. A. Padlan, and S. Sheriff. 1990. Antibody-antigen complexes.
Annu. Rev. Biochem. 59: 439–473.
12. MacCallum, R. M., A. C. R. Martin, and J. M. Thornton. 1996. Antibody-antigen
interactions: contact analysis and binding site topography. J. Mol. Biol. 262:
732–745.
13. Schlessinger, A., Y. Ofran, G. Yachdav, and B. Rost. 2006. Epitome: database of
structure-inferred antigenic epitopes. Nucleic Acids Res. 34(Database issue):
D777–D780.
14. Haste Andersen, P., M. Nielsen, and O. Lund. 2006. Prediction of residues in
discontinuous B-cell epitopes using protein 3D structures. Protein Sci. 15: 2558–
2567.
15. Ponomarenko, J. V., and P. E. Bourne. 2007. Antibody-protein interactions:
benchmark datasets and prediction tools evaluation. BMC Struct. Biol. 7:
64.
16. Ansari, H. R., and G. P. Raghava. 2010. Identification of conformational B-cell
epitopes in an antigen from its primary sequence. Immunome Res. 6: 6.
17. Zhao, L., and J. Li. 2010. Mining for the antibody-antigen interacting associations that predict the B cell epitopes. BMC Struct. Biol. 10(Suppl. 1): S6.
18. Collis, A. V. J., A. P. Brouwer, and A. C. R. Martin. 2003. Analysis of the
antigen combining site: correlations between length and sequence composition
of the hypervariable loops and the nature of the antigen. J. Mol. Biol. 325:
337–354.
19. Almagro, J. C. 2004. Identification of differences in the specificity-determining
residues of antibodies that recognize antigens of different size: implications
for the rational design of antibody repertoires. J. Mol. Recognit. 17: 132–
143.
20. Kunik, V., B. Peters, and Y. Ofran. 2012. Structural consensus among antibodies
defines the antigen binding site. PLOS Comput. Biol. 8: e1002388.
21. Madej, T., K. J. Addess, J. H. Fong, L. Y. Geer, R. C. Geer, C. J. Lanczycki,
C. Liu, S. Lu, A. Marchler-Bauer, A. R. Panchenko, et al. 2012. MMDB: 3D
structures and macromolecular interactions. Nucleic Acids Res. 40(Database
issue): D461–D464.
22. Ofran, Y., A. Schlessinger, and B. Rost. 2008. Automated identification of
complementarity determining regions (CDRs) reveals peculiar characteristics of
CDRs and B cell epitopes. J. Immunol. 181: 6230–6235.
23. Wu, T. T., and E. A. Kabat. 1970. An analysis of the sequences of the variable
regions of Bence Jones proteins and myeloma light chains and their implications
for antibody complementarity. J. Exp. Med. 132: 211–250.
24. Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig,
I. N. Shindyalov, and P. E. Bourne. 2000. The protein data bank. Nucleic Acids
Res. 28: 235–242.
25. Martin, A. C. R. 2010. Protein sequence and structure analysis of antibody
variable domains. In Antibody Engineering, Vol. 2. R. Kontermann, and
S. Dübel, eds. Springer-Verlag, Berlin, p. 33–51.
26. Kirkham, P. M., and H. W. Schroeder, Jr. 1994. Antibody structure and
the evolution of immunoglobulin V gene segments. Semin. Immunol. 6: 347–
360.
27. Boeckmann, B., M. C. Blatter, L. Famiglietti, U. Hinz, L. Lane, B. Roechert, and
A. Bairoch. 2005. Protein variety and functional diversity: Swiss-Prot annotation
in its biological context. C. R. Biol. 328: 882–899.
28. Wheelan, S. J., A. Marchler-Bauer, and S. H. Bryant. 2000. Domain size distributions can predict domain boundaries. Bioinformatics 16: 613–618.
29. Jones, S., M. Stewart, A. Michie, M. B. Swindells, C. Orengo, and
J. M. Thornton. 1998. Domain assignment for protein structures using a consensus approach: characterization and analysis. Protein Sci. 7: 233–242.
30. Savageau, M. A. 1986. Proteins of Escherichia coli come in sizes that are
multiples of 14 kDa: domain concepts and evolutionary implications. Proc. Natl.
Acad. Sci. USA 83: 1198–1202.
31. Zhang, W., Y. Xiong, M. Zhao, H. Zou, X. Ye, and J. Liu. 2011. Prediction of
conformational B-cell epitopes from 3D structures by random forests with
a distance-based feature. BMC Bioinformatics 12: 341.
Downloaded from http://www.jimmunol.org/ by guest on June 17, 2017
affinity or perhaps less broadly neutralizing activity. In instances
where Abs bind to CR on multiple domains, the size of an immunogen
required to elicit such an Ab would be expected to be larger still,
to encompass all CRs and all supporting framework amino acids
required for maximum contact.
A great deal of effort is expended attempting to predict protein
epitopes. Zhang and coworkers (31) recently reported an improved
method of predicting conformational epitopes based on a “thick
surface patch” model including an “adjacent residue distance” feature. This concept of an epitope includes the idea of CRs layered on
top of supporting amino acids but fails to appreciate that the structure
of the immediately underlying amino acids is, in turn, dependent on
even more amino acids comprising the complete protein domain.
Currently, B cell epitope prediction is largely focused on physicochemical attributes of a protein Ag recognized by Abs such as surface
exposure, solvent accessibility, spatial configuration, and so on. It
is unlikely, however, that such characteristics will fully account
for issues of immunodominance and tolerance that influence Ab development and involve other components of the immune system such
as T cell help.
When attempting to develop functional and therapeutic Abs, it
is less important to identify the sites on a protein that may elicit
large amounts of Ab and is more important to direct Ab development
to sites where binding would modulate biological function, regardless of the site’s inherent immunogenicity. Given that there are
a limited number of immunodominant sites on a protein, immunization with full-length proteins, although ensuring proper conformation and maximizing the potential interaction between Ag and
Ab, may not elicit Abs directed to the biologically relevant site.
Immunization with peptides and polypeptides representing less
than a complete structural epitope has proved time and again to
elicit Abs that may be directed to the site of interest but either do
not bind to native protein or bind in a way that does not modulate
function. To develop functional and therapeutic Abs, it is critical
to focus Ab development to potentially weakly immunogenic, biologically relevant sites in a way that the resulting Abs bind to fulllength native proteins and effect a desired function.
The analyses presented in this article demonstrate that functional
and therapeutic Abs bind structures composed of linear strings of
contiguous amino acids that are the size of protein domains. As
defined by the CRS, a structural epitope is significantly larger than
previous definitions of epitopes that are focused primarily on CR.
A structural epitope includes CRs as well as the minimum number
of contiguous amino acids required to yield a compact native
structure that is independent of other protein regions or even other
molecules. Thus, immunization with independent protein domains
representing complete structural epitopes is indicated for development of functional and therapeutic mAbs targeted to biologically
relevant and potentially weakly immunogenic sites.
1435