Molecular flexibility in protein–DNA interactions

BioSystems 85 (2006) 126–136
Molecular flexibility in protein–DNA interactions
Stefan Günther∗ , Kristian Rother, Cornelius Frömmel
Institute of Biochemistry Charité, Monbijoustrasse 2, 10117 Berlin, Germany
Received 23 July 2005; received in revised form 7 September 2005; accepted 13 December 2005
Abstract
In living cells protein–DNA interactions are fundamental processes. Here, we compare the 3D structures of several DNA-binding
proteins frequently determined with and without attached DNA. We studied the global structure (backbone-traces) as well as the
local structure (binding sites) by comparing pair-wise the related atoms. The DNA-interaction sites of uncomplexed proteins show
conspicuously high local structural flexibility. Binding to DNA results in specific local conformations, which are clearly distinct
from the unbound states. The adaptation of the protein’s binding site to DNA can never be described by the lock and key model but
in all cases by the induced fit model. Conformational changes in the seven protein backbone traces take place in different ways. Two
of them dock onto DNA without a significant change, while the other five proteins are characterized by a backbone conformation
change caused by DNA docking. In the case of three proteins of the latter group the DNA-complexed conformation also occurs in
a few uncomplexed structures. This behavior can be described by a conformational ensemble, which is narrowed down by DNA
docking until only one single DNA-complexed conformation occurs. Different docking models are discussed and each of the seven
proteins is assigned to one of them.
© 2006 Elsevier Ireland Ltd. All rights reserved.
Keywords: Conformational changes; Structure/function studies; Protein nucleic acid interactions; Computational analysis of protein structure;
Conformational equilibrium
1. Introduction
At the atomic level protein–DNA binding is characterized by non-specific interactions with the DNA backbone
and DNA-sequence-specific interactions with individual
bases. Several aspects of these interactions have already
been elucidated on protein–DNA-complexes collected
in the Protein Data Bank (Berman et al., 2000). For
example, addressing questions such as: how large
are the interfaces and which hydrogen bonds occur
Abbreviations: PDB, The Protein Data Bank; MetJ, methionine
repressor; CAP, catabolite activator protein; CBF, core-binding factor;
DtxR, diphtheria toxin repressor; PvuII, PvuII endonuclease
∗ Corresponding author. Tel.: +49 30 450 528 375;
fax: +49 30 450 528 942.
E-mail address: [email protected] (S. Günther).
(Nadassy et al., 1999)? How do protein mutations affect
the binding specificity (Luscombe and Thornton, 2002)?
How do the various chemical interactions influence the
protein–DNA binding (Mandel-Gutfreund and Margalit,
1998)? Which packing density do the atoms adopt in the
interface (Nadassy et al., 2001)? However, there is no detailed analysis on how proteins adopt to DNA structure.
Several models have been proposed to describe
molecular recognition processes of proteins. In 1894,
Emil Fischer formulated the “lock and key principle”
(Fischer, 1894). It implies that the binding site is inflexible, and the appropriate ligand fits it perfectly. Although
it is incontestable that the function of a protein depends
on its structure, the lock and key principle fails to explain
several observations. Taking into account the flexibility
of proteins, Koshland introduced the “induced fit model”
(Koshland, 1958), which assumes the binding sites will
0303-2647/$ – see front matter © 2006 Elsevier Ireland Ltd. All rights reserved.
doi:10.1016/j.biosystems.2005.12.007
S. Günther et al. / BioSystems 85 (2006) 126–136
be structurally influenced by the ligand. Later Monod et
al. (1965) postulated two or more pre-existing conformational states for allosteric proteins. This model, based
on the “equilibrium hypothesis” (Tsai et al., 1999), is
related to the funnel-shaped energy landscape of a protein, used to describe protein folding (Frauenfelder et al.,
127
1991). The protein has an ensemble of local and global
conformations located at the bottom of the energy funnel.
Low barriers allow switching through these conformations, one of which corresponds to the ligand complexed
form and ligand binding shifts the equilibrium to this
state (Fig. 1(a)).
Fig. 1. (a) Schematic description of local structural alteration of the binding site. Three different models are illustrated: (I) “Lock and key model”:
The binding site of the free protein has a shape complementary to DNA. (II) Induced conformational change: The DNA docking is associated with a
structural adaptation of the binding site. (III) “Conformational Equilibrium model”: The free binding site appears in an ensemble of conformations.
DNA binds to the one that is best suited for docking. (b) Schematic illustration of different docking models describing conformational change of the
backbone. (I) The backbone remains unchanged during DNA docking. (II) The DNA docking is associated with an induced conformational change.
(III) Conformational diversity: The free protein appears in an ensemble of conformations. The DNA binds to the one that is best suited for docking.
(IV) “Dynamic shift model”: DNA binding causes a change in the probability distribution of the ensemble of native states.
128
S. Günther et al. / BioSystems 85 (2006) 126–136
An extension of the equilibrium hypothesis is the “dynamic population shift model” (Freire, 1999). It is based
on the assumption that ligand binding causes a change in
the probability distribution of the native global state ensemble. The stabilization of a distinct structure by ligand
binding results in a conformational change throughout
the protein. This model can adequately describe proteins
that exhibit allosteric behavior.
In protein–protein interactions as well as in the case of
DNA-binding proteins the different docking models for
local binding site structure (Fig. 1(a)) and global backbone conformation (Fig. 1(b)) do not contradict each
other, rather particular protein–ligand interactions relate
more or less to one of them.
Structural shifts induced by ligands or structural differences between pre-established states of an equilibrium are variable in scale. The differences encompass
small changes of side-chain atoms and complete movements of the backbone of a particular protein. In the
following we use “conformational change” to describe
changes of the protein backbone measured as rmsd of
C(␣)-atoms and “local structural alteration” to describe
the adaptation of binding site atoms to DNA measured
as rmsd of atoms located in the interface area.
Structural shifts have been analyzed for protein–
protein interactions (Goh et al., 2004; Echols et al.,
2003). Distinct interactions could be assigned to the
induced fit model, others to the equilibrium and the
dynamic population shift model. The data support the
hypothesis that proteins exist in an ensemble of conformational states. A small population exists between
these states, which is active without the presence of a
second protein or ligand. This hypothesis can explain
a low level of activity of unphosphorylated proteins,
which are normally described as completely inactive
(Barak and Eisenbach, 1992).
Due to the large number of 3D structures of free
and complexed DNA-binding proteins, a similar comparative study was also possible. Here, we analyze
conformational diversity within seven DNA-binding
proteins, that have frequently been crystallized in
DNA-complexed and free states. While several analyses
point out that DNA binding is often coupled with conformational changes at least within the binding domain
(Spolar and Record, 1994; Garvie and Wolberger,
2001), here local changes of the binding site are also
considered to estimate their flexibility.
The superposition results indicate that movements of
atoms in the binding site region as well as of the backbone
trace due to binding of DNA are detectable within all
seven proteins. We were able to assign each of the seven
proteins to one of the models given in Fig. 1(a) and (b).
2. Materials and methods
2.1. Dataset
First, we collected all protein–DNA complexes from the
PDB showing a resolution better than 3.2 Å . Second, using
the cd-hit algorithm (Li et al., 2001) and a sequence similarity threshold of at least 90% we searched for free homologues. Only complexes co-crystallized with double-stranded
DNA longer than four nucleotides were used in the analysis.
Small proteins with less than 20 amino acids were removed
from the dataset. Only sets of structures including at least six
DNA-bound and six DNA-free examples were considered. The
resulting seven protein groups are given in Table 1, for the complete list of PDB codes see Table 3 published as supporting information (http://bioinf.charite.de/protein DNA/index.html).
2.2. Definition of backbone trace and binding site
For superposition, two different atom subsets of each protein structure were selected: the relevant backbone trace (C␣atoms) and the protein atoms involved directly in DNA interaction. The backbone trace was defined as all C␣-atoms of
the protein chain containing the DNA-binding site. Only C␣atoms present in all backbones of a structure group were considered. The DNA binding region is defined as all heavy atoms
of the protein chain within a range of 5 Å around any atom of
the DNA. This procedure yields binding sites with different
sizes within the same protein group. The main cause of this is
that within each group a few structures exist that contain an
incomplete DNA molecule. The associated DNA-binding proteins show fewer atoms in direct contact with DNA. To allow
pair-wise comparisons only those atoms present in all DNAbound structures within the contact area were used for superposition. The selection of the largest common subset actually decreases the number of atoms, but comparing identical numbers
of atom sets prevents statistical difficulties with 3D comparisons (Stark et al., 2003). The atomic sets for superimposition
of the backbone traces were selected in an analogous manner.
In five of the seven protein groups the restriction to the largest
common subsets decreases the number of C␣-atoms slightly.
The resulting data set and its characteristics is summarized
in Table 1.
2.3. Determination of conformational diversity by
superposition
All superpositions were performed using the ‘pair fit’ routine implemented in the molecular graphics system ‘PyMOL’.1
The algorithm calculates the minimal three-dimensional deviation (rmsd) of two superimposed sets of atom coordinates.
The atoms from both sets were assigned pairwise to each other
such that only superposition of the same amino acid and atom
1
PyMOL version 0.93, DeLano Scientific, Castro City, CA.
S. Günther et al. / BioSystems 85 (2006) 126–136
129
Table 1
The seven structural groups of DNA-binding proteins with the size of the binding sites (in atoms) and the length of the backbones (in amino acid
residues)
Protein
Transcription factors
Methionine-repressor
Catabolite-gene-activator
Runx1 runt domain
Diphtheria-toxin-repressor
Enzymes
DNA-polymerase-␤
Fragment of DNA-polymerase I
PvuII endonuclease
Number of available structures
Binding site
Backbone
8
8
13
16
45–82 (31)
67–104 (30)
73–89 (41)
85-100 (61)
104–104 (102)
197–208 (193)
110–125 (106)
118–219 (111)
11
6
9
82–215 (58)
175–290 (86)
124–146 (62)
241–331 (223)
525–828 (496)
151–158 (148)
DNA-complexed
Uncomplexed
20
15
7
12
82
7
10
The numbers in parenthesis indicate the size of the consensus subset common to all structures, which was used for superposition.
type was allowed. No atoms were omitted. Within the seven
protein groups all C␣-sets and binding site related atoms were
superimposed with all other members of the distinct group.
In total, 5726 binding site as well as 5726 backbone superpositions were calculated. The measured pairwise rmsd values
result in two similarity matrices per structure group: one for
the similarity of binding regions and the other for the backbone
traces. The elements of the 14 resulting similarity matrices
were clustered using the ‘Ward Method’ (Ward, 1963). Beside
the rmsd, the maximal deviation of two superimposed sets
of atoms was calculated to test the robustness of the selected
method of measurement. Analogous to the rmsd values the
resulting maximal deviations were transferred to similarity
matrices and are published as supporting information.
The calculated structure similarities were the basis for
assignment to the above mentioned docking models. If the
structural ensemble of a protein is separated into the DNAcomplexed and the uncomplexed states that protein was assigned to the induced fit model. If low or none structural
changes between the complexed and uncomplexed protein
structures were detected the assigned docking model is the
lock and key principle. An assignment to the equilibrium hypothesis is characterised by two or more structural cluster. One
of them consists of all bound protein-instances, but contains at
least one additional unbound state. All other unbound proteins
are located within further clusters. A precondition for assignment to the last docking model, the dynamic population shift
model, is the presence of at least two different DNA-bound
conformations of the same protein.
3. Results and discussion
The different selected methods of measurement, rmsd
and maximal deviation resulted in similar cluster maps.
Exemplary the similarity matrices based on the rmsd values for two of the seven analyzed DNA-binding proteins (diphtheria toxin repressor and polymerase ␤) are
shown in Fig. 2(a) and (b). The data for all seven
analyzed proteins are given as supporting information
(http://bioinf.charite.de/protein DNA/index.html).
3.1. Size of binding region and length of
Cα-backbone
Constraining the binding region to the largest common subset (see Section 2 above) of each protein group
notably reduces the number of atoms, but allows straightforward 3D comparisons and ensures that detected differences between DNA-complexed and free forms are
related to DNA binding (compare Table 1).
The C␣-traces of two of the seven protein groups,
diphtheria toxin repressor (DtxR) and polymerase I,
show large differences in the length of backbone traces.
In the case of DtxR most structures do not contain the
C-terminal SH3-like domain. Within the polymerase I
group some of the structures contain the whole enzyme
while others contain only the large polymerase fragment
(Klenow fragment).
3.2. Conformational diversity of protein backbones
3.2.1. Diphtheria toxin repressor
Diphtheria toxin repressor (DtxR) recognizes several iron-regulated promoter/operator sequences. For
example, activation of DtxR occurs at high iron concentrations. Activated DtxR binds to the tox regulatory
region, toxO, located upstream of the tox structural gene,
preventing expression of diphtheria toxin. When iron
becomes a growth-limiting nutrient, DtxR dissociates
from DNA, allowing the transcription of toxin mRNA.
Different forms of DtxR are available within the PDB
database: apo-DtxR, holo-DtxR and the holo-DtxR
complexed with DNA. Activation of the N-terminal
domain by metal ions results in a conformational
130
S. Günther et al. / BioSystems 85 (2006) 126–136
Fig. 2. (a) Diphtheria toxin repressor. Similarity matrices (rmsd values) of backbone and binding site structure. DNA-complexed structures are
labeled green, the unbound structures are marked grey (top line). High similarity values are marked red, yellow cells indicate more discriminative
conformations. (b) DNA-polymerase ␤.
S. Günther et al. / BioSystems 85 (2006) 126–136
Fig. 3. Structure of diphtheria toxin repressor. The activated state of
the dimer is colored green and is complexed with DNA (PDB code:
1C0W, chains a and b). Cobalt ions (grey spheres) induce a conformation change of six amino-terminal residues (colored blue). For clear
illustration of the conformation switch the superposition of an unactivated repressor (PDB code: 1BI2, chain b) is also shown, and is
colored red. If there was no change of conformation the N-terminal
part of the backbone would clash with a DNA phosphate group.
change of the protein backbone that contains part of the
DNA-binding site (White et al., 1998). Upon activation
with the co-repressor, the six amino-terminal residues
of DtxR undergo a helix-to-coil transition (Fig. 3). This
conformational change is only partly reflected by clustering in the similarity matrix of the DtxR-superposition
results (Fig. 2(a)). All protein structures attached to the
DNA show a similar local and global conformation and
are activated by metal ions. We found one DNA-free
structure (2TDX), which shows a conformation similar
to DNA-complexed forms, indicating that this conformation is not only DNA induced. The repressor’s other
structures are divided into four distinct conformational
global variants. All of these are observed with the
ligand-free protein, while the metal-activated forms
are limited to two of them. The results indicate that
metal binding limits the ensemble of ligand free conformational variants. Such behavior is described by the
equilibrium model: the conformation suitable for DNA
is not only induced by DNA, but also appears within the
conformational ensemble of the DNA-free protein.
3.2.2. Methionine repressor
In the presence of the co-repressor S-adenosylmethionine (SAM) the methionine repressor (MetJ)
binds to tandem eight base pair DNA recognition sites of
the met regulon. SAM binds at a distance from the DNA
binding site of the repressor (shortest distance to DNA:
131
11.5Å). Bound SAM increases the repressor’s affinity
for operator DNA by a factor up to 1000. However, the
protein undergoes no obvious conformational change. It
is suggested that the co-repressor effect may be electrostatic (Phillips and Phillips, 1994). The conformational
diversity of the backbone (rmsd max = 2.67Å) can divided into between two to five groups. Greater conformational deviations within this structure group are restricted
to two loop regions (residues nos. 13–19 and 76–84),
while most parts of the backbone trace retain a similar
conformation. SAM binding correlates with the position
of the two flexible loops. Within most ternary complexes
(MetJ:SAM:DNA) these two parts of the backbone have
an equivalent position, and binding of DNA and SAM
surely favors this conformation compared to the other
loop forms. Nevertheless, the low backbone movement
of the methionine repressor mediated by DNA binding
is best described by the lock and key model.
3.2.3. Catabolite gene activator
The catabolite activator protein (CAP) activates transcription of the catabolite gene by facilitating the binding of RNA polymerase (RNAP) to the promoter. CAP
possesses two binding sites for the C- and N-terminal
subunits of RNAP, ␣CTD and ␣NTD (Lawson et al.,
2004). One single structure of a CAP:␣CTD:DNA complex is present within the dataset (PDB code: 1LB2A),
but the binding of ␣CTD has no effect on backbone conformation of CAP. Two distinct backbone variants occur
amongst the structures of CAP. While one of them is only
observed in DNA-free structures, the other conformational group arises in both DNA-complexed and uncom-
Fig. 4. Structural diversity of catabolite gene activator protein (CAP).
All pair-wise superpositions within the structure group of CAP are
divided into three groups: rmsd values between DNA-complexed and
uncomplexed structures and rmsd values resulting from superpositions
within both groups. The plot illustrates the correlation between structural diversity of the backbone traces (X-axis) and those of the binding
sites (Y-axis).
132
S. Günther et al. / BioSystems 85 (2006) 126–136
plexed structures. Fig. 4 illustrates the conformational
ensemble by a two-sided separation of the rmsds resulting from backbone superpositions. All pair-wise rmsds
of DNA-complexed structures fall below 1.5 Å. Superposition values involving additional uncomplexed states
are partitioned to low and high deviations. Differences
between both conformations are not limited to a few loop
regions but involve the entire backbone. Each chain of
the dimer consists of two domains, a small C-terminal
DNA-binding domain and the larger N-terminal domain.
Both are displaced relative to each other within the two
conformational variants. CAP is activated by cyclic AMP
molecules (cAMP), which bind to the larger N-terminal
domain. The cAMP molecule induces an allosteric transition preceding the DNA binding process (Passner et
al., 2000). All known structures of CAP are similar to
the activated, cAMP-complexed states, so the conformational variances detected are not caused by cAMP
binding. The two distinct conformations observed are
well explained by reduction to a single conformation
by DNA-binding. Such behavior explains why DNAcomplexed and uncomplexed structures with a similar
conformation of CAP occur within this structural group
and can be assigned to the equilibrium docking model.
3.2.4. Runt related transcription factor
The runt related transcription factor/core-binding factor (CBF␣) regulates the transcription of different genes
by binding to DNA in the proximal promoter region
(Matsuo et al., 2003). The binding of CBF␤ stabilizes
a conformation suitable for the DNA docking process
(Tahirov et al., 2001; Backstrom et al., 2002; Yan et
al., 2004). The backbone superpositions of the transcription factor reveals three groups of conformations. Two
of them are observed in DNA-free structures, the third
one contains the DNA-complexed protein but is also observed in one distinct DNA-free structure (PDB-code:
1E50 A). The existence of this structure demonstrates
that the conformation suitable for DNA-binding also exists without DNA. This structure is also complexed with
CBF␤ supporting the theory of the subunit promoting
DNA docking. The results again suggest a behavior described by the equilibrium model.
3.2.5. DNA-polymerase β
DNA-polymerase ␤ (pol ␤) adds new complementary
deoxynucleotides to a growing DNA chain. The enzyme
has two domains. The nucleotidyl transfer reaction involves a large movement of the 8 kD domain and part of
the 31 kD domain from a closed to an open conformation (Pelletier et al., 1994, 1996; Sawaya et al., 1997).
Both types of conformation are present in the 82 enzyme-
DNA-complexes: 75 structures were crystallized in the
open conformation and 7 represent the closed conformation. Both variants deviate from the 11 DNA-free
structures (Fig. 2(b)) though the closed conformation
more than the open state. The presence of two distinct
DNA-complexed conformations indicate a bidirectional
structural change between the two bound forms, while
catalysing the synthesis of DNA. The shift of the uncomplexed conformation to the different complexed states is
best described by the dynamic population shift model.
3.2.6. Fragment of DNA-polymerase I
DNA-polymerase I catalyzes the addition of deoxynucleotides to a primer RNA chain. The reaction requires a template chain which directs the enzyme to select a specific nucleotide. The catalysis is coupled with
rotation of a subdomain of the polymerase or Klenow
fragment (Li and Waksman, 2001). The superposition
results show three distinct DNA-complexed backbone
conformations reflecting different stages of nucleotide
incorporation. One of them is represented by one single
structure (1TAU) and is quite similar to the uncomplexed
states (rmsd 1TAU ↔ 1TAQ = 0.84 Å). Analogous to pol
␤, DNA-polymerase I is characterised by a structural
shift from the uncomplexed conformation to the DNAcomplexed states. Therefore, DNA-polymerase I is also
assigned to the dynamic population shift model.
3.2.7. Restriction enzyme PvuII
The PvuII endonuclease recognizes the doublestranded DNA sequence 5 -CAGCTG-3 and hydrolyzes
the phosphodiester bond in unmethylated duplex DNA
between the central GC nucleotides (Gingeras et al.,
1981). The restriction enzyme is a homodimer and metal
ions are essential cofactors for DNA cleavage (Pingoud
and Jeltsch, 1997). The structures of crystallized PvuII
endonucleases are clearly partitioned into two different clusters, one containing the DNA-bound enzyme
the other the uncomplexed structures. The presence of
metal ions has no obvious effect on enzyme conformation and DNA docking is also possible without them.
The clear separation of the DNA-complexed and uncomplexed conformation indicates a structural change
induced by DNA (“induced fit”).
3.3. Structural diversity of the binding sites
Protein–DNA interfaces comprise above average polar and positively charged amino acids compared to
protein–protein interfaces or protein surfaces (Jones et
al., 1999). The residue composition of the seven binding sites analyzed corresponds to the interface amino
S. Günther et al. / BioSystems 85 (2006) 126–136
acid composition specified by Jones et al. and reflects
the polar character and negative charge of the DNA. The
positively charged arginine residue is most frequent, followed by polar threonine, asparagine and serine residues,
and the positively charged lysine residue. Except for
one structure of the polymerase I group, all DNA-bound
binding sites are dissimilar to the unbound states. This is
indicated by a clear separation within all structure groups
shown by the cluster analysis. In DNA-bound chains
the atoms in direct contact with DNA are located in a
similar position. The bound structures of DtxR, MetJ,
CBF␣ and PvuII do not exceed an rms deviation of 1.1
Å within each group. The uncomplexed binding sites differ clearly from them and they are less similar amongst
themselves. The observed local changes in binding site
structure are assigned to structural adaptation of sidechains for perfect interaction with DNA. Often the sidechain conformation is changed significantly when the
C(␣) trace remains in the same conformation (Fig. 5).
Comparing the binding site of CAP with and without
DNA (1CGP and 1G6N) the residue Arg180 moves significantly (rmsd = 3.6 Å).
Structurally similar pairs of free binding sites are
rarely present within the dataset even in proteins that are
analyzed frequently. This observation reflects the wide
range of possible local conformations while interacting
with the solvent. Similarly, a DNA-adapted local con-
133
formation is hardly ever adopted without DNA interaction. An example is illustrated in Fig. 4 for the binding
site structures of CAP. In contrast to the similarity values within both the DNA-complexed and uncomplexed
structure groups, no rmsd between both groups falls below 1.2 Å.
Substantial alterations of side-chain structures
located in the interaction site are also observed in
protein–protein associations (Betts and Sternberg,
1999). Significant side-chain movement was estimated
in all proteins considered by Betts and Sternberg, but
greater backbone conformational changes were only
observed in three of the eight complexes.
With the exception of the two polymerases, all other
five proteins analyzed bind to specific DNA target sites.
Sequence specific DNA binding implies that proteins
discriminate between, and thus have different affinities
for individual bases. All five sets contain variants of the
DNA target sites. A complete list of the DNA target
sites within each of the five protein groups is given in
Table 4, published as supporting information. There is
no conspicuous relationship to the local structure of the
binding site. Even if a small structural adaptation to different DNA target sites exists, the scale is small (e.g.
DtxR, rmsd: 0.15–0.99 Å) compared to the large conformational change from the free to the DNA-complexed
state (DtxR, rmsd: 1.17–2.29 Å).
There is one exception within the structure group of
polymerase I. The cluster algorithm does not pick out
a complexed binding site (PDB code: 1TAU A) from
the structures uninfluenced by DNA. However, this fact
can be explained by the strong deviation from the other
DNA-bound structures, which represent different activation states (see DNA-polymerase I section above).
Nevertheless, adaptation of binding site conformation
to DNA can also be observed within this structure. For
the local structure of the interaction site all proteins analyzed can be assigned to the induced fit model. Due to
the large number of possible iso-energetical binding site
structures (see below) we cannot exclude that a DNAfree protein adopts a local conformation similar to the
DNA-bound form from time to time.
3.4. Conformational change of DNA
Fig. 5. Example of a local conformation change. Two structures of
the CAP are superimposed. One structure is colored green (PDB code:
1CGP) and is complexed with DNA (blue), the second structure is
uncomplexed and is colored red (PDB code: 1G6N). One exemplary
residue of the binding site (ARG 180) are shown in stick representation,
the other parts of the protein are shown in ribbon representation.
In contrast to proteins the local flexibility of double
stranded DNA is more restricted. However, helical axis
bending can occur due to external factors including small
molecule ligands or proteins (Dickerson, 1998; PerezMartin and de Lorenzo, 1997; Otwinowski et al., 1988).
Among the seven proteins analyzed here, only CAP interacts with obviously bent double-stranded DNA. DNA
134
S. Günther et al. / BioSystems 85 (2006) 126–136
Table 2
Assignment of the DNA-binding proteins to different docking models
Protein
Docking model for the binding site
Docking model for backbone conformation
Methionine-repressor
Runt-related-transcription factor
Catabolite-gene-activator
Diphtheria-toxin-repressor
DNA-polymerase ␤
Fragment of DNA-polymerase I
PvuII endonuclease
Induced fit
Induced fit
Induced fit
Induced fit
Induced fit
Induced fit
Induced fit
Inflexible docking
Equilibrium
Equilibrium
Equilibrium
Dynamic shift
Dynamic shift
Induced fit
The classification is based on the binding site and backbone similarity within each structural group.
complexed to CAP forms a right angle caused by two
successive kinks (Dickerson, 1998).
Apart from this, only the single-stranded DNA regions within the polymerase complexes show conspicuous structural differences compared to DNA strands
organized in a straight double helix.
However, the influence of a protein on DNA binding is often difficult to estimate because the structure of
the unbound DNA is usually unknown. For the CAPoperator sequence it was estimated by MD simulations
that 40% of the bending in the complex is intrinsic to
the DNA sequence, whereas 60% is induced by protein
binding (Dixit et al., 2005). DNA bending does not result
in a new conformation of CAP, since this would only be
seen in DNA-complexed structures of the protein (see
section of chapter CAP).
4. Conclusions
4.1. Backbone conformation of the binding domain
Diversity in protein backbone conformation was observed within each of the seven proteins, but the degree
of conformational change varies and is not always related
to function. Low structural variation is observed within
the structure group of the methionine repressor protein.
A model describing the docking of MetJ should most
probably assume an inflexible backbone. The proteins
catabolite gene activator, runt related transcription factor and diphtheria toxin repressor are all characterized by
the fact that binding of the co-activators/repressor narrows down the number of alternative conformations, and
thus facilitates DNA-docking. This is best described by
the equilibrium model. DNA-polymerase I and DNApolymerase ␤ exhibit additional conformations while
docked with DNA. Therefore, the shift of the structural
ensemble is best reflected by the dynamic population
shift model. The DNA bound and unbound structures of
PvuII differ considerably. Binding of the co-activator has
no effect on backbone conformation, but is necessary for
cutting DNA. This behavior is described by the induced
fit model.
4.2. The local structure of DNA binding sites
The local structure of the DNA-binding sites of all
seven proteins is influenced by DNA. DNA binding is
combined with a change of local structure to a precise
DNA-complementary binding site shape. This conformation is hardly ever observed in proteins which are
not attached to DNA. The flexible side-chains located at
the protein surface adopt a number of alternative local
conformations. Similar to the backbone docking model
for the transcription factors, DNA-docking is associated
with a severe reduction in the protein’s conformational
ensemble. Finally it is reduced to nearly one variant that
allows discrimination between different bases. Although
some protein sets contain variant DNA target-sites, little
effect on local binding site structure was detected. Table
2 gives an overview of the assignments to the different
docking models.
4.3. Structure of DNA molecules
Structural change in double-stranded DNA molecules
analyzed is mainly limited to a global bending of the axis
within the complexed structures of CAP. The bending is
partly induced by the protein. Local structural changes
of single nucleotides are prevented by the strong association of complementary bases.
4.4. Consequences for computer-based docking
The local flexibility of binding sites constitutes a frontier problem for shape-based prediction models. The
number of possible ligands increases substantially even
upon small conformational variations of the binding site
(Ferrari et al., 2004).
For small organic compounds, Yang et al. (2004)
showed that binding sites complexed with their ligands
S. Günther et al. / BioSystems 85 (2006) 126–136
assume a low energy conformation of the free protein.
This assumption could provide an opportunity to identify
binding sites by geometric criteria.
However, for DNA binding, the interactions are more
complex as larger interfaces are involved and the negative charges of the DNA backbone constrain the binding region on the protein site such that it is no longer
equal to a low energy conformation of the free state. The
side chains of the binding site accommodate to the structure they bind dynamically. It follows, that the conformational space to be considered for a potential binding
site is enormous: even if a binding site comprises only
20 residues and as few as five rotamers per residue, the
search space would yield 205 or 1014 combinations.
Thus, a more promising way of predicting protein–
DNA interactions is to combine geometric criteria with
additional physical parameters that narrow down the conformational space by several orders of magnitude.
References
Backstrom, S., Wolf-Watz, M., Grundstrom, C., Hard, T., Grundstrom,
T., Sauer, U.H., 2002. The runx1 runt domain at 1.25a resolution:
a structural switch and specifically bound chloride ions modulate
dna binding. J. Mol. Biol. 322, 259–272.
Barak, R., Eisenbach, M., 1992. Correlation between phosphorylation
of the chemotaxis protein CheY and its activity at the flagellar
motor. Biochemistry 31 (6), 1821–1826.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E., 2000. The protein data bank.
Nucleic Acids Res. 28, 235–242.
Betts, M.J., Sternberg, M.J., 1999. An analysis of conformational
changes on protein–protein association: implications for predictive docking. Protein Eng. 12 (4), 271–283.
Dickerson, R.E., 1998. DNA bending: the prevalence of kinkiness and
the virtues of normality. Nucleic Acids Res. 26 (8), 1906–1926.
Dixit, S.B., Andrews, D.Q., Beveridge, D.L., 2005. Induced fit and the
entropy of structural adaptation in the complexation of CAP and
lambda-repressor with cognate DNA sequences. Biophys. J. 88 (5),
3147–3157.
Echols, N., Milburn, D., Gerstein, M., 2003. Molmovdb: analysis and
visualization of conformational change and structural flexibility.
Nucleic Acids Res. 31, 478–482.
Ferrari, A.M., Wei, B.Q., Costantino, L., Shoichet, B.K., 2004. Soft
docking and multiple receptor conformations in virtual screening.
J. Med. Chem. 47, 5076–5084.
Fischer, E., 1894. Einfluss der configuration auf die wirkung der enzyme. Chem. Ber. 27, 2985–2993.
Frauenfelder, H., Sligar, S.G., Wolynes, P.G., 1991. The energy landscapes and motions of proteins. Science 254, 1598–1603.
Freire, E., 1999. The propagation of binding interactions to remote sites
in proteins: analysis of the binding of the monoclonal antibody d1.3
to lysozyme. Proc. Natl. Acad. Sci. USA 96, 10118–10122.
Garvie, C.W., Wolberger, C., 2001. Recognition of specific dna sequences. Mol. Cell 8, 937–946.
Gingeras, T.R., Greenough, L., Schildkraut, I., Roberts, R.J., 1981.
Two new restriction endonucleases from proteus vulgaris. Nucleic
Acids Res. 9, 4525–4536.
135
Goh, C.S., Milburn, D., Gerstein, M., 2004. Conformational changes
associated with protein–protein interactions. Curr. Opin. Struct.
Biol. 14, 104–109.
Jones, S., van Heyningen, P., Berman, H.M., Thornton, J.M., 1999.
Protein–DNA interactions: a structural analysis. J. Mol. Biol. 287,
877–896.
Koshland, D., 1958. Application of a theory of enzyme specifity to
protein synthesis. Proc. Natl. Acad. Sci. USA 44, 98–104.
Lawson, C., Swigon, D., Murakami, K.S., Darst, S.A., Berman, H.M.,
Ebright, R.H., 2004. Catabolite activator protein: DNA binding and
transcription activation. Curr. Opin. Struct. Biol. 14 (1), 10–20.
Li, W., Jaroszewski, L., Godzik, A., 2001. Clustering of highly homologous sequences to reduce the size of large protein databases.
Bioinformatics 17, 282–283.
Li, Y., Waksman, G., 2001. Crystal structures of a ddatp-, ddttp-, ddctp, and ddgtp- trapped ternary complex of klentaq1: insights into
nucleotide incorporation and selectivity. Protein Sci. 10, 1225–
1233.
Luscombe, N.M., Thornton, J.M., 2002. Protein–DNA interactions:
amino acid conservation and the effects of mutations on binding
specificity. J. Mol. Biol. 320, 991–1009.
Mandel-Gutfreund, Y., Margalit, H., 1998. Quantitative parameters for
amino acid-base interaction: implications for prediction of protein–
DNA binding sites. Nucleic Acids Res. 26, 2306–2312.
Matsuo, N., Yu-Hua, W., Sumiyoshi, H., Sakata-Takatani, K., Nagato,
H., Sakai, K., Sakurai, M., Yoshioka, H., 2003. The transcription
factor ccaat-binding factor cbf/nf-y regulates the proximal promoter activity in the human alpha 1(xi) collagen gene (col11a1). J.
Biol. Chem. 278, 32763–32770.
Monod, J., Wyman, J., Changeux, J.P., 1965. On the nature of allosteric
transitions: a plausible model. J. Mol. Biol. 12 (NIL), 88–118.
Nadassy, K., Tomas-Oliveira, I., Alberts, I., Janin, J., Wodak, S.J.,
2001. Standard atomic volumes in double-stranded dna and packing in protein–DNA interfaces. Nucleic Acids Res. 29, 3362–
3376.
Nadassy, K., Wodak, S.J., Janin, J., 1999. Structural features of proteinnucleic acid recognition sites. Biochemistry 38, 1999–2017.
Otwinowski, Z., Schevitz, R.W., Zhang, R.G., Lawson, C.L.,
Joachimiak, A., Marmorstein, R.Q., Luisi, B.F., Sigler, P.B., 1988.
Crystal structure of trp repressor/operator complex at atomic resolution. Nature 335 (6188), 321–329.
Passner, J.M., Schultz, S.C., Steitz, T.A., 2000. Modeling the campinduced allosteric transition using the crystal structure of cap-camp
at 2.1 a resolution. J. Mol. Biol. 304, 847–859.
Pelletier, H., Sawaya, M.R., Kumar, A., Wilson, S.H., Kraut, J., 1994.
Structures of ternary complexes of rat dna polymerase beta, a dna
template-primer, and ddctp. Science 264, 1891–1903.
Pelletier, H., Sawaya, M.R., Wolfle, W., Wilson, S.H., Kraut, J., 1996.
Crystal structures of human dna polymerase beta complexed with
dna: implications for catalytic mechanism, processivity, and fidelity. Biochemistry 35, 12742–12761.
Perez-Martin, J., de Lorenzo, V., 1997. Clues and consequences of
DNA bending in transcription. Annu. Rev. Microbiol. 51 (NIL),
593–628.
Phillips, K., Phillips, S.E., 1994. Electrostatic activation of escherichia
coli methionine repressor. Structure 2, 309–316.
Pingoud, A., Jeltsch, A., 1997. Recognition and cleavage of dna by
type-II restriction endonucleases. Eur. J. Biochem. 246, 1–2.
Sawaya, M.R., Prasad, R., Wilson, S.H., Kraut, J., Pelletier, H., 1997.
Crystal structures of human dna polymerase beta complexed with
gapped and nicked dna: evidence for an induced fit mechanism.
Biochemistry 36, 11205–11215.
136
S. Günther et al. / BioSystems 85 (2006) 126–136
Spolar, R.S., Record Jr., M.T., 1994. Coupling of local folding to sitespecific binding of proteins to dna. Science 263, 777–784.
Stark, A., Sunyaev, S., Russell, R.B., 2003. A model for statistical
significance of local similarities in structure. J. Mol. Biol. 326 (5),
1307–1316.
Tahirov, T.H., Inoue-Bungo, T., Morii, H., Fujikawa, A., Sasaki, M.,
Kimura, K., Shiina, M., Sato, K., Kumasaka, T., Yamamoto, M.,
Ishii, S., Ogata, K., 2001. Structural analyses of dna recognition by
the aml1/runx-1 runt domain and its allosteric control by cbfbeta.
Cell 104, 755–767.
Tsai, C.J., Kumar, S., Ma, B., Nussinov, R., 1999. Folding funnels,
binding funnels, and protein function. Protein Sci. 8, 1181–1190.
Ward, J.H., 1963. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244.
White, A., Ding, X., vanderSpek, J.C., Murphy, J.R., Ringe, D., 1998.
Structure of the metal-ion-activated diphtheria toxin repressor/tox
operator complex. Nature 394, 502–506.
Yan, J., Liu, Y., Lukasik, S.M., Speck, N.A., Bushweller, J.H., 2004.
Cbfbeta allosterically regulates the runx1 runt domain via a dynamic conformational equilibrium. Nat. Struct. Mol. Biol. 11, 901–
906.
Yang, A.Y., Kallblad, P., Mancera, R.L., 2004. Molecular modelling
prediction of ligand binding site flexibility. J. Comput. Aided Mol.
Des. 18 (4), 235–250.