Quantitative Biology Problem Set #2 1. Equilibrium binding of two

Quantitative Biology
Problem Set #2
1. Equilibrium binding of two species: A protein (P) can bind with a short piece of DNA
(D) to form the complex PD. Assume that there is fixed total amount of P and D, with
concentrations [P]tot and [D]tot respectively. Find a general expression for the concentration
of the complex, [PD]. We will show that when [P]tot [D]tot , we recover the familiar form
given in class.
(a) Write down the dissociation constant Kd in terms of the free concentrations of P, D
and the complex PD.
Solution
Kd =
[Pf ] [Df ]
[PD]
(b) Write down the concentrations of total P and total D in terms of the free concentrations and the concentration of PD.
Solution
[Ptot ] = [Pf ] + [PD]
[Dtot ] = [Df ] + [PD]
(c) Use your results from (b) to express the expression for Kd from (a) in terms of [P]tot ,
[D]tot and [PD].
Solution
Kd =
([Ptot ] − [PD]) ([Dtot ] − [PD])
[PD]
(d) Solve the expression for Kd from (c) for [PD].
Solution
Multiplying through, we find
Kd [PD] = [Ptot ] [Dtot ] + [PD]2 − [PD] ([Ptot ] + [Dtot ])
Bringing everything to one side . . .
0 = [PD]2 − [PD] (Kd + [Ptot ] + [Dtot ]) + [Ptot ] [Dtot ]
This is a quadratic equation in [PD], the solution is
1
[PD] =
2
q
2
[Ptot ] + [Dtot ] + Kd − (Kd + [Ptot ] + [Dtot ]) − 4 [Ptot ] [Dtot ]
1
(e) Show that for [P]tot [D]tot , the fraction of DNA occupied, f ≡ [PD]/[D]tot , is given
approximately by f ≈ [P]tot /(Kd + [P]tot ).
(Hint: It may be useful to use Taylor’s expansion
√
1 − x ≈ 1 − x/2 for x 1.)
Solution
For ease of writing, we stop bracketing the concentrations, and just write,
PD, P, D. Since we’re taking the limit where [Ptot ] [Dtot ] let’s write Dtot
as to emphasize it is small. So we have
q
1
2
PD =
Ptot + + Kd − (Kd + Ptot + ) − 4Ptot 2
Now we pull the Ptot + + Kd out from the expression and have
!
s
4Ptot 1
PD = (Ptot + + Kd ) 1 − 1 −
2
(Kd + Ptot + )2
Since Ptot Dtot the quantity
4Ptot is small.
(Kd + Ptot + )2
We therefore use
it to make a Taylor expansion.
√
1 − x ≈ 1 − 12 x gives.
4Ptot 1
2
+O PD = (Ptot + + Kd ) 1 − 1 +
2
2 (Ptot + + Kd )2
Applying the Taylor series is
1
4Ptot (Ptot + + Kd )
2
2 (Ptot + + Kd )2
Ptot
Kd + Ptot
Restoring the original notation, we have the desired result
[PD] = [Dtot ]
[Ptot ]
Kd + [Atot ]
2. Protein-protein interactions: Two transcription factors A and B can bind in solution
or in cytoplasm to form the complex AB with a dissociation constant of KAB = 2 µM.
Transcription factor A is maintained at the level of 500 molecules per cell in the free form.
Only the complex AB can bind to a target site in the genome of E. coli, characterized by
an effective dissociation constant KAB−DN A = 6 nM.
(a) Write equilibrium constants KAB and KAB−DN A in terms of the relevant concentrations.
2
Solution
By definition, we have KAB =
[A][B]
[AB][DN A]
and KAB−DN A =
[AB]
[AB − DN A]
(b) What should the cellular level of proteins B be in order for the target site be occupied
5% and 95% of the time?
Solution
The occupation of genomic DNA is given by the ratio of occupied DNA to total
DNA:
−1
[AB − DN A]
1
[DN A]
O=
=
= 1+
[DN A]
[DN A] + [AB − DN A]
[AB − DN A]
1 + [AB−DN
A]
Using the definitions for the equilibrium constants in (a) we find
O=
=
1+
1+
KAB−DN A
[AB]
−1
KAB−DN A KAB
[A][B]
−1
Now, we can solve for [B]:
[A][B] = O · ([A][B] + KAB−DN A KAB )
[A][B](1 − O) = O · KAB−DN A KAB
O KAB−DN A KAB
[B] =
1−O
[A]
For the values provided in the problem
[B] = 24
KAB−DN A KAB
[A]
= 24, thus
O
1−O
Which gives [B] ≈ 1.3 molecules per cell for 5% occupancy and
[B] = 456 molecules per cell for 95% occupancy.
(c) What would be the synthesis rate of protein B if it is not degraded and the cells are
growing exponentially at 45 min per doubling. (You may neglect the occurrence of
multiple targets within a cell.)
Solution
3
Synthesis must balance dilution, so γsynth = µ[B]tot where µ = log 2/Td .
Further
[A][B]
[A][B][DN A]
[B]tot = [B] +
+
KAB
KAB KAB−DN A
Assuming one copy of the genome per cell [DN A] = 1 nM, we have [B]tot ≈
1.6 nM for 5% occupation and [B]tot ≈ 460 nM for 95% occupation.
For 1.6 molecules, we have γsynth ≈ 0.03 molecules/min .
For 460 molecules, we have γsynth ≈ 9 molecules/min .
(d) If instead, protein A can bind specifically to the target site by itself, with KA−DN A = 8
nM, but protein B can bind to protein A only when A is already bound to the target
site, what should the level of protein B be in order for the target site be occupied by
AB 5% and 95% of the time?
Solution
First, we write down the new equilibrium constants.
Also, KAB becomes KB−ADN A
[A][DN A]
= 8 nM
[A − DN A]
[B][A − DN A]
=
= 2 µM
[B − ADN A]
KA−DN A =
KB−ADN A
The occupancy O of the genomic target site is given by
[B−ADN A]
[DN A]+[A−DN A]+[B−ADN A]
O=
=
[B−ADN A]
[DN A]
[A−DN A]
[B−ADN A]
1+ [DN A] + [DN A]
=
[A][B]
KA−DN A KB−ADN A
[A]
[A][B]
1+ K
+K
A−DN A
A−DN A KB−ADN A
=
=
1
1+
KB−ADN A KA−DN A
K
A
+ AB−DN
[A][B]
[B]
[B]
K
A KA−DN A
[B]+ B−ADN[A]
+KAB−DN A
From here, we can solve for [B] in terms of [A] and the equilibrium constants.
O
[B] =
1−O
KB−ADN A KA−DN A
+ KB−ADN A
[A]
For 5% occupancy, [B] must be present at 107 molecules per cell.
4
For 95% occupancy, [B] must be present at 39000 molecules per cell.
3. Binding energy matrix: Mnt is a dimeric transcription factor which binds to a 17bp
DNA segment. The binding energy matrix Gi (b) was measured by the Stormo lab and is
reproduced below for the half site from position 10 to 17. The binding energies for the other
half (position 1-8) can obtained as the reverse complement of those shown here. Position
9 is a neutral position which does not affect Mnt-DNA binding.
Table 1: Binding energy matrix
position
A
C
G
T
10
1.8
2.4
0
3.0
11
2.4
1.9
1.6
0
12
1.6
4.2
0
2.2
13
1.0
2.1
0
2.2
14
0
0.3
1.2
0.6
15
2.1
0
3.2
2.2
16
0.8
0
1.0
0.7
17
1.1
0
1.2
0.3
(The numbers are expressed in units of kB T ≈ 0.6 kcal/mole)
e ∗ for the
(a) Given that Gns − G∗ ≈ 16kB T , find the effective dissociation constant K
strongest binder in the presence of genomic DNA that is 5 · 106 bp in length.
Hint: approximate the genomic DNA as a random string of nucleotides with equal distribution of {A, C, G, T }.
Solution
The following solution is a very detailed one, starting directly with the effective dissociation constant as derived in the lecture is also fine.
We can express the probability for site Sj to be occupied in terms of the
free energies of the the transcription factor (TF) binding events using the
Boltzmann distribution, which applies to systems that are at thermodynamic
equilibrium.
For DNA binding, we assume that the E. coli genome (of length N ) acts as
a kind of reservoir for the TFs. The TF can bind either to the target of
interest Sj or to any of the N other sites Si .
Each event has a probability of occurring, proportional by e−∆G(Si ) . This
gets weighted by the number of ways in which the binding event can occur.
Each binding event involves a TF binding to a target sequence Si of sequence
{b1i , b2i , . . . , bLi }. If the sequence is the target sequence Sj we have ∆G(Sj ) =
L
X
∗
∗
G , otherwise we have ∆G(Si ) = G +
Gi (bij ).
i=1
We assume that TF binding to any of the nonspecific sites is equivalent to
its binding to any of the other nonspecific sites, so that they are
indistinguishable events. The only distinguishable event is when a TF binds
to the target site Sj which has a lower free energy of binding. From now
5
on, we write the non-target Si as Sns .
This assumption is equivalent to approximating all N nonspecific binding
sites in the genome as perfectly scrambled random DNA of length 17 (length
of the TF binding site), with free energies given by the binding matrix.
When one of the P TFs bind to site Sj , it leaves the other P −1 TFs free
to bind to any of the N non-specific site in the genome.
If one TF is bound to Sj and the rest are bound non-specifically, there are
N!
ways for this to happen.
Ω(P − 1) = (P −1)!(N
−P +1)!
If Sj is unbound by a TF, there are Ω(P ) =
N!
P !(N −P )!
ways for this to happen.
The probability for binding to the target site Sj is then
fj =
Ω(P −1)e
Ω(P −1)e−∆G(Sj )/kB T
+Ω(P )he−∆G(Sns )/kB T i
−∆G(Sj )/kB T
1
=
*
Ω(P )
1+ Ω(P −1)
≈
So that fj = 1+
N
P
∆G(Sj )−∆G(Sns ))/kB T
e(
+
(∆G(S )−∆G(S ))/k T −1
ns
j
B
e
1
f∗ = N e(∆G(Sj )−∆G(Sns ))/kB T
with K
f∗ /P
1+K
f∗ = N e−G(Sns )/kB T =
From the Mnt matrix, G(Sj ) = 0 and so K
Now, G(Sns ) = G(b1 ) + G(b2 ) + . . . G(bL ), so
f∗ = N
K
17
Y
−∆G(bi )/kB T
e
i=1
=N
17 Y
1
i=1
4
−G(bA
i )/kB T
e
+e
−G(bC
i )/kB T
+e
−G(bG
i )/kB T
−G(bT
i )/kB T
+e
f∗ ≈ 6.2 TFs
Evaluating the expression numerically we find that K
X )/k
The bracketed expression is the average value of e−G(bi
over all possible bases X ∈ {A, C, G, T }.
Bt
at each position
(b) Approximate the non-zero entries of the binding energy matrix by one parameter, ,
e ∗.
and find the smallest value of that would result in the same K
From above, we have
6
f∗ = N
K
17 Y
1
i=1
4
−G(bA
i )/kB T
e
−G(bC
i )/kB T
+e
−G(bG
i )/kB T
+e
−G(bT
i )/kB T
+e
Now, if all non-zero entries of the Mnt binding matrix are made equal to
an effective energy , then we have G(bX
i ) = for all non-zero entries.
At each position in the target sequence, we will have 3 bases which have
the energy and one which has the energy 0 (except for position L = 9 which
has energy 0 for all 4 possible bases). Therefore, the term for the average
energy at a position becomes:
1
1 −G(bAi )/kB T
C
G
T
e
+ e−G(bi )/kB T + e−G(bi )/kB T + e−G(bi )/kB T ≈
1 + 3e−
4
4
f∗ becomes
and K
16
1
−
f∗ = N
K
1 + 3e
4
f∗ :
We can solve for in terms of K
1 + 3e
−
e
−
=
q
16
4
f∗
K
N
q
=
4
g
∗
16 K
−1
N
3
q3
= ln
4
g
∗
16 K
−1
N
f∗ = 6.2, N = 5 · 106 bp, gives ≈ 1.4 kB T .
Which for K
(c) The target sequence is located within the following segment of DNA
5’ - TCTACGATCCACTGTCGACTCGACTGCCGTAT - 3’
ns
Compute and plot the binding energy Gj = min(Gsp
j , G ) as a function of the position
j, the position in the sample sequence that the first position of the Mnt motif aligns
to. Repeat the plot for Gns − G∗ ≈ 30kB T . Attach your computer code (or show your
method if performed otherwise) and comment on your findings.
Solution
In this part, we don’t make the assumption that all non-target sites are equal
to a smeared average of all random DNA strings (which we did make above). We
7
go through the proposed 32 nt genome segment position by position and compute
the minimum of specific and non-specific binding energy at each location. At
locations where specific binding is more favorable than non-specific binding,
it is more likely that Mnt will bind.
The first step is to construct the full Mnt binding matrix by taking the reverse
complement of the Mnt matrix and adding in a row of 0’s for the inconsequential
9th position.
Once this is done, we proceed through the matrix, position by position, and
compute the energy of interaction between the target sequence the the Mnt dimer.
If this is less than the non-specific binding energy, we keep it, if it isn’t,
we keep the non-specific binding energy. We do this on the next page in the
functional language OCaml.
When the energy of a non-specific binder is 15 kB T above that of a specific
binder, the TF prefers to be in the non - specific conformation, which allows
it to perform a 1D search for the optimal alignment. There is a major kinetic
trap at position 5 which is likely a real binding site. When the non - specific
conformation is 30 kB T away from ideal, the TF is almost always more likely
to bind specifically and thus less able to perform a 1D search. All of these
specific binding sites act as kinetic traps in the search to find the optimal
one.
free energy landscape for G*-Gns=16 kB T
30
30
25
minHGsp,GnsL
minHGsp,GnsL
25
20
15
10
5
0
free energy landscape for G*-Gns = 30 kB T
20
15
10
5
0
5
10
Start position L
0
15
8
0
5
10
Start position L
15
Figure 1: Mnt energy calculator in OCaml
9
4. Multiple target sites: Suppose a transcription factor can bind to N0 distinguishable target sites with the same specific binding free energy G0 , and N additional distinguishable
background sites with the non-specific binding free energy Gns . Suppose that there are M
TF molecules in the cell, with N {M, N0 } , and assume all TFs are associated with the
DNA. Derive an expression for the probability PA (M ; N, N0 ) that a particular target site,
site A, is occupied by the following procedure.
*Note*: In what follows, we express the free energies Gi in dimensionless
form, i.e. we write Gi rather than Gi /kB T .
(a) Write down the Boltzmann weight W (M − m, m) that M − m proteins are bound to
the background sites and m proteins are bound to the target sites.
If there are m TFs bound to target sites, this releases mG0 of free energy.
If there are M − m bound to background sites, this releases (M − m)Gns
of free energy. Assuming that all M TFs bind to DNA, we have the following
statistical weight:
e−mG0 −(M −m)Gns = e−m(G0 −Gns ) e−M Gns
Also, we have to consider the number of distinct states that can result in
m of the N0 target sites being occupied. We do this by imagining the target
sites as squares on a checkerboard and the TFs as game pieces arranged on
the squares.
Figure 2: Cartoon of TF binding
There are N0 ! ways to order all of the squares. Since the TFs are identical,
we do not care about how the individual TFs are distributed about the specific
sites, so we get rid of the indistinguishable possibilities and divide by
m!. We also do not care about the identities of the empty sites. We get
rid of these through dividing by (N0 − m)!.
!
Thus, we find that there are m!(NN00−m)!
= Nm0 ways to arrange the m TFs on
the target sites. In the same way, there are MN−m ways to arrange the
10
leftover TFs on the N target sites.
N0
N
Putting this all together, we find PA (M ; N, N0 ) =
e−m(G0 −Gns ) e−M Gns
m
M −m
(b) Given that m proteins are bound to the N0 target sites, what is the probability,
f (m, N0 ), that the site A is occupied?
If there are m TFs bound to N0 target sites, then there is a chance f (m, N0 ) =
m/N0 that any given site is occupied.
(c) Argue that PA is given by
N0
P
PA (M ; N, N0 ) =
f (m, N0 )W (M − m, m)
m=0
N0
P
W (M − m, m)
m=0
Simplify this expression for M N0 and cast it into the form PA =
1
fA /M
1+K
Hint: For N n, N !/(N − n)! ≈ N n
Using the results from (a) and (b) we have:
N
P0
PA (M ; N, N0 ) =
m
N
m=0 0
N
P0
m=0
N
P0
=
m=0
N
P0
ns
G
e−M
(Nm0 )(MN−m)e−m(G0 −Gns )
ns
G
e−M
(Nm0 )(MN−m)e−m(G0 −Gns )
0 −1)( N )e−m(G0 −Gns )
(Nm−1
M −m
m=0
(1)
(Nm0 )(MN−m)e−m(G0 −Gns )
Now, we have N M N0 , m, so we can use
N!
≈ N n to simplify.
(N − n)!
n terms
z
}|
{
N!
We can see this by writing
= N (N − 1) . . . (N − (n − 1))
(N − n)!
Since N n we have N −n ≈ n.
to N n .
Therefore the n terms are roughly equal
11
Using this, we see that
N
M −m
=
N!
(M −m)!(N −M +m)!
=
M!
1
N!
(N −M +m)! (M −m)! M !
Mm
N M −m
M!
M m NM
N
M!
=
=
The reason that the M N0 condition is important is that it ensures the
sum over m will never be close to M and so the approximation on M !/(M −
m)! is justified.
Let γ = e−(G0 −Gns ) , we have:
N
P0
PA (M ; N, N0 ) =
m=0
N
P0
0 −1)( M )
(Nm−1
N
m=0
=
γM
N
m m
γ
m
(Nm0 )( M
N )
N
P0
m=0
N
P0
m=0
γm
0 −1)(γ M )
(Nm−1
N
m−1
m
(Nm0 )(γ M
N )
The sums are easy to do, if we expanded (x + y))N we find that each term xn y N −n
N
n N −n
P
N
is multiplied by Nn , so that (x + y)N =
x y
. Going backwards,
n
n=0
. (Also xy =
we see that the sums are of this form with y = 1 and x = γ M
N
0 for all y < 0). So we have:
N0
−1
M 1+γM
N
PA (M ; N, N0 ) = γ
N
0
N 1 + γ M N
=
1
N
1 + γ1 M
fA in terms of the other variables introduced above.
Find K
fA = N γ −1 = N e−(Gns −G0 )
From the above, we see that K
Compare it to the expression obtained in class for the case of a single target site and
fA is independent of N0 .
show that as long as M N0 , K
12
It is clearly N0 independent in this approximation.
In class ("Lecture 4") and in the solution to Problem 2 it was shown that
e is equal
in a background of N unique background sites, the effective K
N
P
to
e−Gn . In our approximation where the N sites have the same free energy
of binding, each term in the sum is the same and we just multiply the summand
by N , which matches to our solution here.
fA on N0 affect the “programmability” of the binding affin(d) Would the dependence of K
ity?
fA were dependent on N0 , then additional target sites would interfere
If K
with the binding to a single target site. The probability of binding to
a given site would be reduced.
(e) Finally, we’ll see just how large M needs to be before the above approximation becomes
fA starting from the full form of PA for N0 = 2.
valid. Write the exact expression for K
Writing out the sums in (1), we have:
N
M −1
N
M
+2
γ+
N
M −1
N
M −2
2
γ
γ+
N
M −2
γ2
N!
N!
γ + (M −2)!(N
γ2
(M −1)!(N −M +1)!
−M +2)!
N!
N!
N!
+ 2 (M −1)!(N
γ + (M −2)!(N
γ2
M !(N −M )!
−M +1)!
−M +2)!
M
γ
N −M +1
+
M
γ
1 + 2 N −M
+1
M (M −1)
γ2
(N −M +1)(N −M +2)
(M −1)
+ (N −MM+1)(N
γ2
−M +2)
M (N − M + 2)γ + M (M − 1)γ 2
(N − M + 1)(N − M + 2) + 2M (N − M + 2)γ + M (M − 1)γ 2
M (N − M + 2)γ + M (M − 1)γ 2
[M (N − M + 2)γ + M (M − 1)γ 2 ] + [(N − M + 1)(N − M + 2) + M (N − M + 2)γ]
1
1+
(N −M +1)(N −M +2)+M (N −M +2)γ
M (N −M +2)γ+M (M −1)γ 2
1
1+
h
(N −M +1)(N −M +2)+M (N −M +2)γ
(N −M +2)γ+(M −1)γ 2
fA is given by
Therefore, K
i
/M
(N − M + 1)(N − M + 2) + M (N − M + 2)γ
(N − M + 2)γ + (M − 1)γ 2
13
Show that it is dependent on M .
fA = f (M )
Clearly, K
fA as a function of M for N0 = 2. Find
For N = 107 bp, and Gns − G0 = 16kB T , plot K
fA approaches the M -independent limit.
the range of M where K
Here we are plotting to see just how big M has to be, for the case of two
fA is M -independent. K
fA is roughly M -independent
target sites, so that K
when M is ∼5-7, which is just more than twice N0 !
Ž
KA + vs. M for N0 = 2
10
Ž
KA
8
6
4
2
0
0
5
10
15
M HTFs per cellL
g
Figure 3: K
A vs. M
14
20
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
OCaml code for 3.3
( ∗ f u n c t i o n t o r e v e r s e an a r r a y ∗ )
let revArray a =
l e t l e n = Array . l e n g t h a − 1 in
Array . i n i t ( l e n +1) ( fun i −> a . ( l e n − i ) ) ; ;
(∗ d e f i n i t i o n s ∗)
l e t b i n d r e g i o n = ”TCTAGGATCCACTGTGGACTCGACTGACGTAT” ; ;
let regionlength = String . length bindregion ; ;
l e t delG = 1 5 . 0 ; ;
let halfmnt =
[| [|1.8;2.4;0.0;3.0|]; [|2.4;1.9;1.6;0.0|];
[|1.6;4.2;0.0;2.2|]; [|1.0;2.1;0.0;2.2|];
[|0.0;0.3;1.2;0.6|]; [|2.1;0.0;3.2;2.2|];
[|0.8;0.0;1.0;0.7|]; [|1.1;0.0;1.2;0.3|] |] ;;
( ∗ making Mnt m a t r i x : b u i l d s t h e f u l l Mnt m a t r i x from h a l f m a t r i x ∗ )
l e t f u l l m n t = Array . append [ | Array . c r e a t e 4 0 . 0 | ] h a l f m n t ; ;
l e t f u l l m n t = Array . append ( r e v A r r a y ( Array . map r e v A r r a y h a l f m n t ) ) f u l l m n t ; ;
l e t t a r g e t l e n g t h = Array . l e n g t h f u l l m n t ; ;
( ∗ match : g i v e n pos , i d o f base , r e t u r n s e n e r g y Mnt i n t e r a c t i o n ∗ )
let
|
|
|
|
|
l 2 e pos = function
( ’A’ | ’ a ’ ) −> f u l l m n t
( ’C’ | ’ c ’ ) −> f u l l m n t
( ’G’ | ’ g ’ ) −> f u l l m n t
( ’T’ | ’ t ’ ) −> f u l l m n t
−> r a i s e Not found ; ;
. ( pos ) . ( 0 )
. ( pos ) . ( 1 )
. ( pos ) . ( 2 )
. ( pos ) . ( 3 )
( ∗ e n e r g y : g i v e s t h e e n e r g y f o r Mnt t o i n t e r a c t w i t h a
sequence ∗)
l e t rec e n e r g y ? ( i = 0 ) ? ( value = 0 . 0 ) r e g i o n =
match i with
| i when i < t a r g e t l e n g t h −> ( e n e r g y ˜ i : ( i +1) ˜ value : ( value +. ( l 2 e i r e g i o n . [ i ] ) ) r e g i o n )
| t a r g e t l e n g t h −> min value delG
|
−> r a i s e Not found ; ;
( ∗ t a r g e t s : makes an a r r a y o f t h e p o t e n t i a l t a r g e t s e q u e n c e s ∗ )
l e t t a r g e t m a k e r = ( fun i −> S t r i n g . sub b i n d r e g i o n i t a r g e t l e n g t h ) ; ;
l e t i n d e x = Array . i n i t ( r e g i o n l e n g t h −t a r g e t l e n g t h +1) ( fun i −> i ) ; ;
l e t t a r g e t s = Array . map t a r g e t m a k e r i n d e x ; ;
( ∗ c a l c u l a t i o n : maps
energy function over
p o t e n t i a l t a r g e t s ∗)
Array . map e n e r g y t a r g e t s
15