Regulatory Dynamics in Engineered Gene Networks

Regulatory Dynamics
in Engineered Gene
Networks
The Physico-chemical Foundation
of Transcriptional Regulation with
Applications to Systems Biology
Mads Kærn
Boston University
Center for BioDynamics
Center for Advanced Biotechnology
Department of Biomedical Engineering
Disclaimer. This document contains material that has been reproduced from various sources
without permission from the copyright owners. As a result, the document may only be distributed to participants of the 4th International Systems Biology Conference, Washington Unic Mads Kærn, 2003. The document is intended for
versity, St. Louis. All other material is educational purposes only. Should any copyrights have been infringed, please contact the author
and the material will be removed immediately.
Contents
1 The Biology of Gene Expression
1.1 The Genetic Code . . . . . . . . . . . . . . . .
1.2 Genes and Gene Expression . . . . . . . . . .
1.3 Transcription and Translation . . . . . . . . . .
1.3.1 Prokaryotic Cells . . . . . . . . . . . .
1.3.2 Eukaryotic Cells . . . . . . . . . . . .
1.4 Regulation of Gene Expression . . . . . . . . .
1.4.1 The Lactose Operon of E. coli . . . . .
1.4.2 The Genetic Switch in Bacteriophage λ
1.4.3 The Galactose Regulon in S. cerevisiae
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
10
13
14
16
19
20
22
24
2 Engineered Gene Networks
2.1 Some Tools of the Trade . . . . . . . . .
2.1.1 Cutting and Pasting DNA . . . .
2.1.2 Plasmid Vectors . . . . . . . . . .
2.1.3 Extracting DNA Sequences . . .
2.2 Engineering Regulatory Modules . . . . .
2.2.1 Genetic Switches in E. coli . . . .
2.2.2 Genetic Switches in S. cerevisiae
2.2.3 Mammalian Switches . . . . . . .
2.3 Engineering Regulatory Circuits . . . . .
2.4 How Transcriptional Regulation Works .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
32
32
33
36
39
39
40
44
46
48
3 Modeling Small Gene Networks
3.1 Biochemical Reaction Kinetics . . . . .
3.1.1 Elementary Reactions . . . . .
3.1.2 Law of Mass Action . . . . . .
3.1.3 Generalized Mass Action . . . .
3.1.4 Chemical Equilibrium . . . . .
3.1.5 The Michaelis-Menten Reaction
3.1.6 Hill-type Kinetics . . . . . . . .
3.2 Modeling Gene Expression . . . . . . .
3.3 Modeling cis-Regulatory Systems . . .
3.3.1 Repressor-Operator Binding . .
3.3.2 Alternative Reaction Paths . . .
3.3.3 Cooperative Binding of Dimers
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
53
54
54
57
58
60
63
65
69
70
71
74
.
.
.
.
.
.
.
.
.
.
.
.
4
3.4
3.5
3.6
3.3.4 Synergism in RNA Polymerase Binding
3.3.5 DNA looping . . . . . . . . . . . . . .
Models of Gene Regulatory Systems . . . . . .
3.4.1 The Lactose Operon in E. coli . . . . .
3.4.2 The Galactose Regulon in S. cerevisiae
Models of Engineered Gene Networks . . . . .
Concluding Remarks . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
77
79
83
83
89
93
94
Foreword
The future success of System Biology requires the establishment of general principles
and the development of methodologies that can be used to link the behavior of individual molecules to system characteristics and functions. In order to achieve this goal, we
need to study systems that have been characterized in minute detail and are sufficiently
small to be manageable. The central theme in this tutorial is the use of engineered gene
networks to deduce principles that govern gene transcription and to develop reasonably
accurate system level models from qualitative molecular level information. The tutorial
consists of three parts: (1) The Biology of Gene Expression, (2) Genetic Network Engineering and (3) Modeling Small Gene Networks. No previous knowledge of molecular
biology is assumed.
The purpose of Part (1) is to provide a brief introduction to the fundamental biology of gene expression and a discussion of the current theories of gene regulation in
bacteria and in yeast. Part (2) will provide a basic introduction to some experimental
techniques. The main emphasis, however, is a discussion of how genetically engineered
systems have provided support for the theories of transcriptional regulation introduced
in Part (1) and how they are used to investigate system level characteristics and function.
Part (3) discusses the physico-chemical basis of gene regulatory systems and provide
a detailed and rigorous methodology that can be used to convert qualitative molecular
level models into quantitative system level descriptions. Particular emphasis will be
given to the limits and dangers of quantitative modeling of which any researcher in
Systems Biology should be aware.
Discussions and comments from Michael Driscoll and Michael Thompson have
been very valuable during the writing of these notes. The reader is kindly reminded that
the notes serve only as a brief introduction to a very large subject area. They include
material that I believe is most relevance to readers that are involved in the mathematical
and computational aspects of Systems Biology, and are looking for a brief summary of
important aspects from molecular biology and physical chemistry. I have attempted to
present this material in a way that is accessible to a non-specialist audience. Despite
my best efforts, this has undoubtfully resulted in descriptions that in many aspects are
oversimplified. My sincere apologies go to the authors of the books and the articles that
for one reason or the other did not make the Suggested Readings lists. Please report
errors and mistakes to [email protected]. Suggestions that can be used to improve the
quality of future versions are most welcome.
Tutorial Part
1
The Biology of Gene
Expression
The sophistication of biological control systems is extraordinarily rich and regulation
takes place on many different levels simultaneously. Novel surprising details are constantly revealed as our experimental methods continue to improve and new technologies
are invented. As a result, it is difficult to organize a comprehensive presentation of general aspects without getting mired in details that may not appear to be important, but
probably are. Since most engineered cellular control systems currently involve manipulation of the information stored in the cell’s DNA, this tutorial will focus on regulatory
processes at the level of gene transcription. In this section, I will briefly summarize
some basic concepts from molecular level biology and then discuss some of the general principles involved in the regulation of gene transcription. This discussion will
be augmented by a walk-through of some of the best studied natural gene regulatory
systems.
1.1 The Genetic Code
Most regulatory processes that take place within cells involve proteins whose structure
and function is determined by information stored in the cell’s DNA. Genetic engineering and the engineering of gene networks involve the manipulation of this information
and of the conditions under which it is used to synthesize proteins. The DNA molecule
encodes information in the four nucleotides containing the bases adenine (A), guanine
8
The Biology of Gene Expression
Figure 1.1: (A) The molecular structure of ribonucleic acid (RNA) and deoxyribonucleic acid
(DNA). RNA and DNA has a hydroxyl group and a hydrogen atom at position X, respectively.
In DNA, the base bound to the carbon at position 1’ is adenine (A), guanine (G), cytosine (C)
or thymine (T). In RNA, the thymine is replaced by uracil (U). (B) Double stranded DNA.
Hydrogen bonds (broken line) are formed between the bases A and T or G and C and links together two complementary single stranded DNA molecules. (C) The helical structure of double
stranded DNA.
(G), cytosine (C) and thymine (T). The molecular structure of the nucleotides is illustrated in Fig. 1.1. In the figure, carbon atoms are indicated as solid circles, lines
indicate covalent bonds between atoms and sticks indicate a covalent bond that ends
in a hydrogen. RNA and DNA differ in the identity of the atom bound to the carbon
at position 2’ in the sugar ring (marked by an “X” in Fig. 1.1A). RNA has a hydroxyl
group bound at this position while DNA has a hydrogen atom. Polynucleotide chains
are formed by individual ribonucleotides being linked to each other through a phosphodiester bond. This bond is between the phosphate group bound to the carbon at
position 5’ and the oxygen bound to the carbon at position 3’ and establishes the 5’→3’
directionality of the polymer chain. Under normal conditions, the DNA is in a double
stranded form that consists of the 5’-3’ strand and its complement where the direction
of the DNA backbone is reversed (Fig. 1.1B). Bases on opposite strands are paired with
each other through hydrogen bonds such that A pairs with T and C pairs with G. The
double stranded DNA forms a helical structure (Fig. 1.1C).
The synthesis of a protein based on the DNA-encoded amino acid sequence requires
at least two steps. First, the genomic information must be transcribed from the DNA sequence into a messenger RNA molecule (mRNA). This is done by an RNA polymerase,
which, in analogy to DNA polymerase, catalyzes the formation of phosphodiester bonds
between individual nucleotides (Fig. 1.1B). The structure of RNA molecules is similar
to that of DNA molecules with the exception that the backbone consists of ribose rather
than deoxyribose and the base thymine is replaced by the base uracil (U). Furthermore,
the mRNA is usually single stranded.
After transcription, the message contained in the mRNA must be translated into
a protein. This is done by the ribosome, which is a molecular machine made of both
1.1 The Genetic Code
Figure 1.2: (A) The molecular structure of amino acids. The identity of the amino acid is determined by its side chain. (B) Peptide bond formed between the amino- and the carboxyl-groups
of two amino acids. (C) The correspondence between the DNA sequence, mRNA sequence and
that sequence of the first eight amino acids of the LacR repressor protein.
RNA and protein. The process of translation involves two additional types of RNA
molecules, ribosomal RNA (rRNA) and transfer RNA (tRNA). The rRNA molecules
are components of the ribosome. The tRNAs provide the specificity that enables the
insertion of the correct amino acid into the protein that is being synthesized.
Proteins consist of a chain in which individual amino acids residues are linked to
each other through peptide bonds. The general structure of the amino acids is illustrated
in Fig. 1.2A. In analogy with DNA and RNA, they consist of a common element that
enables the formation of a polymer chain. The identity and the property of the individual amino acids is determined by the side chain. There are 20 naturally occurring
amino acids. In the polymer chain that forms the backbone of proteins, the individual amino acids are linked to each other through peptide bonds formed between the
carboxyl-group of one amino acid and the amino-group of another (Fig. 1.2B). This
creates a chain that at one end has a free amino-group, the N-terminal (NH+
3 ), and the
−
other end has a free carboxyl-group, the C-terminal (COO ).
The DNA molecule stores the information required to synthesize proteins in terms
of a string of codons. A codon consists of three nucleotides, each selected from of the
four available bases (A, T, G or C), which are read from the DNA molecule in the 5’
to 3’ direction. In Fig. 1.2B, the codon encoded on the left strand is AGT while the
codon encoded on the right strand is ACT. Of the 64 possible codons, 61 encode for
one of 20 amino acids (Table 1.1). The genetic code is thus redundant and different
codons may identify the same amino acid. The last three codons (TAA, TAG and TGA)
are stop codons. They define the end of the protein encoding region of the DNA. In
addition, the order of the amino acids in the polypeptide chain is determined by the
sequence in which the codons appear in the DNA sequence. In most cases, there is a
linear relationship between the DNA sequence and the amino acid sequence within the
protein that the sequence encodes. This is illustrated in Fig. 1.2C, which shows the first
24 base pairs of the gene that encodes the LacR repressor protein, the corresponding
mRNA sequence and the sequence of the first 8 amino acids in the LacR repressor
9
10
The Biology of Gene Expression
1st
(5’)
A
A
A
A
T
T
T
T
C
C
C
C
G
G
G
G
2nd
T
Isoleusine
Isoleusine
Isoleusine
Methionine
Leucine
Phenylalanine
Phenylalanine
Leucine
Leucine
Leucine
Leucine
Leucine
Valine
Valine
Valine
Valine
C
Threonine
Threonine
Threonine
Threonine
Serine
Serine
Serine
Serine
Proline
Proline
Proline
Proline
Alanine
Alanine
Alanine
Alanine
A
Lysine
Asparagine
Asparagine
Lysine
STOP
Tyrosine
Tyrosine
STOP
Glutamine
Histidine
Histidine
Glutamine
Glutamic acid
Apartic acid
Apartic acid
Glutamic acid
G
Arginine
Serine
Serine
Arginine
STOP
Cysteine
Cysteine
Tryptophan
Arginine
Arginine
Arginine
Arginine
Glycine
Glycine
Glycine
Glycine
3rd
(3’)
A
T
C
G
A
T
C
G
A
T
C
G
A
T
C
G
Table 1.1: The correlation between the sequence of bases in the codons and the amino
acids. The codons TAA, TGA and TAA signals termination of translation.
polypeptide chain. The N- and C-terminal regions are encoded by the codons in the 5’
and the 3’ end of the DNA-encoding sequence, respectively.
Once translation is completed and the full length DNA-encoded polypeptide has
been formed, the function of many proteins requires the completion of additional steps.
This may involve, for example, covalent modification, such as phosphorylation, acetylation or glycosylation, i.e., the addition of a phosphate, an acetyl or a glycosyl-group,
the incorporation of the protein into multi-protein complexes or the transportation of
the protein to its appropriate cellular location, for instance, in the cell membrane.
1.2
Genes and Gene Expression
The term gene is usually used to refer to the DNA sequence that is transcribed into
mRNA and subsequently translated into a protein. However, there are important exceptions to this rule. For example, DNA sequences that encode for molecules like rRNA
and tRNA are genes even though the RNA molecule is never translated into a protein.
Genes are usually carried on the cell’s chromosomes. Each chromosome carries at least
one origin of replication. These regions determine the location where the DNA polymerase initiates the duplication of the genetic material. The location of a specific gene
on the chromosome is called the gene’s locus. Haploid cells carry a single copy of each
1.2 Genes and Gene Expression
Figure 1.3: (A) Schematic illustration of DNA wrapped around a nucleosome. (B) The primary
component of the nucleosomes consists of four histone proteins H2A, H2B, H3 and H4. The
nucleosomes can be remodeled and rearranged spatially by covalent modification of the protruding histone tails. (C) Illustration of potential organizations of nucleosomes in spatial structures
c
(Reproduced without permission from Bednar et al., National
Academy of Sciences).
chromosome and the locus thus uniquely determines the location of the gene. Diploid
cells have homologous chromosome pairs. Two different forms of the same gene are
known as alleles.
The chromosomes are organized very differently in prokaryotic, which lack a cell
nucleus, and in eukaryotic cells. In bacteria (a prokaryote), such as Escherichia coli, all
of the genes are located on a single, circular chromosome while the genes in eukaryotic
cells are located on several linear chromosomes. There are 16 chromosomes in yeast. In
addition, the eukaryotic DNA is complexed with nuclear proteins and compacted into
a structure called chromatin. Central to this structure is the wrapping of approximately
200 base pairs of DNA around protein complexes known as nucleosomes (Fig. 1.3A).
The organization of chromatin and of the nucleosomes can be used as an instrument
to regulate which genes are accessible for transcription by RNA polymerase (discussed
in section 1.3.2). The primary constituent of the nucleosomes is the four histone proteins H2A, H2B, H3 and H4, which combine to form a histone tetramer (Fig. 1.3B). A
nucleosome consists of two histone tetramers. Each histone subunit has a protruding
N-terminal “tail” that serves important regulatory functions. There, covalent modifications, such as acetylation, can greatly influence the accessibility of the DNA . The
nucleosomes are, together with other nuclear proteins, arranged into chromatin fibers.
Examples of potential spatial arrangements of the nucleosomes are shown in Fig. 1.3C.
In addition to the chromosomes, genes can be carried on plasmids. Plasmids are in
many ways similar to the bacterial chromosome. They are circular pieces of DNA that
typically replicate independently of duplication of the chromosomal DNA prior to cell
division. As a result, plasmids are often present in multiple copies within each cell and
the plasmid copy number usually changes as cells progress through the cell division
cycle. The average copy number of plasmid per cell depends on the type of the origin
11
12
The Biology of Gene Expression
Figure 1.4: Typical organization of a gene containing the information required for the synthesis of a protein. The promoter is the region where the RNA polymerase initially binds. The
terminator is the region where the RNA polymerase is released from the DNA. The DNA also
contains regions that, when transcribed into mRNA, controls translation initiation (5’ UTR) and
termination (3’ UTR).
of replication that it carries. Some plasmids are stringently controlled and are present
only in a single copy while others are loosely regulated and present in 60 copies per
cell or higher. Plasmids are used widely in genetic engineering (Tutorial Part 2).
In addition to the sequences that encode for genes, the DNA contains regions that
are involved in the regulation of gene transcription. The RNA polymerase reads the
genetic code in the 5’ to 3’ direction and the location where it initially binds to the
DNA is located upstream of the gene, i.e., farther in the 5’ direction (Fig. 1.4). The
region where the RNA polymerase initially contacts the DNA is called the promoter of
the gene whose expression it facilitates. The expression of a gene may occur from more
than one promoter, i.e., the region upstream of the gene may contain distinct binding
sites for the RNA polymerase. The first nucleotide that is transcribed is usually labeled
+1 and nucleotides are counted relative to this transcription start site in the 5’ to 3’
direction of the DNA. The nucleotides in the gene-encoding region are thus labeled
with positive numbers while nucleotides within the promoter region are labeled with
negative numbers. In bacteria, the promoter region is about 60 base pairs in length and
spans roughly 40 base pair upstream and roughly 20 base pairs downstream of the +1
site. In yeast, the promoter region spans roughly 200 base pairs.
Generally speaking, no two promoters are identical. Statistical analysis has however shown that there are regions that are highly conserved within different promoters.
In bacteria, one of these regions is located at position -10 and has the consensus sequence TATAAT. This region is called the TATA-box and is in many cases essential for
the proper alignment of the RNA polymerase holoenzyme with respect to the geneencoding sequence. Mutations of the TATA-box sequence, i.e., the substitution of one
nucleotide with another, can greatly affect the the rate at which the DNA is transcribed
into an mRNA. A sequence that is similar to the TATA-box is also important for the
transcription of many eukaryotic genes.
In addition to the TATA-box, the promoter region often contains sites where transcription factor proteins can bind and directly or indirectly affect the rate of transcription. In bacteria, transcription factor binding sites are often referred to as operators.
1.3 Transcription and Translation
However, such regulatory elements may also be located far from the promoter region
or even within the gene-encoding region of the DNA. In eukaryotes, it is quite common
to find enhancer sequences that affect the transcription from a promoter located very
far from it in the DNA sequence. This action-at-a-distance can arise from the rearrangement of chromatin structure and/or close spatial proximity of transcription factors
bound to the enhancer sequence due to bending and looping of the DNA. Transcription
factor binding sites are referred to as cis-regulatory elements while the transcription
factor proteins that binds to them are referred to as trans-regulatory elements.
In addition to the promoter and cis-regulatory elements, there are sequences within
the DNA that determine the termination of transcription (a terminator sequence) and,
for protein-encoding genes, sequences that determine the region of the mRNA that is
to be translated into protein (Fig. 1.4). The codon that indicates the location where
translation is to start, the translation start codon, is often ATG. The DNA sequence
located between the start site of transcription and start of translation is referred to as an
untranslated region (UTR). UTRs can greatly influence the efficiency of gene expression, for example by determining how well the ribosomes can bind to the mRNA and
initiate translation. The translation stop codon that indicates the location where translation is terminated is either TAA, TAG or TGA. The sequence of the DNA between
the stop codon and the site where transcription is terminated can also have an effect on
the efficiency of gene expression. This region is referred to as the 3’ UTR.
1.3
Transcription and Translation
Similarly to the definition of a gene, the meaning of gene expression is not always
clearly defined. Some use the term gene expression to refer to the biological manifestation in terms of alteration in phenotype, that is, an observable change in the characteristics of the cell. The gene that is responsible for a specific cellular trait can be said
to be expressed when the phenotype is observed and not expressed otherwise. In other
words, gene expression can be viewed as being a binary on/off process. Others use
gene expression to refer to the process that starts when the transcription of the DNA
that encodes the gene is initiated and ends when a biologically functional molecule
is formed, regardless of whether this is accompanied with a detectable change in the
cell’s phenotype. In this view, gene expression can be graded and quantified based on
measurements of the activity of the end product of the gene expression process.
Since many proteins require some post-translational modification to be fully functional, e.g., the attachment of a phosphate group or the incorporation of the protein into
a larger complex, it can be argued that such events are part of the process in which the
genomic information is expressed. Generally speaking, however, there will be a positive correlation between the rate at which a gene is transcribed and the abundance (and
hence the activity) of the end product of the gene expression process. Typically, if a
gene’s mRNA is abundant within a cell, there will be a high level of the corresponding protein product. Transcription is usually a prerequisite for gene expression and the
13
14
The Biology of Gene Expression
control of transcription is one of the most important regulatory instruments available
to the cells. In prokaryotes as well as eukaryotes, the transcription of a gene into a
corresponding mRNA occurs in three general steps: transcription initiation, elongation
of the mRNA and termination of transcription. Gene expression can be regulated on all
of these levels. Regulation of gene expression at the levels of transcription initiation is,
however, the most common.
1.3.1 Prokaryotic Cells
The general steps involved in the transcription of prokaryotic genes are illustrated in
Fig. 1.5. The RNA polymerase core enzyme, which is a multi-component complex
consisting of two α, one β and one β 0 subunit, must first bind to the DNA at an appropriate position relative to the gene that is to be transcribed. Molecules known as sigma
factors facilitate the appropriate positioning of the core enzyme. The sigma factor combines with the core enzyme to form the RNA polymerase holoenzyme (Fig. 1.5A). The
sigma factor provides specificity to the RNA polymerase holoenzyme and ensures that
transcription occurs only from promoters. In addition, sigma factors serve as global
regulators of gene expression and are used to direct transcription on a genome-wide
scale. For example, by recognizing specific cis-regulatory elements, the sigma factor σ54 can direct transcription of a set of genes that are not transcribed when the RNA
polymerase holoenzyme contains the sigma factor σ70 . The discussion below addresses
transcription from promoters by the holoenzyme containing the most common sigma
factor σ70 .
The binding of the holoenzyme to a promoter is usually considered to be reversible
with many association and dissociation reactions taking place before transcription is
initiated. The binding affinity depends on the promoter sequence and is, as mentioned,
particularly sensitive to variation in the bases in the TATA-box. The binding of the RNA
polymerase holoenzyme to the promoter leads to the formation of a closed complex in
which the DNA remains in its double-stranded duplex form (Fig. 1.5B). The initiation
of transcription involves a transition from the closed complex to an open complex. In the
open complex, the helical structure of the DNA is disrupted to expose a single-stranded
region of the DNA near the transcription initiation site (Fig. 1.5C). The transition from
closed to open complex is usually considered irreversible and transcription will usually
occur once the open complex has been established.
Once the open complex is formed, synthesis of the mRNA molecule begins with
the formation of a phosphodiester bond between the first two ribonucleotides that are
base-paired complementary to the DNA template. The result is the formation of a
ternary complex consisting of the holoenzyme, DNA and RNA (Fig. 1.5D). Further
ribonucleotides are then added to the RNA chain, up to a length of 9 bases. During this
stage, a transition back to the open complex (Fig. 1.5C) can occur by the release of the
short nucleotide chain from the ternary complex. This is known as abortive initiation.
In order to synthesize RNA chains longer than 9-10 bases, the RNA polymerase must
move along the DNA in the 5’ to 3’ direction. This requires that the sigma factor be
1.3 Transcription and Translation
Figure 1.5: Prokaryotic transcription initiation. (A, B) The binding of the RNA polymerase
holoenzyme to the promoter is facilitated by the sigma factor. (C) The DNA is opened to
expose a region of single stranded DNA. (D) The single stranded DNA is used to synthesize
short RNAs. (E) The release of the sigma factor allows the RNA to move down the gene and
produce a full-length mRNA. The mRNA is translated into protein as soon as it emerges from
the elongating complex.
released from the holoenzyme and the formation of the elongation complex composed
of the RNA polymerase core enzyme, DNA and RNA (Fig. 1.5E). This complex can
move along the DNA and synthesize full-length mRNA. After transcription is initiated,
it usually takes some time (1-2 seconds) before the core enzyme clears the promoter
region and another holoenzyme can bind.
Translation of the mRNA is facilitated by the ribosomes and, in bacteria, occurs
simultaneously with the elongation of the mRNA transcript (Fig. 1.5E). The translation start site is located downstream of the transcriptional start site and the sequence
between them defines the 5’ UTR. The 5’ UTR contains the site where the ribosomes
initially bind to the mRNA, the ribosome-binding site (RBS), and is part of the RNA
molecule that first emerges during transcription elongation. A ribosome binds to the
RBS as soon as it emerges from the elongating ternary complex and immediately starts
to synthesize the polypeptide chain encoded in the mRNA. The efficiency of translation
can vary substantially depending on the RBS sequence and/or the distance between the
transcription and the translation start sites.
As mentioned above, the RNA polymerase holoenzyme mediates transcription from
a different set of promoters when it contains the sigma factor σ54 rather than σ70 . It
15
16
The Biology of Gene Expression
Complex
TFIIB
TFIIA, TFIIE
TFIIF
TFIIH
Subunits
1
2
3
9
Complex
RNA polymerase II
TFIID
Mediator
SAGA
Subunits
12
12∗
26
15
Table 1.2: The number of subunits for some common complexes in the transcriptional machinery. Spt-Ada-Gcn5 acetyltransferase is abbreviated as SAGA. ∗ The 12 subunits of TFIID
include TBP (one subunit) and TAFs (11 subunits). Numbers are from Ptashne and Gann.
also initiates transcription in a manner that is different from the scenario illustrated in
Fig. 1.5. The holoenzyme is able to bind to the promoter region to form the closed
complex. However, the transition to the open complex does not happen spontaneously
unless the σ54 subunit is in the correct conformation. The required modification of
the holoenzyme can be provided by DNA binding proteins that have ATPase activity.
These activators bind to the DNA upstream of the closed complex and, subsequently,
make contact with the σ54 subunit through a DNA loop. ATPases are able to couple
the energy released by the hydrolysis of the energy-storage molecules ATP to a specific process, in this case causing a conformational change in the N-terminal region of
the σ54 subunit. Transcription mediated by the σ54 -holoenzyme is thus more complex
than that mediated by the σ70 -holoenzyme. In fact, the requirements of an transcriptional activator acting at a distance and of an activator-mediated conformational change
prior to open complex formation makes σ54 -mediated transcription appear as a hybrid
of prokaryotic and eukaryotic mechanisms of transcription initiation (see below). In
addition, transcription mediated by σ54 may involve a process known as transcription
reinitiation (discussed in section 1.3.2).
1.3.2 Eukaryotic Cells
Eukaryotes have three different RNA polymerases. RNA polymerases I and III transcribe genes that encode for rRNA and tRNA (and other small RNAs), respectively.
RNA polymerase II, which consists of 12 subunits, transcribes protein-encoding genes
(class II genes). The expression of eukaryotic genes is significantly more complex than
in prokaryotes and involves a large number of proteins, many with functions that are
not fully understood. The transcriptional machinery in yeast can involve as many as 50
different proteins in addition to the core polymerase. These include general transcription factors (TFs), TATA-box binding protein (TBP) and associated factors (TAFs), the
so-called Mediator complex, nucleosome remodelers, histone acetylases (HATs), histone deacetylases (HDACs) and others. The Mediator is a complex that is believed to
be one of the components that interacts with DNA-bound transcriptional activators, i.e.,
it mediates the activation signal. Most of these components are complexes that consist
of multiple protein subunits (Table 1.3.2).
1.3 Transcription and Translation
It has proven difficult to determine the different steps involved in the initiation of
transcription in eukaryotes. Despite our somewhat blurred understanding of the details,
general parts of the picture are clear. A number of protein complexes, the general transcription factors (TFs), bind to the DNA and form a scaffold to which the polymerase
holoenzyme can bind. This group includes TFIIA and TFIID. The TFIID complex consists of the TATA binding protein (TBP) and TBP associated factors (TAFs) and binds
to TATA-box like sequences found about 30 base pairs upstream of the transcription
start site of many genes. TATA-box mediated expression is common for class II genes.
Transcription from promoters that do not contain a TATA-box is usually mediated by
a so-called initiator sequence located at the transcription start site. Some promoters
contains a TATA-box as well as an initiator site.
The general picture of the recruitment and proper positioning of the RNA polymerase II holoenzyme to a TATA-box-containing promoter is illustrated in Fig. 1.6.
The illustrated scenario is based on knowledge obtained from the major late (ML) promoter of adenovirus, and is believed to capture the basic logic of TATA-box mediated
transcription in eukaryotes. In the first step, the TFIIA complex and the TBP-containing
TFIID and complexes binds to the DNA. The TFIID complex and the TBP protein contained in this complex associates with the DNA near the TATA-box (Fig. 1.6B). The
binding of the TBP/TFIID is generally considered to be the rate-limiting step during
transcription initiation and often requires the presence of additional factors in the vicinity of the promoter. This is discussed further in section 4.3. The TFIIB complex is then
recruited (Fig. 1.6C) to form a scaffold that can bind the RNA polymerase II holoenzyme, including the Mediator complex, and its partner TFIIF (Fig. 1.6D). Then TFIIE
and TFIIH are added to form the closed complex (Fig. 1.6E). It is not entirely clear
which components act as individual complexes and which are part of the RNA polymerase II holoenzyme. In some cases, it may be that most of the factors are recruited
simultaneously together with the polymerase corresponding to a direct transition from
the pre-initiation scaffold (Fig. 1.6B) to the closed complex (Fig. 1.6E).
The TFIIH complex has helicase activity and can unwind the DNA. It also has kinase activity and can add phosphate groups to the C-terminal region of the largest subunit of RNA polymerase II. This phosphorylation is likely to be critical for the initiation
of transcription and appears to trigger open complex formation, the start of transcriptional elongation and RNA synthesis (Fig. 1.6F). The transition from transcriptional
initiation to elongation involves the release of TFIIE and TFIIH. TFIIF remains bound
to the RNA polymerase as it clears the promoter and moves down the gene. Interestingly, TFIIA and TFIID may remain bound to the promoter after the polymerase has
cleared (Fig. 1.6F). These complexes can can act as a scaffold for the binding of the
transcriptional apparatus and may allow for repeated rounds of transcription. Since the
presence of the scaffold circumvents the rate limiting binding of the TBP-containing
TFIID complex, transcription can take place at an increased rate. Transcription reinitiation has also been reported for genes that are transcribed by RNA polymerase I and
III. σ54 -mediated transcription in E. coli may also involve reinitiation as the sigma fac-
17
18
The Biology of Gene Expression
Figure 1.6: Steps in eukaryotic transcription initiation. (A, B, C) The first factors to bind are
TFIIA, TFIID and TFIIB. The TFIID complex contains the TATA-box binding protein (TBP).
(D) The binding of the TFIIB complex facilitates the recruitment of the RNA polymerase
(RNAP) holoenzyme and its partner TFIIF. This is followed by the binding of TFIIE and TFIIH
to form the pre-initiation complex. (E) TFIIH can trigger open complex formation and the
initiation of transcription. (F) The elongating complex contains RNAP and TFIIF. TFIIA and
TFIID may remain on the promoter after transcription initiation. It is possible that the RNA
polymerase holoenzyme contains most of the complexes such that some of the steps after the
binding of TFIID are circumvented.
1.4 Regulation of Gene Expression
tor seems to remain attached to the promoter when the polymerase core enzyme moves
down the gene.
Transcription from TATA-less promoters that contain an initiator site appears to be
very similar to the scenario illustrated in Fig. 1.6. While the TBP component of the
TFIID complex, for obvious reasons, is unable to assist in its binding to the DNA, the
TBP protein is still required for transcription. It is believed that the recruitment of
the TFIID complex is due to the subunits TAF250 and TAF150, which together can
recognize the initiator sequence. In addition, TFIID may be recruited to the DNA by
factors that bind to the initiator site and the RNA polymerase itself has some affinity
for the initiator sequence.
The histones and the nucleosomes have important implications for the transcription of eukaryotic genes. The RNA polymerase is approximately the same size as a
nucleosome and the latter must be replaced during transcription elongation. In addition, tight organization of the nucleosomes and the chromatin fibers may prevent DNA
binding proteins from binding to the DNA. The strength of the barrier, which depends
on how strongly the DNA and the nucleosome interact, might be modified by enzymes
that have histone acetylase (HAT) or deacetylase (HDAC) activity and by enzymes that
are able to alter the spatial organization of the nucleosomes. In addition, certain proteins that are part of the transcriptional apparatus are able to bind to acetylated histones
with a relative high affinity and it is possible that modification of the nucleosomes may
assist in the recruitment of at least some of the components required for transcription
initiation.
Once transcription is complete, the mRNA must be modified and transported from
the nucleus to the cytosol where it can be translated into a protein. There are generally
three different types of mRNA processing. The first type of modification occurs shortly
after the RNA emerges from the elongating complex and leads to the addition of a
methylated guanine nucleotide in reverse linkage, i.e., 5’-5’ rather than 4’ to 3’, to the
protruding end of the RNA. This 5’ methylated cap assists in the further processing of
the RNA molecule and in translation. Eukaryotic genes often contain a mixture of noncoding regions, the introns, and coding regions, the exons, and the non-coding regions
must be eliminated to form a protein-encoding mRNA. This is done by a process called
RNA splicing, in which the introns are cut out and the protein-encoding exons of the
RNA are put back together. Splicing of the RNA occurs while the nascent RNA is
being transcribed. The last modification of the RNA is the addition of a an chain of 200
adenine nucleotides (the poly-A tail) to its 3’ end. This yields the mature mRNA that
is subsequently transported to the cytosol where it is translated by ribosomes.
1.4
Regulation of Gene Expression
The means employed to regulate gene expression are remarkable and many. The most
obvious method of control, and the one that can be most readily manipulated, is the
modulation of the frequency of transcription initiation. The next sections will discuss
19
20
The Biology of Gene Expression
Figure 1.7: (A) The genes lacZYA of the lactose operon share the same promoter, Plac , which is
repressed by the repressor encoded by the adjacently located lacI gene. (B) Regulatory elements
of the Plac promoter. The LacR repressor can bind to the three lacO operators O1, O2 and O3.
The CAP protein can bind to the CAP operator. (C) Activation of transcription by CAP. (D)
Repression involves DNA looping facilitated by LacR repressor tetramers bound to different
operator sites.
how this method of gene expression control is utilized in three well-studied systems;
the lactose operon in E. coli, the λ CI repressor in bacteriophage λ and the galactose
utilization network in Saccharomyces cerevisiae.
1.4.1 The Lactose Operon of E. coli
The lactose operon in E. coli consists of three genes, lacZ, lacY and lacA, whose transcription is initiated from a single promoter region, Plac (Fig. 1.7A). The rate of transcription of the lacZYA genes is regulated by the LacR repressor protein and by a protein
called CRP (cyclic AMP receptor protein) or CAP (cAMP activating protein). CAP can
act as a transcriptional activator. It binds as a dimer to an operator site centered at position -61 relative to the transcription start site (Fig. 1.7B). It affects the process of transcription initiation by interacting directly with the α-subunit of the RNA polymerase
holoenzyme. It has been observed that the presence of CAP increases the amount of
the open complex some 13-fold, but that its presence does not change the rate of the
transition between the closed and the open complex. This indicates that CAP may act
at the first step in transcription initiation (Fig. 1.5B) by increasing the rate at which the
holoenzyme binds to the promoter and/or by decreasing the rate at which the holoenzyme dissociates from the promoter.
The LacR repressor protein is, as the name implies, an inhibitor of transcription of
the genes in the lactose operon. It is expressed constitutively, i.e., at a constant rate,
from the Pi promoter and is located adjacent to the lactose operon (Fig.1.7 A). The
LacR protein binds as a tetramer to three lacO operators, O1, O2 and O3, centered at
1.4 Regulation of Gene Expression
Figure 1.8: Feedback regulation of the lactose operon. Allolactose inhibits the activity of the
repressor and relieves its effect on the transcription of the lactose operon genes. This causes upregulation of lacZ and lacY, which, in turn, causes an increased rate of allolactose production
and lactose uptake, respectively.
positions +11, -82 and +410, respectively (Fig. 1.7B). The operators have nearly palindromic sequences and are composed of two half-sites that each make contact with one
LacR monomer in the tetrameric repressor complex. It is believed that the binding of
the LacR repressor to O1 prevents the binding of the RNA polymerase holoenzyme
to the promoter through steric hindrance; the repressor tetramer may simply act as
a space-excluding barrier for the incoming holoenzyme. Elimination of the auxiliary
lacO operators O2 and O3 does not abolish the inhibitory function of LacR, but reduces
its effect. While elimination of either O2 or O3 causes a 3-fold reduction in repression,
eliminating both causes a 70-fold reduction. Thus, the auxiliary operators appear to
serve redundant roles in the inhibition of transcription by the LacR protein. The efficient repression observed in the presence of two or three of the operators is believed to
be due to looping of the DNA. The binding of the repressor tetramer to a single operator
involves only two of its four subunits, which leaves two subunits capable of binding a
second operator site provided that the DNA is twisted into a loop structure (Fig. 1.7D).
These loop structures may act as barriers that limit the accessibility to the promoter
region and/or as a roadblock of its movement along the DNA.
The above discussion of the regulation of the lactose operon addresses the interaction between cis- and trans-regulatory elements in the promoter region. In addition to
this, the activity of the trans-factors, i.e., CAP and the LacR repressor, are extensively
regulated. First of all, the activity of CAP depends on the presence of cAMP. The concentration of cAMP in turn depends on the presence of glucose. The transcription of the
genes in the lactose operon is negatively correlated with the concentration of glucose
in the growth medium. CAP affects the transcription of a large number of genes and
is a central player in the global gene regulatory system known as catabolite repression.
This system ensures that the cell does not wastefully express the genes required for
metabolizing other sugars when the energy-rich glucose is available.
The activity of the lactose operon is modulated via a feedback loop involving the
proteins LacR, LacZ and LacY (Fig. 1.8). The genes lacZ and lacY encodes for the enzyme β-galactosidase and the membrane-bound lactose permease, respectively. While
the lactose permease enables the transport of extracellular lactose into the cell, the β-
21
22
The Biology of Gene Expression
galactosidase converts intracellular lactose into glucose and galactose. It also converts
some of the lactose into allolactose. Allolactose in turn binds to the LacR tetramer and
causes a conformational change, or allosteric transition, to a state that has a significantly reduced affinity for the operator sites. As a result, the presence of small amounts
of the allolactose, the inducer of LacR, causes an up-regulation of the expression of the
lacZYA genes in the lactose operon. This causes an increased rate of lactose uptake (by
LacY) and conversion of lactose into allolactose (by LacZ), which, in turn, lowers the
activity of LacR even further. The lactose operon is thus regulated through a positive
feedback loop and catabolite repression. This enables an energy-efficient switch. The
lacZYA genes are expressed at low (basal) levels when glucose is present and are only
activated when needed, i.e., when glucose is absent and lactose is present. Many other
operons are regulated in a manner that resembles that of the lactose operon and it is a
textbook example of a simple gene regulatory circuit.
1.4.2 The Genetic Switch in Bacteriophage λ
The λ CI repressor isolated from the bacteriophage λ of E. coli is indeed a remarkable
example of a transcription factor protein. Depending on the operator to which the protein binds, and depending on which promoter is considered, the λ CI repressor can act
both as a transcriptional repressor and as a transcriptional activator. The λ CI repressor
is part of the regulatory circuitry that enables the λ phage to change its ”lifestyle” from
a dormant (lysogenic) state where it co-exists with its bacterial host and an active state
(lytic) where the phage rapidly replicates, bursts the host cell and releases a massive
number of offspring into the environment. This switch is part of the survival strategy of
the virus. In the lysogenic state, the phage DNA is stably integrated into the chromosome of the host cell and is replicated every time the cell divides. However, the virus is
able to detect if the life of the host cell is threatened, for instance following DNA damage, and can switch to the lytic state where the host cell is abandoned and the released
phage viruses go in search for suitable hosts to infect.
This switch from lysogenic to lytic growth is controlled at the level of gene expression, particularly the two promoters PR and PRM depicted in Fig. 1.9. The PR and
PRM promoters regulate the expression of the genes cI and cro, which encodes the λ CI
repressor protein and the Cro transcriptional regulator, respectively. The two promoters
share a common regulatory region, called the right operator (OR), and direct transcription in divergent directions along the DNA. Transcription of the cI gene is required for
maintenance of the lysogenic growth state. In fact, the “RM” in PRM refers to “repressor maintenance”. The transition from the lysogenic to the lytic state is caused by the
repression of cI expression and activation of cro expression. The phage DNA contains
a second promoter, the left promoter PL , and the left operator (OL) that is able to bind
the CI protein.
The OR region contains three adjacent sites OR1, OR2 and OR3, each of which
can bind both Cro and CI. The binding affinities of these sites are such that at increased
concentrations CI initially binds to OR1, then OR2 and finally OR3. The binding affini-
1.4 Regulation of Gene Expression
Figure 1.9: (A) The divergent PR and PRM promoters regulates the expression of cro and cI.
The shared regulatory OR region contains the three binding sites, OR1, OR2 and OR3, to which
the CI and the cro proteins can bind. (B) Repression of cI transcription in the lytic state by the
binding of the Cro dimer to OR3. (C) Repression of cro transcription by the binding of CI
dimers to OR1 and OR2. The transcription of cI is maintained by a positive feedback loop in
the lysogenic state.
ties for Cro are such that it will first bind to OR3 and then to OR1 and OR2. It is the
occupancy of these sites that determines which one of the two promoters that are active (Figs. 1.9B and 1.9C). The binding of the dimeric Cro protein to OR3 (Fig. 1.9B)
causes the repression of cI transcription, but does not alter the transcription of the cro
gene. Autorepression by Cro only occurs when it is present in sufficiently high concentrations to occupy OR1 and/or OR2.
The binding of the CI protein, which is also a dimer, to OR1 prevents the binding
of the RNA polymerase to the PR promoter, thus preventing the expression of the cro
gene (Fig. 1.9C). In addition, once the CI protein is bound to OR1, the affinity of the
OR2 site for CI is increased approximately 10-fold. This is due to a direct interaction
between the two CI dimers located next to each other with the appropriate relative
orientation. Binding of CI to OR2 also prevents the RNA polymerase from binding to
the PR promoter and thus causes a further repression of transcription of cro from PR .
Interestingly, CI two dimers bound to OR1 and OR2 may interact with two CI dimers
bound to the OL region within the PL promoter located several thousands base pairs
away and may facilitate efficient repression of PR and PL at intermediate concentrations
of CI.
Remarkably, when bound to the OR2 site the CI protein can interact with the σfactor in the RNA polymerase holoenzyme and increase the rate of transcription from
PRM . This CI-induced activation of cI transcription facilitates an increase in the concentration of the CI protein. Hence, a positive feedback loop ensures that sufficiently
23
24
The Biology of Gene Expression
high levels of the CI protein are present to maintain the phage in the lysogenic state.
This feedback structure causes the state to be remarkably stable with fewer than 1 in 10
million lysogenic cells spontaneously switching to the lytic state per generation. When
present in high concentrations, the CI protein will bind to the low-affinity OR3 site and
repress the transcription of its own gene. Binding of CI to OR3 causes the transcription
from PRM to decrease. Efficient repression at high concentrations of CI is believed to
be due to the formation of a CI tetramer between a CI dimer bound to OR3 and a CI
dimer bound to the OL region in addition to CI tetramer linking OR1 and OR2 to OL.
This negative feedback loop could ensure that the rate of transcription of the cI gene is
never to high to respond to endogenous signals produced by the host cell.
The transition from the lysogenic to the lytic state occurs in response to damage to
the DNA of the host cell. This switch is very robust with an efficiency close to 100%.
A central regulatory protein in the DNA-damage response system in E. coli, the SOS
response system, involves the RecA protein. This protein has protease activity and is
able to cleave certain proteins in the presence of single-stranded DNA. The activation of
the RecA co-protease is a key component of the SOS response system and is essential
to the survival of the cell. The λ phage ingeniously exploits the SOS response to its
own advantage, but with dire consequences for the host cell. The CI dimer contains a
site that is recognized by RecA and activation of RecA causes cleavage and inactivation
of the CI dimers. This causes a decrease in the concentration of the CI dimers and the
OR1 and OR2 sites become vacant. This in turn diminishes the expression of cI from
PRM and increases expression of cro from the PR . Cro then binds to the sites within OR
which further represses cI expression and ensures a robust switching from the lysogenic
(cI on) state to the lytic (cro on) state.
1.4.3 The Galactose Regulon in S. cerevisiae
The galactose utilization pathway in the yeast Saccharomyces cerevisiae is one of the
most well studied eukaryotic gene regulatory circuits. Detailed investigations of one
of the key regulatory proteins in this system, the dimeric transcriptional activator Gal4,
have revealed many details of how transcription is regulated in eukaryotes. The Gal4
protein acts as a transcriptional activator for a large number of genes, including many
of the genes required for the cell to metabolize galactose. Genes whose expression is
regulated by the same set of transcriptional regulators are often said to belong to the
same regulon.
The sequence upstream of the gal1 gene contains two regulatory regions; an upstream activating sequence (UASG ) that contains four binding sites for the Gal4 protein, and a binding site for the transcriptional repressor Mig1 (Fig. 1.10A). The gal1
gene is subject to glucose (catabolite) repression. The Gal4 protein is only active when
galactose is not present and the Mig1 protein is only active when glucose is present.
The repression is stronger than the activation and the expression of the genes in the
galactose regulon diminishes when cells are grown in media containing galactose and
glucose.
1.4 Regulation of Gene Expression
Figure 1.10: (A) Regulatory elements affecting transcription of the gal1 gene. The region upstream of the gene contains a TATA-box, a binding site for the repressor protein Mig1 and an
upstream activating sequence (UASG ). The UASG contains four binding sites for the dimeric
form of the Gal4 transcriptional activator. (B) Speculative spatial arrangement of SAGA, Mediator, TBP and RNA polymerase recruited directly or indirectly by Gal4. (C) Time-course
of recruitment of components of the transcriptional apparatus following galactose induction
(Based on the experimental study by Bryant and Ptashne).
The Gal4 protein is active when cells are grown in the presence of galactose and
in the absence of glucose. In a manner similar to CAP and λ CI, the Gal4 protein
is modular with an activation region that operates independently of a separate DNAbinding region. These domains can be detected by elimination of their corresponding
DNA sequence in the gal4 gene. Removal of the DNA binding domain (BD) gives a
protein that fails to bind to the UASG . On the other hand, removal of the sequence
that encodes the activating region, or activation domain (AD), gives a protein that can
bind to the UASG , but fails to activate transcription. This property of Gal4 forms the
basis of the two-hybrid technology for detection of protein-protein interactions (see
section 2.2.2).
While a great deal is known about the Gal4 protein, its mode of action is far from
clear. The binding of the Gal4 protein to the UASG appears not to depend on factors
that modify the nucleosomes, such as the Gcn5 component of the histone acetylation
complex Spt-Ada-Gcn5 acetyltransferase (SAGA). This is supported by the observation
that a weakening of the Gal4 binding sites causes expression from the gal1 promoter
to depend on the presence of Gcn5. The interaction between the Gal4 protein and the
UASG region is probably stronger than the interaction between the histones and the
UASG region. However, even though Gal4 can bind to the DNA in the absence of
SAGA, the latter is still required for transcription from the Gal1 promoter.
Experiments done in vitro, i.e., in solutions outside the cellular environment, have
shown that the Gal4 protein may interact with a number of components in the transcrip-
25
26
The Biology of Gene Expression
Figure 1.11: (A) Repression of Gal4 activation by the Gal80 protein in the absence of galactose.
Gal80 binds to Gal4 near its activation domain and prevents the recruitment of SAGA and
Mediator. The regulatory protein Gal3 is unable to bind Gal80. (B) Gal3 binds to Gal80 in the
presence of galactose. Gal4 can now recruit SAGA and Mediator causing the initiation of the
cascade (Fig. 1.6) leading to transcription.
tional machinery. This includes RNA polymerase II, the transcription factors TFIIE,
TFIIH and TBP, various components of the Mediator and components of SAGA. A recent experimental study by Bryant and Ptashne investigated the temporal order in which
Gal4 recruits components of the transcriptional apparatus in vivo, i.e., in living cells.
The experiments indicate that SAGA and the Mediator complexes are the direct targets
of the Gal4 protein and that they are recruited to the promoter region independently
of each other. As illustrated in Fig. 1.10C, the SAGA complex is recruited first, then
the Mediator complex. The RNA polymerase II, TBP, TFIIE, TFIIH and TFIIF are
recruited last and the temporal resolution in the experiments (0.5-1 minute) is too long
to determine the order in which these components are recruited to the promoter. Other
experiments have demonstrated that the binding of the TBP is required for the binding
of the polymerase holoenzyme. The Bryant-Ptashne experiment indicates that the polymerase (and the other factors required for transcription) is recruited very rapidly once
SAGA, Mediator and TBP are bound to the DNA.
In addition to the UASG , the region of sequence that regulates gal1 expression
contains a cis-regulatory element to which the transcriptional repressor protein Mig1
can bind. The Mig1 protein is a key regulatory factor in glucose repression and its
activity depends on the concentration of glucose in the growth medium. As for the
Gal4 protein, the Mig1 protein exerts its effect by recruiting complexes to the promoter
region. The complex that is recruited to the promoter by the Mig1 protein consists
of the components Ssn6 and Tup1. Artificial constructs in which either one of these
components is fused to a DNA binding domain (section 2.2.2) indicates that the Mig1
protein is not required for repression and simply acts to bring the Tup1 component into
the appropriate position on the promoter. Tup1 may then recruit complexes that remove
acetyl groups from the histones (histone deacetylases, or HDATs) to make the promoter
region less accessible to the transcription apparatus. It has also been suggested that
Tup1 interacts directly with components of the transcriptional apparatus and somehow
interferes either with the assembly of a transcription pre-initiation complex or with
1.4 Regulation of Gene Expression
Figure 1.12: Components of the positive feedback regulation of Gal4 activity. In the presence
of galactose, Gal4 activity is increased through Gal3-Gal80 (Fig. 1.11), which increases the
rate of Gal2-mediated galactose uptake and the concentration of Gal3. Gal80 expression is also
up-regulated by Gal4. This negative feedback is relatively weak
transcription initiation.
In addition to the cis-regulation exerted at the UASG and the Mig1 binding sites,
the transcription of the gal1 gene is strongly influenced by interactions between transregulatory factors. When cells are grown in a medium that contains a non-repressive
sugar, such as raffinose, the Gal4 protein occupies the four binding sites within the
UASG of the Gal1 promoter. However, in the absence of galactose, the Gal4 protein is
unable to recruit components of the transcriptional apparatus to the vicinity of the promoter region. This repression is due to the protein encoded by the gal80 gene, which
binds to the Gal4 protein at a site that is partly overlapping with the activating domain.
The repression is very efficient and there is virtually no expression from the gal1 promoter when the Gal80 protein inhibits the Gal4 protein (Fig. 1.11A). This inhibition
is released in the presence of galactose through a third regulatory protein encoded by
the gal3 gene. The Gal3 protein, which becomes activated when galactose binds to it,
binds to the Gal80 protein and causes either the dissociation of Gal80 from Gal4 or the
movement of Gal80 away from the activating region of the Gal4 protein (Fig. 1.11B).
In a manner that is similar to the regulation of the lactose operon, the genes in
the galactose regulon are organized in a circuit containing a positive feedback loop
(Fig. 1.12). The expression of the galactose permease, which is encoded by the gal2
gene, and the Gal3 protein are both activated by the Gal4 protein. The removal of
the repression of the Gal4 protein will therefore cause an increased rate of galactose
uptake, an increased activity of the Gal3 protein and, subsequently, a further increase
in the activity of Gal4. Interestingly, the Gal4 protein also increases the expression of
the gal80 gene, though to a lesser extent than gal2 and gal3. The regulatory function of
this negative feedback is unknown. One possibility is that it enables a rapid shutdown
of the circuit when glucose is present in the growth medium and robust switching to
glucose repression.
The regulation of the transcription of the yeast genes that are required to metabolize
galactose is in many ways similar to the genes required to metabolize lactose in E. coli.
In both cases, the expression of the relevant genes is simulated when the alternative carbon source is present and suppressed if the preferred carbon source glucose is present.
27
28
The Biology of Gene Expression
However, the activating and inhibiting signals are mediated in different ways in the two
systems. In the bacterial system, the alternative energy source lactose enhances transcription from Plac by suppression of the repressor LacR and the presence of glucose is
mediated through the suppression of the activator CAP. In yeast, the alternative energy
source galactose enhances transcription from the Gal1 promoter by stimulation of the
activator Gal4 and the presence of glucose is mediated through the stimulation of the
repressor Mig1. These differences may indicate a fundamental shift of gene regulatory
mechanisms from a default on state in bacteria to a default off state in higher organisms.
1.4 Regulation of Gene Expression
Suggested Further Reading
Textbooks:
Alberts B. et al., The Molecular Biology of the Cell. Garland Science. New York,
New York (2002).
Latchman D. Gene Regulation: A Eukaryotic Perspective. Stanley Thornes.
Cheltenham, United Kingdom (1998).
Lewin B. Genes VII. Oxford University Press. Oxford, United Kingdom (2000).
Müller-Hill B. The lac Operon. de Gruyter. Berlin, Germany (1996).
Ptashne M., & Gann A. Genes & Signals. Cold Spring Harbor Laboratory Press.
Cold Spring Harbor, New York (2002).
White R. J. Gene Transcription: Mechanisms and Control. Blackwell Science.
Oxford, United Kingdom (2001).
Articles
Bednar J. et al. Nucleosomes, linker DNA, and linker histone form a unique
structural motif that directs the higher-order folding and compaction of chromatin. Proc. Natl. Acad. Sci. U. S. A. 95, 14173-8 (1998)
Bryant G. O. & Ptashne M. Independent recruitment in vivo by Gal4 of two
complexes required for transcription. Mol Cell. 11, 1301-9 (2003).
Hochschild A.The λ switch: cI closes the gap in autoregulation. Curr. Biol. 12,
R87-9 (2002).
Dieci G, Sentenac A. Detours and shortcuts to transcription reinitiation. Trends
Biochem Sci. 28, 202-9 (2003).
Orphanides G. & Reinberg D. A unified theory of gene expression. Cell, 108,
439-51 (2002).
Zhang X. et al. Mechanochemical ATPases and transcriptional activation. Mol
Microbiol. 45, 895-903 (2002).
29
30
The Biology of Gene Expression
Tutorial Part
2
Engineered Gene
Networks
Natural gene networks can be described as circuits of interconnected functional modules, each consisting of specialized interactions between proteins, DNA, RNA, and
small molecules. The simplest element of a gene regulatory network consists of a promoter, the gene(s) expressed from that promoter, and the regulatory proteins (and their
cognate DNA binding sites) that affect the expression of that gene. While there are
several different ways by which regulatory proteins and small molecules can modulate gene expression, the regulation of the frequency of gene transcription is the most
prevalent control instrument employed in natural gene circuits. In the previous section,
it was discussed how the frequency of transcription is regulated in a number of natural
systems; expression from the Plac promoter is modulated by the trans-acting proteins
CAP and LacR, the PR /PRM promoters by CI and Cro and numerous promoters in the
Galactose regulon are modulated by Gal4 and Mig1.
The frequency of transcription is also the parameter that can most easily be manipulated in the laboratory. The nucleotide sequences contained in cis-regulatory elements,
the promoter region(s) and in the untranslated regions of the mRNA transcript can be
altered with relative ease to control the level of gene expression by altering the binding affinities of transcription factors and the various components of the transcriptional
and translational apparatus. In addition, cis-regulatory elements that make transcription controllable by one set of transcription factor can be substituted with other cisregulatory elements. This ability to mix and match cis- and trans-regulatory elements
in living cells has allowed for the construction of artificial gene circuits with customiz-
32
Engineered Gene Networks
able properties and characteristics.
This part of the tutorial is intended to provide a brief introduction to the practical
aspects of genetic network engineering. Aside from the obvious biotechnological and
biomedical implications of this research, there are numerous applications to Systems
Biology. Simple engineered expression systems provide a framework for the deduction
and validation of the basic principles of transcriptional regulation and allows for the
testing of the methodologies discussed in Part 3 that are used to link the properties of
molecules to systems level functionality.
2.1
Some Tools of the Trade
The construction of artificial gene circuits is based on relatively recent advances in
molecular biology technologies and the availability of DNA sequence data for the cisregulatory elements and their corresponding transcription factor proteins. Some common tools from genetic engineering include restriction enzymes that are used to cleave
DNA molecules at specific locations, DNA ligases that are used to “glue” DNA fragments together, and vectors that are used in vivo to express genes of interest or to
amplify DNA sequences contained in the vector. Another frequently used technology
is the polymerase chain reaction (PCR), which is used to amplify DNA sequences in
vitro.
2.1.1 Cutting and Pasting DNA
One of the most basic and powerful molecular biology technologies is the ability to cut
DNA at specific locations with restriction endonucleases and to paste DNA fragments
back together with DNA ligase. Restriction endonucleases are enzymes that bind specific DNA sequences, typically 4 to 6 base pairs, and cleave the DNA within or near
its recognition sequence. Figure 2.1A illustrates how two commonly used restriction
enzymes, EcoRI and EcoRV, isolated from E. coli, cleave DNA sequences that contain the hexamers GAATTC and GATATC. Treatment of a purified DNA sample with
EcorI gives DNA fragments with complementary “overhangs”. One fragment has a
TTAA overhang on the 3’-5’ strand, the other a AATT overhang on the 5’-3’ strand.
Treatment with EcorV gives fragments that have “blunt” ends as the enzyme cleave the
GATATC hexamer in the middle. Note that EcoRI and EcoRV cleave DNA sequences
that differ only in the two central base pairs of the hexamer. In sequence that is cleaved
by EcoRI the central base pairs are AT. If the central base pairs were TA instead of AT,
the sequence would not be cleaved by EcoRI but by EcoRV. There are currently hundreds of commercially available restriction enzymes that can cleave DNA sequences
with a very high specificity and produce a variety of different overhangs.
DNA fragments obtained from a restriction enzyme digests can be separated using
gel electrophoresis. DNA is a negatively charged molecule and fragments of different
sizes migrate at different velocities through a gel when an external electrical field is
2.1 Some Tools of the Trade
Figure 2.1: (A) Cleavage of double stranded DNA by the restriction enzymes EcoRI and EcoRV.
EcoRI recognizes the sequence GAATTC and cleaves the phosphodiester bond between G and
A to produce two DNA fragments with unpaired nucleotides (overhangs). EcoRV recognizes
the sequence GATATC and cleaves the DNA in the middle of this sequence to produce blunt
ends with no overhangs. (B) DNA ligase can be used to reestablish the phosphodiester bond
and joined fragments that have been cleaved by restriction enzymes.
applied. To separate DNA fragments, the sample is loaded onto a gel together with a
dye and an electrical field is applied, typically for 30 to 90 minutes depending on the
current, the density of the gel, and on the size of the fragments that are being separated.
The gel is then stained with a dye, ethidium bromide, that fluoresces brightly under
ultraviolet light when bound to double-stranded DNA. The appropriate DNA fragments
then can be excised from the gel with a razor blade and purified. Once the desired DNA
fragments have been obtained, they can be “glued” back together in a ligation reaction
(Fig. 2.1B). In this reaction, a DNA ligase enzyme reestablishes the phosphodiester
bond in the DNA backbone that was initially cleaved by the restriction enzyme.
2.1.2
Plasmid Vectors
Genetic engineering generally uses cloning vectors to carry out manipulations on DNA
sequences and expression vectors to control the in vivo expression of genes of interest.
Most engineered gene regulatory networks are constructed on vectors. Cloning and
expression vectors are plasmids and are typically used to express a gene of interest at
high levels in vivo, for instance during the manufacturing of enzymes, or to amplify a
DNA sequence of interest. Since the plasmid is replicated each time the cell divides,
large quantities of vector DNA (or protein) can be isolated from a cell population that
has been allowed to grow to a high density. In other words, once a DNA sequence has
been successfully cloned into a plasmid vector, essentially unlimited quantities of the
DNA can be obtained by isolation of the plasmid DNA from cell extracts.
In addition to the sequence of interest, vectors carry at least one origin of replication
33
34
Engineered Gene Networks
and a selective marker. As mentioned in section 1.2, the origin of replication typically
allows for amplification of the plasmid in a rapidly growing host cell, such as E. coli,
while the selective marker ensures that the host cell can only grow if it contains one
or more copies of the plasmid. Selective markers in E. coli are typically genes that
confer resistance to antibiotics, such as tetracycline, amphicillin and kanamycin. When
the bacterium is grown in the presence of the antibiotic, only the cells that carry the
appropriate resistance gene will be able to survive. The method of selection in yeast
is typically an auxotrophic marker. The marker is a gene that has been deleted from
the genome of the host cell, but is required for cell growth when certain nutrients, for
instance specific amino acids, are absent from the growth medium.
Vectors can be inserted into host cells through a process called transformation.
Typically, transformation is typically carried out with cells that are made competent by
treatment with various chemicals. These cells can accept foreign DNA when a mixture
of DNA (typically 50 to 500 ng) and cells are subjected to brief heat shock. The basic
theory behind the transformation procedure is poorly understood. It is a very inefficient
process and requires billions of cells to produce on the order of 10 to 1000 cells that can
grow on a plate containing an appropriate mixture of nutrients and selective markers.
When the sequence that allows for plasmid replication is absent, the only way for the
host cell to survive is to integrate the vector DNA into its chromosome. This can be
done by a process known as homologous recombination. One method that is used to
integrate DNA sequences into the chromosome of S. cerevisiae is to cut the vector with
a restriction enzyme in a region of the vector where the DNA sequence is identical
to a sequence within one of the yeast chromosomes. This linear DNA molecule will
replace the corresponding sequence within the chromosome when the cells are allowed
to grow after the transformation. A similar technique can also be used to integrate
DNA sequences into the E. coli chromosome, but the process is particularly efficient in
S. cerevisiae.
A wide variety of cloning and expression vectors are commercially available. An
example of a frequently used cloning vector is the pZ vector system developed by Lutz
and Bujard for expression in E. coli (Fig. 2.2A). This vector system was constructed in
such a way that each plasmid contain three modular region that are flanked by unique
restriction sites for the endonucleases AatII, XhoI, SacI and XbaI. Region (I) can be
used to insert arbitrary sequences. Region (II) contains the origin of replication and
region (III) contains the selective marker. The pZ system comes with different origins
of replication (ColE1, p15A and SC101) and different resistance markers (Fig. 2.2B).
While the origin ColE1 allows a high number of plasmid copies in each cell (around
60) the replacement of the ColE1 sequence with the sequence for the p15A origin or a
modified SC101 origin lowers the plasmid copy number per cell to about 25 and 3-4,
respectively.
The modular structure of the pZ system allows for the rapid exchange of different
components. Any one of the three modules can be replaced with another in a manner
of three to four days. First, cells that contain the vector(s) are grown for 12-20 hours
2.1 Some Tools of the Trade
Figure 2.2: (A) The pZ expression vectors contain three modular regions that carry sequences
for, respectively, (I) the promoter/gene of interest, (II) an origin of replication and (III) a resistance marker. Region (I) contains a promoter (P/O) and a sequence that encodes a ribosome
binding site (RBS). A gene of interest can be inserted between the restriction sites KpnI and
XbaI. T1 and t0 refer to sequences that terminate transcription. (B) Examples of pZ expression
vectors with different origins of replication and resistance markers. The engineered regulatory
units are discussed in section 2.2.1. Reproduced from Lutz and Bujard without permission.
c
Oxford
University Press.
and the DNA is isolated, treated with restriction enzymes and the desired vector and
insert fragments are purified. These fragments are then ligated together and transformed
with competent cells. After the transformation, the cell mixture is spread out on plates
containing selective antibiotics. After a day or two, cells that are viable will have
formed colonies that can be used to inoculate a batch culture. An additional 12-20
hours later, vector DNA can be isolated and analyzed, for instance, by treating the
vector DNA with restriction enzymes followed by gel electrophoresis to confirm that it
contains fragments of the correct size.
Due to the relative fast growth of E. coli, it is often desirable to perform the basic
DNA sequence manipulations on a vector that can be propagated in E. coli and only
insert the vector into the desired cell type once the correct vector has been obtained.
Vectors that are used for such purposes typically contain two origins of replication and
two selective markers in order to provide means of propagation and selection in the
two different cell types. A shuttle vector system of this type that is frequently used in S.
cerevisiae is the so-called pRS vectors developed by Sikorski and Hieter (Fig. 2.3). The
multipurpose pRS vectors contain a ColE1 high copy number bacterial origin of replication and a gene that confers amphicillin resistance to an E. coli host. The pRS vectors
also carry an auxotrophic marker for selection in yeast (the his3 gene in Fig. 2.3B) and
may contain a sequence (ARS/CEN) that allows the plasmid replicate autonomously.
Other features of the pRS system include an origin of replication from the f1 filamentous phage, a multiple cloning sequence (MCS) containing various restriction sites and
35
36
Engineered Gene Networks
Figure 2.3: (A) The multipurpose pRS vector contains a bacterial origin of replication (ColE1)
and resistance marker (amphicillin), an auxotrophic marker (HIS3). The ARS/CEN sequence
allows the replication of the plasmid in yeast. If this sequence is absent, the yeast cell can
only grow if the plasmid is integrated into the yeast chromosome. In addition, the pRS vector
contains an origin of replication from the f1 phage, promoters for T3 and T7 phage polymerases,
a multiple cloning sequence (MCS) and the lacZ gene. (B) The pESC vector is very similar to
the members of the pRS vector family. It contains the bidirectional Gal1 and Gal10 promoters
allowing for expression of two genes inserted into the multiple cloning sites MCS1 and MCS2.
c
Modified without permission. Stratagene
Cloning Systems.
the sequence that encodes a variant of the LacZ protein.
When a sequence that encodes a protein is inserted into the MCS using the appropriate restriction enzymes, it becomes fused to the sequence of the lacZ gene. If there
is no transcriptional or translational stop signal between the two sequences, the result
is a hybrid protein that may have all the properties of the protein of interest and the
LacZ protein. Recall from section 1.4.1 the lacZ encodes a β-galactosidase that in E.
coli converts lactose into the inducer of LacR, allolactose. It can also cleave various artificial galactopyranosides to give a brightly colored reaction product. This can be used
to quantify the expression of the hybrid protein in an enzymatic assay. A cell extract is
mixed with an appropriate artificial galactopyranoside and the activity of LacZ can be
measured by quantifying the absorbance of the sample at an appropriate wavelength.
The pESC system shown in Fig. 2.3 is closely related to the pRS vectors, but with
some noticeable differences. It contains the 2µ origin of replication that allows a very
high number of plasmids per yeast cells. It also contains the divergent Gal1/Gal10
promoters. This allows the simultaneous galactose-induced expression of two genes.
2.1.3 Extracting DNA Sequences
There are a number of methods that can be used to isolate specific DNA sequences.
The most direct methods are to purify chromosomal or plasmid DNA directly from
a cell extract or by direct chemical synthesis. Custom-made single-stranded DNA,
or oligonucleotides, can be obtained from commercial sources for a reasonable price
2.1 Some Tools of the Trade
Figure 2.4: Illustration of one cycle in the polymerase chain reaction. Separation of the doublestranded DNA is followed by primer annealing, DNA polymerase binding and DNA synthesis.
when the sequence contains 100 nucleotides or less. Purification of DNA from plasmid
or genomic DNA is usually the preferred option for longer fragments. The quantities
obtained by isolation of genomic DNA is, however, quite low and the sequence of interest is embedded in the chromosomal DNA. It is usually desired to increase the yield
of a specific region of the DNA by performing a polymerase chain reaction (PCR). The
theory behind PCR is simple. In order to replicate the chromosomal DNA, the DNA
polymerase synthesizes double-stranded DNA by adding single nucleotides through
base-pairing to a single-stranded region of the parental DNA molecule. In a PCR reaction, this process is repeated multiple times in vitro and the quantity of DNA is doubled
in each step.
The PCR reaction relies on polymerases that can synthesize DNA at elevated temperature and on the requirement that DNA synthesis has to start form a region where
the DNA is double-stranded. The region of the DNA that needs to be amplified is specified by two oligonucleotides, or primers, that are complementary to the sequence that
flank the region of interest. One primer is designed to bind to the 3’-5’ strand upstream
of the region of interest while the other is designed to bind to the 5’-3’ downstream
of the region of the interest (see Fig. 2.4). The amplification of the region of interest
is done by repeated cycles of DNA melting, primer annealing and primer extension.
In the first step (Fig. 2.4A), the double-stranded DNA is cleaved into single-stranded
chains at high temperatures (typically 94◦ ). The sample is then cooled (typically to
45-65◦ ) to allow the single-stranded DNA to form a stable complex with the primers
(Fig. 2.4B). DNA synthesis is then initiated by increasing the temperature to a value
where the polymerase works most efficiently (typically 72◦ ). Once the polymerase has
completed the synthesis of the region of interest, the sample is heated to melt the newly
37
38
Engineered Gene Networks
Figure 2.5: Examples of sequence modifications by PCR. (A) A sequence of interest can be
augmented with restriction sites appropriate for cloning by using primers that in addition to
the DNA binding sequence, contain a sequence recognized by a restriction enzyme (RS1). (B)
Two-step replacement of DNA sequence by PCR. The sequence to be inserted is carried on
primer overhangs.
synthesized double-stranded DNA and the process is repeated.
PCR is used in many different applications. For example, PCR is used to obtain
DNA sequences that are flanked by restriction sites compatible with those in a desired
cloning vector. This is done by using primers with “overhangs” that contain the recognition sequence for the restriction enzyme (Fig. 2.5A). Once the PCR amplification is
complete, the PCR product is purified, cut with the appropriate restriction enzyme and
ligated into the vector fragment that has been treated with the same restriction enzymes.
In addition to these and countless other applications, a similar method can also be used
to introduce an arbitrary sequence into an existing sequence at an arbitrary location.
One way this can be done is to do two PCR reactions with a total of four primers
(Fig. 2.5B). Two of the primers A and B are chosen to coincide with flanking regions in
the parental DNA that contain appropriate restriction sites. The two remaining primers
C and D are designed to bind to the parental DNA at positions flanking the region that
needs to be replaced. These primers have overhangs that contain the sequence to be
inserted and the sequence for a common restriction site. After the two PCR products
AC and DB have been amplified, they are cut with the common restriction enzyme and
ligated together. The ligated ACDB fragment can then be PCR amplified using the
primers A and B and subsequently inserted into the vector using the restriction sites
contained in the primers A and B.
2.2 Engineering Regulatory Modules
2.2
2.2.1
Engineering Regulatory Modules
Genetic Switches in E. coli
Many genetic switches constructed to control gene expression are based on an architecture that mimics ones found in natural bacterial operons (see section 1.4.1). Two commonly used switches were constructed by Lutz and Bujard (using the pZ vector system)
by replacing the binding sites for the λ CI repressor with tetO and lacO operators in
a modified PL promoter. The TetR repressor/tetO-operator module is another example
of a natural system that has found broad use in genetic engineering (see section 2.2.2
and 2.2.3). It is derived from a system that confers bacterial resistance to tetracyclinebased antibiotics. Two genes, tetR and tetA, modulate tetracycline-resistance in a manner similar to the functioning of the genes in the lactose operon. Specifically, in the
absence of tetracycline (Tc), or analogues such as the non-toxic anhydrotetracycline
(ATc) and doxycyline (Dox), the Tet repressor (TetR) binds to tet operator sites within
the promoter controlling the expression of tetA. The TetA protein is an antiporter that
is located in the cell membrane and exports tetracyclines from the cell. Binding of the
antibiotic to the TetR repressors decreases its affinity for the tetO binding site causing up-regulation of tetA expression, and subsequent removal of tetracycline from the
cell. The interactions between TetR, tetO and inducer have been extensively studied
and these components are, together with the components of the LacR system, some of
the best characterized systems in molecular biology.
The PL -based, TetR and LacR repressible hybrid promoters, PLtetO−1 and PLlacO−1
(Fig. 2.6A), were obtained by direct chemical synthesis followed by insertion into the
pZ vector using the XhoI and the AatII restriction sites. Expression from these promoters can be modulated over a broad range by tuning the amounts of the inducers in the
growth medium (Fig. 2.6B). A third hybrid promoter in the pZ vector system, designated Plac/ara−1 , was constructed from a variant of the natural Plac promoter by insertion of two additional lacO operators (Os and O1 in Fig. 2.6A), and by replacement
of the CAP/cAMP binding site with sequences that are recognized by a transcriptional
activator encoded by the araC. The AraC protein binds to its recognition sequence
when the sugar arabinose is present in the growth medium and activates transcription
by facilitating the recruitment of the RNA holoenzyme to the promoter.
The hybrid promoters PLtetO−1 and PLlacO−1 (as well as many other engineered
expression systems not mentioned here) demonstrate that negative transcriptional regulation can be achieved by a relatively simple mechanism. When an appropriate DNA
sequence is inserted within or near the promoter sequence, the binding of a repressor
protein to this sequence can attenuate expression from an otherwise active promoter.
This can be as simple as a competition in which the repressor and the polymerases
cannot occupy the same space at the same time, i.e., steric hindrance. Specialized interactions, such as cooperative binding and DNA looping, are not required. They may
however increase the efficiency of the switch (see section 3.3). Similarly, the binding
site for the CAP/cAMP transcriptional activator near the Plac promoter can be replaced
39
40
Engineered Gene Networks
Figure 2.6: (A) Promoters constructed by Lutz and Bujard by the replacement of λ CI binding sites in the PL promoters with tetO operators (PLtetO−1 ) and lacO operators (PLlacO−1 ).
The Plac/ara−1 promoter was constructed by replacing the CAP binding site with a binding site
for the AraC transcriptional activators. (B) Modulation of expression from the engineered promoters by addition of the inducers ATc (PLtetO−1 ), IPTG (PLlacO−1 ) and arabinose and IPTG
c
(Plac/ara−1 ). Modified without permission from Lutz and Bujard. Oxford
University Press.
with the binding site for the AraC transcriptional activator. This demonstrates that transcriptional activation does not require particularly sophisticated mechanisms to work.
The binding of a protein that can interact with the RNA polymerase near the promoter
appear to be sufficient to enable a more efficient binding of the polymerase holoenzyme
to the promoter.
2.2.2 Genetic Switches in S. cerevisiae
As mentioned in section 1.4.3, regulation of eukaryotic gene expression contrasts that
of most prokaryotic genes, as the eukaryotic genes are generally in a silenced state. It
is common for many eukaryotic genes to require some mechanism of activation before
they can be expressed. This has been exploited in a relatively large number of engineered expression systems and led to the development of technologies such as yeast
two-hybrid for the detection of protein-protein interactions.
The theory behind the two-hybrid technology (Fig. 2.7) relies on the ability of Gal4
to activate expression from promoters that contain the Gal4 upstream activating sequence (UASG ). However, rather than having the DNA binding domain (BD) and the
activation domain (AD) located on one protein (Gal4), the DNA sequences that encode
2.2 Engineering Regulatory Modules
Figure 2.7: Basic yeast two-hybrid system. The sequences for the activation domain (AD) and
the DNA binding domain (BD) from the gal4 gene is fused to two different protein encoding
sequences to give two different hybrid proteins, the bait and the prey. The bait binds to the
DNA at the Gal4 UAS. If the bait and the prey can bind to each other, the AD fused to the prey
can recruit the transcriptional apparatus to the promoter and the expression of a reporter gene
detected.
these two domains are fused to sequences of two different proteins. As a result, the cell
will express two hybrid proteins, one, the bait, containing the Gal4 BD, and the other,
the prey, the AD. Since the bait contains the BD, it will associate with the UASG , but
will not activate transcription since this hybrid protein lacks the ability to recruit the
transcriptional apparatus to the promoter. However, if the bait is able to interact with
the prey, the association between the two hybrid proteins causes the AD fused to the
prey to be brought to the vicinity of the promoter and activate transcription by facilitating the assembly of the transcriptional apparatus. In other words, transcription will
only occur when the two proteins interact and the level of expression will be correlated
with the strength of the protein-protein interaction.
The success of two-hybrid expression systems demonstrates an important principle in transcriptional regulation. The sequences of the activation and binding domains
from Gal4 correspond to only about 10% of the sequence of the native gal4 gene. In
other words, there is not anything particularly special about the full-length Gal4 gene
that enables its protein product to act as a transcriptional activator. The activation of
transcription seems only to require that the activation domain is recruited to the vicinity
of the promoter. A remarkable example of this is a light-switchable expression system
developed by Shimizu-Sato et al. This engineered two-hybrid system exploits the ability of certain plant photoreceptors (phytochromes, Phy) to change reversibly from one
form, Pr, to another, Pfr, in response to light signals and on the ability of a second protein, PIF3, to associate only with the Phy(Pfr) form of photoreceptor. Absorption of red
light by Phy(Pr) causes the protein to be converted into the Phy(Pfr) form and Phy(Pfr)
can be converted back to the Phy(Pr) form when it absorbs far-red light. Hence, the
strength of the interaction between and PIF3 depends on the light signal.
Based on the light-dependent binding of PIF3 to Phy, Shimizu-Sato et al. constructed the light-switchable yeast two-hybrid expression system illustrated in Fig. 2.8A
41
42
Engineered Gene Networks
Figure 2.8: (A) Light-switchable two-hybrid system. The activation domain carried on the PIF3GAD hybrid protein is only recruited to the promoter region when the Phy-GBD hybrid protein
is activated by red light and (the Pfr conformation). (B) Red light activates the expression of a
histidine auxotrophic marker and colonies can form on histidine selective plates when exposed
to red light.
by fusing the Gal4 BD to the Phy protein and the AD to the PIF3 protein. The PhyBD hybrid protein binds to an UAS sequence and the interaction between the Pfr form
of Phy-BD and PIF3-AD is sufficient to activate the transcription of either a histidine
(HIS) auxotrophic marker or the lacZ reporter gene. Fig. 2.8B shows an example of
light-controlled expression of the auxotrophic marker. Cells were spread onto plates
lacking histidine. One was grown in darkness while the other was grown in the red
light. Colonies were only observed on the latter demonstrating that the exposure to red
light enables the expression of the auxotrophic marker.
The ability to regulate transcription in yeast is not associated with any specific
properties of the activation and DNA binding domains of Gal4. For example, a yeast
two-hybrid system that involves entirely prokaryotic components (the DNA binding domain from the LexA protein, its operator, and the activation domain from the B42 protein) is available commercially. Another example of a prokaryotic repressor/operator
system that has been exploited to control gene expression in yeast, and in higher eukaryotes (section 2.2.3), is the TetR/tetO discussed in section 2.2.1. Two TetR-based
“one-hybrid” yeast expression systems developed by Herrero and co-workers are illustrated in Figs. 2.9A. These systems were originally constructed by Gossen et al. and are
today used widely to regulate gene expression in higher eukaryotes (see section 2.2.3).
In Fig. 2.9A, the TetR protein is fused to the activation domain of the protein VP16
from the Herpes simplex virus. In the absence of TetR inducers, the TetR-VP16 hybrid protein, or tetracycline controlled transactivator (tTA), can bind to tetO operator
sequences inserted at positions near a cytomegalovirus promoter (PCMV ) promoter in
such a way that the VP16 activation domain can interact with, and recruit, the transcriptional apparatus. The strength of the interaction between the TetR binding domain and
the tetO binding site is reduced by the addition of tetracycline. Increased concentrations of the inducer decreases the rate of transcription as the probability that the VP16
AD is in the vicinity of the promoter is decreased.
2.2 Engineering Regulatory Modules
Figure 2.9: (A) The tetracycline controlled TetR-VP19 transactivator (rTA) system developed
by Gossen and Bujard and adapted to yeast by Herrero et al. (B) Tetracycline controlled transcriptional silencing (tTS) system based on a TetR-Ssn6 hybrid protein.
The second TetR-based yeast expression system (Fig. 2.9), makes use of a hybrid
protein composed of TetR and Ssn6 or Tup1. Contrasting the yeast one- and two-hybrid
systems, the TetR-Ssn6/Tup1 system relies on promoter silencing rather than promoter
activation. Recall from section 1.4.3 that Ssn6 and Tup1 are components of the machinery that is recruited to the Gal1 promoter by Mig1 to downregulate expression of the
gal1 gene in the presence of glucose. In the TetR-Ssn6 hybrid system, the interaction
between TetR and its tetO DNA binding site serves the same role in the regulation of
transcription as Mig1; it recruits transcriptional regulators that interacts with the nucleosomes and/or the RNA polymerase holoenzyme. Similarly to the TetR-VP16 hybrid
protein, the TetR-Ssn6/Tup1 hybrid protein, or tetracycline controlled transcriptional
silencer (tTS), has a reduced affinity for the tetO binding sites in the presence of inducer. In other words, the rate of transcription is increased when inducer is added to
the growth medium.
It would appear that transcriptional regulation in eukaryotes is significantly more
complicated compared to the corresponding process in prokaryotes. Regulation in
prokaryotes is many cases a one-step process; a transcriptional activator such as CAP
or AraC binds near the promoter and increases the rate of transcription by interacting
with the RNA polymerase holoenzyme. Transcriptional repressors may work simply by
reducing the accessibility of the promoter through steric hindrance. In all the examples
above, the regulation is indirect and mimics that of many genes in the galactose regulon
43
44
Engineered Gene Networks
Figure 2.10: (A) Transcriptional repression of the yeast Gal1 promoter by steric hindrance by
the TetR repressor. In the absence of ATc, the TetR repressor binds to the DNA and prevents
polymerase binding to the promoter. (B) Correlation between expression level and induction
with galactose at 500 ng/ml ATc and with ATc at 2% galactose. The engineered TetR-switch
works nearly as well as the natural system.
(section 1.4.3). However, it is possible to engineer eukaryotic promoters where expression is attenuated in a manner that resembles the simple regulation found in prokaryotes. Such a system was constructed by Blake et al. by inserting tandem tetO operators
downstream of the TATA box in the Gal1 promoter (Fig. 2.10A). Transcription from
this promoter is activated by Gal4 when galactose is present in growth medium and
the expression level can be attenuated by the addition of ATc. Figure 2.10B shows
the expression of a yeast-enhanced variant of the green fluorescent protein (yEGFP)
when the concentration of galactose is varied at full induction with ATc (500 ng/ml)
and when the concentration of ATc is varied at full induction with galactose (2% w/v).
Although the mechanism of repression is relatively simple compared to the switches
that are based on hybrid proteins, the TetR switch works remarkably well. There is a
low “basal” level of transcription in the absence of ATc and the dynamic range matches
that of the natural Gal1 promoter. In the next section it will be discussed how the LacR
repressor can, in a similar way, be used to regulate transcription in mammalian cells.
2.2.3 Mammalian Switches
An engineered expression systems that is frequently used to regulate transcription in eukaryotic cells is the TetON/TetOFF system originally developed by Gossen and Bujard
and co-workers. The TetOFF system for mammalian transcription regulation works in
the same way as the system that was adapted to regulate transcription in yeast; a fusion
protein composed of the bacterial TetR repressor and the VP16 activation domain, the
tetracycline controlled transactivator tTA, is capable of activating transcription from a
2.2 Engineering Regulatory Modules
Figure 2.11: (A) The reverse tetracycline controlled transactivator (rtTA) composed of a mutated
TetR protein and the VP16 transcriptional activator binds to tetO operators in the presence
of inducer. (B) Dual system in which a tetracycline controlled transcriptional silencer (tTS)
prevents transcription in the absence of inducer and rtTA activates transcription in the presence
of inducer.
promoter that contains multiple tetO binding domains in the absence of tetracycline.
The TetON system uses a mutant variant of the hybrid protein, the reverse tetracycline
controlled transactivator (rtTA), in which four altered amino acids in the TetR component cause the interaction with the tetO binding sites to require the presence of inducer.
Adaptations of the rtTA systems are also available for yeast. The yeast TetR-Ssn6
hybrid protein in Fig. 2.9 is an analogue of a tetracycline controlled transcriptional silencer (tTS) first developed for mammalian cells. This hybrid protein contains a fusion
of TetR and the transcriptional silencing domain (SD) from the Kid-1 protein and can
be used to attenuate transcription in the absence of tetracycline. Improved switching
properties can be achieved by having tTS and rtTA present at the same (Fig. 2.11B).
In the same manner that the TetR repressor can be used directly to control the expression from yeast promoters, the LacR repressor can be used to control gene expression in higher eukaryotes by the insertion of lacO operator sequences at appropriate
locations near a promoter. There are many examples of this reported in the literature.
These switches operate in a manner that is similar to the regulation of the lactose operon
in E. coli and the PLlac0−1 system; the lacI gene is expressed at a constant rate giving
rise to a high level of tetrameric LacR repressor proteins that can interfere with transcription when bound to lacO operators within or near a promoter. A particularly re-
45
46
Engineered Gene Networks
Figure 2.12: Regulation of transcription with LacR/IPTG in the mouse. (A) Three lacO sequences were inserted within a promoter that control the expression of the sequence that encodes
the enzyme tyrosinase involved in pigmentation. Transcription is low when LacR is bound to the
operators. (B) The effect on pigmentation of IPTG on the expression of tyrosinase. Removal of
IPTG causes downregulation of expression and the pigmented infant becomes an adult albino.
c
Modified without permission from Cronin et al. Cold
Spring Harbor Laboratory Press.
markable example is the LacR/IPTG mediated control of pigmentation in the mouse reported by Cronin et al. In this system (Fig. 2.12A), three lacO DNA binding sites were
inserted into the promoter region of the gene containing the coding (cDNA) sequence
for the enzyme tyrosinase. Tyrosinase catalyzes the first step in melanin biosynthesis
and its deletion or downregulation gives rise to an albino phenotype. Figure 2.12B
shows an example of the phenotypic alterations that can be controlled by this simple
gene regulatory system. The mouse embryo and nursing pup is feed IPTG (through the
mothers milk) and the expression of tyrosinase causes the infant to be pigmented. When
the administration of IPTG to the infant is discontinued, LacR represses the expression
of the enzyme causing the adult mouse to display an albino phenotype.
2.3
Engineering Regulatory Circuits
The regulatory systems described in the previous section are composed of a single regulated promoter element, but are in fact small networks; the regulated promoter receives
inputs from a biochemical reaction network composed of transcription factors and their
inducers. The function of these networks is relatively simple. The inducers modulate
transcription by up- or down-regulating the activity of the transcription factor proteins.
Because of this simplicity, it seems appropriate to classify these systems as input/output
modules or regulatory motifs rather than networks. Input/output modules and regula-
2.3 Engineering Regulatory Circuits
Figure 2.13: (A) The toggle switch plasmid. R1 and P1 is either TetR and PLtetO−1 (pIKE
toggle) or λ CI and PLs1con (pTAK toggle). RBS1, rbs E and rbs B refers to different ribosome
binding sites. T1T2 to transcriptional terminators. The reporter is the GFPmut3 protein. (B)
Bistability in the pTAK117 toggle. pTAK102 is a IPTG inducible switch obtained by deletion
of the cI gene. (C) Population distributions obtained by flow cytometry at different levels of
c
induction in (B). Modified from Gardner et al. without permission. Nature
Publishing Group
and Annual Reviews.
tory motifs can be used as the foundation of more elaborate circuits allowing for more
sophisticated control of gene expression. Several such systems have been developed,
but only two, the bacterial toggle switch and the ring oscillator, will be discussed here.
They serve as a demonstration that complex behavior and functionality can be achieved
by combining simple elements.
The toggle switch was constructed by Gardner et al. by combining two repressible
promoter motifs in such a way that expression from one promoter prevents the expression of the other (Fig. 2.13). It was implemented on high copy number plasmids (ColE1
origins of replication) in two versions (Fig. 2.13A) using either the LacR and the TetR
repressors (designated pIKE) or the LacR and the λ CI repressors (designated pTAK).
The tetR gene or the cI was expressed from a LacR repressible promoter (Ptrc−2 ) and
the lacI was expressed from the PLtetO−1 promoter (pTAK toggle) or from a modified
PL promoter (PLs1con ) that is repressed by λ CI (pIKE toggle).
The cI gene used in the pTAK system was a mutant version termed cI857, which
produces a λ repressor protein that is inactivated at elevated temperatures. Transient
pulses of IPTG or ATc (pIKE system), or of IPTG or high temperature (pTAK system),
cause robust switching between states. Figure 2.13B illustrates induction with IPTG
of one variant of the toggle, pTAK117. The experiment was started in the LacR (low
fluorescence) and λ CI (high fluorescence) states by growing the cells at elevated temperature and in the presence of IPTG, respectively. These states are stable when IPTG
is removed or the temperature is lowered. When IPTG is added to cell that expresses
LacR at high levels (Fig. 2.13B), the low fluorescence increases sharply at a critical
point corresponding to a saddle-node bifurcation. This sharp transition contrasts the
smooth induction curve obtained from an IPTG-inducible switch (pTAK102) in which
the λ CI component is eliminated. Figure 2.13C shows population distributions obtained by flow cytometric measurements of single cell fluorescence just below, at and
47
48
Engineered Gene Networks
Figure 2.14: (A) The pZ vectors used by Elowitz and Leibler to construct the ring oscillator.
(B) Example of oscillations in fluorescence in a single cell measured by microscopy. Modified
c
without permission. Nature
Publishing Group.
just above the critical IPTG concentration.
The ring oscillator was constructed by Elowitz and Leibler by insertion of three
repressible motifs on a pZ expression vector. In this system, the TetR repressor is
expressed from the PLlacO−1 promoter, the Lac repressor from PR and the λ repressor from the PLtetO−1 promoter, thus constituting a closed ring with negative feedback to the previous module as illustrated in Fig. 2.14A. To obtain a shorter oscillation period, the oscillator was constructed using variants of repressor proteins (denoted
“lite”). These proteins are “tagged” with a amino acid that is recognized by a proteindegradation pathway and are constructed by fusing the sequence for the repressor proteins with the DNA sequence that encodes the recognition tag. The state of the network
was monitored by co-transformation of the low copy number pZ vector carrying the
oscillator (SC101 origin, amphicillin resistance) and a high copy number pZ vector
(ColE1 origin, kanamycin resistance) on which the PLtetO−1 promoter regulates the
expression of a gene, gfp-aav, that encodes a short-lived variant of the GFP protein.
In agreement with model predictions (see section 3.5, cells that carry this engineered network are capable of sinusoidal oscillation with a period of approximately
2.5 hours (Fig. 2.14B). The ring oscillator construct behaves somewhat erratically and
cells oscillate without phase coherence with a period of 160±40 minutes in only 40%
of the cells. The reasons for this are not well understood, but the lack of phase coherence could in part be due to fluctuations in the low number of molecules per cell. A
more robust relaxation oscillator has been constructed by Atkinson et al. by augmenting a chromosomally integrated natural, positive feedback system with a negative LacR
feedback. This system shows dampened, but coherent, oscillations.
2.4
How Transcriptional Regulation Works
It should be clear from the previous sections that the regulation of transcription in many
cases is based on relative simple interactions between transcription factor proteins and
their corresponding DNA binding sites. In prokaryotes, transcriptional repressors may
2.4 How Transcriptional Regulation Works
affect transcription simply by being present near or within the promoter where they
compete with the RNA polymerase for promoter access. As demonstrated by the engineered prokaryotic and eukaryotic promoter where repressor binding sites are appropriately inserted, more elaborate mechanisms, such as long-range interactions mediated
through DNA looping, are not strictly required, but may enhance the performance of
the switch.
Prokaryotic transcriptional activators like CAP and AraC appear to increase the
probability that the RNA polymerase binds to the promoter by interacting directly with
components of the holoenzyme. Since the context of transcriptional activation can be
changed simply by substituting activator binding sites, it seems that there is nothing sophisticated about the mechanism by which these activators work. Similarly, the success
of yeast one- and two-hybrid systems are testaments for a case of relative simplicity
in eukaryotic transcription regulation. All that is required for activation domains or
silencing domains to work is the appropriate positioning of these elements in the vicinity of a promoter. This positioning is in turn determined by the interactions between
protein domains or between DNA and protein DNA binding domains. It would thus
appear that simple binding reactions is at the core of many transcriptional regulatory
mechanisms. In the next section, we will discuss how the laws of chemistry that govern
such interactions can be used to transform qualitative models of molecular interactions
into quantitative, systems level models.
Suggested Further Reading
Textbooks:
Burke D. et al. Methods in yeast genetics. Cold Spring Harbor Laboratory Press,
(2000).
Nicholl, D. S. T. An introduction to genetic engineering. Cambridge University
Press (1994).
Sambrook & Russell. Molecular cloning. A laboratory manual. Cold Spring
Harbor Laboratory Press, 3rd Edition (2001).
49
50
Engineered Gene Networks
Suggested Further Reading (cont.)
Articles
Belli G, Gari E, Piedrafita L, Aldea M & Herrero E. An activator/repressor dual
system allows tight tetracycline-regulated gene expression in budding yeast. Nucleic Acids Res. 26, 942-7 (1998).
Blake W. J., Kærn M., Cantor C. R. & Collins J. J. Noise in eukaryotic gene
expression. Nature 422, 633-7 (2003).
Chien C. T., Bartel P. L., Sternglanz R. & Fields S. The two-hybrid system: a
method to identify and clone genes for proteins that interact with a protein of
interest. Proc Natl Acad Sci U. S. A. 88, 9578-82 (1991).
Cronin C. A., Gluba W & Scrable H. The lac operator-repressor system is functional in the mouse. Genes Dev., 15, 1506-17 (2001).
Elowitz M. B. & Leibler S. A synthetic oscillatory network of transcriptional
regulators. Nature 403, 335-8 (2000).
Gardner TS, Cantor CR, Collins JJ. Construction of a genetic toggle switch in
Escherichia coli. Nature 403, 339-42 (2000).
Gari E, Piedrafita L, Aldea M. & Herrero E. A set of vectors with a tetracyclineregulatable promoter system for modulated gene expression in Saccharomyces
cerevisiae. Yeast 13, 837-48 (1997).
Gossen M, Freundlieb S, Bender G, Muller G, Hillen W, & Bujard H. Transcriptional activation by tetracyclines in mammalian cells. Science 268, 1766-9
(1995).
Gossen M & Bujard H. Tight control of gene expression in mammalian cells by
tetracycline-responsive promoters. Proc Natl. Acad. Sci. U.S.A. 89, 5547-51
(1992).
Lutz R. & Bujard H. Independent and tight regulation of transcriptional units in
Escherichia coli via the LacR/O, the TetR/O and AraC/I1-I2 regulatory elements.
Nucleic Acids Res. 15, 1203-10 (1997).
Shimizu-Sato S., Huq E., Tepperman J. M. & Quail P. H. A light-switchable gene
promoter system. Nat. Biotechnol. 10, 1041-4 (2002).
Sikorski R. S. & Hieter P. A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics
122, 19-27 (1989).
Tutorial Part
3
Modeling Small Gene
Networks
From the examples discussed in the previous parts of the tutorial it should be clear that
the interaction between cis and trans-regulatory elements is of utmost importance in
determining when and how strongly a particular gene is expressed. Since this mode of
regulation is one of the primary control mechanisms available to the cell, it is essential
to understand the fundamental principles that correlate the frequency of transcription
with cis-regulatory dynamics. In this part of the tutorial, we discuss how quantitative models of gene expression can be obtained by combining principles of chemical
reaction kinetics with qualitative knowledge of the molecular mechanisms underlying
transcriptional regulation.
Biological systems are generally described on one of three levels. At the level
of single molecules, the attractive and repulsive forces between individual atoms are
modeled explicitly and the changes in their relative position is simulated on very short
time scales, typically on the order of femto- to nanoseconds (10−15 -10−9 s). At the
level of individual cells, the time-averaged properties of 100 − 104 molecules are used
to model individual reaction events at the microscopic level. This is typically done
in terms of stochastic birth-death processes in which molecules of a specific type are
created or destroyed at random. In chemical systems, such descriptions are usually
appropriate to model processes on the order of nano- to milliseconds (10−9 -10−3 s). At
the highest level of description, macroscopic behavior is modeled using deterministic
equations. This is often the most appropriate level of description of chemical systems
on the order of milli- to kiloseconds (10−3 -103 s) and beyond.
52
Modeling Small Gene Networks
Both stochastic and deterministic modeling serve as useful tools for the analysis
of cellular system behavior. The choice of one over the other depends on factors such
as the number of molecules involves, the time scale of the process of interest, and on
the degree of spatial mixing on that time scale. A deterministic model will typically
not be appropriate if the system contains on the order of 100 − 104 molecules of a
particular type, as it is the case for most living cells. There, the probabilistic nature of
individual reaction events and the deviations from the average may significantly alter or
even dominate the system’s behavior. As a result, the dynamics of a single cell is most
appropriately captured using microscopic, stochastic models that describe the temporal
evolution of biochemical networks of interacting molecules. This, however, does not
mean that macroscopic, deterministic models are irrelevant for the modeling of gene
regulatory networks and other cellular systems. If the number of molecules in a single
cell is described by a probability distribution with average hni and variance σn2 , the
central limit theorem theory tells us that a population of N cells will have an average of
hni molecules per cell and a variance that is given by σn2 /N . If the population is large
enough, the population variance becomes negligibly small and the dynamics of the
population average will reflect the behavior of the majority of cells in the population.
As a result, a macroscopic model will in many cases be an adequate description of the
most probable behavior of an “average” cell and of the average behavior of a single
cell over a long time period. There are a number of situations that need to be kept
in mind. These include, but are not limited to, (1) noise-induced transitions in which
basin boundaries are crossed as a result of random fluctuations, (2) noise-induced shifts
of critical points and (3) noise-induced bifurcations in which new attractors emerge
solely as a consequence of the fluctuations. Unfortunately, time limitations prohibit a
detailed discussion of these fascinating topics.
The modeling of chemical and biochemical reactions at the microscopic and the
macroscopic levels usually involves a description of an process of interest in terms
of elementary reactions. Elementary reactions describe individual reaction events at
the molecular level and it is not unusual that hundreds or even thousands of different
molecules are involved in a given process. To facilitate analysis and interpretation,
the dimensionality of large scale models can be reduced by estimating which reactions
are of marginal importance as well as systematic simplification schemes, such as the
quasi-steady state approximation. It is often the case that large networks of interacting chemical species can be broken down to smaller sets of subsystems that can be
described by response functions, or transfer functions in engineering terms, that reflect
how a subsystem changes its “outputs” as its “inputs” are varied. This part of the tutorial will focus on the estimation of response functions by applying the laws of chemistry
to a qualitative molecular level descriptions.
3.1 Biochemical Reaction Kinetics
Figure 3.1: Binding of repressor (LacR) to the operator O1 occurs in three steps; (1) dimerization of LacR monomers, (2) dimerization of LacR dimers, and (3) binding of the LacR tetramer
to the O1 form the LacR-operator complex.
3.1
Biochemical Reaction Kinetics
The first step in the formulation of a quantitative model of gene regulation is to construct
a qualitative diagram that shows the molecular interactions that are known to occur in
the system of interest. In the best case scenario, all of the individual steps are know
and understood in detail at the molecular level. For example, consider the binding of
the LacR tetramer to the main operator O1 located downstream of the Plac promoter
(see section 1.4.1). This reaction scheme can be described by the quantitative model
illustrated in Fig. 3.1. The model contains three reversible reaction steps (numbered
1, 2 and 3) for a total of six chemical reactions; (1) two LacR monomers combine to
form a LacR dimer and a LacR dimer falls apart to form two LacR monomer, (2) two
LacR dimers combine to form a LacR tetramer and a LacR tetramer falls apart to form
two LacR dimers, (3) a LacR tetramer binds to the lacO operator sequence to form a
LacR-operator complex and the LacR-operator complex falls apart to form a free LacR
tetramer and the unoccupied O1 operator.
Chemical kinetics provide a theoretical foundation that can be used to transform
a qualitative “cartoon” model of molecular interactions into a quantitative description.
In the context of gene regulatory systems, a particularly important application is to
estimate how changes in one or more “input” signals, e.g., the abundance of LacR
monomers, changes an “output” signal, e.g., the average fraction of lacO operators that
are occupied. The response function associated with a given biochemical input/output
system usually involves a series of hierarchical steps; (1) the qualitative model is broken
down into independent elementary reaction steps (or their equivalent), (2) the rates
of each reaction is estimated by applying the law of mass action, (3) the individual
reactions are assumed to have reached a quasi-steady state and (4) constraints such as
mass conservation are then used to calculate the output of the system. These steps are
ubiquitous in the modeling of biochemical reaction systems and are employed both at
the macroscopic and the microscopic levels of descriptions.
53
54
Modeling Small Gene Networks
3.1.1 Elementary Reactions
A general description of the reaction between the reactants A, B, . . . and their conversion into products P, Q, . . . is given by:
kf
aA + bB + . . . → pP + qQ + . . .
(3.1)
where the coefficients a, b, p and q are called the stoichiometric coefficients and the parameter kf is called the rate constant. The stoichiometric coefficients relate the number
of reactant molecules consumed to the number of product molecules generated in a single reaction. The stoichiometric coefficients are usually chosen such that the total mass
conserved and the number of atoms contained in the reactants is the same as the total
number of atoms contained in the products. For example, the six reactions involved in
the binding of LacR to O1 in Fig. 3.1 can be described by the reaction equations given
by:
k
k
1b
(LacR)2 →
2LacR,
k
2b
(LacR)4 →
2(LacR)2 ,
1a
2LacR →
(LacR)2 ,
2a
2(LacR)2 →
(LacR)4 ,
k3a
(LacR)4 + O1 → {O1 (LacR)4 },
k
(3.2)
k3b
{O1 (LacR)4 } → (LacR)4 + O1
The general reaction in Eq. 3.1 is said to be of order a and b with respect to the
reactants A and B, respectively. The overall reaction order n is the sum of the reaction orders for all the reactants and equals n = a + b when A and B are the only
reactants. In the reaction scheme in Eq. 3.2, the dimerization reaction (reaction 1a),
the tetramerization reaction (reaction 2a) and binding of the LacR tetramer to O1 are
second order reactions. The dissociation reactions of the LacR dimer (reaction 1b), the
LacR tetramer (reaction 2b) and of the repressor-operator complex (reaction 3b) are
first order reaction because they involve only a single reactant.
The reactant molecules perform a thermally driven random walk in the intracellular
environment and must collide with each other to have any chance of being converted
into the reaction products. Equation 3.1 describes elementary reaction if it corresponds
to a reaction that takes place as a result of a single molecular encounter in which a total
of n reactant molecules come together at the same instance in time. The probability
that n different molecules in an nth order elementary reaction will find each other at
the instance in time becomes vanishingly small as n increases. A high overall reaction
order is therefore a good indicator that the process is not an elementary reaction, but an
overall reaction that involves a sequence of elementary reactions and intermediates.
3.1.2 Law of Mass Action
For an elementary reaction, the frequency of encounters between reactants generally
depends on the number of reactant molecules per unit volume and on their average
3.1 Biochemical Reaction Kinetics
55
velocity. The probability that an encounter will occur is thus proportional to the concentration of the reactant molecules. The law of mass action reflects this basic principle
and states that the rate, v, of the general reaction in Eq. 3.1 is given by:
vf = kf [A]a [B]b · · ·
(3.3)
where square brackets indicate concentrations. The differential equations that describe
how the concentrations change in time from some arbitrary initial concentrations are
given by:
d[A]
= −ak[A]a [B]b · · · ,
dt
d[P ]
= +pk[A]a [B]b · · · ,
dt
etc.
(3.4)
In chemistry, concentrations are usually reported in molar, symbolized by M, with one
molar corresponding to one mole (a quantity of 6.02 × 1023 molecules) per liter. The
reaction rate is usually reported in terms of concentration change per time unit, e.g.,
M/s. The units of the rate constant kf is therefore Mn−1 /s where n is the reaction
order. The concentrations of chemical species within living cells are typically in the
range of 0.1 nM to 1 µM (10−10 M - 10−6 M).
While an encounter between the reactant molecules is a prerequisite for their conversion into products, not all of the encounters will result in the completion of the
reaction. Only a percentile of the molecular encounters will occur with the required
relative orientation of the reactants and only a fraction of these will be able to complete
the reaction. These effects are incorporated into a proportionality factor, the rate constant. In the classical model of chemical reaction kinetics due to Arrhenius, the value
of the rate constant is given by:
−Ea
,
(3.5)
k = A exp
RT
where Ea is the activation energy, T is the absolute temperature (in Kelvin), R is the
gas constant R = 8.3144 joule mol−1 Kelvin−1 (or 1.9872 calorie mol−1 Kelvin−1 )
and A is a constant. The Arrhenius model can be conceptualized as the motion along
a reaction path ξ where each point along the path is associated with a different energy
(Fig. 3.2A). The correlation between energy and ξ defines an energy potential E(ξ),
which, for simple reactions, has two minima and a single maximum. The minima are
located at ξ = 0 and ξ = 1 and corresponds to the energy of the reactants, ER , and
the energy of the products, EP , respectively. The change in energy as ξ changes from
zero to one is ∆E = EP − ER . The maximum is located somewhere in between and
corresponds to an energy barrier, E ‡ = ER + Ea . When the reactants encounter each
other, they must have sufficient (kinetic) energy to overcome the energy barrier in order
for the reaction to be completed. The fraction of encounters that have the appropriate
energy is given by a Boltzmann distribution and depends on the difference between the
“ground state” energy ER and the energy barrier E ‡ . In addition to having sufficient
56
Modeling Small Gene Networks
Figure 3.2: (A) Changes in internal energy E along the extent of reaction ξ. The energy of
the reactants has to exceed E ‡ in order to cross the activation energy barrier Ea . The internal
energy E is replaced by Gibbs free energy G in transition state theory. (B) A simple bimolecular
substitution reaction between the reactants A and BC, illustrating the formation of an energyrich intermediate ABC prior to the formation of the products AB and C.
excess energy, the molecules must encounter each other with the appropriate relative
angles. This steric factor is taken into consideration by the pre-exponential factor A,
Internal energy is in many cases not the most adequate measure of energy in biological processes. An alternative measure is known as the Gibbs free energy, G, which
is defined by
G = E + P V − T S = H − T S,
(3.6)
where E is the internal energy, P is the pressure, V is the volume, T is the absolute
temperature, S is the entropy and H is the enthalpy, H = E + P V . The transition state
theory of chemical kinetics is an extension of the Arrhenius theory where the internal
energy is replaced by Gibbs free energy, i.e., the activation energy Ea is replaced by the
Gibbs free energy ∆Ga required to reach the transition state with maximal energy G‡
and the change in internal energy ∆E is replaced by the change in Gibbs free energy
∆G. The values of ER and EP can be replaced by total Gibbs free energies GR and
GP (or equivalent measures relative to a standard state). The physical concept behind
transition state theory is the same as that of the Arrhenius theory and the two coincide
when the pressure, the volume, the temperature and the entropy remain unchanged. In
most biological systems, the temperature and the volume are unaffected by the completion of a given reaction. A simple illustrative example, a so-called SN2 reaction, is
given in Fig. 3.2B.
Most chemical and biochemical reaction mechanisms are far more complicated than
the single-step SN2 reaction and are associated with multi-dimensional potential surfaces with many peaks and valleys. The example nevertheless illuminates three general
principles; (1) that reactions require excess energy to rearrange molecular bonds and
atoms, (2) that the rate constant is correlated with the amount of excess energy that is
required to reach the transition state and (3) the reaction rate can be modulated by im-
3.1 Biochemical Reaction Kinetics
57
posing or by removing factors that exert a steric hindrance on the reacting molecules.
Examples of (3) have already been discussed in Part I. The LacR repressor (and many
other bacterial transcriptional repressors) may act to prevent transcription by preventing the RNA polymerase holoenzyme from interacting with the promoter. Similarly,
histone modification and nucleosome remodeling may greatly alter the accessibility to
the regulatory region of eukaryotic promoters.
3.1.3
Generalized Mass Action
The derivation of the reaction rate vf in Eq. 3.3 is based on an idealized model of
how molecules interact in very dilute solution. It has been observed that reaction rates
measured experimentally deviate from those predicted by applying the law of mass
action. This could, of course, reflect a limited understanding of the molecular details
of the reaction, but could also be due to the fundamental assumption that the reaction
takes place at very low concentrations. It has been established experimentally that
the effective concentration under numerous circumstances can be significantly different
from the absolute concentration. This is particularly important for living cells because
the interior of the cell contains a highly concentrated and inhomogeneous soup with
few of the characteristics of very dilute aqueous solutions. This is one of the factors
that makes it extremely dangerous to apply qualitative and quantitative measurements
obtained from biochemical in vitro experiments to living systems. At best, a parameter
value measured in vitro can be within an order of magnitude of its value in vivo. At
worst, molecular interactions observed in vitro may not occur in vivo (or vice versa)
and a qualitative model that is based on a quantitative model of former may lead to a
misinterpretation of how a regulatory mechanism operates within the living cell.
In many cases, the deviation from the idealized behavior can be accounted for by
introducing the concept of activity. In essence, the activity of a chemical species is a
measure of its effective concentration. In the simplest case, the activity ãA of a chemical
species A is given by ãA = γA [A], where γA is called the activity coefficient. Since
the activity coefficients with this assumption can be absorbed into the rate constant, the
rate equation in Eq. 3.3 remains valid. In the simplest nonlinear case, the activity and
the absolute concentration are related by a power-law ãA = γA [A]ζA . Assuming that
the activity coefficients do not change over the range of concentrations in question, the
general rate of reaction can be written as:
a b
vf =kf ãaA ãbB · · · = kf γA
γ · · · [A]aζA [B]bζB
=kf0 [A]α [B]β · · ·
(3.7)
where the activity coefficients are incorporated into the effective rate constant kf0 and
the exponents α and β are given by α = aζA and β = bζB .
The rate equation in Eq. 3.7 has the form of a generalized mass action (GMA)
description of the reaction. As the name implies, this representation is an extension
of the mass action kinetics in which the stoichiometric coefficients (positive integers)
58
Modeling Small Gene Networks
have been replaced by exponents that are positive reals. GMA can provide an adequate
description of reaction rates under circumstances where mass action kinetics cannot
be applied. This includes reaction that takes place under conditions of restricted dimensionality, for instance, when molecules are bound to the DNA or embedded in a
membrane, as well as in the complex intracellular environment of living cells. There
is at present time no theory that in general can predict the correlation between the exponents in the rate equation and the stoichiometric coefficients in a given environment.
However, using the stoichiometric constants in the rate equations is usually a good place
to start. An alternative and more mathematically tractable method for modeling genetic
systems has been discussed by Hlavacek and Savageau.
3.1.4 Chemical Equilibrium
All chemical reactions are in principle reversible. If there is a finite probability that the
transition state with energy G‡ can be reached from an initial state with energy GR (the
state with pure reactants) there will also be a finite probability that the transition state
can be reached from the initial state with energy GP (the state with pure products).
Therefore, to any “forward” elementary reaction there will exist a “backward” reaction
where the direction of the reaction arrow is reversed. For the general forward reaction
in Eq. 3.1, the corresponding backward reaction is given by:
k
pP + qQ + . . . →b aA + bB + . . .
(3.8)
From the law of mass action, the rate vb of this reaction is given by:
vb = kb [P ]p [Q]q · · · .
(3.9)
There are however a number of cases where the backward reaction can be ignored
without introducing a significant error and the forward reaction considered irreversible.
This include reactions where the decrease in Gibbs free energy is very large, enzyme
catalyzed reactions where only the forward rate constant is affected (see section 3.1.5),
and reactions where the product is converted into something else immediately after it
has been formed. An example of the latter is transcription where the product of one
reaction, i.e., the addition of one nucleotide to the RNA chain, is the reactant of a
subsequent fast reaction, i.e., the addition of a second nucleotide to the RNA chain.
When a reaction cannot be considered irreversible, it will eventually settle into a
state where there is no net change in the concentration of any of the components. In
this equilibrium state, the forward and the backward reactions still take place, but their
rates are equal to each other. In other words, in the equilibrium state (denoted with
subscript eq) it is obtained that vf ([A]eq , [B]eq , . . .) = vb ([P ]eq , [Q]eq , . . .) such that:
vf − vb = kf [A]aeq [B]beq · · · − kb [P ]peq [Q]qeq · · · = 0
(3.10)
3.1 Biochemical Reaction Kinetics
59
It follows immediately from vf = vb that:
kf [A]aeq [B]beq · · · = kb [P ]peq [Q]qeq · · · ⇒
K≡
kf
[P ]peq [Q]qeq · · ·
=
.
kb
[A]aeq [B]beq · · ·
(3.11)
Since kf and kb are constants, their ratio K defines a relationship between the reactants
and product concentrations that is independent of initial concentrations. Regardless of
the quantities in which the chemicals are initially mixed, the reaction will eventually
settle into an equilibrium state where the relationship between product and reactant
concentrations are uniquely defined by K.
Let us assume that reactants and products have been mixed together to achieve a certain set of initial concentrations. The direction in which a given reaction will progress
can be determined by calculating the reaction quotient Q defined by the relation
Q=
[P ]p [Q]q · · ·
.
[A]a [B]b · · ·
(3.12)
When Q < K, the reaction will proceed in the direction that increases Q, meaning
that more products will be formed. In the opposite case where Q > K, the reaction
will proceed in the direction that decreases Q and the products will be converted into
reactants. Similarly, if the equilibrium state Q = K is perturbed by addition of one
of the products, the system response will be to reestablish the equilibrium state by
decreasing the concentration of the products and increasing the concentration of the
reactants. In general, the response of a system in equilibrium to an external perturbation
will be to minimize the effect of that perturbation. This phenomenon is known as Le
Chatelier’s Principle or the Principle of Mass Action. When applied to the binding
of LacR to its main operator (Fig. 3.1, step 3), the Principle of Mass Action has the
intuitive and well known consequence; the probability that the operator is occupied
increases if the concentration of the LacR tetramer (or the LacR monomer) is increased.
What is not widely appreciated, however, is that this response is a direct result of mass
action kinetics.
The magnitude of the equilibrium constant depends on the difference between the
products and the reactants in standard Gibbs free energy, ∆G◦ . The standard Gibbs
free energy is a measure of Gibbs free energy relative to a standard state and can be
calculated from tables obtained through extensive experimental measurements. The
relationship between the equilibrium constant K and the standard Gibbs free energy
∆G◦ is given by the Gibbs-Helmholtz equation:
∆G◦ = −RT ln K.
(3.13)
The standard Gibbs free energy and the equilibrium constant are thus equivalent measures of the distribution of molecules at equilibrium and can be used interchangeably.
At 37◦ C, an equilibrium constant of 10 corresponds to ∆G◦ = −5.9 kJ/mol (-1.4
60
Modeling Small Gene Networks
kcal/mol) while an equilibrium constant of 0.1 corresponds to a standard free energy of
∆G◦ = +5.9 kJ/mol. When a system has been perturbed away from an equilibrium
state, or have yet to reach it, the “distance” between the current state of the system
(Q), and the equilibrium state (K) can be quantified by the change in free energy of the
reaction ∆Gr . The reaction free energy is defined by:
∆Gr = ∆G◦ + RT ln Q.
(3.14)
When the change in free energy is zero, ∆Gr = 0, the system is in equilibrium and
Q = Keq (since Q = Keq and ⇒ RT ln K = −∆G◦ ). If ∆Gr < 0, the value of the
reaction quotient is lower than the equilibrium constant and the reaction will proceed
in the forward reaction in order to reach the equilibrium state. Conversely, the reaction
will proceed in the backward direction when ∆Gr > 0.
3.1.5 The Michaelis-Menten Reaction
Many biochemical reactions have very high activation energies and will not occur spontaneously at any measurable rate. In living cells, this problem is solved by the use of
highly specialized enzymes that may increase the rate constant by lowering the activation energy. The Michaelis-Menten reaction is a classic, molecular-level quantitative
model of how some enzymes work. It describes the conversion of a substrate S into a
product P in terms of three elementary reactions described by the reaction scheme:
kf
k
E + S ES →c P + E.
(3.15)
kb
First, the enzyme E combines with the substrate to form an enzyme-substrate complex ES. When the enzyme does it job successfully, the enzyme-substrate complex
decomposes into the product P and the free enzyme E is regenerated. In the case of
an unsuccessful encounter, the enzyme substrate complex simply dissociates into the
original substrate and free enzyme.
Applying the law of mass action to the three elementary reaction gives four differential equations:
d[S]
dt
d[E]
dt
d[ES]
dt
d[P ]
dt
= −kf [E][S] + kb [ES],
= −kf [E][S] + (kb + kc )[ES],
(3.16)
= kf [E][S] − (kb + kc )[ES],
= kc [ES].
Since the conversion of the enzyme-substrate complex is irreversible, the time-invariant
nontrivial solution of Eq. 3.16 is referred to as a steady state. In contrast to a chemical
3.1 Biochemical Reaction Kinetics
61
equilibrium state, a steady state can only be maintained if there is a constant influx of
fresh reaction substrates and a continuous removal of reaction products. It will for now
be assumed that there is no influx of substrate and that there is no product when the
reaction is started, [P ](t = 0) = 0. It will further be assumed that the initial substrate
concentrations is given by [S](t = 0) = S0 and that all of the enzyme is initially in
the free form, i.e., [E](t = 0) = E0 , [ES](t = 0) = 0. In this case, the system will
eventually reach the trivial time invariant solution where [S] = 0 and [P ] = S0 ,
The first step in any modeling is to reduce the dimensionality of the problem considered. Inspection of Eq. 3.16 reveals two conservation relations:
d[E] d[ES]
+
= 0,
dt
dt
d[S] d[ES] d[P ]
+
+
= 0.
dt
dt
dt
(3.17)
The conservation relations reflect the fact that the total enzyme concentration is constant ([E] + [ES] = E0 ) and that a decrease in substrate concentration is coupled to
corresponding increases in the concentrations of the enzyme-substrate complex and of
the product ([S] + [ES] + [P ] = S0 ). The presence of the two conservation relations
means that only two of the four differential equations in Eq, 3.16 are needed to fully
describe how the concentrations of all of the species evolve as the reaction progresses.
The reduced system is given by:
d[S]
= −kf (E0 − [ES])[S] + kb [ES],
dt
d[E]
= −kf (E0 − [ES])[S] + (kb + kc )[ES],
dt
(3.18)
with initial conditions
[S](t = 0) = S0 ,
[ES](t = 0) = 0.
(3.19)
The concentration of free enzyme can be substituted everywhere by [E] = E0 − [ES]
and the product concentration can be calculated from [P ] = S0 − [S] − [ES].
The time-course of the Michaelis-Menten reaction is illustrated in Fig. 3.3A. The
figure shows the concentrations [S]/S0 , [ES]/E0 and [P ]/S0 are plotted as a function
of time. Note that the substrate concentration changes very little initially (t < 10), that
the enzyme-substrate concentration quickly reaches a fairly flat plateau and that [ES]
remains more or less constant for an extended time period (between t ≈ 0.1 and ≈ 10).
In this region, it can be assumed that [S] ≈ S0 and d[ES]/dt ≈ 0. At later times, the
substrate concentration decreases and the product accumulates. The time-course of the
reaction can thus be separated into three distinct regions; a region where the concentration of the enzyme-substrate complex is rapidly increasing, a region where the enzymesubstrate complex concentration remains constant and a region where the concentration
of the enzyme-substrate complex decreases as the substrate is depleted. The second region corresponds to a situation where the enzyme-substrate complex can be assumed to
62
Modeling Small Gene Networks
Figure 3.3: (A) Time course of a reaction catalyzed by a Michaelis-Menten type enzyme. (B)
The dependence of the Michaelis-Menten rate equation on the substrate concentration. The
insert shows the slow convergence to v/vmax = 1.
be in a quasi-steady state defined by d[ES]/dt = 0. Once the enzyme-substrate complex has reached this state, the rate of product formation obeys the Michaelis-Menten
rate equation:
v=
vmax [S]
,
KM + [S]
(3.20)
where vmax is the maximal rate of product formation and KM is called the MichaelisMenten constant. These constants are defined by vmax = kc E0 and KM = (kb +
kc )/kf , respectively.
The Michaelis-Menten rate equation can readily be derived using the following
steps; (1) assume a quasi-steady state for the enzyme substrate complex (d[ES]/dt =
0), (2) solve the resulting algebraic equation with respect to free enzyme (kf [E][S] =
(kb + kc )[ES]), (3) insert the resulting equation into the conservation relation for
the total enzyme concentration (E0 = [E] + [ES]) and (4) solve for the concentration of the enzyme-substrate concentration in the quasi-steady state to obtain [ES] =
E0 [S]/(KM + [S]). The rate of the reaction, i.e., the Michaelis-Menten rate equation,
is then obtained from v = kc [ES]. It can be shown rigorously that the quasi-steady
state introduces minimal error when E0 KM .
The dependence of the rate of reaction on the substrate concentration is shown in
Fig. 3.16B. The dependence of v/vmax on [S] has the characteristic shape of a saturation curve. It describes the relative occupancy of the enzyme by the substrate, i.e.,
[ES]/([E] + [ES]) and is thus the response function associated with the input [S]
and the output [ES]. When data points can be fitted well to the response function for
the Michaelis-Menten reaction, the parameter KM can be read directly from the plot
of f ([S]) = v([S])/vmax as the input signal that gives 50% response, i.e. the value
[S]0.5 where f ([S]) = 0.5. Note that the Michaelis-Menten rate equation converges
quite slowly as the substrate concentration is increased (Fig. 3.16B, insert). In fact, a
Michaelis-Menten enzyme does not act as a very efficient switch. In order to change
3.1 Biochemical Reaction Kinetics
63
the response from 10% to 90% it is necessary to increase the input, i.e., the substrate
concentration, by 810%. The ratio of the input signals that produce 10% and 90% is
called the response coefficient and is denoted by RS . The response coefficient for an
Michaelis-Menten enzyme is RS = 81 and the high value is primarily due to low slope
of the response function at [S]0.5 .
It is noted that the equilibrium constant for the binding of the substrate to the enzyme is given by KS = kf /kb and that the Michaelis-Menten constant is equal 1/KS
when kc kb . The assumption that kc kb together with d[ES]/dt ≈ 0 is often
referred to as the pre-equilibrium or the quasi-equilibrium approximation.
3.1.6
Hill-type Kinetics
It is often observed that response functions measured experimentally have slopes that
exceed those predicted from the Michaelis-Menten reaction scheme. A canonical function that frequently is employed when the Michaelis-Menten rate equation fails is provided by the so-called Hill rate equation or Hill-type function. The most common form
of this equation is given by:
h(x) =
xnH
,
KH + xnH
(3.21)
where nH and KH are called the Hill coefficient and the Hill constant, respectively.
Another frequently used form that is used to describe an inhibitory input signal is given
by:
h0 (x) = 1 − h(x) =
KH
.
KH + xnH
(3.22)
The Hill constant is related to
√ the input signal that gives 50% response (x0.5 )
through the relationship x0.5 = nH KH and the Hill coefficient is related to the steepness of the response function. For nH = 1, the Hill rate equation coincides with the
Michaelis-Menten rate equation and the response coefficient is RS = 81 For nH 1,
the Hill rate equation approaches a Heaviside step function with a threshold at x = x0.5
(see Fig. 3.4A). The response coefficient decreases dramatically as the Hill coefficient
increases. When nH = 2, less than a 10-fold increase is required to change the output
from 10 to 90% (RS = 10). When nH = 6, the input signal only needs to double
(RS = 2).
Experimental data points, r(y), obtained at different values of the input signal y,
can be often be fitted well to a Hill-type function to capture the essential behavior of
the response function. The fitting procedure involves the construction of the so-called
Hill plot in which the logarithm of the ratio r(y)/(1 − r(y)) is plotted as a function
of log(y). It is here assumed that r(y) is appropriately normalized. If this is not the
case, the Hill plot is constructed from the logarithm of the ratio r(y)/(rmax − r(y))
where rmax is the maximal value of r(y) obtained for y y0.5 . A linear fit to data
64
Modeling Small Gene Networks
Figure 3.4: (A) Response functions generated by the Hill rate equation. (B) Hill plot of
log h(x)/(1 − h(x)) versus log(x) used to determine the Hill coefficient as the slope and the
Hill constant as the intersect at log x = 0.
point that are transformed in this way will give the Hill coefficient as the slope and the
logarithm of the Hill constant as the negative intersect. This is illustrated in Fig. 3.4B
where the “experimental” points r(y) are generated from by the function h(x). Since
1 − h(x) = KH /(KH + xnH ), the plot will for an activating signal consists of lines
with positive slopes defined by:
h(x)
−1 nH
log
= log(KH
x ) = − log KH + nH log x.
(3.23)
1 − h(x)
For an inhibitory signal, the Hill plot will consist of lines with negative slopes defined
by:
h0 (x)
KH
log
= log
= log KH − nH log x.
(3.24)
1 − h0 (x)
xnH
The Hill plot is usually constructed from measurements in the range of input values that
give 10% to 90% response.
The Hill-type response functions are generally used to model the relationships between an input and a response with a minimal number of unknown parameters and
without the complexity of the underlying reaction kinetics. We shall see examples of
this in sections 3.4.1 and 3.4.2 where the correlation between the intracellular activity
of transcription factors (the output) and the extracellular concentrations of their inducers (the input) are modeled in terms of Hill-type functions. However, in contrast to
“black-box” models in which a response is fitted to a polynomial function with no clear
molecular basis, models that are based on Hill-type functions can be considered as a
type of “gray-box” phenomenological descriptions that often, but not always, preserve
some connection to the underlying reaction mechanisms.
The most simple molecular mechanism that gives rise to a Hill-type response function is a generalized mass action reaction scheme that resembles the Michaelis-Menten
3.2 Modeling Gene Expression
65
reaction:
kf
k
E + nH S ES →c E + P
(3.25)
kb
where the stoichiometric coefficient of S is replaced by the real-valued exponent nH .
The response function in Eq. 3.21 can be obtained by applying the law of mass action to the reaction in Eq. 3.25 when the it is assumed that [E] + [ES] = E0 and
that d[ES]/dt ≈ 0 (see section 3.1.5) and is usually interpreted as the simultaneous
binding of nH substrate molecules to the enzyme. However, when setting kc = 0, the
reaction scheme applies equally well to other types of reactions. For example, E, S
and ES could also represent a DNA binding region, a transcription factor and a DNAtranscription factor complex, respectively, or a transcription factor, an inducer and a
transcription factor-inducer complex, respectively.
3.2
Modeling Gene Expression
The expression of protein encoding genes involves processes that by all measures are
irreversible; the transcription of a gene into an mRNA and the translation of the mRNA
into a protein are enzyme catalyzed reactions that involve thousands of reactants and
consume vast amounts of energy. In the simplest model of gene expression, mRNA is
synthesized from nucleic acids and protein is synthesized from amino acids following
the irreversible reaction scheme:
κ
M
nucleic acids →
mRNA
κ
P
amino acids →
protein,
(3.26)
where the rate constants κM and κP are pseudo-first order, i.e., have units of inverse
time. Despite the complexity of these reactions at the molecular level, the process of
gene expression can in many cases be described in terms of two ordinary differential
equations that determine the temporal evolution of the number of mRNA molecules nM
and of the number of protein molecules nP :
dnM
= κM nD − kdM nM ,
dt
dnP
= κP nM − kdP nP ,
dt
(3.27)
where nD is the average number of active promoters, and kM and kP are first-order decay constants associated with the half-life of the mRNA and the protein, respectively.
For gene regulatory systems in general, regulatory signals could alter the rate of transcription, the rate of translation as well as the half-lifes of mRNA and protein.
Equation 3.27 is a coarse description of the immensely complicated processes involved in gene expression. However, experience has shown that it often is a simple
and appropriate alternative to more complicated models that incorporate the molecular
details of mRNA and protein synthesis and decay. Situations where it is not a suitable
66
Modeling Small Gene Networks
description include the modeling of processes that takes place on a time scale that is
comparable to the time scales of transcription and translation. Transcription in yeast
occurs at a rate of ≈30 nucleotides/second so it takes less than 1 minute for RNA polymerase II to translate a gene with 1400 nucleotides (the average length of genes in its
target class). Translation occurs at approximately the same time scale. Depending on
the process of interest, the time delay between the initiation of transcription or translation and the formation for the corresponding mRNA or protein product could have
important implications. In this case, it may be sufficient to reformulate the ordinary
differential equation in Eq. 3.27 as a delay differential equation.
While Eq. 3.27 is used frequently and with good results, there are additional approximations that are need to be made when the model describes gene expression in
cells that grow and divide. There are in this case two fundamental problems with
Eq. 3.27; volume-dependent rates of reaction and partitioning of the cellular content
between mother and daughter cells at cell division. The latter arise since a certain fraction of the content of the mother cell will be transferred to the daughter cell when the
it divides. The number of mRNA and protein molecules per cell will therefore oscillate
with a period that is determined by the period of the cell division cycle and an amplitude
that depends on the partition mechanism (see Fig. 3.5). The cellular concentrations of
mRNA and protein will generally also oscillate.
The problem with volume-dependent reaction rates arises since the rate constants
in Eq. 3.27 are associated with reactions that, at the very least, are second order. The
probability that two (or more) molecules will interact depends not only on the number
of molecules, but also on the volume in which the molecules are distributed and the
rates of reactions that involve more than one molecule. Examples of reactions that have
volume-dependent reaction rates include the binding of polymerases and transcription
factors to the DNA, the binding of ribosomes to mRNA and dimerization of proteins.
A second order reaction between two molecules A and B has a rate of reaction vf =
kf [A][B] that is given in concentration units per time unit. This rate can be converted
into a rate v̄f that has units of number of molecules per time unit by multiplication with
the cell volume v(t):
v̄f = vf v(t) = kf [A][B]v(t) = kf nA nB /v(t).
(3.28)
where nA and nB are the numbers of A and B molecules per cell. In certain situations, it may be that the concentration of one of the molecules, say B, remains constant. In this case, it is possible to define a pseudo-first order rate constant kf0 = kf cB ,
where cB = nB /v(t), such that the reaction rate is v̄f = kf0 nA . This assumption
implies the presence of some intracellular feedback mechanism to ensure a constant
number/volume ratio throughout the cell division cycle. This seems reasonable for
housekeeping enzymes, such as RNA polymerases and ribosomes, but not for other
types of reactions, such as protein dimerization or binding of transcriptional regulators
to the DNA.
The problem of volume-dependent reaction rates can be circumvented by working
3.2 Modeling Gene Expression
67
with a model of cellular concentrations rather than the number of molecules per cell.
The transformation of the number-based model in Eq. 3.27 to a concentration-based
model is done by differentiation of the concentration c(t) = n(t)/v(t). For example,
the rate equation that describes the variation in protein numbers in Eq. 3.27 is in terms
of protein concentration cP (t) = nP /v(t) given by:
dcP (t)
d nP (t)
1 dnP (t) cP (t) dv(t)
=
=
+
dt
dt v(t)
v(t) dt
v(t) dt
cP (t) dv(t)
,
= cM (t)κP − kdP cP (t) +
v(t) dt
(3.29)
where cM (t) is the concentration of mRNA. While solving one problem (the volumedependent rates of reaction), the transformation has introduced another. In order to
solve Eq. 3.29, it is necessary to specify the explicit form of v(t). How this can be
done in general is uncertain. For example, E. coli cells are rod-shaped and double their
length during a cell division cycle that ends when the cell is cleaved in the middle. On
the other hand, the cell division of S. cerevisiae is highly asymmetric. A small bud is
formed on the surface of the spherical mother cell and the bud grows in size until it
eventually dissociates. The daughter cells continue to grow after cell division and it
takes some time before it reaches maturity and begins to produce off-spring of its own.
In other words, the cell volume depends not only on the time since the last cell division
but also on the overall age of the cell.
The only way to avoid describing how the volume changes as the cells grow and
divide is to assume that the rate of cell volume increase dv(t)/dt is proportional to the
current volume v(t) with some proportionality constant kg . In mathematical terms it
must be assumed that:
dv(t)
= kg v(t), ⇒ v(t) = v0 ekg t ,
dt
(3.30)
where v0 is the initial volume of the cell. If the cell divides every time its volume
doubles, i.e., when v(t) = 2v0 , the proportionality factor kg is related to the period of
the cell division cycle T through kg = ln 2/T . In other words, to avoid modeling v(t)
explicitly, it is necessary to assume that the cell volume grows exponentially. With this
assumption, the concentration-based model becomes a system of ordinary differential
equations:
dcM
= cD κM − γM cM ,
dt
dcP
= cM κP − γP cP ,
dt
(3.31)
where cD = nD /v(t) and γM = kdM + kg and γP = kdP + kg are the first-order
rate constants associated with the apparent, biological half-lifes of mRNA and protein,
respectively.
In summary, in order to model gene expression in cells that grow and divide, it
is necessary to provide an explicit description of how the cell volume changes during
68
Modeling Small Gene Networks
Figure 3.5: Levels of (A) mRNA and of (B) protein predicted by different models of gene
expression. Black curves shows the number of mRNA (nM ) and proteins (nP ) predicted by
Eq. 3.27 with exponential volume growth, v(t) = v0 exp(ln 2t/T ) and periodic equipartition
of cellular content at regular intervals T (cell division). Blue curves shows the cellular concentrations cM = nM /v(t) and cP = nP /v(t) obtained from Eq. 3.27. Green curves show the
average number of molecules per cell predicted by Eq. 3.32. Red curves show the average cellular concentration predicted by Eq. 3.31. Parameter values: v0 = 1 (arbitrary units), nD = 1,
hcD i = 0.5/ ln 2, T = 90 minutes. Other parameters values (in min−1 ): κM = 0.04, κP = 6,
kM = 0.03, kP = 0.001.
the cell division cycle and how cellular content is partitioned at cell division. This
is desirable to avoid. First of all, cell growth and division are complicated processes
that are difficult to describe with a simple mathematical formula. Secondly, systems
of differential equations with time-dependent parameters are significantly more difficult to analyze compared to systems of ordinary differential equations. The assumption required to obtain a system of ordinary differential equations is unambiguous in a
concentration-based model. It must be assumed that the cell volume grows exponentially. This assumption implies that the concentration is unaffected by cell division,
i.e., that the mother and daughter cells inherit a fraction of the cellular content that is
proportional to their respective size. It is not so clear what assumptions are required
to convert a number-based model into a system of ordinary differential equations. The
simplest way to obtain a model for the average number of molecules per cell is to assume that dc(t)/dt ≈ v(t)−1 dn(t)/dt and multiply both sides of Eq. 3.31 with v(t) to
obtain:
dnM
dnP
= nD κM − γM nM ,
= nM κP − γP nP .
(3.32)
dt
dt
The most direct interpretation of this model is that the loss of molecules at division is
averaged over the cell division cycle and that the effect of volume changes on the rate
constants is negligible.
The choice between a model of the average numbers of molecules per cells (Eq. 3.32)
or the average cellular concentrations (Eq. 3.31) can be made based on whether nD or
cD remains constant. For genes that are carried on the cell’s chromosome, it is reasonable to assume that nD is a constant (or changes in discrete steps). For genes that
3.3 Modeling cis-Regulatory Systems
# entries
total
average
min.
max.
median
mRNAs/cell
Stanford
5707
12543
2.2
0
130
1.2
mRNAs/cell
MIT
5468
14958
2.7
0
89
0.8
mRNA half
life (min.)
5385
101470
19
5
116
16
69
transcription
rate (min−1 )
5065
629
0.12
0
4.7
0.03
translation
rate (min−1 )
5650
115830
21
0.04
1329
6.1
synthesis
per mRNA
4910
907273
185
3.3
3740
130
Table 3.1: Summary of experimental estimates of transcription rates, mRNA half
life, protein synthesis rates and the number of proteins synthesized per mRNA transcript. Data is obtained from http://web.wi.mit.edu/young/expression/ and http://genomewww.stanford.edu/yeast translation/
are carried on self-replicating plasmids, it is reasonable to assume that cD is constant.
However, since both models describe time-averages, the concentration based model can
still be used when nD is constant
R if cD is set equal to the concentration averaged over
on cell division cycle, hcD i = T nD /v(t)dt/T ). Figures 3.5A and 3.5B compare the
two models of average gene expression with the predictions from the “real” description
in Eq. 3.27 where cell growth and division is modeled explicitly. It is assumed that the
cell volume grows exponentially, v(t) = v0 exp(ln 2t/T and that the cell divides into
two identical halves with volume v0 when v(t) = 2v0 . It is also assumed that nD = 1
and that cD = hcD i = nD /(2 ln 2).
Two large scale experiments have provided estimates of transcription rates, the apparent half-lifes of mRNA and translation rates for most of the yeast genes. These
extensive data sets, which are summarized in Table 3.1, can be used to provide rough
estimates of the magnitude of the parameters κM , κP and γM . The median apparent
mRNA half-life is 16 minutes (average of 19 minutes), corresponding to γM ≈ 0.04
min−1 . The rate of transcription probably in the range of κM = 0.03 to 0.1 min−1
for most genes. The median rate of translation per mRNA is κP ≈ 6 min−1 , but can
vary greatly. While no genome-wide estimates of apparent protein half lifes are currently available, the value of γP is expected to be significantly lower than γM . For
well-translated genes, it has been estimated that there are about 4000 protein molecules
per mRNA. This puts the value of γP in the range of approximately γP ≈ 0.001 min−1 .
Combining the two datasets can be used to predict that the median number of proteins
produced per mRNA, i.e., the ratio b = κP /γM is 130 (average of 185).
3.3
Modeling cis-Regulatory Systems
One of the most important means of controlling gene expression is the modulation of
the rate of transcription by transcription factor proteins. Regardless of whether gene
expression is modeled in terms of numbers or concentrations, the steady state number
70
Modeling Small Gene Networks
of mRNA (nsM ) and protein (nsP ) per cell is predicted to be given by:
nsM = csM v(t) = f (x)
κ∗M
,
γM
nsP = csP v(t) =
κ∗ κP
n M κP
= f (x) M ,
γP
γM γP
(3.33)
where κ∗M is the maximal rate of transcription and f (x) describes how the rate of transcription is modulated in response to the input signals x = x1 , . . . , xn . The response
function f (x), which varies between zero and unity, gives the time- or populationaveraged relative occupancy of the promoter by a polymerase. In this section, we
consider a number of examples of how quantitative response functions describing the
promoter occupancy can be obtained from the qualitative knowledge of molecular interactions between cis and trans-regulatory elements. In what follows, the state of the
cis-regulatory region will be denoted using a compact notation, Oijk , where the i, j and
k indicate the occupancy of a specific binding site on the DNA. When a single binding
is considered, there will be two states O0 and O1 where O0 indicates that the binding
site is unoccupied and O1 indicates that the binding site is occupied. When two binding
sites are considered, the state O00 indicates that both sites are unoccupied, O10 and O01
indicates that either site i or site j is occupied, respectively, and O11 indicates that both
binding sites are occupied.
3.3.1 Repressor-Operator Binding
The binding of the LacR to the lacO operator in Fig. 3.1 involves three equilibrium
reactions. These reactions can be represented by the symbolic reaction equations given
by:
K1
2X X2 ,
K2
2X2 X4 ,
K3
X4 + O0 O1 ,
(3.34)
where X, X2 and X4 denote repressor monomers, dimers and tetramers, respectively,
while O and O1 denote the unoccupied operator region and the tetramer-operator complex respectively. The equilibrium constants defined by:
K1 =
[X2 ]
,
[X]2
K2 =
[X4 ]
,
[X2 ]2
K3 =
[O1 ]
.
[X4 ][O0 ]
(3.35)
Note that the subscript eq indicating the equilibrium concentration has been omitted for
clarity. The remaining part of this section deals exclusively with steady states and it is
implied that concentration brackets refers to equilibrium concentrations.
In some cases, it may not be necessary to explicitly include all of the possible
intermediate reaction steps. For example, the overall reaction for the binding of LacR
to the lacO operator can be represented by a single reversible reaction step where four
LacR monomers associate simultaneously with the LacR binding site:
k4f
4X + O0 O1 ,
k4b
K4 =
[O1 ]
.
[X]4 [O0 ]
(3.36)
3.3 Modeling cis-Regulatory Systems
71
It is thus possible to define an equilibrium state based solely on the initial reactants
and the final products without knowledge of the intermediate steps. When the detailed
reaction mechanism is known, the equation for the overall reaction can be obtained by
“adding” together the individual reactions. The following illustrates how the overall
reaction equation is obtained for the reactions in Eq. 3.34:
2X X2 ,
2X X2 ,
2X2 X4 ,
(3.37)
O0 + X4 O1
4X + 2X2 + O0 + X4 2X2 + X4 + O1 ⇒
4X + O0 O1 .
In this procedure, terms that appear on the same side of the reaction arrows are first
summed to give a reaction equation where some of the terms may appear on both sides
of the reaction arrows. These terms, which are indicated by boxes in Eq. 3.37, can
be eliminated if they appear with the same stoichiometric coefficient on both sides as
they represent intermediate states that are neither reactants nor products. Note that the
reaction 2X X2 appears twice in Eq. 3.37 to ensure that the overall stoichiometry
is correct (dimerization has to occur twice for each tetramer formed). The equilibrium
constant for the overall reaction is given by the product of the equilibrium constants for
the individual reactions:
[O1 ]
K4 =
= K12 K2 K3 .
(3.38)
[X]4 [O1 ]
The equilibrium constant K1 is squared since this reaction occurs twice in the overall
reaction. The correctness of the expression for K4 can easily be validated by inspection.
The relative occupancy of the operator by the tetramer is given by the response
function f ([X]) = [O1 ]/[OT ] where [OT ] = [O0 ] + [O1 ]. It can easily be derived by
setting [O] = [O]T − [O1 ] in Eq. 3.38 as:
f ([X]) =
K4 [X]4
1 + K4 [X]4
or f (x) =
x4
,
1 + x4
(3.39)
where the response function f (x) is obtained
by introducing the dimensionless concenp
4
tration x of repressor monomers as x = [X]. This is a Hill-type response function
(see section 3.1.6) with a Hill coefficient equal to four and a Hill constant that is given
by KH = K4−1 (or KH = 1 when the normalized input x is used).
3.3.2
Alternative Reaction Paths
While the three reaction steps in Fig. 3.1 and described by Eq. 3.34 are believed to
capture the molecular level details of the binding of LacR to O1, there are alternative
72
Modeling Small Gene Networks
Figure 3.6: Alternative reaction path for the binding of a tetrameric complex to an operator
region composed of two adjacent binding sites OA and OB . Reaction path (i) (gray box) corresponds to the binding of the lac repressor (Fig. 3.1). In path (ii) and (iii), the tetramer-operator
complex is formed by the sequential binding of dimers.
routes to the formation of the operator-tetramer complex. The lacO operator is comprised of two half-sites that each makes contact to one dimeric subunit of the tetramer.
It is therefore a possibility that the binding of the tetramer occurs sequentially, one
dimer at a time, rather than in a single step. The resulting alternative paths are illustrated in Fig. 3.6 together with the reaction path (Fig. 3.1), believed to best capture the
actual binding of the LacR to O1. This reaction path is labeled (i) in Fig. 3.6 and is
emphasized by a gray box. In the scenario labeled (ii), a dimer first binds to the left
binding site (OA ) and a second dimer then binds to the right binding site (OB ) to form
the tetramer-operator complex. The order of dimer binding is reversed in the reaction
path labeled (iii).
The observant reader will notice that the reaction path (ii) describes the binding of
λ CI repressor to the adjacent operators OR1 and OR2 to form a tetrameric complex
between four λ CI monomers and the OR region of the PR /PRM promoters (see section
1.4.2). λ CI will be discussed further in section 3.3.3. Moreover, in the modified
Gal1 promoter (section 2.2.2) two TetR repressor dimers can bind to two nascent and
identical tetO operators. The dimeric TetR repressor proteins are not known to form
tetramers and their binding to the two operator sites probably follows reaction paths (ii)
and (iii) with identical equilibrium constants for each step. The modified Gal1 promoter
will be discussed further in section 3.4.2.
When O10 and O01 denote configurations where a repressor dimer X2 is bound to
the operators OA and OB , respectively, the additional reaction steps can be represented
3.3 Modeling cis-Regulatory Systems
73
by the symbolic reaction equations:
K2A
X2 + O00 O10 ,
K3A
X2 + O10 O11 ,
K2B
X2 + O00 O01 ,
K3B
(3.40)
X2 + O01 O11 ,
where O11 is the state where both OA and OB are occupied. The equilibrium constants
for the reactions in Eq. 3.40 are defined by:
[O10 ]
,
[X2 ][O00 ]
[O11 ]
=
,
[X2 ][O10 ]
K2A =
K3A
[O01 ]
,
[X2 ][O00 ]
[O11 ]
=
.
[X2 ][O01 ]
K2B =
K3B
(3.41)
Using same method that was used to derive the overall equilibrium constant K4 for
reaction path (i) in Eq. 3.38, the overall equilibrium constants K4A and K4B for path
(ii) and (iii), respectively, can be obtained as:
K4A = K12 K2A K3A ,
K4B = K12 K2B K3B .
(3.42)
The three different reaction scenarios in Fig. 3.6 have the same overall reaction and the
overall equilibrium constant must therefore be identical, i.e., K4 = K4A = K4B . This
is a consequence of a principle known as Independence of the Path and of the GibbsHelmholtz equation. Independence of the path means that the overall change in free
energy is the same regardless of the reaction mechanisms involved in the conversion of
the reactants into the products. It is based on the fact that the total energy of a molecule
depends only on its present state, not on the details of its past history. The GibbsHelmholtz equation then tells us that if two processes that have identical values of
∆G◦ , they will also have identical equilibrium constants (Eq. 3.13). This has important
implications as the parameters associated with two alternative reaction paths will not
be independent. In the case of paths (ii) and (iii) in Fig. 3.6A, the equilibrium constants
are constrained by:
K2A K3A = K2B K3B = K2 K3 .
(3.43)
This constrain simply reflect that the change in Gibbs free energy is the same regardless
of the reaction path taken. Modelers that do not pay attention to such constrains are
likely to make spurious predictions.
Despite the fact that three alternative reaction paths have the same overall equilibrium constant, there can be significant differences in their response functions. The
response function associated with the binding of a tetramer was already derived in
Eq. 3.39. The response function for the formation of a tetramer-operator complex
becomes slightly more complicated when the alternative paths in Fig. 3.6 and more
operator states are included. In this case, the response function is given by g([X]) =
74
Modeling Small Gene Networks
[O11 ]/[OT ] where [OT ] = [O0 ] + [O10 ] + [O01 ] + [O11 ]. The response function can be
derived by using the equilibrium constants K3A , K3B and K4 to express the equilibrium concentrations of O0 , O10 and O01 as functions of [O11 ] and [X]:
[O0 ] =
[O11 ]
,
K4 [X]4
[O10 ] =
[O11 ]
,
K1 K3A [X]2
[O01 ] =
[O11 ]
.
K1 K3B [X]2
(3.44)
The derivation of these relationships uses the definition of K1 to express the concentration of dimers as a function of the concentration of monomers, i.e., [X2 ] = K1 [X]2 .
The equilibrium concentrations for [O0 ], [O10 ] and [O01 ] are then inserted into the expression for the total operator concentration to give:
[OT ] =
[O11 ]
[O11 ]
[O11 ]
+
+
+ [O11 ].
K4 [X]4 K1 K3A [X]2 K1 K3B [X]2
(3.45)
The response function g([X]) is then obtained by rearrangement as:
g([X]) =
K4 [X]4
.
1 + K1 (K2A + K2B )[X]2 + K4 [X]4
(3.46)
Assume that equilibrium constants for the binding of a dimer to OA and OB sites are
identical and that the binding of a second dimer to one of the two sites is independent
of the occupancy of the other. In this special case, the equilibrium constants K2A , K2B ,
K3A and K3B have identical values. When the equilibrium constant for the binding of
a√dimer to any one operator is denoted KAB , it is obtained from Eq. 3.43 that KAB =
K2 K3 (since K2A K3A = K2 K3 ). The definition of K4 in Eq.
√ 3.38 then implies that
the term
√ K1 (K2A + K2B ) in Eq. 3.21 can be replaced by 2 K4 (since 2K1 KAB =
2K1 K2 K3 ). Accordingly, the response function is given by:
gS (x) =
x4
,
1 + 2x2 + x4
(3.47)
p
where x = 4 [X]. This response function is probably the most suitable description of
the modified Gal1 promoter (section 2.2.2) where the TetR repressor dimers is believed
to bind to the two tetO operators independently.
The comparison of the response functions f (x) and gS (x) in Fig. 3.7A, shows that
the response function f (x) is steeper, reaches the 50% level at a lower value of x and
saturates faster than gS (x). Hence, a switch based on the binding of a tetramer (reaction path i) is more sensitive and robust compared to a switch based on the sequential
binding of dimers (reaction paths ii and iii).
3.3.3 Cooperative Binding of Dimers
The Hill plots in Fig. 3.7B demonstrated that a switch that involves sequential and independent binding of dimers functions less efficiently, i.e., has a lower Hill coefficient,
3.3 Modeling cis-Regulatory Systems
Figure 3.7: (A) Response functions r(y) for the occupancy of the O1 operator when the repressor can bind to the two operator half-sites as a tetramer (r(y) = f (x)) or sequentially as dimers
(r(y) = gS (x)). The value of x0.5 is the concentration of repressor monomers that gives 50%
occupancy. The curve h(x) is the Hill curve approximation to gS (x). (B) Hill plots constructed
from the curves in (A). The Hill coefficients for r(y) = f (x) and r(y) = gS (x) are 4 and
≈ 2.3, respectively.
compared to a switch where a tetramer is formed in solution rather than on the DNA.
The Hill coefficient associated with the sequential binding of dimers can however be increased by manipulation of the equilibrium constants. The function g([X]) in Eq. 3.46
coincides with the function f ([X]) in Eq. 3.39 when K2A and K2B becomes vanishingly small. The Hill coefficient can thus be increased from its value of ≈ 2.3 when
all the binding constants are equal to KAB (the function gS (x)) to a maximal value of
4 (the function f (x)) by decreasing the affinity of OA and OB for the binding of the
first dimer. This in turn implies that the binding of a dimer to one site can no longer
be independent of the occupancy of the other site. Since the products K2A K3A and
K2B K3B must be constant (and equal to the product K1 K2 ), a decrease of K2A (or
of K2B ) must be accompanied by a corresponding increase of K3A (or of K3B ). In
other words, a more efficient switch can be obtained when a dimer that is bound to one
site participates in the stabilization of the interaction between a second dimer and its
binding site. Such synergism is observed frequently between cis and trans regulatory
elements and typically leads to improved switching properties. In terms of energetics,
synergism implies that the decrease in free energy associated with the binding of two
dimers to the regulatory region is greater than the sum of the decrease in free energies
associated with the dimer-dimer, the dimer-OA , and the dimer-OB interactions. This
reflects that most of the free energy is released when the tetramer-operator complex is
formed.
The binding of λ CI dimers to the OR1 and OR2 sites in the operator region of the
PR /PRM promoters is one example of how synergy between cis and trans regulatory
elements can improve the performance of a genetic switch. As mentioned above, the
formation of a tetramer-operator complex in the OR region is believed to occur through
75
76
Modeling Small Gene Networks
Figure 3.8: (A) Response functions g(x) (Eq. 3.48) obtained for different values of σ and γ.
The function gS (x) is recovered when σ = γ = 1 (broken curve). The steepness increases and
x0.5 decreases as σ and γ increases. Parameter values: σ = 1, γ = 10 (red curve), σ = 10,
γ = 1 (blue curve), σ = 10, γ = 10 (green curve). (B) Hill plots obtained for the different
values of σ and γ. The Hill coefficients obtained by linear fitting are: nH = 2.3 (broken curve),
nH = 2.8 (red curve), nH = 2.9 (blue curve) and nH = 3.3 (green curve).
the reaction path labeled (ii) in Fig. 3.6. Recall from section 1.4.2 that the first CI
dimers binds preferentially to OR1 (OA in Fig. 3.6) and that the binding of a CI dimer
to OR2 (OB in Fig. 3.6) is dependent on the presence of a CI dimer being bound to OR1.
These observations imply that K2B K2A and that K3A K2A . In other words, a
CI dimer bound to OR1 increases the equilibrium constant for the binding of a CI dimer
to OR2 from a value that is significantly lower than K2A to a value that is significantly
higher than K2A (roughly 10-fold higher). This makes for a more efficient switch that
has a higher Hill coefficient than the switch were the dimers bind independently of each
other.
To quantify the increase in the Hill coefficient, assume that the binding of the first
CI dimer to OR2 is associated with an equilibrium constant that is reduced by a factor
of γ compared with that for OR1, i.e., K2B = γ −1 K2A . With this assumption, the term
K1 (K2A + K2B ) in Eq. 3.21 can be replaced by (1 + γ −1 )K1 K2A . In addition, assume
that the binding constant for the association of a second CI dimer to OR2 is increased by
a factor σ compared to the binding constant for the association of the first dimer to OR1,
2 and the constraint K K
i.e., K3A = σK2A . From K2A K3A = σK2A
√ 2A 3A = K2 K3 ,
√
−1
the term (1 + γ )K1 K2A can be replaced by (1 + γ −1 ) σK1 K2 K3 . In the final
step, the definition of K4 is used to obtain the response function given by:
g(x) =
x4
√
,
1 + (1 + γ −1 ) σ −1 x2 + x4
(3.48)
√
where x = 4 K4 . Constructing the Hill plot using data points in the range 0.1 <
g(x) < 0.9 for different values of γ and σ produce curves that can be fitted reasonably
well to straight lines (Fig. 3.8B). The slope of the line, i.e., the Hill coefficient, obtained
3.3 Modeling cis-Regulatory Systems
77
Figure 3.9: The formation of a CAP-DNA-polymerase complex can occur through three reaction
paths; (i) binding of the RNA polymerase holoenzyme to the promoter when a CAP dimer is
bound to the DNA, (ii) binding of a CAP-polymerase complex and (iii) binding of polymerase
followed by binding of CAP.
for σ = 10 and γ = 10 has a value of 3.3 which is an significant improvement over the
value of 2.3 obtained when the dimers bind independently to OR1 and OR2.
A detailed treatment of the interaction between λ CI and operator elements in the
PR promoter is given by Isaacs et al. and references therein.
3.3.4
Synergism in RNA Polymerase Binding
The recruitment of the RNA polymerase to the promoter of the lactose operon by the
CAP transcription factor (section 1.4.1) can be described by the reaction mechanism
illustrated in Fig. 3.9. In the reaction scheme, there are three alternative paths to the
formation of a closed polymerase-promoter complex. In the reaction path labeled (i), a
CAP dimer binds to its operator site and the RNA polymerase is subsequently recruited
to form the CAP-polymerase-operator complex. The order of CAP and polymerase is
reversed in the reaction path labeled (iii). In reaction path (ii), the CAP protein and
the polymerase form a complex prior to the formation of the CAP-polymerase-operator
complex. Note that the overall reaction scheme is very similar to the one illustrated in
Fig. 3.6. The most significant difference is the substitution of one protein dimer with
the polymerase holoenzyme.
The three different reaction paths in Fig. 3.9 have the same overall reaction. When
A and P are used to denote CAP monomers and polymerase holoenzymes, the overall
reaction can be represented by a fourth-order elementary reaction:
K40
2A + P + O00 O11 ,
K40 =
[O11 ]
,
2
[A] [P ][O00 ]
(3.49)
78
Modeling Small Gene Networks
where O00 denotes operator region with an unoccupied CAP site and an unoccupied
promoter and O11 denotes the state where these sites are occupied.
To derive the response function, it is only necessary to consider a subset of the
reactions since the system is overdetermined. In addition to Eq. 3.49, the intermediate
reaction steps from Fig. 3.9 that need consideration are given by:
K10
2A A2 ,
0
K2A
K20
A2 + O00 O10 ,
P + A2 A2 P,
0
K2B
(3.50)
O00 + P O01 ,
where O10 and O01 indicates an occupied CAP site and an occupied promoter, respectively, and A2 P is the CAP-polymerase complex. The equilibrium constants for these
reactions are given by:
[O10 ]
[A2 ]
K20 =
,
[A]2
[A2 ][O00 ]
[A2 P ]
[O01 ]
0
=
, K2B
=
.
[A2 ][P ]
[P ][O00 ]
K10 =
0
K2A
(3.51)
Before proceeding with the derivation of the relative promoter occupancy, it is
worth commenting on the reaction path (ii) in which the activator forms a complex
with the RNA polymerase holoenzyme prior to the binding to the promoter. At equilibrium, the fraction of polymerases that are bound with activator, [A2 P ]/([P ] + [A2 P ])
is given by:
0 [A ]
K2A
[A2 P ]
2
=
0 [A ] .
[P ] + [A2 P ]
1 + K2A
2
(3.52)
0 has any appreciable value, a significant fraction of the polymerases will have a
If K2A
CAP dimer attached and the CAP-polymerase could be viewed as a type of holoenzyme
in which the CAP component provides specificity to promoters with an adjacent CAP
binding site. Holoenzymes that contain an additional activator have, to the best of
my knowledge, not been reported in the literature. As a result, the value of K2A will
be assumed to be negligibly small corresponding to a small negative (or perhaps a
positive) value of ∆G◦ for the interaction between CAP and the holoenzyme. The
difficulties in pinpointing the exact constituents of the eukaryotic RNA polymerase
II holoenzyme could be an indication that the staged assembly of the pre-initiation
complex (sections 1.3.2 and 1.4.3) might involve a number of alternative reaction paths.
0
When K2A
is set equal to zero and the concentration of free RNA polymerase
holoenzyme is assumed constant, [P ] = cP , the relative occupancy of the promoter
can be obtained from ([O01 ] + [O11 ])/[OT ] with [OT ] = [O00 ] + [O10 ] + [O01 ] + [O11 ],
as:
h(a) =
σB + a2
,
1 + σB + (1 + γB )a2
(3.53)
3.3 Modeling cis-Regulatory Systems
79
Figure 3.10: (A) Relative promoter occupancies predicted by Eq. 3.53 for different values of
σB and γB . Parameter values: σB = 0, γB = 0 (green curve), σB = 0, γB = 0.5 (blue curve),
σB = 0.5, γB = 0 (red curve), σB = 0.5, γB = 0.5 (broken curve). (B) Hill plots of the
different parameter values with h∗ = σB /(1 + σB ). Only curves obtained for low values of σB
can be approximated well by a Hill-type function.
where the new parameters σB and γB are defined by:
0
σB = K2B
cP ,
γB =
K10 K20
1
= 0 .
0
K4 cP
K3 cP
(3.54)
Different relative occupancies for different values of σB and of γB are shown in
Fig. 3.10A. The most efficient switch is obtained in the limit where σB and γB are negligibly small. The Hill plot generated by h(a) shows straight lines with a slope of two
when σB = 0 (Fig. 3.10B). For σB > 0, a linear fit to the curves in the Hill plot gives
lines with slopes that are less than two. Moreover, high values of h(a) requires that
γB is low (Fig. 3.10A). These observations have clear biological interpretations (see
0 (and hence σ ) is high, the polymerase will bind to the
Fig. 3.9). If the value of K2B
B
promoter in the absence of the activator, h(a = 0) = σB /(1 + σB ). This gives rise to
“leaky” expression and a decreased Hill coefficient. If K30 is low (corresponding to a
high value of γB ), the activator is ineffective and high occupancies cannot be achieved.
Taken together with the previous argument that K2A is low, the most efficient switch is
obtained under the following conditions; (1) the interaction between the activator and
the polymerase must be weak, (2) the interaction between the polymerase and the promoter must be weak, and (3) the interaction between the polymerase and the promoter
with CAP bound must be strong. In other words, a more efficient switch is obtained
when the activator and the promoter operate synergistically.
3.3.5
DNA looping
Synergism implies cooperative interactions between multiple components and typically
manifests as an increase of the Hill coefficient. The two previous sections demonstrated
80
Modeling Small Gene Networks
this phenomenon for the binding of λ CI dimers to the OR region and in the CAPdependent binding of the RNA polymerase, respectively. In section 1.4.1, it was discussed how the auxiliary operators O2 and O3 affect repression of the lactose operon by
forming a DNA loop in which a repressor tetramer is bound to O1 and to any one of the
auxiliary operators. One explanation for the increased efficiency of the LacR tetramer is
that the looped DNA acts as a physical barrier that prevents the RNA polymerase from
finding the promoter region. Another possible explanation is that the DNA-looping,
particularly the loop formed between O1 and O2, prevents the RNA polymerase from
moving down the gene after it successfully has initiated transcription from the Plac promoter. However, the observed increase in repression may also be a consequence, at
least in part, of a change in the equilibrium distribution of promoter occupancies due to
the additional repressed states that are possible when one of the auxiliary operators can
cooperate with the main operator, without necessarily acting in synergy.
The derivation of the response function becomes quite messy when the system contains a large number of binding sites. There are 23 = 8 possible states of the Plac
promoter that do not involve the formation of a loop and each of the three different
loops (O1-O2, O1-O3 and O2-O3) can have the third operator either occupied or unoccupied by the repressor. This gives a total of 14 possible states. For simplicity, what
follows will assume the presence of only one auxiliary operator site, O3. However, the
method used to derive the response function in this simpler case can readily be extended
to a system that contains an arbitrary number of binding sites. It is advisable to employ
software capable of symbolic algebra when deriving the response functions of larger
systems.
The occupancy of the O1 and O3 sites can be described using the notation Oij
where i is the number of repressor molecules bound to the main operator and j is the
number of repressor molecules bound to the auxiliary operator. In the absence of DNAlooping, the distance between operator sites makes it reasonable to assume that the
binding of the repressor to one site is independent of the occupancy of the other. The
binding reactions can then be described by:
K3
O00 + X4 O10 ,
K3
O01 + X4 O11 ,
σK3
O00 + X4 O01 ,
σK3
O10 + X4 O11 ,
(3.55)
(3.56)
where the equilibrium constant for the binding of the repressor to the auxiliary operator is expressed relatively to that for the main operator site. The auxiliary operator is
typically weaker than the main operator, i.e., σ < 1. The equilibrium concentrations of
the states O01 , O10 and O00 are accordingly given by:
[O11 ]
[O11 ]
, [O10 ] =
,
K3 [X4 ]
σK3 [X4 ]
[O10 ]
[O11 ]
[O00 ] =
=
.
K3 [X4 ]
σK32 [X4 ]2
[O01 ] =
(3.57)
3.3 Modeling cis-Regulatory Systems
81
The formation of a DNA loop C can occur through two reaction paths; a repressor
molecule bound to the main operator can make contact with the auxiliary operator, or
vice-versa. The corresponding reactions are:
KL
O10 C,
0
KL
O01 C.
(3.58)
The equilibrium concentrations of O10 and of O01 in Eq. 3.57 can be used to derive
two different expressions for the equilibrium concentration of C in terms of [O11 ]. One
involves the equilibrium constant KL , the other the equilibrium constant KL0 . These
expressions are given by:
[C]
⇒ [C] = KL [O10 ] =
[O10 ]
[C]
KL0 =
⇒ [C] = KL0 [O01 ] =
[O01 ]
KL =
KL [O11 ]
,
σK3 [X4 ]
KL0 [O11 ]
.
K3 [X4 ]
(3.59)
(3.60)
Since the equilibrium concentration is independent of the chosen path, this implies that
KL = σKL0 . This constrain is not necessary to impose (since the system is overdetermined).
The relative occupancy of the main operator can be obtained from the definitions
of the equilibrium constants. The total concentration of DNA molecules that carry the
regulatory region is given by:
[OT ] = [O00 ] + [O10 ] + [O01 ] + [O11 ] + [C],
(3.61)
which, based on the equilibrium concentrations, can be used to express [O11 ] as a function of [X4 ] and [OT ]:
[O11 ] =
σK32 [X4 ]2 [OT ]
.
1 + (1 + σ + KL )K3 X4 + σK32 [X4 ]2
(3.62)
The equilibrium concentrations of O10 and C can now be used to obtain the total concentration [RT ] of states in which the main operator is occupied and the promoter is
repressed:
1
KL
[RT ] = [O11 ] + [O10 ] + [C] = [O11 ] 1 +
+
=
σK3 [X4 ] σK3 [X4 ]
(1 + KL )K3 [X4 ] + σK32 [X4 ]2
= [OT ]
.
(3.63)
1 + (1 + σ + KL )K3 [X4 ] + σK32 [X4 ]2
The response function f ([X]) = [RT ]/[OT ] is then obtained from the equilibrium
concentration of X4 ([X4 ] = K12 K2 [X]4 ) and the definition of K4 (K4 = K12 K2 K3 )
as:
f (x) =
(1 + KL + σx4 )x4
,
(1 + σx4 ) + (1 + KL + σx4 )x4
(3.64)
82
Modeling Small Gene Networks
Figure 3.11: (A) Response functions in the absence (green curve) and presence (blue and red
curves) of DNA looping. Parameter values KL = 0 (green curve), KL = 10, σB = 0 (blue
curve), KL = 10, σB = 1 (red curve). (B) Hill plots demonstrating that the Hill coefficients are
not affected by DNA looping. The slopes of the straight lines for KL = 10 (blue) and KL = 0
(green) are identical and equal to 4. Increased affinity for the auxiliary operator, i.e., increased
σ, decreases the Hill coefficient. The value of nH is 3.3 for σB = 1, corresponding to equal
binding affinities of O1 and O2.
√
where x is the normalized concentration of repressor monomers defined by x = 4 K4 [X].
There are some interesting observations that can be made based on the response
function in Eq. 3.64. First of all, the response function:
f (x) =
x4
,
1 + x4
(3.65)
associated with the binding to a single operator (Eq. 3.39), is recovered when the DNA
loop is unable to form, i.e., when KL = 0 (the terms 1 + σx4 and 1 + KL + σx4 in
Eq. 3.64 cancel each other when KL = 0). This is due to the independent binding of
the repressor to the two operator sites, i.e., it is due to the lack of cooperativity in the
system. Figure 3.11A shows plots of f (x) obtained for different values of σ and KL .
From these plots it appears as if the response becomes more nonlinear as KL increases.
However, this is a deception introduced by a shift of the monomer concentration x0.5
that gives 50% saturation to a lower value. This becomes apparent when Hill plots are
constructed with the data points also used to draw the curves in Fig. 3.11A. As seen in
Fig. 3.11B, the curves are linear in the Hill plot and have identical slopes but different
intersects. This can also be obtained directly from Eq. 3.64. When σ is small and
the term σx4 is negligible, the response function can be rearranged into a form that
coincides with the standard form of the Hill curve (Eq. 3.21):
f (x) ≈
(1 + KL )x4
x4
=
,
1 + (1 + KL )x4
KH + x4
(3.66)
where KH = (1 + KL )−1 . In other words, the Hill constant (and the value of x0.5 )
decreases as KL increases. The Hill coefficient is independent of KL . The only signifi-
3.4 Models of Gene Regulatory Systems
cant deviation from the standard form of the Hill curve occurs when σ is relatively large
and the binding affinity for the auxiliary operator is close to that of the main operator.
In this case, however, the Hill coefficient is decreased (Fig. 3.11B).
The discussion above has demonstrated that DNA looping does not increase the
Hill coefficient, which is the usual measure of cooperativity. Rather, the formation
of the DNA loop state C causes a shift in the equilibrium distribution of the different
repressor-operator states and results in decreased value of the Hill constant. Because an
increase in the concentration of C necessarily is associated with a decrease in the concentration of all of the other states, the ability to form a DNA loop shifts the equilibrium
distribution toward the states in which the main operator is occupied by a repressor. As
a result, the higher the stability of the DNA loop complex, i.e., of KL , the lower is the
repressor concentration that is required to achieve the same level of occupancy of the
main operator. This has in the literature been explained in terms of a local increase in
the concentration of the repressor.
Indeed, Eq. 3.65 is recovered from Eq. 3.66 when
√
4
introducing the rescaling y = 1 + KL x where y is the “local” or effective repressor
concentration. The effective concentration is always greater than the actual concentration if the loop structure is able to form, i.e., when KL > 0. An alternative treatment
of DNA looping is given by Vilar & Leibler.
3.4
3.4.1
Models of Gene Regulatory Systems
The Lactose Operon in E. coli
The Plac promoter of the lactose operon in E. coli depends on two distinct input signals;
the activity of the CAP transcriptional activator and the activity of the LacR transcriptional repressor. Both CAP and LacR are naturally present in E. coli and their activity
is regulated by inducers. CAP is activated in the presence of cAMP and LacR is inhibited by allolactose (or by the artificial inducer IPTG). High cAMP usually signals the
absence of glucose and the purpose of CAP is to activate genes that are required for
the bacterium to utilize alternative sources of energy. On the other hand, the purpose
of LacR is to prevent the expression of the lactose operon genes when lactose is not
available. The Plac promoter thus has the computational logic of an AND NOT gate.
Expression of the lactose operon genes is high in the presence of lactose AND NOT
glucose. Since intracellular cAMP concentrations are inversely correlated to the concentration of glucose, the Plac has computational logic corresponding to an AND gate
in terms of the signals cAMP and IPTG. Expression is high when the concentrations
of IPTG AND of cAMP are high. A recent study by Setty et al. investigated the AND
gate operation of the Plac promoter at different extracellular concentrations of cAMP
and IPTG. They measured transcription of the gfp gene encoding green fluorescent protein (GFP) from a Plac promoter and correlated the population-averaged fluorescence
with a model of Plac cis-regulatory dynamics.
The Plac promoter activity was measured using an engineered plasmid system. A
83
84
Modeling Small Gene Networks
Figure 3.12: (A) Experimental measurements of Plac activity under 96 different combinations
of inducer concentrations. (B) Experimental data points presented as a smoothened surface.
c
Modified from Setty et al. without permission The
National Academy of Sciences.
232 bp region that extents 130 bp into the lacZ gene and include most of the cisregulatory elements of the wild type lactose operon and extents 130 bp into the lacZ
gene. Recall (section 1.4.1) that the main lacO operator O1 is centered at position +9.
It is therefore important that some part of the lacZ gene (and its 5’ UTR) is included.
The 232 bp fragment, which lacks the auxiliary O2 operator, was fused to the gfpmut2
gene that encodes a variant green fluorescent protein. This artificial reporter system
was transformed into E. coli using a low-copy plasmid carrying the SC101 origin of
replication and the gene that endows kanamycin resistance to the host cell. The experiments were carried out using a 96-well microplate containing all possible combinations
of 8 and 12 different concentrations of IPTG and cAMP. Microplate measurements are
a convenient and rapid way to obtain large amounts of data as they allow simultaneous
detection of population-averaged expression in a variety of conditions and over an extended period of time. The measurement time is limited by the time it takes for cells
to reach a critical density where they will stop dividing at regular intervals and enter
a stationary growth phase. Cell density is usually measured as the absorbance of light
at 600 nm as it passes though a cell suspension 10 mm in depth and is reported in
units of optical density (OD). In the Setty et al. experiment, GFP fluorescence at 535
nm was measured over two cell division cycles, i.e., the time it takes the cell density
to increase by a factor of four, during mid-exponential growth. The promoter activity was determined by the change in GFP fluorescence normalized by the cell density
(d[GFP]/dt/OD600 ).
Expression activity from the Plac promoter measured for 96 different combinations
of inducer concentrations is shown in Fig. 3.12A. In Fig. 3.12B the experimental data
is represented as smoothened surface. This representation allows for the identification
of four distinct regions where the promoter activity is roughly the same. The promoter
activity is low at low concentrations of IPTG and cAMP (plateau I) and high when the
concentrations of both the inducers are high (plateau II). This is what is expected from
3.4 Models of Gene Regulatory Systems
the AND operation of the Plac promoter. The presence of two additional plateaus (III
and IV) demonstrates that the Plac promoter does not operate as a perfect AND gate.
At high concentrations of IPTG and low concentrations of cAMP, the promoter activity
reaches nearly 50% of its maximal value (plateau III). This considerable level of expression in the absence of cAMP indicates that the CAP transcriptional activator is not
required for transcription of the lac operon genes, i.e., that the expression is “leaky”
(see section 3.3.4). An significant increase in promoter activity (to about 20% of maximal) is also observed when the concentration of cAMP is high and the concentration
of IPTG is low (plateau IV). There a several possible explanations for the presence of
this plateau. Perhaps the presence of CAP increases the rate of transcription when the
polymerase and the repressor are bound at the same time. It is also possible that CAP
and LacR are mutually exclusive such that the occupancy of the O1 operator decreases
at increasing concentrations of cAMP. A third possibility is that the polymerase and
LacR are mutually exclusive. Since CAP-cAMP may increase the affinity of the polymerase for the promoter, an increased cAMP concentration could shift the equilibrium
distribution toward the state where the CAP site and the promoter are occupied and O1
is unoccupied.
The cis-regulatory dynamics of the Plac -gfp fusion system involves four different
control elements; the Plac promoter where the RNA polymerase binds, the O1 and O3
operators where LacR binds and the CAP site where the activated CAP-cAMP binds.
The third LacR operator O2 lies within the part of the lacZ gene that is replaced by the
gfp gene. The cis-regulatory state can thus be described by a binary string of length
four where each entry is one or zero depending on the occupancy of the corresponding
site. This gives a total of 24 = 16 configurations. In addition to these states, the LacR
tetramer can facilitate the formation of a DNA loop by simultaneously binding to O1
and O3. This gives rise to an additional four states as CAP and the polymerase in
principle are able to bind to their respective sites even when a DNA loop is formed.
While the probability that CAP-cAMP and the polymerase actually binds to the DNA
in its looped conformation may be very low, these additional states might be considered
in a comprehensive analysis.
The number of configurations of the Plac regulatory region can be reduced further
when it is assumed that the binding of LacR to O3 has marginal effect on the occupancy
of the promoter. The justification for such an approximation is that DNA looping can be
accounted by rescaling the effective LacR concentration (see section 3.3.5). Moreover,
the O3 site is located at position -83 and might not interfere directly with the binding of
the polymerase holoenzyme to the promoter. Interference with CAP is a possibility that
can not be ruled out. However, the O3 site has a binding affinity that is significantly
lower than that of the O1 site and O3 will probably only be occupied to a significant
extent at very high concentrations of LacR. With these assumptions, the configuration
of the cis-regulatory region can be described by a binary vector of length three. Figure 3.13 illustrates the 8 possible configurations of the cis-regulatory region and the
transitions between them. The different states are symbolized by the variables Oijk
85
86
Modeling Small Gene Networks
Figure 3.13: Model of cis-regulatory dynamics of the Plac promoter. The the index of Oijk
gives the occupancy of the CAP site (i), the O1 operator (j) and the promoter (k). Black
arrows indicates reversible reactions. Grey arrow indicates the return to configuration Oij0
after transcription initiation from configuration Oij1 .
where the i, j and k are either zero or one and denotes the occupancy of the CAP-site,
the O1 operator and the promoter, respectively.
Transcription of the gfp gene can be initiated from any one of the configurations
where the promoter is occupied, i.e., from the states for states Oij1 , i = 0, 1, j = 0.1.
This gives rise to “basal” or “leaky” transcription in the absence of an activator or
at saturating concentrations of a repressor. In an ideal switch, there would only be
transcription from the configuration O101 because the polymerase would be unable to
occupy the promoter in the absence of the activator or when the repressor is bound. In
other words, the configurations O001 , O011 and O111 would not exist. The observation
of basal transcription indicates that (1) binding of the polymerase to the promoter can
occur without the assistance of the activator and (2) the RNA polymerase can bind to the
the promoter and initiate transcription even when the repressor is bound to its operator.1
The rate constant associated with transcription initiation from the Oij1 configuration is denoted by αij κM where κM is the maximal rate of transcription. This is
done because the rate of transcription initiation might be affected by the presence of
CAP-cAMP or LacR. CAP-cAMP could in principle affect the rate of open complex
formation or the transition to an elongating complex and LacR could in principle repress transcription by preventing transcription elongation and/or the formation of the
open complex. It is expected that the rate of transcription is maximal from the O101
configuration and that this state is associated the maximal value of κM , i.e., α10 = 1.
In order to calculate the relative occupancy of the different configurations in the
quasi-steady state it is only necessary to consider a total of seven reactions (since the
1
The model presented here is slightly different from that presented by Setty et al., which assumes that
leaky transcription occurs from three configurations S, SC and SR corresponding to O000 , O100 and
O010 in the present representation, respectively. Moreover, the states O011 , O101 and O111 are not
present in the Setty et al. model.
3.4 Models of Gene Regulatory Systems
87
system is overdetermined). Three of these reactions are equilibrium reactions while the
remaining four are Michaelis-Menten reactions. These reactions are shown in Fig. 3.13
where A, R and P are used to denote CAP-cAMP, LacR and the RNA polymerase,
respectively. The equilibrium constants KA , KR and KRA for the reversible binding of
A and R are defined by:
KA =
[O100 ]
,
[O000 ][A]
KR =
[O010 ]
,
[O000 ][R]
KAR =
[O110 ]
≡ γKA ,
[O010 ][A]
(3.67)
where the subscript eq has been omitted for clarity and γ is defined by KAR /KA .
The model thus assumes (Occam’s razor) that the binding of CAP dimers and LacR
tetramers to their respective binding sites are appropriately described by second-order
elementary reactions. The constants Kij associated with the Michaelis-Menten type
reactions are defined by:
Kij =
kb,ij
kf,ij
[Oij1 ]
=
≡ σij K10
+ αij κM
[Oij0 ][P ]
(3.68)
where σij is defined by Kij /K10 .
The concentration of the CAP-cAMP transcriptional activator A and the LacR transcriptional repressor R depends on the concentration of the inducers cAMP and IPTG.
This dependency is approximated by Hill-type functions:
[A]
[cAM P ]n
= n
,
[AT ]
KcAM P + [cAM P ]n
m
KIP
[R]
TG
R =
= m
,
[RT ]
KIP T G + [IP T G]m
A=
(3.69)
where [AT ] and [RT ] are the total concentrations of CAP dimers and LacR tetramers,
respectively, while KcAM P and KIP T G are the extracellular concentrations of cAMP
and IPTG that give 50% activity of CAP (A = 0.5) and LacR (R = 0.5), respectively.
This approximation ignores all the molecular details of how the intracellular concentrations of cAMP and IPTG are regulated. In this context, it is worth mentioning recent
studies by Yildirim and Mackey and by Vilar et al. who investigated models of the
feedback mechanism that governs the LacY-mediated import of lactose, its conversion
by LacZ into allolactose and subsequent up-regulation of lacZ and lacY expression by
the suppression of LacR activity by allolactose.
In the quasi-stationary state, the relative occupancies of the different cis-regulatory
configurations can be expressed in terms of the dimensionless input signals A = [A]/[AT ]
and R = [R]/[RT ] by using the definitions of the equilibrium constants in Eqs. 3.67
and 3.68:
[O100 ] = a[O]A,
[O001 ] = σ00 c[O],
[O010 ] =b[O]R ,
[O101 ] =ca[O]A,
[O110 ] = abγ[O]AR ,
[O011 ] = σ01 cb[O]R ,
[O111 ] =σ11 cabγ[O]AR ,
(3.70)
88
Modeling Small Gene Networks
Figure 3.14: Promoter activity predicted by the model (Eq. 3.72 with γ = 0) after a fit to the
experimental data. Best fit parameters are from Setty et al..
where a = KA [AT ], b = KR [RT ], c = K10 [P ] and [O] = [O000 ].
Introducing the new parameters βij = αij σij , the rate of transcription can be obtained as:
P
i,j αij [Oij1 ]
f (A, R ) = P
(3.71)
i,j,k [Oijk ]
β00 c + caA + β01 cbR + β11 abγAR
=
.
1 + σ00 c + (1 + c)aA + (1 + σ01 c)bR + (1 + σ11 c)abγAR
By consolidating the various biological parameters into seven parameters V1 , . . . , V7
the promoter activity can be expressed as:
f (A, R ) = V1
1 + V2 A + V3 R + γV6 AR
,
1 + V4 A + V5 R + γV7 AR
(3.72)
where V1 = β00 c/(1 + σ00 c), V2 = a/β00 , V3 = β01 b/β00 , V4 = (1 + c)a/(1 + σ00 c),
V5 = (1 + σ01 c)b/(1 + σ00 c), V6 = β11 ab/β00 c and V7 = (1 + σ11 c)ab/(1 + σ00 c).
Despite the differences in the derivation, Eq. 3.72 coincides with that presented by Setty
et al. when γ = 0. Having non-zero values of γ does not change the qualitative shape
of the surface f (A, R ).
The model in Eq. 3.72 fully capable of capturing the details of the corresponding
experimental plot, including the plateaus observed in the absence of cAMP and at high
concentration of IPTG. This is illustrated in Fig 3.14 which shows the result of a fit
to the experimental data using ten model model parameters (with γ = 0). The performance of the AND gate, i.e., suppression of the expression plateaus observed in the
absence of IPTG or cAMP, could be enhanced, for instance, by increasing the equilibrium constant KR for the repressor binding. This would increase the parameters V3 and
V5 and shift the off-diagonal expression plateaus III and IV to higher concentrations of
IPTG and cAMP, respectively. The complete elimination of the plateaus would require
3.4 Models of Gene Regulatory Systems
Figure 3.15: Model of the cis-regulatory configurations and transcription initiation from the
0
Gal1 promoter. ATc is assumed to alter the effective rate of TetR binding kRf = kRf
[R] while
galactose is assumed to affect the effective rate of activation kAf and/or deactivation kAb
that the binding of the polymerase can only occur when cAMP-CAP is bound to it binding site, i.e., σ00 = β00 = 0 and that the repressor and the polymerase are mutually
exclusive, i.e., σj1 = βj1 = 0.
3.4.2
The Galactose Regulon in S. cerevisiae
The expression of GFP from the modified, TetR-repressible Gal1 promoter discussed in
section 2.2.2 can be modeled using the same set of equations that was used to model the
Plac promoter in section 3.4.1. However, due to the low basal expression in the absence
of galactose or ATc, it is not necessary to include the states O001 , O011 and O111 . A
minimal model of cis-regulation of the TetR-repressible Gal1 promoter thus incorporates five distinct configurations (Fig. 3.15). It is noted that this model is more abstract
than the model of the Plac promoter where each cis-regulatory state corresponds to the
occupancy of a single DNA binding site. The Gal1 promoter contains multiple binding
sites and the five different configurations corresponds to the states where inactive Gal4
is bound to the UASG (O000 ), an intermediate state where activated Gal4 has recruited
the transcriptional activators SAGA, Mediator and TBP/TFIID (O100 ), the pre-initiation
complex where the polymerase holoenzyme is bound (O101 ) and the repressed states
where TetR is bound in the presence (O001 ) and absence (O101 ) of SAGA, Mediator
and TBP. The actual assembly of the pre-initiation complex is far more complicated
than depicted in Fig. 3.15 (see section 1.3.2) and additional steps, for instance the independent recruitment of SAGA, Mediator and TBP/TFIID, could be incorporated in a
more comprehensive analysis.
89
90
Modeling Small Gene Networks
Since the transitions between the different cis-regulatory configurations for the
modified Gal1 promoter follow the general reaction scheme in Fig. 3.13 the rate of
transcription can be obtained directly from Eq. 3.71 by setting βij = σij = 0 as:
f (A, R ) =
caA
,
1 + a(1 + c)A + abγAR + bR
(3.73)
where A and R give the relative level of activation by galactose and ATc, respectively.
The input A changes from zero (inactive) to one (full induction) as the extracellular
concentration of galactose changes from 0 to 2% w/v while R changes from one (full
repression) to zero (full induction) as the extracellular concentration of ATc changes
from 0 to 500 ng/ml. Recall that the steady state number of proteins per cell can be
approximated by Eq. 3.33 and is proportional to f (A, R ). Therefore, the relative GFP
fluorescence signal measured from single cells by flow cytometry can be correlated
directly with the normalized response r(A, R ) obtained by dividing Eq. 3.73 with the
maximal value f ∗ of f (A = 1, R = 0) = ca/(1 + a + ac):
r(A, R ) =
(1 + a + ac)A
.
1 + a(1 + c)A + γabAR + bR
(3.74)
In order to fit the response function r(A, R ) to experimental data it is necessary
to specify how the relative activity levels A and R depends on the extracellular concentration of galactose cgal and ATc cAT c , respectively. The dependence between the
activity of Gal4, denoted by agal4 , and the extracellular concentration of galactose is
expected to be quite complicated as galactose import is highly regulated and its presence mediated through a series of protein-protein interactions (between Gal3, Gal80
and Gal4). As a first approximation, it is assumed that the dependence can be capture
by a Hill-type function:
A(cgal ) =
cngal
agal4
=
,
a∗gal4
Kgal + cngal
(3.75)
where ∗ denotes maximal activity, Kgal is the Hill constant and n is the Hill coefficient.
It is further assumed that the activation step from O0j0 to O1j0 behave as if it was
a second-order binding reaction. This assumption is not critical and is invoked for
simplicity.
When the concentration of galactose is varied at saturating amounts of ATc, i.e.,
R = 0, Eq. 3.74 becomes:
r(A, R = 0) =
(1 + a + ac)A
,
1 + a(1 + c)A
(3.76)
which can be rewritten in the form of a standard Hill-curve:
r(cgal ) =
cngal
KH1 +
,
cngal
KH1 =
Kgal
.
1 + a + ac
(3.77)
3.4 Models of Gene Regulatory Systems
91
Using the Hill plot method, the experimental data obtained in section 2.2.2 for galactose
induction at 500 ng/ml ATc reveals that the Hill coefficient in Eq. 3.77 is 2.0 and the
Hill constant KH1 is equal to 0.06 (% w/v)2 . This corresponds to 50% transcriptional
efficiency at 0.24 % v/w galactose. Figure 3.16A shows the good agreement between
the experimental data points and Eq. 3.76 when KH1 is equal to 0.06 by setting Kgal =
0.7 (% w/v)2 , a = 2 and c = 4.5. There are, of course, many other combinations of the
parameters that can give rise to this particular value of KH1 . Additional experiments
are required to extract this information.
The binding of TetR dimers (R) to the tetO operators can by described as a single
reaction step:
kRf
Oi00 → Oi10 ,
(3.78)
where the pseudo-first order rate constant kRf is a function of the number nR of active
repressor dimers per cell. In the framework of generalized mass action (GMA) the
0 nm1 , where m is the GMA exponent for the
value of kRf is given by kRf = kRf
1
R
TetR dimer binding reaction. Note that this model differs from that for the Plac system
where it was assumed that the binding of activator and repressors to the DNA obey
second-order reaction kinetics (m1 = 1).
The presence of ATc causes a titration of active TetR dimers and the formation of
an inert form (T ) that has a significantly reduced affinity for the tetO operators. The
equilibrium constant KAT c for this reaction is in the frame of GMA given by:
KAT c =
nT
,
2
nR cm
AT c
(3.79)
where nT is the number of inactive TetR repressors and m2 is the Hill coefficient associated with the binding of ATc to the TetR dimers. The number of dimers in the active
conformation is thus given by:
nR =
ntot
,
2
1 + KAT c cm
AT c
(3.80)
where ntot is the total number of TetR dimers per cell. With these approximations, The
pseudo-first order rate constant kRf is given by:
kRf =
0 nm1
kRf
tot
2
m1
(1 + KAT c cm
AT c )
,
(3.81)
and R can be expressed as:
−m1
2
R (cAT c ) = (1 + KAT c cm
,
AT c )
(3.82)
0 nm1 /k .
when the dimensionless equilibrium constant b is redefined as b = kRf
Rb
T
92
Modeling Small Gene Networks
Figure 3.16: (A) Comparison of fitted induction curves (broken lines) with experimental data
for induction of transcription from the modified Gal1 promoter with galactose (blue points) or
ATc (red points). (B) Response function r(A, R ) predicted based on fit to experimental data for
=1 (ATc induction) and R = 0 (galactose induction).
For full galactose induction, A = 1, the response function Eq. 3.74 is given by:
r(A = 1, R ) =
(1 + a + ac)
,
1 + a + ac + b(1 + γa)R
(3.83)
which, following insertion of Eq. 3.82 and rearrangement, becomes:
r(cAT c ) =
1 m2
m1
cm
(1 + KAT c cm2
AT c
AT c )
≈
,
1 m2
0 + (1 + K
m2 m1
KH2 + cm
KH2
AT c cAT c )
AT c
(3.84)
0
0 /K m1 . The approximation
where KH2
= b(γ + 1)/(1 + a + ac) and KH2 = KH2
AT c
that allows the response function r(cAT c ) to be written as a Hill-type function requires a
0 to ensure that there is no significant response unless K
m2
high value of KH2
AT c cAT c 1.
This in turn implies that KAT c must not be too small. Using the Hill plot method
with the experimental data for ATc induction at 2% galactose (section 2.2.2) gives a
value of m1 m2 equal to 8.0 and a value of KH2 equal to 2 × 1012 (ng/ml)8 . This
corresponds to 50% transcriptional efficiency when the system is induced with 34 ng/ml
ATc. Figure 3.16A shows the good agreement between the experimental data points and
the induction curve predicted by Eq. 3.83 when a value of KH2 = 2 × 1012 (ng/ml)8
is obtained by setting m1 = 4, m2 = 2, a = 2.0, b = 2 × 106 , c = 4.5, γ = 0.0125
and KAT c = 10−4 . Similar to the case of galactose induction (Eq. 3.76) there are many
combinations of the parameter that can give the correct Hill coefficient and Hill constant
for the induction with ATc. Additional experiments are required to obtain these values.
A critical test of the five state model of the cis-regulatory region of Gal1 would be
to compare different combinations of the input signals galactose (A) and ATc (R ) with
the surface r(A, R ) (Eq. 3.74) shown in Fig. 3.16B. This surface predicted from data
obtained at full induction with ATc (R = 0, Eq. 3.76) and full induction with galactose
3.5 Models of Engineered Gene Networks
93
(A = 1, Eq. 3.83) and it would be interesting to investigate to what extent it is possible
to predict the outcome of all input combinations based on a limited set of conditions.
3.5
Models of Engineered Gene Networks
The promoters used to construct the bacterial toggle switch and ring oscillator are active
in the absence of a repressor and can be described using a four-state configuration space
Oij where i gives the occupancy of the promoter and j indicates whether or not the
repressor is bound to the promoter. The expression level can be derived by applying
the method used in section 3.4.2 for the more complicated case where transcription is
regulated by two effector molecules and their inducers. When the rate of expression
from O1j is denoted by αj the response function is given by:
f (nR , R ) =
α0 c + α1 σcKR nnR R n
1 + c + (1 + σc)KR nnR R n
(3.85)
where nR is the number of repressor protein molecules, σ = K01 /K00 , c = K00 [P ]
and R describes the modulation of the activity of the repressor protein by inducers and
other environmental factors. Note that the case n = 1 corresponds to second-order
binding of the repressor to its operator. It is typically assumed that nR is the number
(or the concentration) of repressor monomers and n will in this case correspond to the
GMA exponent associated with the overall binding reaction of repressor monomers to
the operator. In this case, however, the exponents associated with the R term may not
be equal to n.
Another way to obtain essentially the same result is to assume that the equilibrium
constant associated with repressor multimerization is so large that the number of repressor monomers present within each cell is negligible. In this case, it is valid to set
0 nm where n
KR nnR ≈ KR
tot is the number of repressor monomers. Introducing the
tot
dimensionless variable x, the expression level is given by:
f (x, R ) =
α + βxm R n
,
1 + xm R n
(3.86)
where α and β are the relative rates of expression from O01 and O11 , respectively, and
x is defined by:
r
m 1 + σc
x=
KR nR .
(3.87)
1+c
Functions of the type in Eq. 3.86 has been used to model the engineered bacterial
toggle switch and the ring oscillator. The model of the former (see Gardner et al.) is
given by:
du
αu
=
− γu,
dt
1 + vn
dv
αv
=
− γv,
dt
1 + un
(3.88)
94
Modeling Small Gene Networks
where it is assumed that there is no leaky expression from the promoters. The ring
oscillator (see Elowitz and Leibler) is modeled by the equations:
α
dmi
=
+ α0 − mi ,
dt
1 + pni−1
dpi
= βmi − βpi ,
dt
(3.89)
where mi denotes mRNA of lacI, tetR and cI and pi denotes the protein level of LacR,
TetR and λ CI. Note that leaky transcription in this case is modeled as an additive term.
The models of the systems level behavior of the switch and the oscillator can be analyzed by standard techniques and provide coarse, but reasonably accurate descriptions
of the system dynamic in vivo.
3.6
Concluding Remarks
The purpose of this tutorial has been to provide a broad introduction to the regulatory dynamics in engineered gene networks and the physico-chemical principles and
methodologies that can be used to link molecular-level qualitative models to quantitative systems level descriptions. The recurrent theme is the use of polynomial functions
to describe different levels of organization and how these functions arise from the basic
laws of physics and chemistry. At the most fundamental level, networks of interacting
biochemical species, e.g., proteins and the small molecules that affect their activity,
are described in terms of elementary reactions or reaction schemes that describe the
molecular interactions with reasonable accuracy. The laws of chemistry and simplifying assumptions are then applied to arrive at a functional relationship between the
fraction of molecules in a particular state.
Once a suitable representation of the biochemical reaction network is obtained,
the predicted protein activities are used as input signals to a network of cis-regulatory
states. A series of simplifying assumptions, noticeably quasi-steady state and preequilibrium approximations, can then be used to represent the cis-regulatory system
in terms of a polynomial function that describe how the average rate of transcription
depends on the inputs to the regulatory biochemical reaction networks. At the final
step, such simplified representation of individual modules can be combined to provide
a quantitative systems level description of networks of interconnected models. Interestingly, it may not always be required to describe all of the components of a system to
get a suitable systems level representation. Gene regulatory networks containing many
components may in some cases be represented by relative simple polynomial functions
that correlate the inputs to the network to its output. The modeling of the Plac and
the modified Gal1 promoter provide two examples. In both cases, the regulatory network is comprised of many genes and proteins that interact with each other in complex
ways. If the whole-cell regulatory circuitry could be decomposed into such smaller,
well-behaved regulatory motifs it is not unlikely that we soon will be able to construct
accurate yet tractable models of living cells and organisms.
3.6 Concluding Remarks
Suggested Further Reading
Textbooks:
Murray J. D. Mathematical biology. Springer-Verlag (1993).
Bower J. M. & Bolouri H. (Eds) Computational modeling of genetic and biochemical networks. MIT Press (2001).
Hammes G. F. Thermodynamics and kinetics for the biological sciences. WileyInterscience (2000).
Articles
Many of the articles listed in Part (2) are also relevant here.
Setty Y., Mayo A. E., Surette M. G. & Alon U. Detailed map of a cis-regulatory
input function. Proc Natl Acad Sci U S A. 100, 7702-7 (2003).
Atkinson M. R., Savageau M. A., Myers J. T. & Ninfa A. J. Development of
genetic circuitry exhibiting toggle switch or oscillatory behavior in Escherichia
coli. Cell 113, 597-607 (2003).
Hlavacek W. S. & Savageau M. A. Rules for coupled expression of regulator and
effector genes in inducible circuits. J. Mol. Biol. 255 (1996).
Isaacs F. J., Hasty J, Cantor C. R. & Collins J. J. Prediction and measurement of
an autoregulatory genetic module. Proc. Natl. Acad. Sci. U. S. A. 100, 7714-9
(2003).
Vilar J. M., Guet C. C. & Leibler S. Modeling network dynamics: the lac operon,
a case study. J Cell Biol. 161, 471-6 (2003).
Vilar J. M. & Leibler S. DNA looping and physical constraints on transcription
regulation. J Mol Biol. 331, 981-9 (2003).
Yildirim N & Mackey M. C. Feedback regulation in the lactose operon: a mathematical modeling study and comparison with experimental data. Biophys J. 84,
2841-51 (2003).
95