Gene regulatory proteins

Control of Gene
Expression
MOLECULAR BIOLOGY OF
THE CELL
5TH Edition
Chapter 7
1
The rules and mechanisms by which a
subset of the genes is selectively
expressed in each cell
2
Cell Differentiation
What makes the differences??
The two cells extremely different but
contain the same genome!!
Differentiation = synthesizing and
accumulating different sets of RNA and
protein molecules.
The DNA sequence is generally not altered
All the differences are achieved by changes
in gene expression
3
Clues for genome preservation during cell
differentiation:
1. From Animal
Experiments
4
5
2. From comparing the
detailed banding
patterns detectable in
condensed
chromosomes at
mitosis
6
3. Comparisons of the genomes of different cells
based on recombinant DNA technology have
shown:
 Changes in gene expression that underlie the
development of multicellular organisms are not
accompanied by changes in the DNA sequences of
the corresponding genes.
 However, in a few cases DNA rearrangements of
the genome take place during development
 Example: in generating the diversity of the immune system
of mammals
7
Different Cell Types Synthesize
Different Sets of Proteins
1. Housekeeping proteins are made in all cells
Like what??





The structural proteins of chromosomes
RNA polymerases
DNA repair enzymes
Ribosomal proteins
Enzymes involved in the central reactions of
metabolism
 Many of the proteins that form the cytoskeleton
8
2. Specialized proteins are responsible for the
cell’s distinctive properties Like what??
 Example: Hemoglobin can be detected only in red
blood cells.
3. All other proteins are expressed to various
degrees from one cell type to another
 A typical human cell expresses 30-60% of its
approximately 25,000 genes
9
A cell can change the expression of its
genes in response to external signals
Different cells respond in different ways to
the same signal
For example: in response to glucocorticoid
hormone:
liver cell turns up tyrosine aminotransferase (helps to
convert tyrosine to glucose to combat starvation)
fat cells turns the same enzyme down
other cell types don’t respond at all
10
 The patterns of mRNA
abundance,
characteristic of a cell
type can be determined
using DNA microarrays
 It reflects the pattern of
gene expression
11
 The radical differences in gene expression between cell
types can be appreciated by two-dimensional gel
electrophoresis
 protein levels are directly measured
 some of the most common posttranslational
modifications can be displayed
12
Gene expression can be regulated at many of
the steps in the pathway from DNA to RNA
to protein
13
How does a cell determine which of
its thousands of genes to transcribe?
14
The transcription of each gene is
controlled by two types of fundamental
components:
1. Regulatory regions: short stretches of DNA
of defined sequence
 Some are simple and act as switches
 Respond to a single signal especially in bacteria
 Many others are complex and act as tiny
microprocessors
 Respond, interpret and integrate a variety of signals to
switch the neighboring gene on or off
2. Gene regulatory proteins (Transcription
factors) that recognize and bind to regulatory
regions
15
Gene regulatory proteins must recognize
specific nucleotide sequences embedded
within the DNA double helix
The edge of each base pair is
exposed at the surface of the
double helix
The surface of protein must fit
tightly against the special
surface features of the double
helix
Features on the DNA surface
vary with nucleotide sequence
16
A gene regulatory protein interacts with DNA by:
hydrogen bonds
ionic bonds
hydrophobic interactions
Typically ~20 contacts combine to ensure that
the interaction is both highly specific and very
strong
Protein-DNA interactions are among the tightest
and most specific molecular interactions known
in biology!
17
Most DNA/protein interactions are on
the major groove
 Proteins usually insert into the
major groove of DNA helix
and make molecular contacts
with its base pairs
 DNA binding proteins don’t
have to open the double helix.
18
A distinctive pattern of hydrogen bond donors, and
acceptors, and hydrophobic patches are available in
both grooves
Only in the major groove are the patterns markedly
different for each of the four base-pair
arrangements
19
For this reason, gene regulatory proteins
generally bind to the major groove
The Geometry of the DNA Double Helix
Depends on the Nucleotide Sequence
 The normal DNA
conformation must be
distorted to maximize the
fit between DNA and
protein
The extent to which the
double helix is deformable is
variable
20
 Some sequences (for ex.
AAAANNN) form a double
helix with a slight bend.
 If this sequence is repeated
at 10-bp intervals in a long
DNA molecule, the small
bends add together so that
the DNA molecule appears
unusually curved when
viewed in the electron
microscope
 A few gene regulatory
proteins induce a striking
bend in the DNA when they
bind to it
21
Short DNA Sequences Are Fundamental
Components of Genetic Switches
 Nucleotide sequences
typically < 20 bp function
as fundamental
components of genetic
switches
 They serve as recognition
sites for the binding of
specific gene regulatory
proteins
 Each is recognized by a
different gene regulatory
protein (or by a set of
related gene regulatory
proteins).
22
Gene Regulatory Proteins Contain Structural
Motifs That Can Read DNA Sequences
Certain aa structures
make precise
contacts with one or
more bases in the
major groove.
This is not sufficient
and the right structure
has to be in the right
position.
23
Many of the proteins contain one or another
of a small set of DNA-binding structural
motifs
The DNA binding motifs generally use
either a-helices or b-sheets to bind the
major groove of DNA
The major groove, contains sufficient
distinctive information to distinguish one
DNA sequence from any other.
24
A few examples of DNA
binding motifs
25
The Helix-Turn-Helix Motif (HTH)
 The first DNA-binding motif to be recognized in bacterial
proteins
 The two helices are held at a fixed angle, primarily
through interactions between them
 The more C-terminal helix is called the recognition
helix because it fits into the major groove of DNA
26
 Three important features about HTH
proteins:
1. The actual amino acids within the recognition
helix can vary from one transcription factor to
another
 This variation allows proteins with similar global
structures to recognize very different DNA
sequences.
2. Amino acid composition and structure outside
the HTH region of the protein can vary
tremendously.
 Amino acids outside the HTH region can also
make important contacts with the DNA.
27
3. Many HTH proteins function as dimers
 Dimers are complexes of two protein molecules that
come together and function as a unit
 Dimerization allows/requires the complex to have twice
the number of contacts with the DNA
28
Homeodomain Proteins: a Special Class
of HTH Proteins
 Homeodomain an almost identical stretch of 60 aa that
defines a class of proteins termed homeotic selector
 The homeotic selector genes, play a critical part in
orchestrating the Drosophila fly development.
 Homeodomain contains a HTH motif related to that of
the bacterial gene regulatory proteins.
 Thus the principles of gene regulation established in
bacteria are relevant to higher organisms as well.
 Homeodomain proteins have been identified in virtually
all eucaryotic organisms that have been studied, from
yeasts to plants to humans.
29
The structure of a homeodomain bound to its
specific DNA sequence
The HTH motif of
homeodomains is always
surrounded by the same
structure (which forms the
rest of the homeodomain)
Structural studies have
shown:
A yeast homeodomain
protein and a Drosophila
homeodomain protein have
very similar conformations
30
 DNA-binding Zinc Finger Motifs
 A zinc-coordinated DNA-binding motif.
 Two major structurally distinct types
of zinc finger that:
 both use zinc as a structural element,
 both use an a-helix to recognize the
major groove of the DNA.
1. The first type: Discovered in the protein
that activates the transcription of a
eukaryotic ribosomal RNA gene.
 Consists of an a-helix and a b-sheet held
together by the zinc
31
 A strong and specific DNA-protein interaction is built
up through a repeating basic structural unit
zinc fingers, are arranged one after the other
The a-helix of each can contact the major groove of
the DNA, forming a nearly continuous stretch of ahelices along the groove
32
2. The second type: is
found in the large family of
intracellular receptor
proteins:
 It forms a different type of
structure (similar in some
respects to the HTH motif)
in which two a-helices are
packed together with zinc
atoms
 Like the HTH proteins,
these proteins usually
form dimers that allow one
of the two a-helices of
each subunit to interact
with the major groove of
the DNA
33
b-sheets DNA binding motif
In this case the information on the surface of the
major groove is read by a two-stranded b-sheet
The exact DNA sequence recognized depends
on the sequence of amino acids that make up
the b-sheet.
34
The Leucine Zipper Motif
Unlike other proteins, the leucine
zipper motif dimerizes and binds
DNA using the same domain.
Two a-helices, one from each
monomer, are joined together to
form a short coiled-coil
The helices are held together by
interactions between hydrophobic
amino acid side chains (often on
leucines)
35
The Helix-Loop-Helix Motif HLH
HLH motif should not be
confused with the HTH
Consists of a short a-helix
connected by a loop to a second,
longer a-helix.
Also Mediates dimerization
and DNA binding
HLH proteins can create a
homodimer or a heterodimer.
36
Example of DNA binding proteins:
DNA recognition by the P53
 The most important DNA
contacts are made by arginine
248 and lysine 120
 They extend from the protruding
loops entering the minor and
major grooves.
 The folding of the p53 protein
requires a zinc atom (shown as a
sphere)
 but the way in which the zinc is
grasped by the protein is
completely different from that of
the zinc finger proteins,
described previously.
The gene regulatory proteins can bind
DNA as dimers
1. Homo-dimers: dimers made up of two
identical subunits.
2. Heterodimers: composed of two
different subunits
 heterodimers typically form from two proteins
with distinct DNA-binding specificities
38
 There are tremendous advantages to
dimerization
1. Doubles the number of contacts with DNA, i.e.
stronger binding affinity
2. Can turn two weak binders into one moderate
or strong binder
3. Adds specificity to the system
39
How can dimerization contribute to
specificity??
Let’s consider a fictional protein binding site of
one bp
There are four possible bp possibilities at this site.
1
The likelihood of this site occurring is one in every 4
Let’s consider a fictional protein binding site of 4
bp
The binding site has 4 positions but at any given one
position there are 4 possible bp
The likelyhood of this site occuring is one in every
44=256 bp
40
How can dimerization contribute to
specificity??
9
Consider that the human genome is 3.2x10 bp,
and most of it is not genes
then by random chance we would find
3.2x109/256=1.25x107 (12.5 million) sites for this
protein.
However,there are only ~30,000 genes in the whole
genome and very few are regulated by the same
specific transcription factor!!
In this case the cell would have to make a lot of this
protein to ensure that it would actually get to the few
binding sites where it is really needed or have another
solution.
41
How can dimerization contribute to
specificity??
If the binding site is 8 bp
then this sequence will randomly, be found once every
48 (1/65,536) bp (roughly 48,828 (3.2x109/48) sites in the
human genome)
If the protein requires two of these sites (in
dimer)
then this sequence will, randomly be found once every
(48)x(48) = 1/4,294,967,296 or 1/4.29x109 bp
roughly 0.75 times in the human genome
(3.2x109/4.29x109)
42
How can dimerization contribute to
specificity??
If number of 8 bp sites is 48,828 times in the
human genome
And number of two adjacent 8 bp sites is 0.75
times in the human genome (less than one)
then it is far less likely that a protein requiring
two half-sites will find a random place in the
genome that it can bind to, when compared to a
protein that requires only one half site.
43
Heterodimerization Expands the Repertoire
of DNA Sequences Recognized by Gene
Regulatory Proteins
 Heterodimerization is an example of combinatorial
control:
combinations of different proteins, rather than individual
proteins, control a cellular process.
 Heterodimerization occurs in a wide variety of different
types of gene regulatory proteins
44
Heterodimerization greatly expands the
DNA-binding specificities
Example: Three distinct DNA-binding
specificities could, in principle, be generated
from two types of leucine zipper monomer
Heterodimerization depends on the exact amino
acid sequences of the two zipper regions.
Thus each leucine zipper protein in the cell can form
dimers with only a small set of other leucine zipper
proteins
45
Is There a DNA Sequences Recognized
by All Gene Regulatory Proteins?
For example, is a G-C base
pair always contacted by a
particular amino acid side
chain?
The answer appears to be NO
However, certain types of aa-base
interactions appear much more
frequently than others
46
Homework
Briefly discuss at least 2 methods
that are used to experimentally
determine the DNA sequence
recognized by a gene regulatory
protein (DNA binding protein)
How a Genetic Switch Works
Bacterial Genes
48
Regulatory proteins and specific
DNA sequences control the switch.
How?
49
Negative control by transcriptional
repressors
 Tryptophan operon:
 5 E. coli genes code for enzymes that manufacture the aa
tryptophan, arranged in a single transcriptional unit.
 Promoter: the 5 genes are transcribed into a single long
mRNA molecule.
 Operator: a short sequence of regulatory DNA within the
promoter that directs transcription of the tryptophan
biosynthetic genes
50
On/off switch
Cells need tryptophan to live
51
Get it from environment
Synthesized inside the cells
The
opero
n is off
The
operon
is on
Three essential features of this on/off
switch
1. The cell needs to know whether it has
tryptophan or not.
2. Accordingly, the cell then turns the
operon ON or OFF.
3. ON or OFF, the cell needs to
continuously monitor tryptophan levels.
52
Tryp & tryptophan repressor will do
the job
The tryptophan repressor is a member of the
HTH family that recognizes the operator
53
Three essential features of this on/off
switch
1. The cell needs to know whether it has tryptophan
or not.
The repressor binding to tryp is the sensor.
2. Accordingly, the cell then turns the operon ON or
OFF.
The activity of the repressor dependent on the
presence/absence of tryp.
3. ON or OFF, the cell needs to continuously
monitor tryptophan levels.
The repressor can be turned on/off by the simple
presence or absence of tryp.
54
This is an example of a feedback loop
 Tryp that is being
synthesized by the
gene products can
directly feedback
information to the
switch and dictate
whether more or less
gene products should
be made.
55
How does tryptophan activate the
repressor?
 Two molecules of tryp can bind to the repressor
 Tryp binding causes a conformational change
 the DNA binding domains of the repressor swings into a
different position
 This is a very good DNA binding state
 thus can exclude RNA pol from biniding.
56
Negative regulation can be controlled by
ligands
1. The repressor is
originally active
 A ligand inhibit its
activity
2. The repressor
needs the ligand for
being activated
57
Positive Control by Transcriptional
Activators
Poorly functioning bacterial promoters can be
rescued by gene regulatory proteins
(transcriptional activators or gene activator
proteins):
They bind to a nearby site on the DNA
They may strengthen the RNA pol binding to the
promoter by providing an additional contact surface
for it.
They may facilitate the polymerase transition from the
initial DNA-bound conformation to the actively
transcribing form
58
Positive regulation can also be controlled by
ligands
1. Ligands can serve
to remove positive
regulators form
DNA.
2. Ligands can also
serve to allow
positive regulators
to bind DNA.
59
Repressors and Activators are similar
to one another.
1. They may bind DNA in very similar ways using
similar helical structures.
2. They may be controlled by similar or identical
ligands or be independent of ligands.
3. Some transcriptional regulator proteins can act
as both repressors and activators
 depending on the exact placement of their DNA
recognition sequence in relation to the promoter
60
The lambda repressor can both:
activate and repress.
 when bound in the right
position, relative to the RNA
pol binding site on the
promoter, the lambda
repressor can activate
transcription.
 A shift of even one basepair,
in another promoter, of the
repressor binding site
relative to the RNA pol
binding site inhibits RNA pol
binding to the promoter, thus
repressing transcription.
61
The lac operon, uses both negative and
positive regulation to control
 The lac operon codes for proteins required to transport
the disaccharide lactose into the cell and to break it
down.
 The operon is highly expressed only when two
conditions are met:
1. lactose must be present
2. glucose must be absent
 CAP: enables bacteria to use alternative carbon sources
such as lactose in the absence of glucose.
 It needs the presence of lactose to induce expression of the lac
operon
 The lac repressor ensures that the lac operon is shut
off in the absence of lactose.
62
63
64
How a Genetic Switch Works
Eucaryotic Genes
65
Transcription regulation in eucaryotes
differs in three important ways from that in
bacteria.
1. A single promoter can be controlled by an almost
unlimited number of regulatory sequences
 Regulatory proteins can act even when they are bound to DNA
thousands of bps away from the promoter
2. Eucaryotic RNA pol II, which transcribes all proteincoding genes, cannot initiate transcription on its own.
 It requires general transcription factors
 The rate of there assembly and thus the rate of transcription
initiation can be controled in response to regulatory signals
3. The packaging of eucaryotic DNA into chromatin provides
opportunities for regulation not available to bacteria.
66
A Eucaryotic Gene Control Region
Refers to the whole region of DNA involved in
regulating transcription of a gene, including:
The promoter: site of the general TFs and the RNA
Pol II assembly
The regulatory sequences: to which gene
regulatory proteins bind and control the rate of
assembly at the promoter
Regulatory sequences of a gene can be found over
distances as great as 50,000 bp of "spacer" sequence
Spacer DNA may facilitate transcription by providing the
flexibility
Much of the DNA in gene control regions is packaged into
nucleosomes and higher-order forms of chromatin, thereby
compacting its length.
67
68
Enhancers
The DNA sites to which the eucaryotic
gene activators bound
They could be thousands of bp away from the
promoter.
They could be located either upstream or
downstream from it.
How do enhancer sequences and the
proteins bound to them communicate with
the promoter over these long distances?
69
A model for action at a distance
 The DNA between the
enhancer and the
promoter loops out to
allow the activator
proteins bound to the
enhancer to come into
contact with proteins
bound to the promoter
(RNA polymerase,
one of the general
transcription factors,
or other proteins)
70
DNA Looping Occurs During
Bacterial Gene Regulation
71
 If this were a random or
passive interaction then one
would predict that the further
away the enhancer from the
promoter the less likely it is to
be able to control
transcription.
 Note that being too close also
presents a problem.
 Eukaryotic enhancers can be
spaced out over a 50,000
basepair region, relative to
the promoter.
 Thus, the interaction between
enhancers and promoter
complexes cannot be random
or passive since this great
distance would predict a very
low likelyhood of chance
interaction.
72
We have reason to believe that looping occurs in
both prokaryotes and in eukaryotes
we can see something that looks like looping in EM
pictures from bacteria.
73
There are thousands of different gene
regulatory proteins.
 About 5-10% of the roughly 30,000 human genes,
encode gene regulatory proteins.
 Each regulatory protein is usually present in very small
amounts in a cell, often less than 0.01% of the total
protein.
 Most of them directly recognize their specific DNA
sequences using one of the DNA-binding motifs
 Some do not recognize DNA directly but instead
assemble on other DNA-bound proteins.
74
Eucaryotic Gene Activator Proteins
Consist of at least two distinct domains:
A DNA binding domain usually contains one
of the structural motifs that recognizes a
specific regulatory DNA sequence.
An activation domain accelerates the rate of
transcription initiation.
75
The yeast GAL4 TF
needs its DNA
binding domain to
recognize its target.
It is the activation or
repressor domain of
a TF that influences
the activity of RNA
pol and transcription initiation.
76
In general activators work on the level of
initiation of transcription
 Once bound to DNA, eucaryotic gene activator
proteins increase the rate of transcription
initiation
They attract, position, and modify the
general TFs and RNA pol II at the promoter
so that transcription can begin.
1. They can act directly on the transcription
machinery itself
2. They can change the chromatin structure around
the promoter.
77
1. Activators act directly on the
transcription machinery itself
 General TFs and RNA pol II assemble
in a stepwise, prescribed order in vitro
 In living cells some TFs and RNA pol II
are brought to the promoter as a large
pre-assembled complex (RNA pol II
holoenzyme).
 The holoenzyme typically also contains
a 20-subunit protein complex called
mediator
 required for activators to stimulate
transcription initiation.
78
Eucaryotic activators help to attract and position
RNA pol on specific sites on DNA
 Activator proteins interact
with the holoenzyme
complex and thereby
make it more
energetically favorable for
it to assemble on a
promoter
 Most forms of the
holoenzyme complex
lacks some of the general
transcription factors
(notably TFIID and TFIIA)
 these must be assembled
on the promoter
separately
79
Experimental support: “Activator bypass”
 A sequence-specific DNA-binding domain is
experimentally fused directly to a component of the
mediator
 The hybrid protein lacks an activation domain
 It strongly stimulates transcription initiation when the
DNA sequence to which it binds is placed in proximity to
a promoter
80
Many activators have been shown to interact
with one or more of the general transcription
factors
Several have been shown to directly accelerate
their assembly at the promoter
81
2.Activators change the chromatin structure
around the promoter.
Two most important ways of locally
altering chromatin structure are:
Covalent histone modifications
Chromatin remodeling
82
Many gene activator proteins bind to and recruit:
Histone acetyl transferases (HATs).
ATP-dependent chromatin remodeling complexes
83
Covalent histone modifications
A. Activator proteins bind to
enhancers and recruit
HATs.
 Acetylation then allows other
activator proteins to bind to
DNA and/or acetylated
histones and enhances RNA
pol activity.
B. The bromodomain of TFIID
specifically binds to
Acetylated Lysine 8 & 16 on
the terminal tail of histone
H4.
84
An example of how events are ordered on a
particular yeast gene.
Note that the order of
events can be slightly or
even dramatically
different at another gene.
The order of events
during transcription
activation can vary from
one gene to another.
85
Eucaryotic Gene Repressor Proteins Can
Inhibit Transcription in Various Ways
A. A repressor physically
blocks activator
binding site on DNA
B. The repressor has a
distinct DNA binding
site, but it interacts
with and inhibits the
activator activity
C. The repressor binds
directly TFIID and
inhibits activation by
the activator.
86
Eucaryotic Gene Repressor Proteins Can
Inhibit Transcription in Various Ways
D. Repressors recruit
remodeling enzymes that
make DNA inaccessible
E. Repressors can also
recruit histone modifying
enzymes (like histon
deacetylase)
 they covalently modify
histones in a pattern that
is not favorable for
activation and thus
inhibits transcription
87
Gene Activator Proteins Work
Synergistically
Gene activator proteins often exhibit what is
called transcriptional synergy
the transcription rate produced by several activator
proteins working together is much higher than that
produced by any of the activators working alone
88
Gene Activator Proteins Work
Synergistically
 Transcriptional synergy is observed both:
Between different gene activator proteins bound
upstream of a gene
between multiple DNA-bound molecules of the same
activator.
 Synergistic effects turn a simple genetic on/off
switch into a “dimmer” switch
The quantity of transcript being made can also be
regulated
 Synergy allows cells to respond to conditions that
require production of small amounts as opposed to
large amounts of gene products.
89
Eucaryotic Gene Regulatory Proteins Often
Assemble into Complexes on DNA
Two gene regulatory proteins with a weak
affinity for each other cooperate to bind to a
DNA sequence
neither protein having a sufficient affinity for DNA
to efficiently bind to the DNA site on its own.
Once bound to DNA, the protein dimer creates a
distinct surface that is recognized by a third protein
that carries an activator domain that stimulates
transcription
90
 An important general point:
 protein-protein interactions that are too weak to cause
proteins to assemble in solution can cause the
proteins to assemble on DNA
 the DNA sequence acts as a "crystallization" site or seed
for the assembly of a protein complex.
91
 An individual gene regulatory protein can often
participate in more than one type of regulatory complex.
 A protein might function, in one case as part of a
complex that activates transcription and in another case
as part of a complex that represses transcription
 Thus individual eucaryotic gene regulatory proteins
function as regulatory units that are used to generate
complexes
 Their function depends on the final assembly of all of the
individual components.
 This final assembly, in turn, depends both on:
 the arrangement of control region DNA sequences
 which gene regulatory proteins are present in the cell.
92
Coactivators or corepressors:
Gene regulatory proteins that:
do not themselves bind DNA
They assemble on DNA-bound gene regulatory proteins
Coactivators and corepressors typically can
interact with:
chromatin remodeling complexes
histone modifying enzymes
the RNA polymerase holoenzyme
several of the general transcription factors
93
The DNA sequence directly bound by a
regulatory protein can influence its
subsequent transcriptional activity
Fore example: a steroid hormone receptor
interacts with a corepressor at one type of
sequence and turns off transcription.
it assumes a different conformation and
interacts with a coactivator, at a slightly
different DNA sequence, thereby stimulating
transcription.
94
In some cases, a protein-DNA structure,
termed an enhancesome, is formed
 A hallmark of enhancesomes is the participation of
architectural proteins that bend the DNA by a
defined angle and thereby promote the assembly of
the other enhancesome proteins.
 The formation of the enhancesome requires the presence of
many gene regulatory proteins
 This ensures that a gene is expressed only when the correct
combination of these proteins is present in the cell.
95