Lecture 1: Overview of the Biology of RNA

Lecture 1:
Overview of the Biology of RNA
- The Central Dogma and Beyond
Milestones in RNA research. (Caution, some of these dates are approximate.)
1869
Nucleic acids (Friedrich Miescher)
1939
RNA in cytoplasm of growing tissues (Caspersson, Schultz and Brachet)
(suggesting involvement in protein synthesis)
1941
1944
1950
1952
1953
One gene, one enzyme
(Beadle & Tatum)
Neurospora
DNA is the “transforming” principle (Avery, MacLeod, McCarty)
Chargaff’s rules
Confirm DNA is genetic material
(Hershey, Chase) T2 injection of DNA
Duplex structure of DNA (Watson & Crick)
1954
1955
1955
1955
1956
RNA duplex (hybridization of polyA poly U & fiber diffraction) (Alex Rich)
DNA polymerase I (Kornberg) (and dNTPs)
Adapter hypothesis (Crick)
TMV reconstituted using purified protein & RNA (no DNA) (Fraenkel-Conrat)
TMV RNA infectious
1956
1957
Central Dogma proposed (Francis Crick) - letter
“On Protein Synthesis” (Crick)
1959
Nobel Prize for:
DNA polymerase I (Arthur Kornberg)
“RNA polymerase” (PNPP) Severo Ochoa
1960
DNA-dependent RNA polymerase
(Jerry Hurwitz)
1960
1961
1962
mRNA (Jacob & Monod) (Brenner & Crick)
Genetic code figured out
(Marshal Nirenberg)
Nobel Prize for DNA structure
Watson, Crick, Wilson
1967-68
RNA world proposals (Woese, Crick, Orgel)
1968 Central Dogma (re)published (Crick)
1970 reverse transcriptase discovery published
1975 Nobel Prize for:
Reverse Transcriptase (Baltimore & Temin)
DNA virus transformation of cells (Dulbecco)
See http://www.genomenewsnetwork.org/resources/timeline/
Crick - Central Dogma, 1956
Crick - Central Dogma, 1958
Crick - Central Dogma, 1970
All possible
1958
known/possible
Crick - On Protein Synthesis, 1958
Symposia for the Society for
Experimental Biology 12: 138-163 (1958)
1970
general/special
Cech - Tetrahymena large rRNA gene has intron
Cech - Intron splicing in vitro - 1980
Cech - an extra G at the splice junction - 1981
Indentifying the
components
required for
splicing in vitro...
the “negative” control
(RNA alone)
showed splicing.
Cech - “Catalytic RNA”
PROTEIN
138
ON
PROTEIN
SYNTHESIS
BY F. H. c. [CRICK
Medical Research Council Unit for the Study of Molecular Biology,
Cavendish Laboratory, Cambridge
I.
INTRODUCTION
Protein synthesis is a large subject in a state of rapid development. To cover
it completely in this article would be impossible. I have therefore deliberately limited myself here to presenting a broad general view of the problem,
emphasizing in particular well-established facts which require explanation,
and only selecting from recent work those experiments whose implications
seem likely to be of lasting significance. Much very recent work, often of
great interest, has been omitted because its implications are not clear.
I have also tried to relate the problem to the other central problems of
molecular biology-those
of gene action and nucleic acid synthesis. In
short, I have written for the biologist rather than the biochemist, the
general reader rather than the specialist. More technical reviews have
appeared recently by Borsook (1956), Spiegelman (1957), and Sin&in &
Work (19576 and this Symposium),
The importance of proteins
It is an essential feature of my argument that in biology proteins are
uniquely important. They are not to be classed with polysaccharides, for
example, which by comparison play a very minor role. Their nearest rivals
are the nucleic acids. Watson said to me, a few years ago, ‘The most
significant thing about’ the nucleic acids is that we don’t know what they
do.’ By contrast the most significant thing about proteins is that they can
do almost anything. In animals proteins are used for structural purposes,
but this is not their main role, and indeed in plants this job is usually done
by polysaccharides. The main function of proteins is to act as enzymes.
Almost all chemical reactions in living systems are catalysed by enzymes,
and all known enzymes are proteins. It is at first sight paradoxical that it is
probably easier for an organism to produce a new protein than to produce
a new small molecule, since to produce a new small molecule one or more
new proteins will be required in any case to catalyse the reactions.
I shall also argue that the main function of the genetic material is to
control (not necessarily directly) the synthesis of proteins. There is a little
direct evidence to support this, but to my mind the psychological drive
behind this hypothesis is at the moment independent of such evidence.
SYNTHESIS
139
Once the central and unique role of proteins is admitted there seems little
point in genes doing anything else. Although proteins can act in so many
different ways, the way in which they are synthesized is probably uniform
and rather simple, and this fits in with the modern view that gene action,
being based upon the nucleic acids, is also likely to be uniform and rather
simple.
Biologists should not deceive themselves with the thought that some new
class of biological molecules, of comparable importance to the proteins,
remains to be discovered. This seems highly unlikely. In the protein
molecule Nature has devised a unique instrument in which an underlying
simplicity is used to express great subtlety and versatility; it is impossible
to see molecular biology in proper perspective until this peculiar combination of virtues has been clearly grasped.
II.
THE
PROBLEM
Elementary facts about proteins
(I) Composition. Simple (unconjugated) proteins break down on hydrolysis to amino acids. There is good evidence that in a native protein the
amino acids are condensed into long polypeptide chains. A typical protein,
of molecular weight about 25,000, will contain some 230 residues joined
end-to-end to form a single polypeptide chain.
Two points are important. First, the actual chemical step required to
form the covalent bonds of the protein is always the same, irrespective of
the amino acid concerned, namely the formation of the peptide link with
the elimination of water. Apart from minor exceptions (such as S-S links
and, sometimes, the attachment of a prosthetic group) all the covalent links
within a protein are formed in this way. Covalently, therefore, a protein is to
a large extent a linear molecule (in the topological sense) and there is little
evidence that the backbone is ever branched. From this point of view the
cross-linking by S-S bridges is looked upon as a secondary process.
The second important point-and
I am surprised that it is not remarked
more often-is that only about twenty different kinds of amino acids occur
in proteins, and that these same twenty occur, broadly speaking, in all
proteins, of whatever origin-animal,
plant or micro-organism. Of course
not every protein contains every amino acid-the amino acid tryptophan,
which is one of the rarer ones, does not occur in insulin, for examplebut the majority of proteins contain at least one of each of the twenty
amino acids. In addition all these twenty amino acids (apart from glycine)
have the L configuration when they occur in genuine proteins.
There are a few proteins which contain amino acids not found else-
PROTEIN
SYNTHESIS
140
where-the hydroxyproline of collagen is a good example--but in all such
cases it is possible to argue that their presence is due to a modification of
the protein after it has been synthesized or to some other abnormality. In
Table I, I have listed the standard twenty amino acids believed to be of
universal occurrence and also, in the last column, some of the exceptional
ones. The assignment given in Table I might not be agreed by everyone,
Table
I
The magic twenty amino acids found
universally in proteins
,
,
Other amino acids found in proteins
-I
Glycine
Hydroxyproline’
g,;y*z
Alanine
Hydroxylysine
Valine
Aspartic acid
Phosphoserine
Leucine
Diaminopimelic ‘kid
Glutamic acid
Isoleucine
Arginine
Thyroxine and related molecules
Proline*
Lysine
Phenylalanine
Hi&dine
Cystinet
Tryptophan
z$gehe
Cysteinet
Threonine
Methionine
l
These are,.of course, imino acids. This distinction is not made in the text.
t This classification implies that all the cystine found in proteins is formed by the
joining together of two cyst&e molecules.
as the evidence is incomplete, but more agreement could be found for this
version than for any other. Curiously enough this point is slurred over by
almost all biochemical textbooks, the authors of which give the impression
that they are trying to include as many amino acids in their lists as they
can, without bothering to distinguish between the magic twenty and the
others. (But see a recent detailed review by Synge, 1957.)
(2) Homogeneity. Not only is the composition of a given protein fixed,
. but we have every reason to believe that the exact order of the amino acid
residues along the polypeptide chains is also rigidly determined: that each
molecule of haemoglobin in your blood, for example, has exactly the same
sequence of amino acids as every other one. This is clearly an overstatement ; the mechanism must make mistakes sometimes, and, as we shall see,
there are also interesting exceptions which are under genetic control.
Moreover, it is quite easy, in extracting a protein, to modify some of the
molecules slightly without affecting the others, so that the ‘pure’ protein
may appear heterogeneous. The exact amount of ‘microheterogeneity’ of
proteins is controversial (see the review by Steinberg & Mihaiyi, 1957), but
this should not blind one to the astonishing degree of homogeneity of most
proteins.*
(3) Structure. In a native globular protein the polypeptide chain is not
* The y-globulins and other antibody molecules are exceptions
They
are probably
heterogeneous
in folding
and possibly
to these generalizations.
to some extent in composition.
PROTEIN
SYNTHESIS
‘4’
fully extended but is thrown into folds and superfolds, maintained by weak
physical bonds, and in some cases by covalent -S-Slinks and possibly
some others. This folding is also thought to be at least broadly the same for
each copy of a particular protein, since many proteins can be crystallized,
though the evidence for perfect homogeneity of folding is perhaps rather
weak.* As is well known, if this folding is destroyed by heat or other
methods the protein is said to be ‘denatured’. The biological properties
of most proteins, especially the catalytic action of enzymes, must depend
on the exact spatial arrangement of certain side-groups on the surface of
the protein, and altering this arrangement by unfolding the polypeptide
‘,
chains will destroy the biological specificity of the proteins.
(4) Amino acid requirements. If one of the twenty amino acids is supplied
to a cell it can be incorporated into proteins; amino acids are certainly protein
precursors. The only exceptions are amino acids like hydroxyproline, which
are not among the magic twenty. The utilization of peptides is controversial
but the balance of evidence is against the occurrence of peptide intermediates. (See the discussion by Simkin & Work, this Symposium.)
I:
If, for some reason, one of the twenty amino acids is not available to the
organism, protein synthesis stops. Moreover, the continued synthesis of
those parts of the protein molecules which do not contain that amino acid
appears not to take place. This can be demonstrated particularly clearly in
bacteria, but it is also true of higher animals. If a meal is provided that
lacks an essential amino acid it is no use trying to make up for this deficiency
by providing it a few hours later.
Very little is known about the accuracy with which the amino acids are
selected. One would certainly expect, for example, that the mechanism
would occasionally put a valine into an isoleucine site, but exactly how often
this occurs is not known. The impression one gets from the rather meagre
facts at present available is that mistakes occur rather infrequently.
In recent years it has been possible to introduce amino acid analogues
into proteins by supplying the analogue under circumstances in which the
amino acid itself is not easily available (see the review by Kamin & Handler,
1957).
For example in Escherichia co& fluorophenylalanine has been incorporated in place of phenylalanine and tyrosine (Munier & Cohen, 1956)
and it has even proved possible to replace completely the sulphur-containing
amino acid methionine by its selenium analogue (Cohen & Cowie, 1957).
Of the enzymes produced by the cell in these various ways some were
active and some were inactive, as might have been expected.
(5) Contrast with polysacchavides. It is useful at this point to contrast
proteins with polysaccharides to underline the differences between them.
* See previous
footnote.
PROTEIN
SYNTHESIS
‘42
(I do not include nucleic acids among the polysaccharides.) - Polysaccharides, too, are polymers, but each one is constructed from one, or at the
most only about half-a-dozen kinds of monomer. Nevertheless many
different monomers are found throughout Nature, some occurring here,
some there. There is no standard set of monomers which is always used, as
there is for proteins. Then polysaccharides are polydisperse-at
least so
far no monodisperse one has been found-and the order of their monomers
is unlikely to be rigidly controlled, except in some very simple manner.
Finally in those cases which have been carefully studied, such as starch,
glycogen and hyaluronic acid, it has been found that the polymerization is
carried out in a straightforward way by enzymes.
(6) The genetics and taxonomy of proteins. It is instructive to compare
your own haemoglobin with that of a horse. Both molecules are indistinguishable in size. Both have similar amino acid compo8itions; similar
but not identical. They differ a little electrophoretically, form different
crystals, and have slightly different ends to their polypeptide chains. All
these fact8 are compatible with their polypeptide chains having similar
amino acid sequences, but with just a few changes here and there.
This ‘family likeness’ between the ‘same’ protein moleculesfromdifferent
species is the rule rather than the exception. It has been found in almost
every case in which it ha8 been looked for. One of the best-studied examples
is that of insulin, by Sanger and his co-workers (Brown, Sanger & Kitai,
1955;
Harris, Sanger 8z Naughton, 1956), who have worked out the
complete amino acid sequence8 for five different species, only two of which
(pig and whale) are the same. Interestingly enough the differences are all
located in one small segment of one of the two chains.
Biologists should realize that before long we shall have a subject which
might be called ‘protein taxonomy ‘-the study of the amino acid sequences
of the proteins of an organism and the comparison of them between species.
It can be argued that these sequences are the most delicate expression
possible of the phenotype of an organism and that vast amounts of
evolutionary information may be hidden away within them.
There is, however, nothing in the evidence presented so far to prove that
these differences between species are under the control of Mendelian genes.
It could be argued that they were transmitted cytoplasmically through the
egg. On the other hand, there is much evidence that genes do affect enzymes,
especially from work on micro-organisms such as Neurospora (see Wagner
& Mitchell, 1955). The famous ‘one gene-one enzyme’ hypothesis
(Beadle, 1945) expresses this fact, although its truth is controversial
(personally I believe it to be largely correct). However, in none of these
cases has the protein (the enzyme, that is) ever been obtained pure.
PROTEIN
SYNTHESIS
‘43
There are a few cases where a Mendelian gene has been shown unambiguously to alter a protein, the most famous being that of human sicklecell-anaemia haemoglobin, which differs electrophoretically from normal
adult haemoglobin, as was discovered by Pauling and his co-workers (1949).
Until recently it could have been argued that this was perhaps not due to a
change in amino acid sequence, but only to a change in the folding. That
the gene does in fact alter the amino acid sequence has now been conclusively shown by my colleague, Dr Vernon Ingram. The difference is due
to a valine residue occurring in the place of a glutamic acid one, and
Ingram has suggestive evidence that this is the onZy change (Ingram,
1956, 1957). It may surprise the reader that the alteration of one amino
acid out of a total of about 300 can produce a molecule which (when
homozygous) is usually lethal before adult life but, for my part, Ingram’s
.,
,,
result is just what I expected.
I
t
i’
The nature of protein synthesk
1 :
The basic dilemma of protein synthesis ha8 been realized by many people;
but it has been particularly aptly expressed by Dr A. L. Dounce (1956);
My interest in templatea, and the conviction of their necessity, originated from
a question asked me on my Ph.D. oral examination by Professor J. B. Sumner.
He enquired how I thought protein8 might be synthesized. I gave what seemed
the obvious answer, namely, that enzymes must be responsible. Professor
Sumner then asked me the chemical nature of enzymea, and when 1 answered
.that enzymes were protein8 or contained proteins 88 essential components, he
asked whether these enzyme protein8 were synthesized by other enaymes and 80
on ad injim’tum.
The dilemma remained in my mind, causing me to look for possible solutions
that would be acceptable, at least from the standpoint of logic. The dilemma, of
course, involve8 the specificity of the protein molecule, which doubtless depends to
a considerable degree on the sequenceof amino acid8 in the peptide chains of the
protein. The problem is to find a reasonably simple mechanism that could account
for specific sequence8 without demanding the presence of an ever-increasing
number of new specific enzyme8 for the synthesis of each new protein molecule.
It is thus clear that the synthesis of proteins must be radically different
from the synthesis of polysaccharides, lipids, co-enzymes and other small
molecules; that it must be relatively simple, and to a considerable extent
uniform throughout Nature; that it must be highly specific, making few
mistakes; and that in all probability it must be controlled at not too many
removes by the genetic material of the organism.
The essenceof the problem
A systematic discussion of our present knowledge of protein synthesis
could usefully be set out under three headings, each dealing with a flux:
PROTEIN
SYNTHESIS
‘44
the flow of energy, the flow of matter, and the flow of information. I shall
not discuss the first of these here. I shall have something to say about the
second, but I shall particularly emphasize the third-the flow of information.
By information I mean the specification of the amino acid sequence of
the protein. It is conventional at the moment to consider separately the
synthesis of the polypeptide chain and its folding. It is of course possible
that there is a special mechanism for folding up the chain, but the more
likely hypothesis is that the fokfing is simply a function of the order of the
amino acids, provided it takes place as the newly formed chain come8 off
the template, I think myself that this latter idea may well be correct,
though I would not be surprised if exceptions existed, especially the
y-globulins and the adaptive enzymes.
Our basic handicap at the moment is that we have no easy and precise
technique with which to study how proteins are folded, whereas we can at
least make some experimental approach to amino acid sequences. For this
reason, if for no other, I shall ignore folding in what follows and concentrate
on the determination of sequences. It is as well to realize, however, that
the idea that the two processes can be considered separately is in itself an
assumption.
The actual chemical step by which any two amino acids (or activated
amino acids) are joined together is probably always the same, and may well
not differ significantly from any other biological condensation. The unique
feature of protein synthesis is that only a single standard set of twenty
amino acids can be incorporated, and that for any particular protein the
amino acids must be joined up in the right order. It is this problem, the
problem of ‘sequentialization’, which is the crux of the matter, though it is
obviously important to discover the exact chemical steps which lead up to
and permit the crucial act of sequentialization.
As in even a small bacterial cell there are probably a thousand different
kinds of protein, each containing some hundreds of amino acids in its own
rigidly determined sequence, the amount of hereditary information required
for sequentialization is quite considerable.
III.
RECENT
EXPERIMENTAL
WORK
The role of the nucleic acids
It is widely believed (though not by every one) that the nucleic acids are
in some way responsible for the control of protein synthesis, either directly
or indirectly. The actual evidence for this is rather meagre. In the case of
deoxyribonucleic acid (DNA) it rests partly on the T-even bacteriophages,
since it has been shown, mainly by Hershey and his colleagues, that
PROTEIN
SYNTHESIS
‘45
whereas the DNA of the infecting phage penetrates into the bacterial cell
almost all the protein remains outside (see the review by Hershey, 1956);
and also on Transforming Factor, which appears to be pure DNA, and
which in at least one case, that of the enzyme mannitol phosphate dehydrogenase, controls the synthesis of a protein (Marmur & Hotchkiss, 1955).
There is also the indirect evidence that DNA is the most constant part of
the genetic material, and that genes control proteins. Finally there is the
very recent evidence, mainly due to the work of Benzer on the rI1 locus of
bacteriophage, that the functional gene-the ’cistron’ of Benzer’s termindlogy-consists of many sites arranged strictly in a linear order (Benzer, 1g57)
as one might expect if a gene controls the order of the amino acids in some
:.,,;
particular protein.
As is well known, the correlation between ribonucleic acid (RNA) tihd
protein synthesis was originally pointed out by Brachet and by Caspersson.
Is there any more direct evidence for this connexion? In particular is there
anything to support the idea that the sequentialization of the amino acids
,!‘.
: ‘,
is controlled by the RNA?
The most telling evidence is the recent work on tobacco mosaic virus.
A number of strains of the virus are known, and it is not difficult to show
(since the protein sub-unit of the virus is small) that they differ in amino
acid composition. Some strains, for example, have histidine in their protein,
whereas others have none. Two very significant experiments have been
carried out. In one, as first shown by Gierer & Schramm (rg56), the RNA
of the virus alone, although completely free of protein, appears tp be
infective, though the infectivity is low. In the other, first done by FraenkelConrat, it has proved possible to separate the RNA from the protein of the
virus and then recombine them to produce virus again. In this case the
infectivity is comparatively high, though some of it is usually lost. If a
recombined virus is made using the RNA of one strain and the protein of
another, and then used to infect the plant, the new virus produced in the
plant resembles very closely the strain from which the RNA was taken. If
this strain had a protein which contained no histidine then the offspring
will have no histidine either, although the plant had never been in contact
with this particular protein before but only with the RNA from that strain.
In other words the viral RNA appears to carry at least part of the information which determines the composition of the viralprotein. Moreover the viral
protein which was used to infect the cell was not copied to any appreciable
extent (Fraenkel-Conrat, 1956).
It has so far not proved possible to carry out this experiment-a model
of its kind-in any other system, although very recently it has been claimed
that for two animal viruses the RNA alone appears to be infective.
E II s XII
IO
PROTEIN
SYNTHESIS
146
Turnover experiments have shown that while the labelling of DNA is
homogeneous that of RNA is not. The RNA of the cell is partly in the
nucleus, partly in particles in the cytoplasm and partly as the ‘soluble’
RNA of the cell sap; many workers have shown that all these three fractions
turn over differently. It is very important to realize in any discussion of the
role of RNA in the cell that it is very inhomogeneous metabolically, and
probably of more than one type.
The site of protein synthesis
There is no known case in Nature in which protein synthesis proper (as
opposed to protein modification) occurs outside cells, though, as we shall
see later, a certain amount of protein can probably be synthesized using
broken cells and cell fragments. The first question to ask, therefore, is
whether protein synthesis can take place in the nucleus, in the cytoplasm,
or in both.
It is almost certain that protein synthesis can take place in the cytoplasm
without the presence of the nucleus, and it is probable that it can take place
to some extent in the nucleus by itself (see the review by Brachet &
Chantrenne, 1956). Mirsky and his colleagues (see the review by Mirsky,
Osawa & Allfrey, 1956) have produced evidence that some protein synthesis
can occur in isolated nuclei, but the subject is technically difficult and in
this review I shall quite arbitrarily restrict myself to protein synthesis in
the cytoplasm.
In recent years our knowledge of the structure of the cytoplasm has
enormously increased, due mainly to the technique of cutting thin sections
for the electron microscope. The cytoplasm of many cells contains an
‘endoplasmic reticulum’ of double membranes, consisting mainly of
protein and lipid (see the review of Palade, x956). On one side of each
membrane appear small electron-dense particles (Palade, 1955). Biochemical studies (Palade & Siekevitz, 1956; among others) have shown
that these particles, which are about IOO-zoo A. in diameter, consist almost
entirely of protein and RNA, in about equal quantities. Moreover the
major part of the RNA of the cell is found in these particles.
When such a cell is broken open and the contents fractionated by
centrifugation, the particles, together with fragments of the endoplasmic
reticulum, are found in the ’microsome’ fraction, and for this reason I shall
refer to them as microsomal particles.
These microsomal particles are found in almost all cells. They are
particularly common in cells which are actively synthesizing protein,
whereas the endoplasmic reticulum is most conspicuously present in
(mammalian) cells which are secreting very actively. Thus both the cells
.
PROTEIN
SYNTHESIS
I47
of the pancreas and those of an ascites tumour contain large quantities of
microsomal particles, but the tumour has little endoplasmic reticulum,
whereas the pancreas has a lot. Moreover, there is no endoplasmic reticulum in bacteria.
On the other hand particles of this general description have been found
in plant cells (Ts’o, Bonner & Vinograd, rg56), in yeast, and in various
bacteria (Schachman, Pardee 8z Stanier, 1953); in fact in all cells which.
have been examined for them.
These particles have been isolated from various cells and examined in the
ultra-centrifuge (Petermann, Mizen & Hamilton, 1952; Schachmari et al.
1953 ; among others). The remarkable fact has emerged that they do not
have a continuous distribution of sedimentation constants, but usually fall
into several well-defined groups. Moreover some of the particles are probably simple aggregates of the others (Petermann & Hamilton, t957). This
uniformity suggests immediately that the particles, which have ‘molecular
weights’ of a few million, have a definite structure. They are; in fact,.
reminiscent of the small spherical RNA-containing viruses, ‘and Watson
and I have suggested that they may have a similar type of substructure
(Crick & Watson, 1956).
Biologists should contrast the older concept of microsomes with the more
recent and significant one of microsomal particles. Microsomes came in all
sizes, and were irregular in composition; microsomal particles occur in a
few sizes only, have a more fixed composition and a much higher proportion of RNA. It was hard to identify microsomes in all cells, whereas
RNA-rich particles appear to occur in almost every kind of cell. In short,
microsomes were rather a mess, whereas microsomal particles appeal
immediately to one’s imagination. It will be surprising if they do not prove
to be of fundamental importance.
It should be noted, however, that Simpson and his colleagues (Simpson
& McLean, 1955 ; Simpson, McLean, Cohn & Brandt, 1957) have reported
that protein synthesis can take place in mitochondria. It is known that .
mitochondria contain RNA, and it would be of great interest to know
whether this RNA is in some kind of particle. Mitochondria are, of course,
very widely distributed but they do not occur in lower forms such as
bacteria. Similar remarks about RNA apply to the reported incorporation
in chloroplasts (Stephenson, Thimann & Zamecnik, 1956).
Microsomal particles and protein synthesis
It has been shown by the use of radioactive amino acids that during
protein synthesis the amino acids appear to flow through the microsomal
particles. The most striking experiments are those of Zamecnik and his
10-2
PROTEIN
SYNTHESIS
148
co-workers on the livers of growing rats (see the review by Zamecnik et aZ.
IgsO
Two variations of the experiment were made. In the first the rat was
given a rather large intravenous dose of a radioactive amino acid. After a
predetermined time the animal was sacrificed, the liver extracted, its cells
homogenized and the contents fractionated. It was found that the microsomal particle fraction was very rapidly labelled to a constant level.
In the second a very small shot of the radioactive amino acid was given,
so that the liver received only a pulse of labelled amino acid, since this small
amount was quickly used up. In this case the radioactivity of the microsomal particles rose very quickly and then fed away. Making plausible
assumptions Zamecnik and his colleagues have shown that this behaviour
is what one would expect if most of the protein of the microsomal particles
was metabolically inert, but I or 2% was turning over very rapidly, say
within a minute or so.
Very similar results have been obtained by Rabinovitz & Olson (1956,
1957) using intact mammalian cells, in this case rabbit reticulocytes. They
have also been able to show that the label passed into a well-defined globular
protein, namely haemoglobin. Experiments along the same general lines
have also been reported for liver by Simkin & Work (1957 a).
We thus have direct experimental evidence that the microsomal particles
are associated with protein synthesis, though the precise role they play is
not clear.
Activating enzymes
.
It now seems very likely that the first step in protein synthesis is the
activation of each amino acid by means of its special ‘activating enzyme’,
The activation requires ATP, and the evidence suggests that the reaction is
amino acid + ATP = AMP -amino
acid + pyrophosphate.
The activated amino acid, which is probably a mixed anhydride of the form
0
NH,
0-P-0-RiboseAdenine
d-d
$
\\o
d
in which the carboxyl group of the amino acid is phosphorylated, appears
to be tightly bound to its enzyme and is not found free in solution.
These enzymes were first discovered in the cell-sap fraction of rat liver
cells by Hoagland (Hoagland, 1955; Hoagland, Keller & Zamecnik, 1956)
and in yeast by Berg (1956). They have been shown by DeMoss & Novelli
(rg56)to bewidel y d is t ri b u t e d in bacteria, and it is surmised that they occur
PROTEIN
SYNTHESIS
‘49
in all cells engaged in protein synthesis. Recently Cole, Coote & Work( 1957)
have reported their presence in a variety of tissues from a number of animals.
So far good evidence has been found for this reaction for about half the
standard twenty amino acids, but it is believed that further research will
reveal the full set. Meanwhile Davie, Koningsberger & Lipmann (1956)
have purified the tryptophan-activating enzyme. It is specific for tryptophan (and certain tryptophan analogues) and will only handle the L-isomer.
Isolation of the tyrosine enzyme has also been briefly reported (Koningsberger, van de Ven & Overbeck, 1957; Schweet, 1957).
The properties of these enzymes are obviously of the greatest interest,
and much work along these lines may be expected in the near future. For
example, it has been shown that the tryptophan-activating enzyme contaIni
what is probably a derivative of guanine (perhaps GMP) very tightly
bound. It is possible to remove it, however, and to show that its presence
is not necessary for the primary activation step. Since the enzyme is
probably involved in the next step in protein synthesis it is naturally
suspected that the guanine derivative is also required for this reaction,
“: ?‘...* s,i;
whatever it may be.
. ‘!
.,
In vitro incorporation
‘,
In order to study $he relationship between the activating enzymes and
the microsomal particles it has proved necessary to break open the cells and
work with certain partly purified fractions. Unfortunately it is rare to
obtain substantial net protein synthesis from such systems, and there is a
very real danger that the incorporation of the radioactivity does not
represent true synthesis but is some kind of partial synthesis or exchange
reaction. This distinction haa been clearly brought out by Gale (1953).
The work to be described, therefore, has to be accepted with reservations.
(See the remarks of Sin&in & Work, this Symposium.) It has been shown,
however, in the work described below, that the amino acid is incorporated
into true peptide linkage.
Again the significant results were first obtained by Zamecnik and his coworkers (reviewed in Zamecnik et al. 1956). The requirements so far known
appear to fall into two parts:
(I)
The activation of the amino acids for which, in addition to the
labelled amino acid, one requires the ‘pH 5 ’ fraction, containing the
activating enzymes, ATP and (usually) an ATP-generating system. There
appears to be no requirement for any of the pyrimidine or guanine
nucleotides.
(2) The transfer to the microsomal particles. For this one requires the
previous system plus GTP or GDP (Keller & Zamecnik, 1956) and of
,,
.(
,.
,
PROTEIN
SYNTHESIS
150
course the microsoma! particles; the endoplasmic reticulum does not
appear to be necessary (Littlefield & Keller, 1957).
Hultin & Beskow (1956) have reported an experiment which shows clearly
that the amino acids become bound in some way. They first incubate the
mixture described in (I) above. They then add a great excess of unlabelled
amino acid before adding the microsomal particles. Nevertheless some of
the labelled amino acid is incorporated into protein, showing that it was in
some place where it could not readily be diluted.
Very recently an intermediate reaction has been suggested by the work
of Hoagland, Zamecnik & Stephenson (1957), who have discovered that in
the first step the ‘soluble’ RNA contained in the ‘pH 5’ fraction became
labelled with the radioactive amino acid. The bond between the amino
acid and the RNA appears to be a covalent one. This labelled RNA can be
extracted, purified, and then added to the microsomal fraction. In the
presence of GTP the labelled amino acid is transferred from the soluble
RNA to microsomal protein. This very exciting lead is being actively
pursued.
Many other experiments have been carried out on cell-free systems, in
particular by Gale & Folkes (1955) and by Spiegelman (see his review,
1957), but I shall not describe them here as their interpretation is difficult.
It should be mentioned that Gale (reviewed in Gale, 1956) hai isolated
from hydrolysates of commercial-yeast RNA a series of fractions which
greatly increase amino acid incorporation. One of them, the so-called
‘glycine incorporation factor’ has been purified considerably, and an
attempt is being made to discover its structure.
. RNA turnover and protein synthesis
From many points of view it seems highly likely that the presence of
RNA is essential for cytoplasmic protein synthesis, or at least for specific
protein synthesis. It is by no means clear, however, that the #urnover of
RNA is required.
In discussing this a strong distinction must be made between cells which
are growing, and therefore producing new microsomal particles, and cells
which are synthesizing without growth, and in which few new microsomal
particles are being produced.
This is a difficult aspect of the subject as the evidence is to some extent
conflicting. It appears reasonably certain that not all the RNA in the
cytoplasm is turning over very rapidly-this
has been shown, for example,
by the Hokins (1954) working on amy!ase synthesis in slices of pigeon
pancreas, though in the light of the recent work of Straub (this Symposium)
the choice of amylase was unfortunate. On the other hand Pardee (1954)
PROTEIN
SYNTHESIS
151
has demonstrated that mutants of Escherichia coli which require uracil or
adenine cannot synthesize /3-galactosidase unless the missing base is
provided.
Can RNA be synthesized without protein being synthesized? This can
be brought about by the use of chloramphenicol.
In bacterial systems
chloramphenicol stops protein synthesis dead, but allows ‘RNA’ synthesis
to continue. A very interesting phenomenon has been uncovered in E. coli
by Pardee & Prestidge (1956), and by Gros & Gros (1956). If a mutant is
used which requires, say, leucine, then when the external supply of leucine
is exhausted both protein and RNA synthesis cease. If now chloram;
phenicol is added there is no effect, but if in addition the cells are given a.
small amount of leucine then rapid RNA synthesis takes place. If the
chloramphenicol is removed, so that protein synthesis restarts, then this
leucine is built into proteins and then, once again, the synthesis of both
protein and RNA is prevented. In other words it appears as if ff’ree’;
leucine (i.e. not bound into proteins) is required for RNA synthesis. %‘hiq
effect is not peculiar to leucine and has already been found for s&era1
amino acids and in several different organisms (Y&s & Brawerman, 1955):
As a number of people have pointed out, the most likely ipterpretation
of these results is that protein and RNA require common intermediates for
their synthesis, consisting in part of amino acids and in part of RNA
components such as nucleotides. This is a most valuable idea; it explains
a number of otherwise puzzling facts and there is some hope of getting
close to it experimentally.
For completeness it should be stated that Anfinsen and his co-workers
have some evidence that proteins are not produced from (activated) amino
acids in a single step (see the review by Steinberg, Vaughan & Anfinsen,
1956), since they find unequal labelling between the same amino acid at
different points on the polypeptide chain, but this interpretation of their
results is not accepted by all workers in the field. This is discussed more
fully by Sin&in & Work (this Symposium).
Summary of experimental work
Both DNA and RNA have been shown to carry some of the specificity
for protein synthesis. The RNA of almost all types of cell is found mainly
in rather uniform, spherical, virus-like particles in the cytoplasm, known
as microsomal particles. Most of their protein and RNA is metabolically
rather inert. Amino acids, on their way into protein, have been shown to
pass rapidly through these particles.
An enzyme has been isolated which, when supplied with tryptophan and
ATP, appears to form an activated tryptophan. There is evidence that
PROTEIN
SYNTHESIS
152
there exist similar enzymes for most of the other amino acids. These
enzymes are widely distributed in Nature.
Work on cell fractions is difficult to interpret but suggests that the first
step in protein synthesis involves these enzymes, and that the subsequent
transfer of the activated amino acids to the microsomal particles requires
GTP. The soluble RNA also appears to be involved in this process.
Whereas the presence of RNA is probably required for true protein
synthesis its rapid turnover does not appear to be necessary, at least not for
all the RNA. There is suggestive evidence that common intermediates,
containing both amino acids and nucleotides, occur in protein synthesis.
IV.
IDEAS
ABOUT
PROTEIN
SYNTHESIS
It is an extremely difficult matter to present current ideas about protein
synthesis in a stimulating form. Many of the general ideas on the subject
have become rather stale, and an extended discussion of the more detailed
theories is not suitable in a paper for non-specialists. I shall therefore
restrict myself to an outline sketch of my own ideas on cytoplasmic protein
synthesis, some of which have not been published before. Finally I shall
deal briefly with the problem of ‘coding’.
General p&ciples
My own thinking (and that of many of my colleagues) is based on two
general principles, which I shall call the Sequence Hypothesis and the
Central Dogma. The direct evidence for both of them is negligible, but
I have found them to be of great help in getting to grips with these very
complex problems. I present them here in the hope that others can make
similar use of them. Their speculative nature is emphasized by their
names. It is an instructive exercise to attempt to build a useful theory
without using them. One generally ends in the wilderness.
The Sequence Hypothesis
This has already been referred to a number of times. In its simplest
form it assumes that the specificity of a piece of nucleic acid is expressed
solely by the sequence of its bases, and that this sequence is a (simple)
code for the amino acid sequence of a particular protein.
This hypothesis appears to be rather widely held. Its virtue is that it
unites several remarkable pairs of generalizations : the central biochemical
importance of proteins and the dominating biological role of genes, and in
particular of their nucleic acid ; thelinearity of protein molecules (considered
covalently) and the genetic linearity within the functional gene, as shown by
the work of Benzer (1957) and Pontecorvo (this Symposium); the simplicity
PROTEIh
SYNTHESIS
I53
of the composition of protein molecules and the simplicity of the nucleic
acids. Work is actively proceeding in several laboratories, including our
own, in an attempt to provide more direct evidence for this hypothesis.
The Central Dogma
i
This states that once ‘information’ has passed into protein it cannot get
out again. In more detail, the transfer of information from nucleic acid
to nucleic acid, or from nucleic acid to protein may be possible, but
transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence,
either of bases in the nucleic acid or of amino acid residues in the protein.
This is by no means universally held-Sir
Macfarlane Burnet, for
example, does not subscribe to it-but many workers now think along these
:
lines. As far as I know it has not been expkitly stated before.
Some ideas on cytopla.vnic protein synthesis
From our assumptions it follows that there must be an RNA template in
the cytoplasm. The obvious place to locate this is in the microsomal
particles, because their uniformity of size suggests that they have a regular
structure. It also follows that the synthesis of at least some of the microsomal RNA must be under the control of the DNA of the nucleus. This is
because the amino acid sequence of the human haemoglobins, for example,
is controlled at least in part by a Mendelian gene, and because spermatozoa
contain no RNA. Therefore, granted our hypotheses, the information
must be carried by DNA.
What can we guess about the structure of the microsomal particle? On
our assumptions the protein component of the particles can have no
significant role in determining the amino acid sequence of the proteins
which the particles are producing. We therefore assume that their main
function is a structural one, though the possibility of some enzyme activity
is not excluded. The simplest model then becomes one in which each
particle is made of the same protein, or proteins, as every other one in the
cell, and has the same basic arrangement of the RNA, but that different
particles have, in general, different base-sequences in their RNA, and
therefore produce different proteins. This is exactly the type of structure
found in tobacco mosaic virus, where the interaction between RNA and
protein does not depend upon the sequence of bases of the RNA (Hart &
Smith, 1956). In addition Watson and I have suggested (Crick & Watson,
1956), by analogy with the spherical viruses, that the protein of microsomal
particles is probably made of many identical sub-units arranged with cubic
symmetry.
PROTEIN
SYNTHESIS
I54
On this oversimplified picture, therefore, the microsomal particles in a
cell are all the same (except for the base-sequence of their RNA) and are
metabolically rather inert. The RNA forms the template and the protein
supports and protects the RNA.
This idea is in sharp contrast to what one would naturally assume at first
glance, namely that the protein of the microsomal particles consists entirely
of protein being synthesized. The surmise that most of the protein is
structural was derived from considerations about the structure of virus
particles and about coding; it was independent of the direct experimental
evidence of Zamecnik and his colleagues that only a small fraction of the
protein turns over rapidly, so that this agreement between theory and
experiment is significant, as far as it goes.
It is obviously of the first importance to know how the RNA of the
particles is arranged. It is a natural deduction from the Sequence Hypothesis that the RNA backbone will follow as far as possible a spatially
regular path, in this case a helix, essentially because the fundamental
operation of making the peptide link is always the same, and we therefore
expect any template to be spatially regular.
Although we do not yet know the structure of isolated RNA (which may
be an artifact) we do know that a pair of RNA-like molecules can under
some circumstances form a double-helical structure, somewhat similar to
DNA, because Rich & Davies (1956) have shown that when the two
polyribotides, polyadenylic acid and polyuridylic acid (which have the
same backbone as RNA) are mixed together they wind round one another
to form a double helix, presumably with their bases paired. It would not
be surprising, therefore, if the RNA backbone took up a helical configura_ tion similar to that found for DNA.
This suggestion is in contrast to the idea that the RNA and protein
interact in a complicated, irregular way to form a ‘nucleoprotein’. As far
as I know there is at the moment no direct experimental evidence to decide
between these two points of view.
However, even if it turns out that the RNA is (mainly) helical and that the
structural protein is made of sub-units arranged with cubic symmetry it is
not at all obvious how the two could fit together. In abstract terms the
problem is how to arrange a long fibrous object inside a regular polyhedron. It is for this reason that the structure of the spherical viruses is of
great interest in this context, since we suspect that the same situation occurs
there; moreover they are at the moment more amenable to experimental
attack. A possible arrangement, for example, is one in which the axes of
the RNA helices run radially and clustered in groups of five, though it is
always possible that the arrangement of the RNA is irregular.
PROTEIN
SYNTHESIS
I55
It would at least be of some help if the approximate location of the RNA
in the microsomal particles could be discovered. Is it on the outside or the
inside of the particles, for example, or even both? Is the microsomal
particle a rather open structure, like a sponge, and if it is what size of
molecule can diffuse in and out of it? Some of these points are now ripe
for a direct experimental attack.
The adaptor hypothesis
Granted that the RNA of the microsomal particles, regularly arranged,
is the template, how does it direct the amino acids into the correct order?
One’s first naive idea is that the RNA will take up.a configuration capable
of forming twenty different ‘cavities’, one for the side-chain of each of the
twenty amino acids. If this were so one might expect to be able to play the
problem backwards-that is, to find the configuration of RNA by trying
to form such cavities. All attempts to do this have failed, and on physical-,
chemical grounds the idea does not seem in the least plausible (Crick,
1957a). Apart from the phosphate-sugar backbone, which we have assumed
to be regular and perhaps linked to the structural protein of the particles,
RNA presents mainly a ‘sequence of sites where hydrogen bonding could
occur. One would expect, therefore, that whatever went on to the template in a specific way did so by forming hydrogen bonds. It is therefore
a natural hypothesis that the amino acid is carried to the template by an
‘adaptor’ molecule, and that the adaptor is the part which actually fits on
to the RNA. In its simplest form one would require twenty adaptors, one
for each amino acid.
What sort of molecules such adaptors might be is anybody’s guess. They
might, for example, be proteins, as suggested by Dounce (1952) and by
the Hokins (1954) though personally I think that proteins, being rather
large molecules, would take up too much space. They might be quite
unsuspected molecules, such as amino sugars. But there is one possibility
which seems inherently more likely than any other-that they might
contain nucleotides. This would enable them to join on to the RNA
template by the same ‘pairing’ of bases as is found in DNA, or in
polynucleotides.
If the adaptors were small molecules one would imagine that a separate
enzyme would be required to join each adaptor to its own amino acid and
that the specificity required to distinguish between, say, leucine, isoleucine and valine would be provided by these enzyme molecules instead
of by cavities in the RNA. Enzymes, being made of protein, can probably
make such distinctions more easily than can nucleic acid.
An outline picture of the early stages of protein synthesis might be as
156
PROTEIN
SYNTHESIS
follows: the template would consist of perhaps a single chain of RNA.
(As far as we know a single isolated RNA backbone has no regular configuration (Crick, x9573) and one has to assume that the backbone is
supported in a helix of the usual type by the structural protein of the
microsomal particles.) Alternatively the template might consist of a pair
of chains. Each adaptor molecule containing, say, a di- or trinucleotide
would each be joined to its own amino acid by a special enzyme. These
molecules would then diffuse to the microsomal particles and attach to
the proper place on the bases of the RNA by base-pairing, so that they
would then be in a position for polymerization to take place.
It will be seen that we have arrived at the idea of common intermediates
without using the direct experimental evidence in their favour; but there
is one important qualification, namely that the nucleotide part of the intermediates must be specific for each amino acid, at least to some extent. It is
not suflicient, from this point of view, merely to join adenylic acid to each
of the twenty amino acids. Thus one is led to suppose that after the activating step, discovered by Hoagland and described earlier, some other more
specific step is needed before the amino acid can reach the template.
The soluble RNA
-
If trinucleotides, say, do in fact play the role suggested here their
synthesis presents a puzzle, since one would not wish to invoke too many
enzymes to do the job. It seems to me plausible, therefore, that the
twenty different adaptors may be synthesized by the breakof RNA,
probably the ‘soluble’ RNA. Whether this is in fact the same action which
the ‘activating enzymes’carry out (presumably using GTP in the process)
remains to be seen.
From this point of view the RNA with amino acids attached reported
recently by Hoagland, Zamecnik & Stephenson (rg57), would be a halfway step in this process of breaking the RNA down to trinucleotides and
joining on the amino acids. Of course alternative interpretations are
possible. For example, one might surmise that numerous amino acids
become attached to this RNA and then proceed to polymerize, perhaps
inside the microsomal particles. I do not like these ideas, because the
supernatant RNA appears to be too short to code for a complete polypeptide chain, and yet too long to join on to template RNA (in the microsomal particles) by base-pairing, since it would take too great a time for a
piece of RNA twenty-five nucleotides long, say, to diffuse to the correct
place in the correct particles. If it were only a trinucleotide on the other
hand, there would be many different ‘correct’ places for it to go to (whereever a valine was required, say), and there would be no undue delay.
PROTEIN
SYNTHESIS
157
Leaving theories on one side, it is obviously of the greatest interest to
know what molecules actually pass from the ‘pH 5 enzymes’ to the
microsomal particles. Are they small molecules, free in solution; or are
they bound to protein? Can they be isolated? This seems at the moment
to be one of the most fruitful points at which to attack the problem.
i:
Subseque?atsteps
What happens after, the common, intermediates have entered the
microsomal particles is quite obscure, Two views are possible, ,which
might be called the Parallel Path and the Alternative Path theories. .In the
first an intermediate is used to produce both protein and RNA at about the .
same time. In the second it is used to produce either protein, or RNAi but
not both. If we knew the exact nature of the intermediates we could piobably decide which of the two was more likely.- At the moment there seems
: p. +?‘I.$’
little reason to prefer one theory to the other,
The details of the polymerization step are also quite unk.novvii~;\One
tentative theory, of the Parallel Path type, suggests that the intermediates
first polymerize to give an RNA molecule with amino acids attached. This
process removes it from the template and it diffuses outside the microsomal
particle. There the RNA folds to a new configuration, and the amino acids
become polymerized to form a polypeptide chain, which folds up as it is
made to produce the finished protein. The RNA, now free of amino acids,
is then broken down to produce fresh intermediates, A great variety of
theories along these lines can be constructed. I shall not discuss these
further here, nor shall I describe the various speculations about the actual
details of the chemical steps involved.
Two types of RNA
It is an essential feature of these ideas that there should be at least two
in the cytoplasm. The first, which we may call ‘template RNA’
is located inside the microsomal particles. It is probably synthesized in the
nucleus (Goldstein & Plaut, 1955) under the direction of DNA, and carries
the information for sequentialization. It is metabolically inert during
protein synthesis, though naturally it may show turnover whenever microsomal particles are being synthesized (as in growing cells), or breaking
down (as in certain starved cells).
The other postulated type of RNA, which we may call ‘metabolic RNA’, is
probably synthesized (from common intermediates) in the microsomal particles, where its sequence is determined by base-pairing with the template
RNA. Once outside the microsomal particles it becomes ‘soluble RNA’
and is constantly being broken down to form the common intermediates
tyges ojRNA
158
PROTEIN
SYNTHESIS
with the amino acids. It is also possible that some of the soluble RNA
may be synthesized in a random manner in the cytoplasm; perhaps, in
bacteria, by the enzyme system of Grunberg-Manago & Ochoa (1955).
One might expect that there would also be metabolic RNA in the
nucleus. The existence of these different kinds of RNA may well explain
the rather conflicting data on RNA turnover.
The coding problem
So much for biochemical ideas. Can anything about protein synthesis be
discovered by more abstract arguments? If, as we have assumed, the
sequence of bases along the nucleic acid determines the sequence of amino
acids of the protein being synthesized, it is not unreasonable to suppose
that this inter-relationship is a simple one, and to invent abstract descriptions of.it. This problem of how, in outline, the sequence of four bases
‘codes’ the sequence of the twenty amino acids is known as the coding
problem. It is regarded as being independent of the biochemical steps
involved, and deals only with the transfer of information.
This aspect of protein synthesis appealsmainly to those with a background
in the more sophisticated sciences. Most biochemists, in spite of being
rather fascinated by the problem, dislike arguments of this kind. It seems
to them unfair to construct theories without adequate experimental facts.
Cosmologists, on the other hand, appear to lack such inhibitions.
The first scheme of this kind was put forward by Gamow (1954). It was
supposedly based on some features of the structure of DNA, but these are
irrelevant. The essential features of Gamow’s scheme were as follows:
(a) Three bases coded one amino acid.
(b) Adjacent triplets of bases overlapped, See Fig. I.
(c) More than one triplet of bases stood for a particular amino acid
(degeneracy).
In other words it was an overlapping degenerate triplet code. Such a
code imposes severerestrictions on the amino acid sequencesit can produce.
It is quite easyto disprove Gamow’s code from a study of known sequenceseven the sequences of the insulin molecule are sufficient. However, there
are a very large number of codes of this general type. It might be thought
almost impossible to disprove them all without enumerating them, but
this has recently been done by Brenner (Ig57), using a neat argument.
He has shown that the reliable amino acid sequences already known are
enough to make all codes of this type impossible.
Attempts have been made to discover whether there are any obvious
restrictions on the allowed amino acid sequences, although the sequence
data available are very meagre (see the review by Gamow, Rich & YEas,
PROTEIN
I59
SYNTHESIS
So far none has been found, and the present feeling is that it may
well be that none exists, and that any sequencewhatsoever can be produced.
This is very far from being established, however, and for all we know there
may be quite severerestrictions on the neighbours of the rarer amino acids,
such as tryptophan.
Jf there is indeed a relatively simple code, then one of the most important
biological constants is what Watson and I have called ‘the coding ratio’
(Crick & Watson, 1956). If B consecutive bases.are required to code A
consecutive amino acids, the coding ratio is the number B/A, when B and
A are large. Thus in Gamow’s code its value is unity; since’a stiing of
moo bases, for example, could code 998 amino acids. (Noti& that when
the coding ratio is greater than unity stereochemiciil problems arise, .
since a polypeptide chaiii has 2 distance bf only about 34 A; between
its residues, which is about thi: minimum distance between ‘sitctessive
bases in nucleic acid. kowever, it has been pointed but bi Bienner
(personal communication), that this difficulty may not be serious if the
polypeptide chain leaves the template as it is being synthesized.)
1955).
Overlapping
code
Partial overlapping
code
I
1
B,CACDDABABDC
BCA
CAC
ACD
CDD
BCA
ACD
DDA
ABA
BCA
CDD
f
A B A
BDC
1
Fig. I. The letters A, B, C, and D stand for the four bases of the four common nucleotides.
Non-overlapping
code
The top row of letters represents an imaginary sequence of them. In the codes illustrated
here each set of three letters represents an amino acid. The diagram shows how the first
foti amino acids of a sequence are coded in the three classes of codes.
If the code were of the non-overlapping type (see Fig. I) one would still
reauire a triplet of bases to code for each amino acid, since pairs of bases
wiuld only allow 4 x 4= 16 permutations, though a possible but not very
likely way round this has been suggested by Dounce, Morrison & Monty
(1955). The use of triplets raises two difficulties. First, why are there not
4 x 4 x 4= 64 different amino acids? Second, how does one know which
of the triplets to read (assuming that one doesn’t start at an end)? For
example, if the sequence of bases is . . . . ABA, CDB, BCA, ACC, . . . .
where A, B, C and D represent the four bases,and where ABA is supposed
to code one amino acid, CDB another one, and so on, how could one read
it correctly if the commas were removed?
PROTEIN
160
PROTEIN
SYNTHESIS
Very recently Griffith, Orgel and I have suggested an answer to both
these di%culties which is of some interest because it predicts that there
should be only twenty kinds of amino acid in protein (Crick, GrifEth &
Orgel, 1957). Gamow & Y&s (1955) had previously put forward a code
with this property, known as the ‘combination code’ but the physical
assumptions underlying their code lack plausibility, We assumed that some
of the triplets (like ABA in the example above) correspond to an amino
acid-make ‘sense’ as we would say-and some (such aa BAC and ACD,
etc., above) do not so correspond, or as we would say, make ‘nonsense’.
We asked ourselves how many amino acids we could code if we allowed
all possible sequencesof amino acids, and yet never accidentally got ‘sense’
when reading the wrong triplets, that is those which included the imaginary
commas. We proved that the upper limit is twenty, and moreover we could
write down several codes which did in fact code twenty things. One such
code of twenty triplets, written compactly is
AB;
where A B AB means that two of the allowed triplets are ABA and ABB,
etc. The example given a little further back haa been constructed using this
code. You will seethat ABA, CDB, BCA and ACC are among the allowed
triplets, whereas the false overlapping ones in that example, such as BAC,
ACD and DBB, etc., are not. The reader can easily satisfy himself that no
sequence of these allowed triplets will ever give one of the allowed triplets
in, a false position. There are many possible mechanisms of protein synthesis
for which this would be an advantage. One of them is described in our
* paper (Crick et al. 1957).
Thus we have deduced the magic number, twenty, in an entirely natural
way from the magic number four. Nevertheless, I must confess that I find
it impossible to form any considered judgment of this idea. It may be
complete nonsense, or it may be the heart of the matter. Only time will
show.
V. CONCLUSIONS
I hope I have been able to persuade you that protein synthesis is a central
problem for the whole of biology, and that it is in all probability closely
related to gene action. What are one’s overall impressions of the present
state of the subject? Two things strike me particularly. First, the existence
of general ideas covering wide aspects of the problem. It is remarkable
that one can formulate principles such as the SequenceHypothesis and the
SYNTHESIS
x61
Central Dogma, which explain many striking facts and yet for which firrdof
is completely lacking. This gap between theory and experiment is ai great
stimulm to the imagination. Second, the extremely active state of the
subject experimentally both on the genetical side and the biochemical side.
At the moment new and sign&ant results are being reported evei$‘few
months, and there seems to be no sign of work coming to a standstill
because experimental techniquea are inadequate. For both these reasons
I shall be surprised if the main features of protein synthesis a&, not
discovered within the next tqn years.
It is a pleasure to thank Dr Sydney Brenner, not enly for ‘many
interesting discussions, but also for much help in redrafting this ‘$aper. i
*...
’ REFEBENCES,
B-LB, G. M. (x945). Cheat. Revi 37, t5.
’
BBNPUI, S. (1957). In The Chemical Baris of Here&y. ’Ed. McElroy; W$&‘&
‘111 ” I’ I I. ,1 ,l#L:‘,
Glass, B. Baltimore: Johns Hopkins Press. !
Bmw, P. (x956). J. Viol. &em. 2&x025.
“’
’ I” ;tl.!‘!,‘. ’
BORSOOK, H. (x956). Proceedings of the Third International Cohgresrof Bioc~tty,
BrusseIs, 1955 (C. Lidbecq, editor). New York: Academic Press. ’
Bn+cmrr, J. Sz CHANTRBNNB,
H. (1956). Coki Spr. Har+‘Syntp. Quont. Biol.’ 21,
329.
. II
Bv,
S. (x957). Proc. Nat. Acad. Sci., II%&. 43, 687;
BROWN,N. H., SANGER,F. & KITAI, R. (1955). B&&m. J. 60,556.
Corn, G. N. tk Cowm, D. B. (x957). C.R. Acud. Sci., Paris, 244,680.
COLB, R. D., COOTB, J. 8z WORK, T. S. (x957). Nature, Land. 179, rgg.
CRICK, F. H. C. (rg57a). In The Structure of NucZeic Acids and Their Role in
Protein Synthesis, p. 25. Cambridge University Press.
CRICK, F. H. C. (19576). In Cellular Biology, Nucleic Acidb and Viruses, 1957.
New York Academy of Sciences.
CRICK, F. H. C., GRIFFITH, J. S. & ORGEL, L. E. (1957). Proc. Nut. Acud. Sci.,
Wash. 43,416.
CRICK, F. H. C. & WATSON,J. D. (x956). Ciba Foundation Symposium on The
Nature of Viruses. London: Churchill.
DAVIB, E. w., KONINGSBERGER, V. V. & LIPMANN, F. (1956). Arch. Biochem.
Biophys. 65, 21.
DBMOSS,J. A. SzNOVBLLI, G. D. (x956). Biochim. Biophys. Acta, 22, 49.
DOUNCB, A. L. (1952). Enaymologia, 15, 251.
DOUNCE,A. L. (1956). J. Cell. Camp. Physiol. 47, suppl. I, 103.
DOUNCB,A. L., MORRISON, M. & Mo~Y, K. J. (1955). Nature, Lond. 176, 597.
F-L-CONRAT,
H. (1956). J. Amer. Chem. Sot. 78, 882.
GALE, E. F. (1953). Adwunc. Protein Chem. 8, 283.
GALE, E. F. (1956). Proceedingsof the Third International Congressof Biochemistry,
Brussl, 195.5(C. Liebecq, editor). New York: Academic Press.
GALB, E. F. & FOLKES, J. (1955). Bi0chem.J. 59, 661, 675 and 730.
GAMOW, G. (x954). Nature, Lo&. 173, 318.
GAMOW, G., RICH, A. & YEAS, M. (1955). Advanc. Biol. Med. Phys. 4. New
York : Academic Press.
162
PROTEIN
SYNTHESIS
PROTEIN
Y&s, M. (1955). Proc. Nut. Acud. Sk, Wash. 41, 1101.
GIERER,A. & SCHRAMM,G. (1956). 2. Naturf. rrb, 138; also Nature, Lmd. x77,
702.
GOLDSTEIN, L. & PLAUT, W. (1955). hoc. Nat. Acad. Sci., Wash. 41, 874.
GROS,F. & GROS,F. (1956). Biochim. Biophys. Acta, az, 200.
GRUNBERG-MANAGO, M. & OCHOA,S. (1955). J. Amer. Chem. Sot. 77, 3165.
HARRIS, J. I., SANDER,F. & NAUGHTON, M. A. (1956). Arch. Biochem. Biophys.
65,427.
HART, R. G. & SMITH, J. D. (1956). Nature, Land. 178, 739.
HERSHEY,A. D. (1956). Advances in Virus Research. IV. New York: Academic
Press.
HOACLAND, M. B. (1955). Biochim. Biophys. Acta, 16, 288.
HOAGLAND, M. B., KELLER, E. B. & ZAMECNIK,P. C. (1956). J. Biol. Chem. 218,
GAMOW, G. &
M. B., ZAMECNIK, P. C. & STBPHXNSON,M. L. (1957).
Biochim.
Biophys. Acta, 24, 215.
HOKIN, L. E. & HOKIN, M. R. (1954). Biochim. Biophys. Actu, 13,401.
HULTIN, T. & B~KOW, G. (1956). Exp. CeZZRes. II, 664.
INGRAM, V. M. (x956). Nature, Lond. 178, 792.
INGRAM,V. M. (1957). Nature, Land. x80, 326.
GAMIN, H. 82 HANDLER, P. (1957). h#U.
h.
BiOCh.
26, 419.
KELLER, E. B. & i?hMECNIK,P. C. (1956). J. BioZ. Chem. 221, 45.
KONING~BBRGER, V. V., VAN DE VEN, A. M. & OVERBECK,
J. TH. G. (1957). +oc. K.
Ahad. Wet. Amst. B, 60, 141.
L~~FIELD, J. W. &KELLER, E. B. (1957). J. BioZ. Chem. 224, 13.
MARMUR, J. & Hcmxmss, R. D. (1955). J. Biol. Chem. 214,383.
MIRSKY, A. E., OSAWA, 6. & -,
V. G. (1956). Co&f S$r. Huh Sym$.
Quunt. Biol. 21, 49.
MUNIER, R. & Corn, G. N. (1956). BiocAim. Biophyr. A&z, 21,592.
Paam, G. E. (1955). j!. B&hem. Bio$hys. Cytol. I, I.
I PALADE,G. E. (1956). J. Biochem. Biophys. Cytol. 2,85.
PJUADE,G. E. & SIEKEVI~, P. (1956). J. Biochem. Biophyr. Cytol. 2, 171.
PARDEE,A. B. (1954). hoc. Nut. Acud. Sci., Wash. 40,263.
‘PARDEE,A. B. & PRESTID~ L. S. (1956). 3. Butt. 71, 677.
HoA:%D,
MAULING, L., ITANO, H. A., SINGER, S. J. 82 WBLLS, I. c. (1949). &h?ICC, 110,543.
M. G. (1957). 3. BioZ. Chem. 224,725; dao Fed.
PBTBRMANN,M. L. & HAMILToN,
PYOC.16, 232.
~,M.L.,MIZBN,N.A.&~LTON,M.G.(1952).
RABIN•
VITZ,M.
RNUNOVITZ, M.
h~~ReS.I2,373.
& OLSON, M. E. (1956). Exp. CeZZRes. IO, 747.
& OLSOF~,M. E. (1957). Fed. Proc. 16, 235.
RICH, A. 8~. DAVIES,D. R. (x956). J. Amer. Chem. Sot. 78,3548.
SCHACHMAN,H. K., PARDEE,A. B. & STANIER, R. Y. (1953). Arch. B&hem.
Biophys. 43, 381.
S~HWEET,R. (1957). Fed. PYOC.16,244.
SIMKIN, J. L, & WORK, T. S. (zg57a). Bi0chem.y. 65, 307.
SIMKIN, J. L. i3zWORK, T. S. (Ig57b). Nature, hzd. 179, 1214.
SIMPSON, M. V. & MCLEAN, J. R. (1955). Biochim. Biophys. Acta, 18, 573.
SIMPSON,M. V., MCLEAN, J. R., COHN, G. I. & BRANDT,I. K. (1957). Fed. PYOC.
16, 249.
SPIEGELMAN, S. (1957). In The ChenaicaZBasis of Here&y. Ed. McElroy, W. D. &
Glass, B. Baltimore: Johns Hopkins Press.
STEINBERG, D. & MIHA~YI,
E. (x957). In Annu. Rev. Biochem. 26, 373.
STEI~TEJERG,D., VAUGHAN, M. & ANFINSBN, C. B. (1956). S&me, 124, 389.
SYNTHESIS
163
M. L., THIMANN, K. V. & ZAMIENIK, P. C. (1956). Arch. Biochem.
Biophys. 65, 194.
SyfyGE,
R. (1957). 1; The origin of Lyz e on the Earth. U.S.S.R. Acad. of Sciences.
Ts o, P. 0. B., BONNER,J. & VINOGRAD, J. (1956). g. Biochem. Biophys. Cytol. 2,
451,
WAGNER,R. P. & MITCHELL, H. K. (1955). Genetics and Metabolism. New York:
John Wiley and Sons.
Y&i, M. 8z BRAWERMAN,G. (1957). Arch. B&hem. Biophrs 68 I 18
ZAMECNIK, P. C., KELLER, E. B., LITIZBFIELD, J. W., HOAG&~
M.‘B. & LOFTFIELD, R. B. (1956). J. Cell. [email protected]. 47, suppl. I, 81.’
STEPHENSON,
SELF-SPLICING AND ENZYMATIC ACTIVITY OF
AN INTERVENING SEQUENCE RNA FROM
TETRAHYMENA
Nobel Lecture, December 8, 1989
by
THOMAS
R. C E C H
Howard Hughes Medical Institute, Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309-0215, USA
A living cell requires thousands of different chemical reactions to utilize
energy, move, grow, respond to external stimuli and reproduce itself. While
these reactions take place spontaneously, they rarely proceed at a rate fast
enough for life. Enzymes, biological catalysts found in all cells, greatly
accelerate the rates of these chemical reactions and impart on them extraordinary specificity.
In 1926, James B. Sumner crystallized the enzyme urease and found that
it was a protein. Skeptics argued that the enzymatic activity might reside in a
trace component of the preparation rather than in the protein (Haldane,
1930), and it took another decade for the generality of Sumner’s finding to
be established. As more and more examples of protein enzymes were found,
it began to appear that biological catalysis would be exclusively the realm of
proteins. In 1981 and 1982, my research group and I found a case in which
RNA, a form of genetic material, was able to cleave and rejoin its own
nucleotide linkages. This self-splicing RNA provided the first example of a
catalytic active site formed of ribonucleic acid.
This lecture gives a personal view of the events that led to our realization
of RNA self-splicing and the catalytic potential of RNA. It provides yet
another illustration of the circuitous path by which scientific inquiry often
proceeds. The decision to expend so many words describing the early
experiments means that much of our current knowledge about the system
will not be mentioned. For a more comprehensive view of the mechanism
and structure of the Tetrahymena self-splicing RNA and RNA catalysis in
general, the reader is directed to a number of recent reviews (Cech & Bass,
1986; Cech, 1987, 1988a, 1990; Burke, 1988; Altman, 1989). Possible
medical and pharmaceutical implications of RNA catalysis have also been
described recently (Cech, 1988b).
Why Tetrahymena?
In the pre-recombinant DNA era of the early 1970’s, much of the research
on the structure and function of eukaryotic chromosomes utilized entire
652
Chemistry 1989
Figure I. Tetrahymena thermophila, showing the
transcriptionallyactive macronucleus and the
germ-line micronucleus.
genomes as experimental systems. My own research with John Hearst in
Berkeley and with Mary Lou Pardue at M.I.T. concerned the organization
of DNA sequences and chromosomal proteins in the mouse genome. During my stay at M.I.T., I began to be dissatisfied with this global approach
and became interested in the prospect of being able to dissect the structure
and expression of some particular gene. Thus, when I set up my own
laboratory in Boulder in 1978, I turned my attention entirely to the rDNA
(gene for the large ribosomal RNAs) of the ciliated protozoan, Tetrahymena
(Figure 1).
Unlike most nuclear genes, which are embedded in giant chromosomes,
the genes for rRNA in Tetrahymena are located on small DNA molecules in
the nucleoli; they are extrachromosomal (Engberg et al., 1974; Gall, 1974).
Furthermore, in the transcriptionally active macronucleus the gene is amplified to a level of ≈10,000 copies (Yao et al., 1974). These properties
made it possible to purify a significant amount of the rDNA. The ability to
purify the gene was not in itself a major attraction, because by this time the
availability of recombinant DNA techniques ensured that no gene would
long escape isolation and sequence analysis. Rather the attractive feature
was the prospect of being able to isolate the gene complete with its associated structural proteins and proteins that regulated transcription (the synthesis of an RNA copy by RNA polymerase).
One feature of the Tetrahymena rDNA that was of only peripheral interest
to me at that time was the presence of an intervening sequence (IVS) or
intron, which interrupted the rRNA-coding sequences of the rDNA of some
strains of Tetrahymena pigmentosa (Wild & Gall, 1979). In the course of
mapping the RNA-coding regions of the rDNA of Tetrahymenu thermophila,
T. R. Cech
653
Figure 2. (A) Visualization of the RNA-coding portions of the T. thermophila rDNA by electron
microscopy. The mature, processed rRNA was hybridized to the DNA under R-loop conditions.
In the interpretation, each solid line indicates a single strand of DNA and each dashed line a
strand of RNA. (Reproduced from Cech & Rio, 1979) (B) The pre-rRNA, thin lines representing
portions that are removed during processing and open boxes representing the mature rRNA
sequences. Numbered arrows indicate the usual order of RNA processing events: 1, splicing;
2 - 4, endonucleolytic cleavages.
Don Rio and I found that this species also harbored an IVS in its rDNA
(Figure 2; Cech & Rio, 1979; independently described by Din et al., 1979).
Although intervening sequences had been discovered only two years before,
by Phil Sharp’s lab at M.I.T. and a group from Cold Spring Harbor
Laboratory, there were already a large number of examples. Thus, the
finding of another IVS was hardly cause for us to be distracted from our
plan to investigate the proteins that regulated rDNA transcription.
RNA Splicing in Vitro
The first step towards biochemical dissection of the transcriptional process
was to see if rRNA synthesis would proceed in a crude cell-free system. We
isolated nuclei from T. thermophila (the nuclei provided both the RNA
polymerase and the ribosomal chromatin templates) and incubated them
with the nucleoside triphosphates and salts necessary for transcription. We
also included a-amanitin, a mushroom toxin known to inhibit the polymerases that transcribe mRNA, tRNA and other small RNAs, which enabled us
to focus on synthesis of the large rRNA.
654
Chemistry 1989
Figure 3. Transcription and splicing of pre-rRNA in vitro in isolated T. thermophila nuclei. (Lanes
1 - 4) RNA produced by incubation of nuclei at 30°C for times ranging from 5 - 60 min. The
0.4 kilobase species is the excised IVS RNA. (Lanes 5 and 6) Purified Tetrahymena 26S and 17S
rRNAs, serving as molecular weight standards. RNA was analyzed by polyacrylamide gel electrophoresis. (Reproduced from Zaug & Cech, 1980; copyright by Cell Press)
When the products of these in vitro transcription reactions were separated by gel electrophoresis, they were found to consist of a somewhat heterogeneous distribution of high molecular weight RNA 226 S, the size expected for full-length pre-rRNA (Figure 3). In addition, there was a discrete low
molecular weight product (≈9 S). The small RNA accumulated post-transcriptionally (Zaug & Cech, 1980). Thus, it seemed likely to be one of several
short regions of the pre-rRNA that was cut out and ultimately discarded
during the maturation process. These candidate regions included an external transcribed spacer at the 5’ end, the internal transcribed spacers flanking the 5.8 S rRNA, and the IVS (Figure 2B).
Driven more by curiosity than by any conviction that the results would be
of central importance to our research goals, I encouraged Art Zaug to
identify the sequences encoding the small RNA. He confirmed that the
T. R. Cech
655
small RNA was encoded by the rRNA gene, and then mapped it to the
intervening sequence (Zaug & Cech, 1980; see also Carin et al., 1980).
This was a finding of considerable excitement: the intervening sequence,
synthesized as part of the pre-rRNA in our in vitro transcription reactions,
was also being cleanly excised from the pre-rRNA in vitro. Despite a great
deal of interest in the mechanism of RNA splicing (Damell, 1978; Abelson,
1979; Crick, 1979; Lerner et al., 1980), in only one other case - pre-tRNA
in yeast - had RNA splicing been confirmed to occur in vitro (Knapp et al.,
1979; Peebles et al., 1979). It seemed reasonable that rRNA splicing in
Tetrahymena might follow a quite different path than tRNA splicing in yeast,
so that detailed study of both systems would be justified.
Furthermore, in each Tetrahymena cell there were 10,000 identical genes
each pumping out unspliced pre-rRNA at the rate of one copy per gene per
sec. I reasoned that the nuclei might contain an unusually high concentration of the splicing enzyme to accomplish so much reaction, which could
facilitate isolation of the first splicing enzyme. Little did we guess that the
splicing “enzyme” would not exist in the traditional sense, but that something much more interesting lay in wait for us.
Self-splicing Unrecognized
Our strategy for purifying the splicing enzyme was conventional. We would
find conditions in which pre-rRNA transcription occurred but splicing was
inhibited, and purify the accumulated pre-rRNA to use as a substrate. We
would then treat the pre-rRNA with extracts of Tetrahymena nuclei, which
we already knew contained splicing activity, and would use gel electrophoresis to monitor the splicing reaction. Finally, we would obtain ever purer
subfractions of the nuclear extract that retained activity, eventually isolating
the splicing enzyme.
Isolation of unspliced pre-rRNA substrate proved to be straightforward
(Cech et al., 1981). Art Zaug purified this RNA by standard SDS-phenol
extraction and used it in a series of RNA splicing reactions. One set of test
tubes contained substrate RNA and nuclear extract dissolved in the same
solution of simple salts and nucleotides that had been conducive to RNA
transcription and splicing in our earlier experiments with intact nuclei.
Another tube, containing the same components except with the nuclear
extract omitted, was to serve as the “splicing minus” control.
The very first attempt was successful. The RNA in the tubes containing
the nuclear extract gave rise to the small band of RNA characteristic of the
IVS. Surprisingly, however, the control RNA incubated only in salts and
nucleotides produced the same amount of IVS. “Well, Art, this looks very
encouraging, except you must have made some mistake making up the
control sample.” Yet several careful repetitions of the experiment gave the
same result: release of the IVS occurred independent of the addition of
nuclear extract, and therefore apparently independent of any enzyme. I
became concerned that we weren’t observing RNA splicing at all; perhaps
the RNA we had been calling “precursor” had already been spliced in vivo,
656
Chemistry 1989
and what we were observing in vitro was some disaggregation or release of
the IVS from already spliced RNA. Clearly the reaction would be worthy of
further pursuit only if we could show that a chemical transformation was
occurring in vitro, and that it was the same cutting and rejoining that
occurred in the living cell. We would have to teach ourselves some RNA
chemistry.
The Mystery of the Extra G
To verify the accuracy of RNA splicing in nuclei, we decided to determine
the nucleotide sequence of the IVS RNA product. (It was not obvious that
this would be particularly illuminating, since we had already shown that the
IVS product came from the IVS region of the rDNA and that within the
error limits of our measurements it was the correct size to account for the
entire IVS.) When Art Zaug labeled the RNA on its 5’ end with 32P and
subjected it to sequencing reactions, he determined a sequence 5’GAAAUAGNAA... (where N represents an unidentified nucleotide). The
DNA sequence across the exon-IVS junction had been reported earlier for
T. pigmentosa by Wild and Sommer (1980), and was being determined for T.
thermophila by Nancy Kan in Joe Gall’s laboratory. The T. thermophila DNA
sequence predicted that the IVS RNA would begin with 5’-AAAUAGCAA. . . .
Thus, our RNA sequence was a perfect match to the DNA sequence
except for the extra G residue on its 5’ end. Art Zaug meticulously checked
and rechecked the identity of this terminal nucleotide using a variety of
enzyme treatments and chromatography systems until there was no doubt:
the IVS RNA began with an ordinary guanosine residue, linked to the next
nucleotide by a standard 3’ - 5’ phosphodiester bond such as that produced
by RNA polymerase. Clearly the Gall lab, known for the high quality of their
science, must have made an error. We telephoned them, advising them that
they had determined most of the sequence correctly but had apparently
missed one G right at the 5’ end of the IVS. Much to our surprise, they
defended every nucleotide of their sequence: no ambiguity in the DNA
sequence, at least in that region, and no chance of a G at the 5’ splice site.
At about the same time, I was working to define the minimum components necessary for the release of the IVS from pre-rRNA. The original
experiments had been done in a “transcription cocktail” that included the
four nucleoside triphosphates, building blocks for RNA synthesis. I found
that removal of three of the NTPs had no effect on the reaction, but the
fourth, GTP, was required in micromolar concentration. In addition, IVS
release required MgCl 2 and was stimulated by certain salts such as
(NH4)2SO 4.
Was it a coincidence that GTP, the nucleotide required for IVS release in
our simple in vitro system, was also the nucleotide that was found unexpectedly at the 5’ end of the excised IVS? Or might there be a causal relationship
between the two observations? The obvious hypothesis was that GTP was
required so that it could be added to the 5’ end of the IVS during splicing.
T. R Cech
657
The test was simple: mix “P-labeled GTP with unlabeled pre-rRNA, and
look for labeling of the IVS RNA concomitant with its excision. The
experiment was the strangest I had ever performed. On the one hand, its
success was a straightforward prediction from our existing knowledge of the
system. On the other hand, it seemed incredibly naive and unrealistic to
expect that simple mixing of a nucleotide with phenol-extracted, proteinase-treated RNA could possibly result in formation of a covalent bond. I
certainly didn’t want to be embarrassed in front of my graduate students
and colleagues by the failure of such an experiment, so I did it very quietly.
The next day I ran the gel, exposed it for autoradiography, and developed the X-ray film. In the sample containing 32P-GTP plus MgCl2 there was
a bright signal of radioactivity at the position of the IVS RNA. In a sample
32
32
containing P-GTP but no MgCl2, and in a sample in which P-ATP had
been substituted for the 32P-GTP, there was no labeled IVS.
Over the next weeks, we confirmed several major features of the reaction.
GTP addition was stoichiometric, one GTP per IVS RNA. The GTP was
added precisely to the 5’ end of the IVS by a normal 3’ - 5’ phosphodiester
bond. Finally, the triphosphate was unnecessary; GMP and even guanosine
were active. The last of these observations eliminated one otherwise very
reasonable hypothesis, that the GTP was providing an energy source for
ligation much as ATP is used by phage T4 RNA ligase. Had that been the
case, forms such as guanosine and GMP which are missing the phosphoanhydride moiety would have been inactive.
It took only a few moments of thought to devise a simple splicing pathway
that integrated our new information about the reaction. Addition of guanosine to the phosphorus atom at the 5’ splice site must be occurring by a
transfer of phosphate esters, or transesterification reaction (Figure 4A).
Such a reaction would free the 5’ exon, leaving a new 3’ hydroxyl group at
its 3’ end. The simplest way to proceed from such proposed intermediates
to the final observed products was to invoke a second transesterification
reaction: attack of the 3’ hydroxyl of the 5’ exon at the 3’ splice site. Thus, a
single active site capable of promoting transesterification could be responsible for the entire splicing reaction.
The model shown in Figure 4A has undergone little change since we first
described it in 1981 and drew a more explicit version in 1982 (Cech et al.,
1981; Zaug & Cech, 1982). It has been a good predictor of a great many
other IVS-catalyzed reactions since then (e.g., Zaug et al., 1983; Tabak et
al., 1987). Furthermore, the model has been strongly bolstered by the
isolation and characterization of the proposed intermediates (Inoue et al.,
1986), by the demonstration of the reversibility of the reactions (Sullivan &
Cech, 1985; Woodson & Cech, 1989), and by determination of the stereochemical course (McSwiggen & Cech, 1989; Rajagopol et al., 1989). An
atomic-level model of transesterification occurring by an SN2(P) mechanism
is presented in Figure 4B.
658
Chemistry 1989
A
Figure 4. (A) Self-splicing of the Tetrahymena pre-t-RNA by consecutive transesterification
reactions. Straight lines, exons (mature rRNA sequences); wavy line, IVS; circle, 5’ splice-site
phosphate; square, 3’ splice-site phosphate; diamond, cyclization site phosphate. (B) Model for
the initial step involving nucleophilic attack by the 3’-hydroxyl group of guanosine on the
phosphorus atom at the 5’ splice site. The hypothesis of an in-line, SN 2 (P) reaction with
inversion of configuration around phosphorus was subsequently confirmed (McSwiggen &
Cech, 1989). The hypotheses of acid-base catalysis and coordination of Mg2+ to the phosphate,
enhancing the electrophilicity of the phosphorus atom and stabilizing the trigonal bipyramid
transition state, are untested. (Reproduced from Cech, 1987; copyright by the AAAS)
Self-splicing Recognized
The guanosine-addition reaction provided us with the proof we needed:
specific RNA bond breakage and formation were occurring in our simple in
vitro splicing reaction. Such a chemically diffkult reaction between very
unreactive molecules certainly had to be catalyzed. But what was the catalyst?
T. R. Cech
659
Figure 5. IVS RNA excision and cyclization activities are resistant to protease treatment. (Lane
S) Pre-rRNA’ purified by SDS-phenol extraction. This RNA was then treated exhaustively with
proteases or (NT) not treated. Incubation with GTP was performed, (lanes -) in the absence of
MgCl 2 or (lanes +) with 10 mM MgCl 2. Excision of the IVS RNA as a linear molecule (L) and
subsequent conversion to a circular form (C) were undiminished by protease treatment. (Lane
M) isolated linear and circular IVS RNAs. (Reproduced from Grabowski et al., 1983, by
permission of Alan R. Liss, Inc.)
Our first hypothesis was that the splicing activity was a protein tightly
bound (perhaps even covalently bonded) to the pre-rRNA isolated from
Tetrahymena nuclei. This would have to be a very unusual protein-RNA
complex to survive the multiple forms of abuse to which we had subjected it:
boiling in the presence of the detergent SDS, SDS-phenol extraction at
temperatures as high as 65°C and extensive treatment with several nonspecific proteases (Figure 5). That we took this hypothesis seriously provides an
indication of how deeply we were steeped in the prevailing wisdom that only
proteins were capable of highly efficient and specific biological catalysis.
In the same paper, we described an alternative hypothesis:
The resistance of the splicing activity to phenol extraction, SDS and
proteases can also be interpreted in a more straightforward manner. The
rRNA precursor might be able to undergo splicing without the participation of a protein enzyme. A portion of the RNA chain could be folded in
such a way that it formed an active site or sites that bound the guanosine
cofactor and catalyzed the various bond-cleavage and ligation events. If
one of the RNA molecules produced in the reaction (for example, the
free IVS) retained is activity and catalyzed additional splicing events, then
it would be an example of an RNA enzyme. (Cech et al., 1981)
As we accumulated negative result upon negative result trying to identify a
protein stuck to the pre-rRNA, the alternative “RNA only” hypothesis
began to appear more and more attractive. But how could we obtain a
positive result to prove it?
The best strategy we could devise was to synthesize the RNA in as artificial
a manner as possible, so that it was never in contact with the Tetrahymena
cells that up to now had been our sole source of the RNA. Complete
chemical synthesis of an RNA the size of the pre-rRNA (or even the size of
the IVS) was and still is beyond the scope of available technology. The next
best approach was to synthesize RNA from a recombinant DNA template
using purified RNA polymerase.
A bacterial plasmid (Figure 6) encoding the IVS and a portion of the
flanking rRNA sequences, situated so as to allow transcription by Escherichia
coli RNA polymerase, was already being constructed in the lab by Kelly
Kruger. The original purpose was to facilitate synthesis of large quantities
of pre-rRNA substrate for isolation of the splicing enzyme. The cloning
Figure 6. Plasmid constructed to enable synthesis of an artificial, shortened version of the prerRNA. (Top) Diagram of the natural pre-rRNA, with transcribed spacers shown as open boxes,
mature rRNA sequences as solid boxes, and the IVS as a hatched box. (Middle) The T.
thermophila rDNA. One half of a palindromic rDNA molecule is shown. (Bottom) Plasmid pIVSll
containing the 1 .6 kilobase Hind III fragment of the rDNA inserted adjacent to P lac, a promoter
for transcription by E. coli RNA polymerase. Restriction endonuclease sites are Hind III (p) and
Eco RI (¯). (Adapted from Kruger et al.. 1982; copyright by Cell Press)
took longer than any of us had anticipated, such that by the time it was
accomplished early in 1982 we were desperate to have the plasmid for a
different purpose: to produce synthetic pre-rRNA for the self-splicing test.
The plasmid was grown up in E. coli, carefully deproteinized, and incubated with purified E. coli RNA polymerase under conditions that we already
knew were inhibitory for splicing. The polymerase was then destroyed and
the RNA purified by gel electrophoresis under denaturing conditions.
Upon addition of GTP, MgCl2 and salt, the TVS RNA was released from this
artificial, shortened pre-rRNA. The site at which GTP broke the RNA chain
was exactly the position that served as the 5’ splice site in vivo, providing
some confidence that self-splicing was relevant to splicing as it occurred in
the living cell.
We held a relatively subdued celebration in the lab. Between sips of
champagne we compiled a list of possible general names for RNA molecules
able to lower the activation energy for specific biochemical reactions. It was
then that we coined the term “ribozyme,” for a ribonucleic acid with
enzyme-like properties.
RNA in Circles
Concurrent with our studies of pre-rRNA splicing in isolated nuclei, we
were pursuing a post-splicing phenomenon: the conversion of the excised
intervening sequence RNA into a circular RNA molecule in isolated Tetrahymena nuclei (Grabowski et al., 1981). The circular form survived treatment
with protease and various denaturants, which suggested that it was a covalently closed circle of RNA. As Paula Grabowski, then a graduate student in
the laboratory, characterized the cyclization reaction, she found that it
occurred with extensively deproteinized RNA in a simple buffered MgCl 2
solution. I dismissed her original observation of protein-free cyclization as
being an artifact of incomplete denaturation of the linear RNA. Yet, as the
experiments proved reproducible, I began to derive some solace in the
knowledge that two researchers, Zaug and Grabowski, studying two different reactions, splicing and cyclization, were both finding activity in the
absence of added protein. Having two strange results somehow made me
more comfortable than just a single strange result. The two sets of observations came together when we found that the plasmid transcripts that underwent self-splicing produced IVS RNA that underwent self-cyclization (Kruger et al., 1982).
The circular IVS RNA was not formed by end-to-end joining of the linear
form. Instead, the 3’ end of the linear IVS attacked an internal phosphorus
atom near the 5’ end of the molecule, clipping off a short oligonucleotide in
the process (Figure 4A). Thus cyclization, like RNA splicing, occurred by
transesterification (Zaug et al., 1983).
How Does the RNA Do It?
At first glance, RNA seemed ill suited to be a catalyst. With its four rather
similar bases, RNA would appear greatly limited in its ability to form a
662
Chemistry 1989
specific substrate-binding pocket. In contrast, the 20 amino acids found in proteins
explore a wide range of sizes and shapes, hydrophilicity and hydrophobicity. In terms
of promoting chemistry, RNA has a dearth of functional groups that are ionizable near
neutral pH, whereas proteins have histidine and cysteine (pKa's in the range of 6-8).
How then, was the Tetrahymena IVS able to catalyze transesterification?
The first glimpse of the catalytic mechanism came from detailed studies of the
guanosine requirement. Brenda Bass tested every available guanosine analog and
found great variation in their activity (Bass & Cech, 1984, 1986). Derivatives carrying
bulky substituents on the 7 or 8 position of the guanine base or on the 5' position of the
ribose sugar were as active as guanosine, indicating that these positions did not
interact with the IVS RNA. Other derivatives were fully active but only at high
concentration, or were inactive as substrates but acted as competitive inhibitors of the
reaction of guanosine. Based on the Km or Ki of these guanosine analogs, we could
assign free energy contributions to individual functional groups of guanosine. All the
data pointed to the IVS containing a well-behaved binding site for gu anosine. The site
has recently been located within the IVS in elegant work by Michel et al. (1989).
The existence of a G-binding site explained the high specificity for guanosine. In
addition, by orienting the nucleophile with respect to the 5’-splice site phosphate, the
G-site would contribute substantially to rate acceleration. RNA catalysis was suddenly
in a familiar context; the loss of entropy and orientation of reacting groups achieved by
formation of a specific enzyme-substrate complex is central to catalysis by protein
enzymes (Jencks, 1969; Fersht, 1985).
The other reactant in the first transesterification step, the phosphorus atom at the 5'
splice-site, is also held in place by a binding interaction. As first proposed by Davies et
al. (1982) and also apparent in models of Michel et al. (1982), a 5' exon-binding site
within the IVS base-pairs to the last few nucleotides of the 5' exon. This pairing
interaction specifies the site of guanosine addition and also holds the 5' exon into place
for the second step of splicing, exon ligation (Waring et al. 1986; Been & Cech, 1986;
Price et al., 1987; Barfod & Cech, 1989).
The IVS does much more than simply hold the reacting groups in place. In the
absence of guanosine, hydrolysis occurs specifically at the splice-site phosphodiester
bonds, producing 5' phosphate/3' hydroxyl termini (Zaug et al., 1984; Inoue et al.,
1986). A 3' hydroxyl is the same as the product of self-splicing but opposite to the
product of random alkaline hydrolysis of RNA. This site-specific hydrolysis reaction
reflects the ability of the catalytic center of the IVS to activate the splice-site phosphates
(or perhaps activate the nucleophile, in which case it must be able to activate OH- as
well as the hydroxyl of guanosine). While the structural basis of this activation is
unknown, reasonable hypotheses include stabilization of the pentacoordinate
transition-state structure of the phosphate and specific coordination of
a Mg 2 + ion (Zaug et al., 1984; Guerrier-Takada et al., 1986; Cech, 1987;
T.R. Cech
663
Grosshans & Cech, 1989; Sugimoto et al., 1989). Once again, in a general sense the RNA
catalyst is recapitulating a major catalytic strategy of protein enzymes, or vice versa.
Tetrahymena is Not Alone
While we were intently characterizing splicing of the Tetrahymena IVS, we were also
wondering when (and perhaps, if) related intervening sequences would be found in other
organisms. The differences in nucleotide sequences near the splice sites made it seem
unlikely that Tetrahymena rRNA splicing would be related to nuclear tRNA or mRNA
splicing. The related intervening sequences came from an unexpected direction: yeast
mitochondria. This was unanticipated because mitochondria, thought to have arisen from
symbiotic prokaryotes, do not usually have genes or modes of gene expression similar to
those of eukaryotes. Furthermore, yeast is evolutionarily extremely distant from Tetrahymena.
In 1982, several groups identified short sequence elements that were conserved among a
group of fungal mitochondrial intervening sequences (Burke & RajBhandary, 1982; Davies et
al., 1982; Michel et al., 1982). Furthermore both Michel and the Davies group had proposed that the
664
Chemistry 1989
conserved sequence blocks interacted to form a common set of short basepaired regions, serving to fold the intervening sequences into the same
fundamental secondary structure (Figure 7A). Michel called these the group
I introns, to distinguish them from a second group of mitochondrial introns
that shared a different structure (Michel & Dujon, 1983).
The Tetrahymena rRNA IVS contained the conserved sequence elements
and secondary structures characteristic of the mitochondrial group I (Michel & Dujon, 1983; Waring et al., 1983). Furthermore, a very similar
structure model of the Tetrahymena IVS was independently derived by
another method, free energy minimization as constrained by experimentally
determined sites of cleavage of the folded RNA by various nucleases (Cech
et al., 1983). A current version of the secondary structure is shown in
Figure 7B.
This convergence of two previously noninteracting sets of ideas was
important in several respects. In terms of splicing mechanisms, it was now
unlikely that the Tetrahymena intron would be unique; we could extend our
knowledge of its mechanism by comparing and contrasting splicing of
different members of the group. Second, a believable model of the secon-
T. R. Cech
665
dary structure of the Tetrahymena IVS was now in hand, and one could begin
to formulate structure-function relationships. Finally, the similarity between the Tetrahymena rRNA and fungal mitochondrial introns might be
revealing their origin; perhaps they were transposable elements able to
enter both nuclear and mitochondrial compartments (Cech et al., 1983).
Waiting for Number Two
If RNA catalysis were of any general significance to biology, there would be
additional examples. Throughout 1982 and most of 1983, none came forth.
Yet there seemed to be some reasonable candidates. In the fall of 1983, we
wrote an article in which we speculated:
Several enzymes, such as RNase P, 1,4-a-glucan branching enzyme and
potato o -diphenol oxidase, have RNA components essential for their
catalytic activities. The peptidyl transferase activity of ribosomes also
requires an RNA-protein complex. It remains to be seen whether the
RNA is directly involved in the active site of any of these ribonucleoprotein enzymes. (Bass and Cech, 1984)
The speculation about RNase P was already outdated when our paper was
published in 1984, because by that time Guerrier-Takada et al. (1983) had
announced that the RNA component was the catalytic subunit of that
ribonucleoprotein enzyme. This was followed by the report early in 1984
that a synthetic E. coli RNase P RNA, transcribed in vitro from a recombinant DNA template, also had intrinsic catalytic activity; the possibility that
catalysis was due to protein contamination was thereby eliminated (Guerrier-Takada and Altman, 1984). Similarly, the RNA subunit of Bacillus
subtilis RNase P acted as an enzyme in vitro (Guerrier-Takada et al., 1983;
Marsh & Pace, 1985). The RNase P discovery was very exciting to us. Not
only did it provide a second example of an RNA molecule that lowered the
activation energy for a specific biochemical reaction, but it was the first
proven case of an RNA molecule that catalyzed a reaction without itself
undergoing any net change. RNA catalysis was not restricted to the realm of
intramolecular catalysis.
Within the next year, self-splicing of additional group I intervening
sequences was reported. Garriga and Lambowitz (1984) found that the first
IVS of the cytochrome b pre-mRNA from Neurospora mitochondria underwent self-splicing in vitro. Several self-splicing group I RNAs from yeast
mitochondria were characterized by van der Horst and Tabak (1985). Most
unexpectedly, self-splicing group I IVSs were found in three bacteriophage
T4 mRNAs (Belfort et al., 1985; Ehrenman et al., 1986; Gott et al., 1986);
RNA splicing took place in a prokaryote. In all cases, splicing occurred by
the same G-addition pathway as splicing of the Tetrahymena nuclear rRNA
IVS.
The mitochondrial group II intervening sequences have conserved sequences and secondary structures distinct from those of group I (Michel
and Dujon, 1983). Peebles et al. (1986) and Van der Veen et al. (1986)
666
Chemistry 1989
Figure 8. Mechanism of splicing of the four major groups of precursor RNAs. Wavy lines
indicate IVSs, smooth lines indicate flanking exons. For nuclear mRNA splicing, many components assemble with the pre-mRNA to form the spliceosome; only the UI and U2 small nuclear
ribonucleoproteins are shown here. (Adapted from Cech, 1986a; copyright by Cell Press)
discovered that pre-mRNA containing a group II intervening sequence was
self-splicing in vitro. The reaction did not require guanosine, and occurred
by formation of a branched “lariat” RNA. The proposed mechanism is
shown in Figure 8.
The fundamental chemistry of group II RNA splicing appears to be the
same as that of nuclear pre-mRNAs, which do not self-splice. Instead,
nuclear mRNA splicing requires assembly of the substrate with a large
complex of proteins and small nuclear ribonucleoproteins to form the
spliceosome (Brody & Abelson, 1985; Frendewey & Keller, 1985; Grabowski et al., 1985). The mechanistic similarities have led to the speculation
that nuclear mRNA splicing may also be RNA catalyzed, with much of the
catalysis being provided in the form of the small nuclear RNAs (Kruger et
al., 1982; Maniatis and Reed, 1987).
Enzymologists Outraged
Although our description of RNA self-splicing was shocking to many, it was
quickly accepted by the scientific community. In contrast, our use of the
words “catalysis” and “enzyme-like” to describe the phenomenon provoked
some much more heated reactions.
Our reasons for emphasizing the relationship between RNA self-splicing
and biological catalysis might be better appreciated in the context of the
T. R. Cech
667
Table 1. Defining characteristics of biological catalysis.
* No, although the active site is preserved through the reactions. (From Cech, 1988c, by
permission of Alan R. Liss Inc.)
definition given in Table 1. First, biological catalysts achieve rate accelerations of the order of 106- to 1013-fold, bringing the reactions into a time
scale that is useful for living systems. Self-splicing clearly meets this criterion; although the quantitation of rate acceleration can be done only for the
site-specific hydrolysis reaction promoted by the RNA active site, even this
relatively slow side-reaction occurs 1010 -fold faster than the estimated uncatalyzed rate (Figure 9). Second, the extraordinary specificity of biological
catalysts is evident in the self-splicing reaction; the molecule selects GTP as
the attacking group for the first step of the reaction and is able to choose 2
of the 6000 nucleotides in the pre-rRNA as splice sites. On the other hand,
the IVS RNA is clearly not regenerated in exactly the same form as it
entered the reaction; after all, the purpose of self-splicing is to convert prerRNA to ligated exons plus excised IVS. Nevertheless, enzymologists do
speak of intramolecular catalysis (Jencks, 1969; Bender et al., 1984; Fersht,
1985), and we thought that such a descriptor was particularly appropriate
for self-splicing.
Figure 9. The IVS RNA has an extremely efficient catalytic center. Second-order rate constants
(42°C) for the alkaline hydrolysis of phosphate diesters are displayed on a logarithmic scale.
Arrows designate the P-O bond that undergoes cleavage. Above line, data for dimethyl phosphate and ethylene phosphate from Kumamoto et al. (1956) and Haake and Westheimer (1961).
Below line, hydrolysis of RNA (left) uncatalyzed and (right) catalyzed by the catalytic center of
the Tetrahymena IVS. (Reproduced from Cech, 1987; copyright by the AAAS)
668
Chemistry 1989
The enzymologists who reviewed Bass & Cech (1984) were far from
convinced. All three referees wrote thoughtful reviews, expressing considerable interest in the data but chastising us for our naivete about enzymes.
For example:
Enzymes are true catalysts: they speed the rate of a reaction, but are
themselves unchanged by the reaction. In the present instance, the ribozyme acts in a “one-shot” reaction and is permanently changed so that it
can no longer cycle as does a true catalyst. The authors are well aware of
this, but appear to ignore this key feature in making their comparisons.
How fundamental was this distinction between self-splicing and catalysis?
Opinions would probably vary as much today as they did six years ago.
Instead of engaging in a protracted argument, we decided to test experimentally whether the self-splicing RNA could be converted into a multipleturnover catalyst. Arthur Zaug made a slight alteration of the self-splicing
IVS, removing its first 19 nucleotides. The resulting L - 19 IVS RNA met
all three criteria of a biological catalyst (Table 1 and Zaug & Cech, 1986a).
Shortened forms of the Tetrahymena IVS have multiple enzymatic activities. Depending on the substrates with which they are presented, these RNA
enzymes can catalyze nucleotidyltransfer, phosphotransfer and hydrolysis
reactions; nucleotidyltransfer can result in either endonucleolytic cleavage
or polymerization of RNA (Zaug & Cech, 1986a,b; Zaug et al., 1986; Kay &
Inoue, 1987; Been & Cech, 1986, 1988; Doudna & Szostak, 1989).
As an example, consider the reaction diagrammed in Figure 10. A form
of the IVS RNA missing both its splice sites catalyzes the cleavage of other
RNA molecules after sequences resembling that of the normal 5’ exon,
CUCU. The reaction is an intermolecular version of the first step of RNA
self-splicing, in which RNA cleavage is accompanied by covalent joining of
guanosine to the 5’ end of the 3’ cleavage product. Because the IVS RNA is
Figure 10. A shortened version of the Tetrahymena IVS RNA has enzymatic activity as an
endonuclease. The mechanism is an intermolecular version of the first step of pre-rRNA selfsplicing. Thin letters and lines, IVS sequences; bold letters and thick lines, exon sequences (top)
or substrate RNA sequences (bottom); G in italics, free guanosine or GTP. (Reproduced by
permission from Zaug et al., 1986; copyright Macmillan Journals Limited)
T. R. Cech
669
unaltered by the reaction, it can sequentially bind and process a large
number of substrate RNA molecules. The kcat/Km for cleavage of an oligoribonucleotide substrate containing a CCCUCU recognition sequence is 10 8
M - 1 m i n- 1 ( H erschlag & Cech, 1990), well within the range of protein
enzymes.
Site-directed mutagenesis of the 5’ exon-binding site of the IVS redirects
substrate specificity as predicted by the rules of Watson-Crick base-pairing
(Zaug et al., 1986; Murphy & Cech, 1989). Thus, it has been possible to
create a whole set of “RNA restriction endonucleases” that may be of use
for the sequence-specific cleavage of RNA.
Converting a self-processing RNA into an RNA enzyme by physically
separating its internal substrate from its catalytic center has been generally
successful. The “hammerhead” and “hairpin” ribozymes both found in
plant infectious agents (viroids or viral satellite RNAs), undergo self-cleavage leaving 2’,3’-cyclic phosphate termini (Prody et al., 1986; Forster &
Symons, 1987). Both have been converted into RNA enzymes (Uhlenbeck,
1987; Haseloff & Gerlach, 1988; Hampel & Tritz, 1989). A group II IVS can
cleave RNA containing a 5’ splice-site or ligated exon RNA in an intermolecular reaction (Jacquier & Rosbash, 1986; Jarrell et al., 1988), although in
these cases multiple turnover was not demonstrated. Even the intramolecular lead-cleavage reaction of tRNA can be converted into an enzymatic
system (Sampson et al., 1987). The ability of pieces of a structured RNA
molecule to self-assemble, recreating an active unit, therefore appears to be
a general property of the RNA biopolymer. In addition, if the fragment
which undergoes reaction is secured to the rest of the molecule by a
relatively weak interaction such as a few base-pairs, it will dissociate quickly
enough to permit multiple turnovers.
Origin of Life Fantasies
The discoveries of RNA self-splicing and the enzymatic activity of RNase P
RNA rekindled earlier speculation concerning the possible role of RNA in
the origin of life (Woese, 1967; Crick, 1968; Orgel, 1968). Contemporary
cells depend on a complex interplay of nucleic acids and proteins, the
former serving as informational molecules and the latter as the catalysts that
replicate and express the information. Certainly the first self-reproducing
biochemical system also had an absolute need for both informational and
catalytic molecules. The dilemma was therefore Which came first, the nucleic
acid or the protein, the information or the function? One solution would be the
co-evolution of nucleic acids and proteins (Eigen, 1971). The finding that
RNA can be a catalyst as well as an information-carrier lent plausibility to an
alternative scenario: the first self-reproducing system could have consisted
of RNA alone (Sharp, 1985; Pace & Marsh, 1985; Orgel, 1986).
Perhaps coincidentally or perhaps because of its ancestry, one of the
reactions catalyzed by the Tetrahymena IVS RNA enzyme is a nucleotidyl
transfer reaction with fundamental similarity to the reaction catalyzed by
RNA replicases (Zaug & Cech, 1986a; Been & Cech, 1988). A specific model
670
Chemistry 1989
Figure 11. Hypothetical model for RNA self-replication involving an RNA catalyst (ribozyme*).
Double-stranded RNA (I) undergoes strand separation to give ribozyme* ((+) strand) and the
complementary (-) strand (II). The ribozyme* catalyzes synthesis of a new (+) strand, using the
(-) strand as a template (III). The detailed mechanism is described by Cech (1986b). Completion of synthesis reforms the double-stranded RNA (I). A second cycle is needed to achieve
replication of the starting material.
for RNA self-replication based on the properties of the catalytic center of
the Tetrahymena IVS RNA is given in Figure 11. The RNA enzyme, ribozyme*, differs from the L - 19 IVS RNA in that it utilizes an external
rather than an internal template (Cech, 1986b). Separation of the template
region from the rest of the catalytic center of the RNA with retention of
activity has recently been achieved by Doudna and Szostak (1989).
Now that we have examples of catalytic RNAs, it has been entertaining to
look back at earlier speculations about the catalytic potential of RNA. As a
representative example, consider the following:
. . . in the evolutionary scheme, folded nucleic acid structures were abandoned by nature in favour of proteins. Although nucleic acids may have
performed many enzymatic tasks in primitive cells, this is much more
efficiently done by proteins. . . . With nucleic acids the four bases are all of
one structural type, though the existence of many modified bases points
to an evolutionary proliferation giving much greater possibilities in form-
T. R. Cech
671
ing structures. Nevertheless, the use of bases cannot match the enormous
flexibility provided by having twenty amino acids which fall into three or
four different structural types. (Klug et al., 1974)
One implication is that nucleic acids, if they could have any such activity,
would by nature be inferior catalysts. At one time I had a similar bias, but it
was gradually dispelled as we quantitated the rate acceleration and specificity inherent to the catalytic center of the Tetrahymena ribozyme. The second
conclusion, regarding the limited versatility of RNA catalysts, still strikes me
as being correct. In all well established examples, the substrate for an RNA
catalyst or ribonucleoprotein enzyme is RNA or DNA or, in the case of
protein synthesis on ribosomes, the closely related aminoacyl-tRNA. To its
credit, RNA can form a specific binding site for at least one amino acid
(Yarus, 1988), and there is evidence that covalent linkage. of the terminal
protein to poliovirus RNA is at least in part RNA-catalyzed (Tobin et al.,
1989). Nevertheless, it seems unlikely that RNA can match the enormous
variety of binding sites that can be formed from amino acid side chains. The
list of RNA catalysts is still growing quite rapidly. Yet it seems likely that if
an entire list of biological catalysts is ever complete, it will include more
proteins than RNA molecules.
ACKNOWLEDGEMENTS
I dedicate this lecture to my wife Carol and to my parents, Robert and
Annette. I am indebted to my coworkers over the past decade, who performed the experiments and contributed many of the ideas about RNA catalysis described here. In addition to those whose work I have described, there
are many who made important contributions that have gone unmentioned
here but not at all unappreciated. Funding for our research has been
provided principally by the National Institutes of Health and the American
Cancer Society, and more recently by the Howard Hughes Medical Institute; I thank them for their uninterrupted and enthusiastic support. Finally,
I have enjoyed the comradeship and helpful criticism of the RNA research
community, both in Boulder and worldwide.
REFERENCES
Abelson J. (1979) Ann. Rev. Biochem. 48,1035-1069.
Altman S. (1989) Adu. Enzym. 62, l-36.
Barfod, E. T. & Cech, T. R. (1989) Molec. Cell. Biol. 9, 3657-3666.
Bass, B. L. & Cech, T. R. (1984) Nature 308, 820-826.
Bass, B. L. & Cech, T. R. (1986) Biochemistry 25, 4473-4477.
Been, M. D. & Cech, T. R. (1986) Cell 47, 207-216.
Been, M. D. & Cech, T. R. (1988) Science 239, 1412-1416.
Belfort, M., Pedersen-Lane, J., West, D., Ehrenman, K., Maley, G., Chu, F. & Maley,
F. (1985) Cell 41, 375-382.
Bender, M. L., Bergeron, R. J. & Komiyama, M. (1984) The Bioorganic Chemistry of
Enzymatic Catalysis (John Wiley, New York).
672
Chemistry 1989
Brody, E. & Abelson, J. (1985) Science 228, 963-966.
Burke, J.M. (1988) Gene 73, 273-294.
Burke, J. & RajBhandary, U. L. (1982) Cell 31, 509-520.
Carin, M., Jensen, B. F., Jentsch, K. D., Leer, J. C., Nielson, O. F. & Westergaard,
0. (1980) Nucleic Acids Res. 8, 5551-5566.
Cech, T. R. (1986a) Cell 44, 207-210.
Cech, T. R. (1986b) Proc. Natl. Acad. Sci. USA 83, 4360-4363.
Cech, T. R. (1987) Science 236, 1532 - 1539.
Cech, T. R. (1988a) Gene 73, 259-271.
Cech, T. R. (1988b) J. Am. Med. Assn. 260, 3030-3034.
Cech, T. R. (1988c) Harvey Lec. 82, 123- 144.
Cech, T. R. (1990) Ann. Reu. Biochem. 59, in press.
Cech, T. R. & Bass, B. L. (1986) Ann. Rev. Biochem. 55, 599 -629.
Cech, T. R. & Rio, D. C. (1979) Proc. Natl. Acad. Sci. USA 76, 5051 -5055.
Cech, T. R., Zaug, A. J. & Grabowski, P. J. (1981) Cell 27, 487-496.
Cech, T. R., Tanner, N. K., Tinoco, I., Jr., Weir, B. R., Zuker, M. & Perlman, P. S.
(1983) Proc. Natl. Acad. Sci. USA 80, 3903 -3907.
Crick, F. H. C. (1968) J. Mol. Biol. 38, 372.
Crick, F. (1979) Science 204, 264 - 271.
Darnell, J. E., Jr. (1978) Science 202, 1257-1260.
Davies, R. W., Waring, R. B., Ray, J. A., Brown, T. A. & Scazzocchio, C. (1982)
Nature 300, 7 19 - 724.
Din, N., Engberg, J., Kaffenberger, W. & Eckert, W. (1979) Cell 18, 525-532.
Doudna, J. A. & Szostak, J. W. (1989) Nature 339, 519-522.
Downs, W. D. & Cech, T. R. (1990) Biochemistry, in press.
Ehrenman, K., Pedersen-Lane, J., West, D., Herman, R., Maley, F. & Belfort, M.
(1986) Proc. Natl. Acad. Sci. USA 83, 5875 -5879.
Eigen, M. (1971) Naturwissenschaften 58, 465-523.
Engberg, J., Nilsson, J. R., Pearlman, R. E. & Leick, V. (1974) Proc. Natl. Acad. Sci.
USA 71, 894-898.
Fersht, A. (1985) Enzyme Structure and Mechanism, ed. 2, (Freeman, New York).
Forster, A. C. & Symons, R. H. (1987) Cell 49, 211-220.
Fox, G. & Woese, C. R. (1975) Nature 256, 505-507.
Frendewey, D. & Keller, W. (1985) Cell 42, 355-367.
Gall, J. G. (1974) Proc. Natl. Acad. Sci. USA 71, 3078-3081.
Garriga, G. & Lambowitz, A. M. (1984) Cell 39, 631-641.
Gott, J. M., Shub, D. A. & Belfort, M. (1986) Cell 47, 81-87.
Grabowski, P. J., Zaug, A. J. & Cech, T. R. (1981) Cell 23, 467-476.
Grabowski, P. J., Brehm, S. L., Zaug, A. J., Kruger, K. & Cech, T. R. (1983) In GENE
EXPRESSION, D. H. Hamer and M. J. Rosenberg, eds. (Alan R. Liss, New York)
Vol. 8, 327-342.
Grabowski, P. J., Seiler, S. R. & Sharp, P. A. (1985) Cell 42, 345-353.
Grosshans, C. A. & Cech, T. R. (1989) Biochemistry 28, 6888-6894.
Guerrier-Takada C., & Altman, S. (1984) Science 223, 285-286.
Guerrier-Takada, C., Gardiner, K., Marsh, T., Pace, N. & Altman, S. (1983) Cell 35,
849-857.
Guerrier-Takada, C., Haydock, K., Allen, L. & Altman, S. (1986) Biochemistry 25,
1509-1515.
Haake, P. C. & Westheimer, F. H. (196l) J. Am. Chem. Soc. 83, 1102 - 1109.
Haldane, J. B. S. (1930) Enzymes (Longmans, Green & Co., Great Britain). (1965, M.
I. T. Press, Cambridge, Mass.).
Hampel, A. & Tritz, R. (1989) Biochemistry 28, 4929-4933.
Haseloff, J. & Gerlach, W. L. (1988) Nature 334, 585 -591.
Herschlag, D. & Cech, T. R. (1990) Nature 344, 405-409.
Inoue, T., Sullivan, F. X. & Cech, T. R. (1986) J. Mol. Biol. 189, 143 - 165.
T. R Cech
673
Jacquier, A. & Rosbash, M. (1986) Science 234, 1099 - 1104.
Jarrell, K. A., Peebles, C. I.., Dietrich, R. C., Romiti, S. L. & Perlman, P. S. (1988) J.
Biol. Chem. 263, 3432- 3439.
Jencks, W. P. (1969) Catalysis in Chemistry and Enzymology,
(McGraw-Hill, New York).
Kay, P. S. & Inoue, T. (1987) Nature 327, 343-346.
Klug, A., Ladner, J. & Robertus, J. D. (1974) J. Mol. Biol. 89, 51 1-516.
Knapp, G., Ogden, R. C., Peebles, C. L. & Abelson, J. (1979) Cell 18, 37-45.
Kruger, K., Grabowski, P. J., Zaug, A. J., Sands, J., Gottschling, D. E. & Cech, T. R
(1982) Cell 31, 147-157.
Kumamoto, J., Cox, J. R., Jr. & Westheimer, F. H. (1956) J. Am. Chem. Soc. 78,
4858 - 4860.
Lerner, M. R., Boyle, J. A., Mount, S. M., Wolin, S. L. & Steitz, J. A. (1980) Nature
283, 220 -224.
McSwiggen, J. A., & Cech, T. R. (1989) Science 244, 679-683.
Maniatis, T. & Reed, R. (1987) Nature 325, 673 -678.
Marsh, T. L. & Pace, N. R. (1985) Science 229, 79-81.
Michel, F. & Dujon, B. (1983) EMBO J. 2, 33 -38.
Michel, F., Jacquier, A. & Dujon, B. (1982) Biochimie 64, 867-881.
Michel, F., Hanna, M., Green, R., Bartel, D. P. & Szostak, J. W. (1989) Nature 342,
391-395.
Murphy, F. & Cech, T. R. (1989) Proc. Natl. Acad. Sci. USA 86, 9218-9222.
Noller, H. F. & Woese, C. R. (1981) Science 212,403-410.
Orgel, L. E. (1986) J. Theor. Biol. 123, 127-149.
Pace, N. R. & Marsh, T. L. (1985) Orig. Life 16 ,97 - 116.
Peebles, C. L., Ogden, R. C., Knapp, G. & Abelson, J. (1979) Cell 18, 27 -35.
Peebles, C. L., Perlman, P. S., Mecklenburg, K. L., Petrillo, M. L., Tabor, J. H.,
Jarrell, K. A. & Cheng, H.-L. (1986) Cell 44, 213-223.
Price, J. V., Engberg, J. & Cech, T. R. (1987) J. Mol. Biol. 196, 49-60.
Prody, G. A., Bakos, J. T., Buzayan, J. M., Schneider, I. R. & Bruening, G. (1986)
Science 231, 1577- 1580.
Rajagopal, J., Doudna, J. A. & Szostak, J. W. (1989) Science 244, 692-694.
Sampson, J. R., Sullivan, F. X., Behlen, L. S., DiRenzo, A. B. & Uhlenbeck, 0. C.
(1987) Cold Spring Harbor Symp. Quant. Biol.,
Vol. LII, (Cold Spring Harbor, New
York), pp. 267-277.
Sharp, P.A. (1985) Cell 42, 397-400.
Sugimoto, N., Tomka, M., Kierzek, R., Bevilacqua, P. C. & Turner, D. H. (1989)
Nucleic Acids Res. 17, 355 - 371.
Sullivan, F. X. & Cech, T. R. (1985) Cell 42, 639-648 .
Tabak, H. F., Van der Horst, G., Kamps, A. M. J. E. & Arnberg, A. C. (1987) Cell 48,
101-110.
Tobin, G. J., Young, D. C. & Flanegan, J. B. (1989) Cell 59 , 511-519.
Uhlenbeck, 0. C. (1987) Nature 328, 596- 600.
Van der Horst, G. & Tabak, H. F. (1985) Cell 40, 759 - 766.
Van der Veen, R., Arnberg, A. C., Van der Horst, G., Bonen, L., Tabak, H. F. &
Grivell, L. A. (1986) Cell 44, 225-234.
Waring, R. B., Scazzocchio, C., Brown, T. A. & Davies, R. W. (1983) J. Mol. Biol.
167, 595-605.
Waring, R. B., Towner, P., Minter, S. J. & Davies, R. W. (1986) Nature 321, 133 139.
Wild, M. A. & Gall, J. G. (1979) Cell 16, 565-573.
Wild, M. A. & Sommer, R. (1980) Nature 283, 693-694.
Woese, C. (1967) In The Genetic Code: The Molecular Basis for Genetic Expression
(Harper and Row, New York), p. 186.
Woodson, S. A. & Cech, T. R. (1989) Cell 57, 335-345.
674
Chemistry 1989
Yao, M.-C., Kimmel, A. R. & Gorovsky, M. A. (1974) Proc. Natl. Acad. Sci. USA 71,
3082-3086.
Yarus, M. (1988) Science 240, 1751-1758.
Zaug, A. J. & Cech, T. R. (1980) Cell 19, 331-338.
Zaug, A. J. & Cech, T. R. (1982) Nucleic Acids Res. 10, 2823-2838.
Zaug, A. J. & Cech, T. R. (1986a) Science 231, 470-475.
Zaug, A. J. & Cech, T. R. (1986b) Biochemistry 25, 4478-4482.
Zaug, A. J., Grabowski, P. J. & Cech, T. R. (1983) Nature 301, 578-583.
Zaug, A. J., Kent, J. R. & Cech, T. R. (1984) Science 224, 574 -578.
Zaug, A. J., Been, M. D. & Cech, T. R. (1986) Nature 324, 429-433.