molecular biology

MOLECULAR BIOLOGY
Historical development and basic concepts
S. K. Jain
Professor
Hamdard University
New Delhi 110062
E-mail: [email protected]
24-Jul-2006 (Revised 05-Sep-2007)
CONTENTS
Introduction
Discovery of DNA
DNA is the genetic material
Basic structure of nucleic acids
Three-dimensional structure of DNA
Double helical structure of DNA
Alternate forms of DNA
Structure of RNA
Genome and the C-value
Central dogma of molecular biology
Genome organization
The satellite DNA
Highly and moderately repetitive sequences
Denaturation of DNA
G:C content of DNA and the Tm
Key words
Deoxyribonucleic acid; Genetic material; Chromosome; Replication; Central dogma; Transcription;
Ribonucleic acid; Translation; Proteins; Repetitive sequences; Denaturation
Introduction
Molecular Biology is an old science. It started with the evolution of life. Soon after the cell
was recognized as the basic unit of life and its structure was deciphered, it became clear that
the cellular constituents are nothing but chemical compounds. While some of these
compounds are simple chemicals such as water, NaCl and many other small molecules,
certain other molecules are very large having complex chemical structure. Nucleic acids,
proteins, complex carbohydrates and lipids are some such molecules. These complex
molecules often have very large molecular weight and are relatively difficult to be
synthesized by chemical means. These are therefore referred as bio-macromolecules. These
are synthesized in the cell by a number of complex reactions catalyzed by enzymes, the
biological catalysts. The biochemical pathways are highly regulated chemical reactions
taking place in the cell. Almost all these biochemical reactions can be achieved in the test
tube also. However, these often require high activation energy and their efficiency may not
always be high in the test tubes. On the other hand, within the cell these reactions take place
at body temperature at relatively low activation energy. Enzymes are responsible for this
lowering the of activation energy. Further, at any given time thousands of different
reactions take place simultaneously in a cell. There is a highly regulated coordination
between different cellular reactions, many of which are of diverse nature. Further, this
coordination is very efficient and precisely controlled. A number of regulatory molecules
(usually protein in nature) regulate various reactions. These biochemical reactions can be
influenced by both endogeneous factors such as hormones, metabolic status of the cell etc
and by exogeneous factors such as the environmental changes.
Life is nothing but the sum of organized interaction of these cellular components (the
biomolecules) and any disorganization of these molecular interactions results either in a
pathological condition or in cell death. A couplet written by an Urdu poet, Chakbast
Lucknowi describes this scientific fact in a very beautiful manner. The couplet is:
`Zindgi kya hai, anasir mein zuhure tertib
Maut kya hai, inhi ajza ka parishan hona’
It can be translated as:
What is life, it is the organization of components (the biomolecules)
What is death, the disorganization of the same component’
Modern molecular biology started with the discovery of DNA by Friedrich Meischer in
1869. A series of experiments finally proved that DNA is the genetic material that transfers
the genetic information from one generation to another. These included the classical
experiments by Frederic Griffin (1928), Oswald Avery, Colin McLeod and Maclyn
McCarty (1943) and the famous double labeling experiment of Alfred Hershey and Martha
Chase (1952) that proved the role of DNA as carrier of genetic information without any
doubt. The elucidation of DNA structure by Watson and Crick (1953) was the key event
that set the ball rolling. Invention of new techniques for cell fractionation and isolation,
purification, characterization and analysis of subcellular components made it possible to
study the cellular processes at molecular level. Development and availability of new
generation of analytical equipments further facilitated the process. A number of other
discoveries especially that of DNA polymerase and genetic code helped in deciphering the
secrets of life at the molecular level. Finally, the discovery of restriction endonucleases and
ligases that can act as `molecular scissors’ and `molecular needles’, respectively led to the
beginning of yet another important field, the `genetic engineering’. In 1997, it became
2
possible to clone an animal from the genome of a somatic cell (cloning of the sheep Dolly)
and by the turn of the century, completion of Human Genome Project has taken us to the era
of functional genomics, the proteomics and micro-array technology. The developments in
the field of `stem cell research’ has opened yet another vista in the field of molecular
biology. The discovery of SiRNA and antisense RNA technology has provided new tools for
regulating the molecular events of the cell. All these discoveries and developments have
resulted in new dimension to our understanding of life. The entire health management
system has been revolutionized and it may soon be possible to have individualized
molecular profile of each person and manage his/her health accordingly.
The plant molecular biologists/ biotechnologists have also made important advances. It is
possible to micropropagate the plants by tissue culture techniques and save the endangered
plant species from being extinct. The tissue culture techniques can also be used for
largescale cultivation of medicinal and economically important plants. The rate of synthesis
and cellular concentration of useful secondary metabolites can be enhanced to obtain their
higher yields. It has become possible to provide new characters to the plants by developing
transgenics and using the plants as biofactory for the production of useful therapeutics and
other biomolecules. Some of the important landmarks in the fields of molecular biology
and biotechnology have been summarized in Table 1.
Table 1: Landmarks in molecular biology and biotechnology
18691941194419511952195319551957195819601960196119611961196319671970-
Friedrich Miescher discovered DNA.
Beadle and Tatum demonstrated that a gene codes for a single protein and
one gene one protein theory was put forward.
Avery, McLeod ana McCarty showed that DNA is the genetic material.
The helical conformation of a chain of aminoacids was proposed and the αhelix and β-sheet structures in proteins were deciphered.
Hershey and Chase proved that DNA is the carrier of genetic information.
Watson and Crick gave the double helical structure of DNA.
Method for determination of amino acid sequence of a protein was developed
by Frederick Sanger and the sequence of insulin was determined.
Arthur Kornburg discovered the DNA polymerase I.
Meselson and Stahl showed that DNA replicates in a semi-conservative
manner.
The detailed 3-D structure of proteins was described to very high resolution.
Polycistronic genes in bacteria were discovered. The one gene one protein
theory became obsolete.
The triplet nature of codons was discovered and the genetic code was
deciphered by Marshal Nirenberg and H.G. Khurana.
Messenger RNA was discovered.
Jacob and Monad proposed the `operon model’ for regulation of gene
expression.
The circular nature of bacterial DNA was discovered by John Cairns.
Enzyme DNA ligase was discovered by Gilbert.
Temin and Baltimore reported the discovery of reverse transcriptase in
3
19731974197519761976-
1977-
19781979198119811984199719982000-
retroviruses.
Type II restriction endonucleases were discovered.
Eukaryotic genes were cloned in bacterial plasmids.
The signal hypothesis was proposed by Gunter Blobel.
Retroviral oncogenes were identified as the causative agents for cellular
transformation by JM Bishop and HE Varmus.
DNA sequencing protocols were developed (chemical method by Maxam &
Gilbert and enzymatic method by Sanger) and it became possible to find out
the nucleotide sequence of gene.
It was shown that the eukaryotic genes are interrupted. The introns were
discovered and the splicing mechanism for the removal of introns from
primary transcripts was deciphered.
The NIH guidelines for r-DNA technology were formulated.
Cellular oncogenes were discovered.
The catalytic activity of RNA was discovered and the concept of ribozyme
was accepted.
Transgenic mice and flies were created by introducing novel genes in the
germ lines.
Polymerase chain reaction (PCR) was discovered by Kary Muller.
Dolly, the sheep was cloned from the somatic cell genome. The first animal
cloning experiment established the totipotency in animal cells.
RNAi was discovered.
The human genome project was completed. The first draft of the sequence of
human genome was published. Functional genomics and proteomics became
the new fields.
Discovery of DNA
The quest for understanding the mystery of life guided a number of scientists to analyze the
chemical nature and composition of cells. While analyzing the cell nucleus in a systematic
manner, Friedrich Miescher in 1868 studied the pus cells (leukocytes) from the hospital
bandages and found some phosphorus containing compounds in it that were acidic in nature.
Being identified within the nucleus of the cell, these compounds were named as nuclein.
Later nuclein was fractionated into two main groups, an acidic component and a basic
component. The acidic component was given the name nucleic acid. The basic component
was later identified to be proteins. Similar chemical entity was later isolated from Salmon
sperm cells. It was partially purified and some chemical studies were carried out on it and it
was partially characterized. Miescher is regarded as the scientist who discovered DNA.
Detailed analysis of the acidic component was followed by studying its chemical and
physical characteristics. A number of different scientists analyzed and established its
chemical composition and it was shown to have a nitrogenous base, a sugar and
phosphate(s). Later two separate nucleic acids, one having ribose sugar and other having 2desoxyribose as the sugar moiety were identified and these were given the name ribonucleic
acid (RNA) and deoxyribonucleic acid (DNA), respectively.
4
DNA is the genetic material
The nucleic acids and proteins soon became the key molecules that were implicated in
having the genetic information, its transfer from one generation to another and carrying out
various biochemical processes. Scientists were looking for the material that was responsible
for the storage of genetic information and transfer of parental characters to offsprings. A
group of scientists believed that DNA is the genetic material while many of the genetists
doubted it and thought that proteins are the key molecules for transfer of genetic
information. In 1928 Fred Griffith carried out the classical experiment where he infected
mice with two separate strains of Pneumococcus. One strain that contained capsular
glycoproteins, had smooth surface (the `S’ strain) and was virulent while the other strain
had rough cell surface (the `R’ strain) and was non-virulent. He showed that infection with
live bacteria of `S’ strain resulted in disease and lead to death of mice while the mice were
not killed if infected with `R’ strain. Further, the mice survived when infected with heatkilled bacteria of `S’ strain. However, the co-infection with bacteria of heat killed `S’ strain
and the live `R’ strain together resulted in death of the animals. This suggested that some
factor(s) present in the killed `S’ bacteria was capable of transforming the `R’ bacterium
resulting in its (the bacteria of non-virulent `R’ strain) conversion to a virulent strain.
Furthermore, when the DNA from heat killed `S’ strain was injected into the mice along
with live `R’ strain bacteria, the combination was found to be virulent and lethal. It clearly
demonstrated that DNA from killed `S’ strain bacteria was able to transform the `R’ strain
bacteria and this was responsible for the pathogenicity of the transformed bacteria.
However, doubts were expressed about the homogeneity and purity of the DNA preparation
(many of the refined techniques for purification of subcellular components were not
available at that time) and a number of scientists still thought that the contaminating
proteins might have been responsible for transformation of the `R’ bacteria resulting in the
death of mice. The dilemma continued till 1943 and many workers still believed that
proteins were the genetic material. In 1944 Avery, McLead and McCarty extracted the
transforming principle from the heat killed bacteria of `S’ strain, chemically characterized it
and showed it to be DNA. Their analysis included elemental analysis as well as physical
characterizations such as optical properties, ultracentrifugal behaviour, electrophoretic
migration and diffusion properties. Further, they also showed that removal of even the last
traces of lipids and proteins from the transforming principle has no effect on its
effectiveness as transforming factor. The treatment of this factor with either proteases or
RNases did not result in loss of its activity. But treatment with DNase resulted in total loss
of its transforming property.
In 1951 Roger Harriot determined the structure of bacteriophages and showed that the
phages have needle like structure with a head and a sharp pointed tail. The nucleic acid was
shown to be localized in the head region. Experiments demonstrated that during infection
the tail pierces the host bacteria and facilitates the transfer of phage nucleic acid into the
host cell. The outer envelope (or the proteins) remains outside the infected host. The phage
nucleic acid was implicated to be responsible for the host cell transformation. This
observation gave further support to the theory that DNA was the genetic material.
Though all these studies made it clear that DNA is the genetic material and is responsible
for the transfer of virulent characters in original experiment of Griffith, some doubts still
persisted. Finally in 1952 Alfred Hershey and Martha Chase performed their famous double
labeling experiment that unequivocally proved that DNA is the genetic material. They
radiolabeled the DNA of phage T2 with 32P and its protein with 35S. Bacteria were infected
with the double-labeled phage and the fate of radioactivity was carefully followed. It was
5
found that while 32P (i.e. phage DNA) was taken up by the host cells and was detected
inside the infected host cells, 35S (i.e. the phage proteins) did not enter the host cells and
remained outside in the culture medium. This confirmed that DNA was the genetic material
that was responsible for the transformation of the host bacteria beyond any doubt. It is now
well established that DNA is the genetic material for all the living cells except in certain
viruses (see later).
Basic structure of nucleic acids
A number of efforts were made to decipher the structure of DNA and RNA. Their chemical
composition was determined and it was established that three components namely the
nitrogenous bases, a pentose sugar and phosphates are covalently linked together to form
nucleotides. The polymerization of nucleotides leads to the synthesis of nucleic acid.
Further, the basic backbone in the primary structure of both DNA and RNA was found to be
similar. The analyses showed that there are only four different types of bases present in both
DNA and RNA. Two of the bases are pyrimidines and two are purines. Pyrimidine has a 6
membered ring structure consisting of four carbons and two nitrogens. The two-pyrimidine
derivatives present in DNA are cytosine (2-oxy, 4 amino pyrimidine) and thymine (2,4dioxy, 5 methyl pyrimidine). However, thymine is not present in RNA. RNA has uracil
(2,4-dioxy pyrimidine) in place of thymine. The purine has 5 carbons and four nitrogens
arranged in pyrimidine plus imidazole rings fused together. Two similar purines are present
both in DNA and in RNA. These are adenine (5-amino purine) and guanine (2-amino, 6-oxy
purine). The structure and the numbering on members of the rings have been shown in Fig.
1. In RNA, the pentose sugar is α-D-2-ribose present in β-furanose configuration (Fig. 2).
In DNA the 2nd carbon of ribose lacks –OH group, which is replaced by a –H atom. Thus
the sugar in DNA becomes α-D-2-deoxyribose. To differentiate the positions of carbon
atoms in sugar from the positions of the members in the ring of the base, the carbons on
sugar are numbered as 1’, 2’, 3’, 4’ and 5’ while the members in the rings of base are
numbered 1-6 in pyrimidines and 1-9 in purines. The nitrogen at position 1 of pyrimidine
or the nitrogen at position 9 of purine is attached to 1’ carbon of sugar through a gycosidic
bond. The sugar base complex is known as a nucleoside. The 5’ carbon of the sugar is
linked to phosphate group(s). Nucleoside and phosphates together are known as nucleotides.
Naturally occurring nucleotides have either one, two or three phosphates attached to each
other in linear manner (at the α,β and γ positions) through ester bonds and are known as
nucleoside mono-, di- or triphosphates. The ribose containing nucleosides and nucleosides
present in RNA are known as ribonucleosides and ribonucleotides (rNTPs) while those
present in DNA contain deoxyribose and are called deoxynucleosides and deoxynucleotides
(dNTPs). Following is the relationship between base/sugar/phosphates and the names of
resulting nucleoside and nucleotides.
Base
Sugar
N-glycosidic linkage
Nucleoside
Phosphate
Ester linkage
Nucleotide
6
Fig. 1: Components of nucleic acids
7
Fig. 2: Nucleosides and nucleotides
8
Base
Pyrimidines
Cytosine
Thymine (only in DNA)
Uracil (only in RNA)
Purines
Adenine
Guanine
Nucleoside Nucleotide
Cytidine
Cytidine (mono-, di- or tri-) phosphate
Thymidine Thymidine (mono-, di- or tri-) phosphate
Uridine
Uridine (mono-, di- or tri-) phosphate
Adenosine Adenosine (mono-, di- or tri-) phosphate
Guanosine Guanosine (mono-, di- or tri-) phosphate
During polymerization, the 3’-OH group of first nucleotide condenses with the 5’-PO4
group of second nucleotide and forms a phosphodiester bond. The process continues `n’
number of times and a long chain of polynucleotide is formed that makes the basic structure
of both DNA and RNA. It should be noted that only one PO4 group (the α-PO4) is required
for the phosphodiester bond formation. However, under natural conditions during enzymatic
biosynthesis of nucleic acids, the enzymes (RNA polymerase and DNA polymerase) use
nucleoside triphosphates as substrate. Nucleoside mono or diphosphates can not serve as
substrate for any of these enzymes. The α-phosphate is conserved in the phosphodiester
bond while the β- and γ-phosphates are released as inorganic pyrophosphate. The general
structure of a polynucleotide has been shown in Fig. 3.
As can be seen in Fig.3, in a long chain of polynucleotide only the first nucleotide will have
free phosphates at the 5’-end. This is referred as the `Head’ or 5’-end of nucleic acid.
Similarly only the last nucleotide will have free –OH group at the 3’-position. This end is
known as `Tail’ or the 3’-end of nucleic acid. The polymerization is thus directional and
provides polarity to the nucleic acid. It is an accepted practice that the structure of nucleic
acid is written in a 5’→ 3’ manner. If for any reason, one wants to write the sequence of
nucleotides in opposite manner it is essential that the 5’ and the 3’ ends be clearly indicated.
Further, no internal free –OH (except in the 3’-tail nucleotide) is present in DNA. This
(absence of a reactive –OH group) makes DNA a very stable molecule. This stability serves
very useful purpose in its role as carrier of genetic information and is essential for
maintaining the genetic conservation. It is a very strong and convincing example of
structure function relationship. RNA, on the other hand, has one free –OH group (the 2’ –
OH) in each of the nucleotide. This makes RNA highly reactive and prone to chemical
modification/ degradation. In fact, it is possible to isolate DNA from dried blood drops and
even from mummies that died thousands of years ago. No intact RNA can be isolated from
such samples.
9
Fig. 3: General structure of a polynucleotide (nucleic acid)
10
Three-dimensional structure of DNA
Soon after the chemical nature and basic structure of DNA became clear, efforts were made
to establish its three dimensional structure. The analysis of DNA from a number of
organisms at different stages of evolution showed that while the same four bases are present
in all the DNAs, the actual base composition of DNA from different species varies and is
unique for any given organism. However, the base composition of DNA within any
particular species remains constant and does not vary within the individuals of the same
species. This further confirmed that DNA is responsible for the species-specific characters.
However, detailed chemical analysis of various DNA molecules revealed an interesting
observation. It was found that the number of purines (A+G) in any DNA sample always
equals to the number of pyrimidines (T+C). Within this ratio, the number of A residues
always equals the number of T residues and the number of G residues always equals the
number of C residues (Table 2). It, therefore, became evident that there has to be a
relationship between A and T and between C and G. Based on these observations, a number
of biophysical studies and the X-ray diffraction studies, a number of different models for
three dimensional structure of DNA were proposed. Finally, James Watson and Fredrick
Crick gave the three dimensional structure of DNA in 1953. This model, popularly known
as the double helical model of DNA structure, explained the three-dimensional and space
filling structure of DNA in an unequivocal manner and is universally accepted.
Table 2: Base composition of DNA from different species
Organism
Relative ratio of bases (%)
A
Human
Sheep
Chicken
Turtle
Salmon
Sea urchin
Locust
Wheat
Yeast
E. coli
Staphylococcus
aureus
Phage T7
Bacteriophage
λ
Phage ϕX174
(RF)
G
C
Ratio
T
A/T
G/C
Pu/Py
30.9
29.3
28.8
29.7
29.7
32.8
29.3
27.3
31.3
24.7
30.8
19.9
21.4
20.5
22.0
20.8
17.7
20.5
22.7
18.7
26.0
21.0
19.8
21.0
21.5
21.3
20.4
17.3
20.7
22.0
17.1
25.7
19.0
29.4
28.3
29.3
27.9
29.1
32.1
29.3
27.1
32.9
23.6
29.2
1.05
1.03
1.02
1.05
1.02
1.02
1.00
1.01
0.95
1.04
1.05
1.00
1.02
0.95
1.03
1.02
1.02
1.00
1.00
1.09
1.01
1.11
1.04
1.03
0.97
1.00
1.02
1.02
1.00
1.00
1.00
1.03
1.07
26.0
21.3
24.0
28.6
24.0
27.2
26.0
22.0
1.00
0.92
1.00
1.05
1.00
0.79
26.3
22.2
22.3
26.4
1.00
1.00
1.00
Double helical structure of DNA
In their model, Watson and Crick depicted that DNA has a double stranded structure in
which the two strands of polynucleotides are present. These strands run in an anti-parallel
11
manner (i.e. 5’-end of one strand faces the 3’-end of the other strand). Further, the two
strands are coiled along a common axis in a spring like manner. The coiling is right-handed.
The entire structure is highly organized. It has a diameter of 23.7 Å. Each complete turn of
the coil contains 10 bases. Two adjacent residues are therefore at an angle of 36º from each
other. Further, two bases are 3.4 Å apart from each other. The pitch or the length of a
complete turn of the helix is 34 Å. The adjacent nucleotides in each strand are joined
together by a phosphodiester bond between the 3’ and 5’ carbons of the sugars. The sugarphosphates form the backbone of DNA, while the bases are present at a right angle to the
axis. Two strands are held together by hydrogen bonds between bases. The analysis of
structure of the bases revealed that for maintaining stability the hydrogen bond formation
should take place between the pyrimidine in one strand and the purine in other strand. A
purine-purine pairing will be too big while a pyrimidine-pyrimidine pairing will be too
small to fit within the helix. Detailed analysis revealed that in fact this is the case. It was
found that the `A’ and `T’ and the `G’ and `C’ are complementary to each other and an `A’
base in one strand always pairs with a `T’ in the other strand and a `G’ base pairs with a `C’
and vice versa. This explained the genesis for observed ratio between purines and
pyrimidines and between A/T and G/C in DNAs from different species. There are two
possible hydrogen bonds between A:T pairing and three bonds in G:C pairing. The
hydrogen bonding takes place either between the –NH2 group of one base and =O of the
other base or between =NH of one base and the –N of the other base. For stable bond
formation, the distance between N-N is 0.30 nm and that between O-N is 0.28-0.29 nm. The
positions and the distances between these bonds have been shown in Fig. 4.
Fig. 4: Base pairing between complementary bases: Formation of hydrogen bonds
12
Besides the hydrogen bonds, base stacking or π-π interaction is the other force that helps in
helix stabilization. It involves hydrophobic interaction between adjacent base pairs. These
hydrophobic interactions are formed as the hydrogen bonded structure of water forces the
hydrophilic groups into internal parts of the molecule. While both base pairing and base
stacking are important in holding the two strands of DNA together, the base pairing has
important biological implications. The complementarity of bases and the fact that an `A’
can pair only with T and a G can pair only with C provides a mechanism by which new
copy of DNA can be produced in a template dependent manner. This pairing ensures the
replication of DNA with high degree of fidelity.
The coiling of the strands along the common axis creates two grooves of different sizes in
each turn. The larger of the grooves is known as the major groove while the smaller groove
is referred as the minor groove. The grooves provide a convenient site for binding of many
DNA binding proteins that play important role in regulation of gene expression as well as
for metabolic factors such as polymerases and transcription factors. Further, the base pairing
provides certain degree of flexibility to DNA strands. It is therefore possible to change the
configuration of DNA in response to certain signals which forms the basis of a number of
regulatory processes. The three dimensional double-helical structure of DNA based on
Watson and Crick’s model is shown in Fig. 5.
Fig. 5: Structure of DNA
13
Alternate forms of DNA
The basic structure of DNA given by Watson and Crick represents the structure of majority
of the cellular DNA. The DNA with this basic structure is referred as the `B’ DNA.
However, all the DNA molecules are not uniform in their structure. A relatively small
amount of DNA may have certain alternate structures. The double helical structure provides
some degree of flexibility and allows the molecule to take slightly different shapes. While a
number of possibilities exist, the most important conformational change involves the
rotation around the glycosidic bond. It changes the orientation of base in relation to the
sugar. The rotation around the bond between the 3’ and 4’ carbon can also take place. Both
these rotations result in changed positioning of two strands and certain alternate structure of
DNA can be formed. Following are the common alternate forms of DNA.
`A’ DNA: It is very minor species of DNA that may or may not be present under normal
physiological conditions. Its presence has been demonstrated in vitro in less hydrous
environment having high Na+ and K+ concentrations. The `A’ DNA is more compact than
the `B’ DNA. It has the diameter of 25.5 Å, distance between two adjacent bases is 2.9 Å
and the pitch is 32 Å. Thus there are 11 bases/turn. This form of DNA has high degree of
resemblance with double stranded RNA. It has much deeper major groove and the minor
groove is very shallow. The `A’ DNA is right handed in its helical turnings. Variations,
such as B’, C, C’ D, E and T DNAs that have minor differences with A or B DNAs and
have right handed double helical structure have also been reported. However, their precise
function is not clear.
`Z’ DNA: It is left handed form of the DNA. In this form of DNA the turns in the DNA
helix are in opposite direction than in other forms of DNA. The `Z’ DNA is slimmer and
has a diameter of only 18.4 Å. The two strands of the helix are coiled in left-handed manner
around the common axis having about 12-bases/ turn. There are no major and minor
grooves. There is only one groove and that too is narrow and deep. The base conformation
is more like a zig-zag arrangement (this gives the name `Z’ DNA). Under experimental
conditions the presence of `Z’ DNA has been shown in high salt condition or in presence of
certain specific cations, such as spermine and spermidine. This form of DNA has high
degree of negative supercoiling and has certain specific proteins attached to it. Besides,
relatively high methylation at 5- position of C residues has been found in `Z’ DNA.
Though precise role of alternate forms of DNA is not very well understood, they may play
some regulatory function. The possibility that some of these DNAs may be artifact of
experimental conditions may not be completely ruled out. As discussed, the conformation of
DNA plays an important biological function. Majority of the regulatory controls require
binding of certain factors to DNA. Any change in the structure will affect the binding of
these factors and will therefore regulate the biological activity.
Structure of RNA
It has been mentioned earlier that the basic structure of RNA is very similar to the primary
structure of DNA. The ribinucleotides are polymerized through phosphodiester bonds in a
directional manner to form RNA. However, the three dimensional structure of RNA differs
from DNA. Majority of the cellular RNA is single stranded. In many RNA molecules
regions of internal complementarities may be present. These regions may create high degree
of secondary structures consisting of intra-molecular double stranded stems. Though
14
relatively very small in quantity, some double stranded RNA molecules containing intermolecular base pairing (similar to DNA) may also be present in the cell. Such ds RNA may
have regulatory functions. The base pairing (both intra- and inter-molecular) in such double
stranded regions follows primarily the Watson-Crick’s model, i.e. an `A’ pairs with a `U’
(as discussed, RNA has uracil in place of the thymine in DNA, rest of the three bases are
common between RNA and DNA) by two hydrogen bonds and `C’ pairs with `G’ by three
hydrogen bonds. However, certain alternate base pairing especially between U and G can
also be present. Certain RNA molecules (tRNA for example) may have some unusual and/
or modified bases. Some of such bases are ribothymine (rT), dihydrouridine (D),
pseudouridine (ψ), 4-thiouridine (S4U), 3 or 5 methyl cytosine, inosine (I), N6- methyl (or
isopentamyl) adenosine, methyl guanosine, quanosine (Q) and wyosine (W). These bases
can form alternate base pairing (Fig. 6).
15
Fig. 6: Modified bases present in tRNA
Sometimes triple base pairing can also take place in which a base pairs with two different
bases (for example C:G:m7: G or U:A:A). In such pairing one pairing is the Watson-Crick
pairing while other is the alternate pairing (Fig. 7). Such alternate bases and unusual base
pairings play an important role in maintaining the three dimensional structure of specialized
molecules and have implication in specific function of the molecules. The tRNA structure:
function relationship is an important example of such modifications.
Three main types of RNA are present in the cell. These are ribosomal RNA (rRNA),
messenger RNA (mRNA) and transfer RNA (tRNA). Each of these RNA has specific
properties and specific functions. Their detailed structure and functions will be described
elsewhere. The rRNA is the constituent of ribosomes and is most predominant class of
cellular RNA, making almost 90% of the total RNA. A number of rRNA molecules are
present that include 28S RNA (23S in prokaryotes), 18S RNA (16S in prokaryotes), 5.8 S
RNA (not present in prokaryotes) and 5S RNA (present both in eukaryotes and in
prokaryotes). The tRNA is relatively small in size (4S) having 78-108 bases. The tRNA acts
as the adapter molecule that carries specific aminoacids to ribosomes during protein
16
synthesis. The tRNA molecules have unusually high number of modified bases and a
complex secondary structure consisting of four stems and loops, each having a specialized
role during aminoacylation and/ or aminoacid transfer. The mRNA contains the genetic
information copied from DNA, which is translated by the translational machinery in the
form of sequence of aminoacids of proteins. While in terms of content, mRNA is only 1-2%
of total cellular RNA, a typical mammalian cell can have up to 10,000 different mRNA
molecules. Further, these have very heterogeneous distribution as their size ranges from 4S
to 32S, 8S-15S being the most predominant class. The copy number of different mRNAs
varies; some mRNAs have only 1-5 copies/cell (rare mRNAs) while certain other mRNAs
are present in up to 12,000 copies/cell (the abundant mRNAs). Furthermore, the distribution
of mRNAs varies from one tissue type to another and forms the basis of tissue specific
functions. The mRNA has certain specialized structures (5’-cap and 3’-poly (A) tail in
eukaryotes and SD sequences in prokaryotes) that help in its specialized functions.
Fig. 7: Unusual base pairing in nucleic acid (especially in tRNA molecules)
17
In addition to the three major classes of RNAs, some small RNA molecules are also present
in the cell. These include small nuclear RNAs (snRNAs), small cytoplasmic RNAs
(scRNAs), silencing RNA (SiRNA) that have specialized regulatory or other functions.
Some of the snRNAs play key role in mRNA splicing. Some RNA molecules with catalytic
function have also been recognized. These are known as ribozymes. The discovery of
ribozymes changed the earlier concept that all enzymes are proteins. It is now accepted that
most (but not all) enzymes are proteins.
Majority of the cellular RNAs are present in association with proteins. Usually the RNAs
and proteins are associated with each in non-covalently. The RNA protein complexes are
known as ribonucleoprotein particles (RNPs). Very little free RNA is present in the cell.
Unlike DNA, that is present in the nucleus (except for mitochondrial and chloroplast DNA),
RNA is present both in nucleus and in cytoplasm. The RNA-associated proteins may help in
stability of RNA. These may also play a role in regulation of mRNA metabolism. It has
been found that certain proteins that are present when mRNA is not being translated (the
CmRNPs), get dissociated when the same mRNA is being actively translated (the
PmRNPs). Certain other specific proteins are associated with poly A tail region and are
important for the stability.
Certain viruses have RNA as their genetic material. Retroviruses and reoviruses are the
RNA viruses. They have a specialized mechanism for the replication of their geneome.
Genome and the C-value
The total content of DNA in a cell is known as the genome. The entire amount of DNA in
the haploid cell is also known as the C-value. There is a relationship between the evolution
and the C-value. Generally (but not always), the organism at a higher position in the
evolutionary ladder has higher C-value. This has a simple logic as the genes are packed
closely in less complex organisms to save space. The higher C-value has a clear advantage
as it provides higher potential for the coding of more proteins. Due to complex cellular
metabolic status of higher organism, more proteins are required for their cellular function.
However, it also has a disadvantage to the cell in terms of need to replicate large amounts of
DNA that is very taxing to the cell in terms of energy and other requirements. The C-value,
DNA content and chromosome number of certain organisms has been shown in Table 3.
An exception to general rule for inter-relationship between the C-value and evolution has
been seen in the cases of certain amphibians. Even though much lower to mammalians,
especially to humans, in terms of the evolutionary ladder, these have much higher genome
size, some times up to 100X higher than the human genome. Further, their DNA content is
much higher than most other amphibians. This is referred as the C-value paradox. Certain
plants also have very high C-value. It also deserves mention that sometimes the DNA
content may vary from cell to cell within the same species. For example, the amphibian
oocytes have higher number of rRNA genes and large amounts of ribosomes than the
somatic cells. In case of eukaryotes, the DNA is present in the nucleus of the cell. However,
two organelles, namely mitochondria and chloroplasts, have their own genome. The basic
structure of organelle genome is same as the structure of nuclear genome. In prokaryotes
where there is no nucleus, the DNA is present in the centre of the cells in a semi-defined
structure, the nucleoid. This observation is in contrast to earlier view that stated the
prokaryotic DNA to be present in an interspersed state within the entire cytoplasmic region.
18
Table 3: DNA content of some of the organisms
Organism
DNA content (C value)
Chromosome
number (2x)
Nanogram/Cell Kbp
SV40
HSV
E. coli
S. cerevisiae (Baker’s Yeast)
Arabidopsis thaliana
Drosophila melanogaster
Sea urchin
Gallus domesticus (chicken)
Homo sapiens (human)
Mas masculus
Xenopus laevis (frog)
Zea mays (corn)
Trinaras cristatus
HeLa cells (Human cervical cell line in
culture)
6.0x10-9
1.7x10-7
5.3x10-6
25x10-6
7x 10-5
17x10-5
45x10-5
7x10-4
3.6x10-3
3.0x10-3
4.2x10-3
7.8x10-3
3.5x10-2
8.5x10-3
5.3
151
4.5x103
22.5x103
64x103
0.15x106
0.41x106
0.63x106
3.2x106
2.7x106
3.8x106
7x106
31.5x106
77x106
1
1
1
34
10
8
40
78
46
40
36
20
24
Polypoidy
Central dogma of molecular biology
As discussed, DNA is the basic genetic material in almost all the cells (a few viruses are the
exception – see later). It was found that DNA is the only macromolecule that has the
capacity to self-replicate. With the help of the enzyme DNA polymerase, DNA can replicate
and produce another copy of the genome. The DNA replication is a highly controlled
process. The cell ensures the fidelity of primary structure of DNA and the daughter strands
are true copies of the parent strands. The proof reading function and the repair mechanisms
of the cell correct any mistake to ensure the fidelity. On an average the rate of mistake is 1
in 109 to 1 in 1010 bases. The replication of DNA is coupled with cell division and once
DNA is replicated, the cell gets committed to divide. The DNA replication takes place
during the S phase of cell cycle that is followed by the M phase when the cell divides. Thus
the amount of DNA remains constant in all the living cells. Entire genome is replicated in
toto during a cycle of replication and replicates only once during each cell cycle. No region
of genome replicates preferentially or replicates more than once during one cycle.
The information stored in DNA can be transferred to RNA by the process of transcription.
The main enzyme for transcription is RNA polymerase. Both DNA and RNA use very
similar language for encoding the genetic information (four nucleotides – dNTPs in DNA
and rNTPs in RNA). However, unlike the replication of DNA entire genome is not
transcribed together during transcription. A small segment of DNA, referred as gene, is the
unit of transcription. Each gene codes for one molecule of RNA in eukaryotic cells (the
monocistronic genes). In prokaryotic cells the genes encoding for molecules performing
19
similar functions are often grouped together and are transcribed as a single RNA molecules
(polycistronic genes). Further, the transcription is a continuous process and has no corelation with cell cycle. Furthermore, a gene can be transcribed more than one time to
produce multiple copies of RNA. It may be noted that only a small portion of DNA is
represented as genes within the genome. Almost 90% (or even more) of the genome does
not code for any useful information and represents the `junk DNA’. Besides the junk DNA,
introns are also the non-coding region of genes. However, these introns are removed from
the primary transcript by a process known as splicing. The mature mRNA does not have the
introns.
The information in the form of RNA (mRNA) is further transferred to yet another molecule,
the proteins. Proteins are the ultimate products of genetic information and carry out majority
of the biological functions of the cell. The proteins and nucleic acids are entirely different
and unrelated molecules. They have different chemical and physical properties. Unlike
nucleic acids that are made up of four types of nucleotides, proteins are made of twenty
different aminoacids. The transfer of information to proteins therefore requires the uncoding
of the information from one language (the nucleotides) to another language (aminoacids).
This coding is done in the form of a language referred as genetic code, where three
nucleotides (a triplet) code for one aminoacid. The process of uncoding of genetic
information to synthesize the proteins is referred as translation. The translation is a complex
process involving a multi factorial machinery. The site of protein synthesis is ribosome. A
number of enzymes and other `trans’ acting factors also participate in the process. The
entire process is highly regulated to ensure the formation of correct proteins.
This pathway for the transfer of genetic information from its storage form (DNA) to
biologically active form (proteins) is referred as the central dogma of molecular biology
(Fig. 8).
Exceptions to central dogma
The description above refers to the pathway of transfer of genetic information as seen in
almost all the living cells. However, certain viruses have RNA as their genetic material.
How do these replicate and express their genetic information? In retroviruses, that have
double stranded RNA as the genetic material, synthesis of complementary DNA is an
obligatory requirement for the replication of the genome. It requires a specific DNA
polymerase that uses RNA as template (RNA dependent DNA polymerase) and synthesizes a
DNA molecule that is complementary to RNA genome (cDNA). This process is thus
opposite of classical transcription and is referred as reverse transcription. The enzyme is
commonly known as reverse transcriptase (RTase). It is a virus-coded enzyme, not present
in host cells. The enzyme has template specificity and will use only the viral RNA as
template. Specific tRNA molecules serve as primer for the RTase. Further, the process has
certain specific requirements/steps (being discussed elsewhere). The properties and
requirements of RTase are very similar to DNA polymerase. However, there are two
marked differences: (i) the capability to use RNA as template, though RTase can use a DNA
also as template and (ii) the majority of the RTases do not have proof reading functions.
The fidelity of cDNA synthesis is therefore, much less than that of DNA synthesis.
Obviously for replication of retroviruses, first the enzyme RTase has to be synthesized. For
this purpose, the sub-genomic RNA (a portion of RNA genome that has an open reading
frame coding for a protein) serves as mRNA and can synthesize RTase using host
translational machinery. The resulting DNA form of the virus is referred as pro-virus that
20
can integrate into the host genome. The transcription of this DNA can produce multiple
copies of viral genome.
DNA Replication
DNA polymerase
Reverse transcriptase
In retroviruses
Transcription
RNA polymerase
In reoviruses
RNA replicase
RNA
Ribosomes
PROTEINS
Exception to Central Dogma
Only in RNA viruses
The Central Dogma
In all the living cells
Fig. 8: Central Dogma of Molecular Biology
Another group of RNA viruses, the reovirus do not replicate through DNA intermediate.
The RNA replicates directly. This is again an exception to the general rule that states that
DNA is the only self-replicating biomolecule. The enzyme required for RNA replication is
RNA dependent RNA polymerase, also known as RNA synthetase or replicase. This template
specific enzyme synthesizes multiple copies of viral genome. The reoviruses are either
double stranded or single stranded RNA viruses. Further, the single stranded viruses can be
either `+’strand or `–‘strand. For the replication of its genome, first the enzyme has to be
synthesized. The viral genome codes for RNA synthetase. The sub-genomic mRNA is first
translated to produce the enzyme and this enzyme then carries out the replication of viral
genome. The synthesis of enzyme is straight forward in case of double stranded or `+’strand
ss:RNA viruses as the genome itself can serve as mRNA. However, there is a paradox in
case of the `-‘ strand ss:RNA viruses. Their genome is complementary to mRNA and can
not code for the protein. For the protein synthesis it is obligatory that the complementary
RNA strand (the `+’ strand) is first synthesized. How this + strand (or the mRNA) that will
code for the enzyme RNA synthetase can be synthesized. Without the synthesis of mRNA
the necessary enzyme can not be synthesized and without the enzyme (RNA synthetase) the
21
mRNA can not be synthesized. It was a big puzzle. However, it got solved when it was
discovered that small amounts of preformed enzyme are always encapsulated along with the
RNA genome during the formation of viral particles. This enzyme initiates the process of
mRNA synthesis, which starts the ball rolling. In the absence of this preformed enzyme, it
will not be possible for the `-‘ strand RNA viruses to replicate.
The entire gamut of gene expression in RNA viruses has been shown in Figure 8 at the left
hand side as the exception to central dogma. By integrating the central dogma with the
exceptions, it becomes clear that the transfer of genetic information from DNA to RNA can
be considered as a reversible transfer (under experimental conditions, not under normal
physiological condition within the cell in vivo). However, once the information has been
expressed in the form of proteins, it is irreversible process and cannot be retrieved back in
form of RNA or DNA. In fact, the process of reverse transcription (using viral RTase) is
routinely used in genetic engineering for the synthesis of cDNA that serves as the important
intermediate for gene cloning experiments. However, it may be possible to deduce the
possible sequence of the coding region of a gene if the aminoacid sequence of the protein
coded by this gene is known. Though, the degeneracy of genetic code makes it difficult to
deduce the correct gene sequence. Furher, a protein molecule can never serve as the
template for the synthesis of either a RNA or a DNA.
Genome organization
A cell has large amounts of DNA. Bacteria have one of the smallest genome of any living
cell. The eukaryotes have much larger genome. A careful analysis revealed that while the
number of expressed genes in eukatryotes is 2-10 times higher than the prokaryotes, the
complexity of genome can be higher by many orders of magnitude than in prokaryotic cell.
This simply means that eukaryotic cells have substantial amount of DNA that does not code
for any known gene. This DNA is referred as `junk DNA’. Its role, if any, is not fully
understood. In higher eukaryotes up to 90% of total genome consist of this DNA. However,
the junk DNA may not necessarily be `junk’ always. Though its precise function is not
known today, it may have important role in evolution. Besides, it carries out at least one
important function that is to protect the useful information from random spontaneous
mutations by providing a type of cushion to the useful information.
Recently concluded Human Genome Project (RGP) revealed that human genome is
approximately 3.2x106 kb in size. This means that the length of DNA will be approximately
one meter (as each base is 0.34 nm apart from each other, 1 kb DNA will have a length of
0.34x10-6m the length of entire DNA will therefore be 0.34x10-6 x 3.2x106 = 1.088m). How
does such a long strand of DNA fit within a cell that is microscopic in size? It is possible
only because the DNA is present in a highly condensed form. It is packed in thread like
structures, the chromosome. A number of proteins and RNA also participate in chromosome
formation. Prokaryotes have their entire DNA in the form of a single chromosome.
Eukarytic cells on the other hand, have many chromosomes. The number of chromosomes
is the characteristic property of a particular organism. In majority of the eukaryotes there are
two copies of each chromosome (diploid, often written as 2x). Humans have 23 pairs (i.e.
46) chromosomes in their genome. Occasionally more than two copies of a chromosome
may be present (polypoidy). Polypoidy is more common in plants than in animals. In a
similar manner, sometime only one copy of a chromosome may be present (haploid, 1x).
Chromosome represents the ultimate organized form of genome. The E.coli DNA in its
fully extended form is approximately 1100 nm in length and is circular in shape. In contrast
22
to earlier belief that the bacterial genome is naked DNA that is spread all over the
cytoplasm, it has now been established that bacterial DNA is present as highly packed
structure known as the nucleoid, which represents the folded form of bacterial genome.
The physical form of DNA, complexed with RNA and proteins is referred as the chromatin.
The chromatin may be present in two forms. The region that is denser and stains darkly is
referred as heterochromatin. It is metabolically inactive and is relatively resistant to DNase
action. The less dense region that stains lightly and is highly sensitive to DNase action is
known as euchromatin. The euchromatin is the metabolically active region of genome that
gets actively transcribed. It may be noted that genomic portions that are present as
euchromatin and heterochromatin are not irreversibly committed regions and these can
interchange amongst themselves based on the metabolic state of the cell. Certain regions
that may represent the junk DNA and are devoid of any gene may be permanently present as
heterochromatin (the constitutive heterochromatin) while certain other regions may be
present as heterochromatin in one cell type but as euchromatin in other cell type. These
regions may represent the portion having genes that are expressed in a tissue specific
manner. Such regions are known as facultative heterochromatin. The chromatin
organization can play a role in regulation of gene expression also.
Chromatin has highly organized structure consisting of three levels of organization. The
first level of organization is referred as nucleosome. Under electron microscope, the
nucleosomes are seen as ellipsoidal beads of 110x60 Å joined together by thin thread like
structure at regular intervals. Mild digestion with DNase reveals that a region of 146 bp
DNA is enzyme resistant. Further, integral multiples of DNase resistant fragments are seen,
suggesting that the repeated structure of nuclease resistant fragments is present within
chromatin. These 110 Å nucleosomes constitute the basic unit of chromatin. Detailed
analysis of nucleosomes showed that it contains about 200 bp of DNA wrapped around an
octamer of histone proteins. Histones are basic DNA binding proteins that participate in
chromatin formation and are the integral part of chromosomes. The histone octamer
includes two molecules each of H2a and H2b and H3 and H4 histones. The DNA is wound
as one and three fourth (1¾) turns of superhelical DNA around the histone core. The
repeated nucleosomes are joined together with linker DNA, one molecule of H1 histone (the
H1 histone is also known as the linker histone, the nucleosome to which the H1 histone is
attached, is also called as chromatosome) and certain non-histone proteins. The linker
histones may be present on the surface of the nucleosome-DNA assembly and act as clamps
that prevent the coiled DNA from getting detached from nucleosomes. However, in some
cases the H1 may not be present on the surface but instead may be present imbedded
between the core octamer and the wrapped DNA. The length of linker DNA varies from
organism to organism and ranges between 8-114 bp. Certain variations between individuals
of the same species and between cell to cell in the same individual have also been seen.
Further, the H1 histone molecules may not be distributed evenly and the H1/DNA ratio of
the linker region may vary between different loci of the chromatin. The histones help in
stabilizing the nucleosome structure. These also help in further organization of chromatin
namely the 300Å fibres.
The nucleosomes are further condensed to form highly coiled lumpy fibre with an average
diameter of 300Å. In formation of 300Å particles, the nucleosomes are in direct contact
with each other without much of the linker DNA. These are super coiled into the solenoid
structure. The linker histones and/or the tails of histone octamer can help in packaging of
nucleosomes into 300Å particles. The nucleosomes are wound into a helix with 6
nucleosomes/ turn. These may be attached to nuclear matrix via A:T rich regions of DNA
23
that are known as matrix associated regions (MARs) or the scaffold attachment regions
(SARs).
Maximum degree of condensation of chromatin is observed during metaphase. The function
of these structures is to package the giant DNA molecule into a form that can easily be
segregated into daughter nuclei. It is important that the DNA molecule does not get
entangled during the segregation and the physical forces do not result in shearing of the
genetic material. This condensation of 300Å particles takes place with the help of nonhistone proteins. Histones do not directly participate in this condensation. This results in the
formation of scaffold structure that has a central core sorrounded by huge pool of DNA. As
entire chromosome is made up of a single DNA molecule, no apparent ends of DNA
molecules are visible in this scaffold structure. Various degrees of organization of
chromatin has been shown in Fig. 9.
Fig. 9: Organization of DNA into chromatin
24
The satellite DNA
As the DNA has random sequence and is uniform in nature, it is expected to have uniform
distribution of the nucleotides within the molecule. However, when the genomic DNA of an
eukaryotic cell is centrifuged through a CsCl gradient (in CsCl gradient any molecule bands
at a specific position where the buoyant density of CsCl is same as the density of the
molecule being analyzed. Any homogeneous molecule should therefore give a single band),
some unexpected results are obtained. In place of an expected single band the multiple
bands are obtained. The predominant band is at about 1.7 g/cc that represents majority of
the genomic DNA. However, at least three minor bands with buoyant densities of 1.692,
1.688 and 1.671 are also seen. These bands represent minor but distinct species of DNA,
known as the satellite DNA. The name satellite DNA is given to these DNA species because
these are different from the main class of DNA. The satellite DNAs represent the sequences
containing multiple repeats of short lengths of DNA. Often these have a distinct sequence
that is different from the average DNA and can therefore show independent behaviour. The
repetitive DNA represents minor but substantial amount of genome comprising of up to 2050% of the genome. Some of the short sequences may be repeated up to a million times.
Certain degree of conservation in sequences can be seen in various regions of satellite DNA
in an organism. However, the length of the repeats may vary with the evolutionary position
of an organism. The satellite DNAs have changed in an unusually rapid manner with the
evolution. Further, certain characteristic differences can be seen between the satellite DNA
of two closely related species. The satellite DNA sequences can also serve as transposable
elements and can get interspersed within the genome in highly characteristic manner. The
DNA of Alu family and L1 elements represent such DNAs. These can therefore, provide
characteristically distinct characters to different individuals of a species. These have been
used as the target for DNA fingerprinting.
The satellite DNA is present within the heterochromatin region of the DNA and is not
transcribed. So far no role has been assigned to these repeats. These do not perform any
useful function in maintenance, metabolism, survival or replication of the cell. These
DNAs are often referred as `selfish DNA’ sequences that ensure their own retention during
replication but do not make any contribution for the benefit of the cell. While many of the
repeat sequences have distinct base composition (highly enriched in A:T, for example),
others have similar structure as the average DNA and may not always band as distinct
separate entity.
Highly and moderately repetitive sequences
The repeatitive DNA can be of many types. Some of the repeat sequences are clustered at
certain regions of genome in tandem arrays. These are referred as tandemly repeated DNA.
This type of repeats is very common in eukaryotic DNA but are less frequent in prokaryotic
genome. The satellite DNA is arranged in tandemly repeated form. Based on the length and
number of repeats, these can be either microsatellites or minisatellites. The minisatellites
consist of repeat units up to 25 bp, present in clusters of up to 20 kb. The microsattelites on
the other hand are smaller, having a repeat unit of only up to 13 bp present in short clusters
of 150 bp or less. The minisattelites are important feature of chromosome structure. For
example, the telomeric DNA of a eukaryotic genome contains hundreds of copies of small
repeats. In human, this repeat motif is 5’-TTAGGG-3’. This repeat motif plays an important
role in DNA replication. Certain other minisatellites are present near the ends of eukaryotic
chromosomes. The chromosomal ends are not the only region of the genome where
25
minisatelitles may be present; some minisattelites are present in other regions too.
However, the functions of these minisatellites are not well understood.
The microsatellies on the other hand are much shorter. A typical microsatellite is made of
10-20 repeats of 1, 2, 3 or 4 bp. A large number of microsatellites are present in the
genome. In human a microsatellite with CA repeats such as
5’-CACACACACACACAC-3’
3’-GTGTGTGTGTGTGTG-3’
makes about 0.25% of entire genome (approximately 8 Mbp total). Similarly single base
repeats such as
5’-AAAAAAAAAAAAAAA-3’
3’-TTTTTTTTTTTTTTT-5’
can make up to 0.15% of the genome. The microsatellites have been proved very useful to
geneticsts. Due to slippage during the replication of the microsatellites, sometimes the
number or location of these may become variable from one individual to another individual.
These are thus one of the characterisitic features of an individual’s genome. The
microsatellites, therefore, are used as the main tool for DNA fingerprinting. A large
number of DNA fingerprinting probes based on microsatellites have been developed.
Other types of repeats may be present interspersed throughout the genome. These may have
evolved due to transposition events during the course of evolution. A number of such
repeats are present in eukaryotic genome. Depending on the length of the repeats, these may
be LINES (long interspersed nuclear elements) or SINES (short interspersed nuclear
elements). Other repeats are LTRs (long terminal repeats) and of course, the transposons.
Often such repeats form a charcterisitic feature of a species. For example, in a short
fragment of 50 kb in human chromosome 7, which forms the part of a larger (685 kb)
`human β T-cell receptor locus’, 52 such repeats are present.
Denaturation of DNA
As discussed earlier, the DNA is a double stranded molecule and the two strands are joined
together by hydrogen bonds between the complementary bases. There are no covalent
linkages between the two strands. It is therefore, possible to separate two strands by
relatively mild treatment without breaking its basic structure (i.e. the phosphodiester bond).
The separation of two strands is referred as the denaturation of DNA or the melting of DNA.
A number of different treatments can denature DNA. These include the denaturing
chemicals such as alkali, urea and formaldehyde. However, one of the simplest and most
commonly used methods to melt DNA is by heat denaturation that can be achieved by
raising the temperature of a DNA solution. The temperature at which half of a DNA
molecule (in aquous solution at neutral pH) opens is referred as the melting temperature or
Tm. This can be determined by UV–spectroscopic analysis of DNA at 260 nm (the
absorbtion maxima of DNA). The melting of DNA is associated with an increase in
absorbance. This process is referred as hyperchromacity. This effect is due to the fact that
melting of DNA results in decreased stacking of the bases (or increase in effective
concentration of UV absorbing groups), which results in an increase in the absorbance at
26
260 nm. . A fully melted DNA has 37% higher absorbance than the ds:DNA. The
absorbance of a solution of double stranded DNA (at 260 nm) with the concentration of 50
µg/ml is 1.0. However, the same solution will give an A260 value of 1.37 when fully melted
(the single stranded DNA).
G:C content of DNA and the Tm
As there are two hydrogen bonds between A:T and three hydrogen bonds between G:C, it
will require higher energy to break a G:C bond than to melt the A:T base pairing. Thus, the
Tm of a DNA will be higher if G:C content is higher. Hence, the melting temperature can
give an idea about the base composition of a DNA molecule. Of course, the length of DNA
will also play a role in the melting of DNA. Larger the molecule, higher temperature will be
required for its melting. Following formula can be used to determine the G:C content of a
DNA sample by determining the Tm.
Tm = 69.3+ 0.41 (G+C)% - 650/L
where L, is the length of the duplex DNA.
It should be noted that the presence of other denaturing agents in the solution will lower the
Tm. For example, if formamide is added to a DNA solution, the Tm of duplex DNA will get
lowered. There is a decrease of 0.7ºC in Tm for every 1% formamide in the solution.
Similarly, if there is any mismatch in the duplex DNA, the Tm gets lowered by 1ºC for each
1% of mismatches.
As the base composition of DNA is the characteristic for a species, the Tm of the genome
also becomes a characteristic property of a species. The Tm of an exclusive A:T DNA
(synthetic, having no G:C) is approximately 60ºC. The Tm of majority of the eukaryotic
DNAs is about 80ºC while the DNA of M.phlei (G:C content about 70%) has a Tm of
~95ºC. The interrelationship between Tm and G:C content has been shown in Fig. 10.
Fig. 10: Interrelationship between G:C content and Tm of DNA
27
Suggested Reading
1.
Text Book of Biotechnology: Fundamentals of Molecular Biology by S.K. Jain. Published by CBS
Publishers & Distributors, New Delhi (India)
2. Biochemistry by Jeremy M. Berg, John L. Tymoczko and Lubert Stryer. Published by W.H. freeman
and Company, New York (SA)
3. Principles of Biochemistry by Albert L. Lehninger, David L. Nelson and Michael M. Cox. Published by
CBS Publishers & Distributors, New Delhi (India)
4. Genes IX by Benjamin Lewin. Published by Oxford University Press, Oxford (UK)
5. Molecular Biotechnology: Principles and Applications of Recombinant DNA by Bernard R. Glick &
Jack J. Pasternak. Published by ASM Press Washington DC (USA).
6. Molecular Cell Biology by Harvey Lodish, David Baltimore, Arnold Berk, S. Lawrence Zipursky, Paul
Matsudaira & James Darnell. Published by Scientific American Books, New York (USA)
7. Molecular Biology of the Cell by Bruce Albert, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts
and James D. Watson. Published by Garland Publishing Inc. New York (USA)
8. Recombinant DNA by J.D. Watson, M. Gilman, J. Witkowski & M. Zoller. Published by Scientific
American Books, New York (USA)
9. Microbial Genetics by S.R. Maloy, J.E. Cronan and D. Freifelder. Published by Jones Bartlett
Publishers, Boston (USA)
10. Immunology by Richard A. Goldsby, Thomas J. Kindt and Barbara A. Osborne. Published by W.H.
Freeman & Company, New York (USA)
11. Concepts of Biochemistry by L.M. Srivastava. Published by CBS Publishers & Distributors, New Delhi
(India)
12. Physical Biochemistry by David Freifelder. Published by W.H. Freeman & Company, New York (USA)
28