Codon Bias-2

Andra Postu
Mammalian Codon Bias and Its Implications in Human Diseases
January 8, 2014
Introduction
Numerous codons can encode all but two of the twenty amino acids. It is commonly
assumed that the variations in the DNA sequence of proteins between two synonymous codons
have no affect on the organisms and are thus referred to as silent changes. The use of these
synonymous codons indicates that the genetic code is redundant. However, synonymous codons
are used with varying frequencies in organisms; this occurrence is known codon bias. This
unequal usage of synonymous codons is present in all organisms. A codon is a series of three
nucleotides that encode a specific amino acid in a growing polypeptide chain or a termination
signal. 64 different codons are present though only 20 are translated into amino acids ( 61 codons
code for amino acids, 3 are stop codons).
Population genetics studies have demonstrated that some synonymous sites are under
weak selection and codon bias is present due to a combination of selection, mutation, and genetic
drift1. The most significant cause for selection on codon bias occurs because preferred codons are
translated more accurately and affectively1. This translational selection is a byproduct of natural
selection and supported by the positive correlation between gene expression and codon bias, high
codon bias in functionally limited codons in proteins, and detection of preferred codons by
abundant tRNAs present18. Translation selection may exist in order to make translation globally
efficient, rather than efficient at the level of individual genes1,2. This idea may indicate that
codon bias exists in order to overcomes the rate limiting step in protein synthesis by increasing
the pool of free ribosomes present to begin translation and thus increasing the elongation rate of
polypeptide synthesis1,17,20.
Various factors have been proposed to contribute to codon usage bias in mammalian cells
especially. These include, but are not limited to: gene expression levels, percentage of guanine/
cytosine composition, strand-specific mutational bias, amino acid conservation, RNA stability, as
well as growth temperature1,17,20. According to Parmley and Hurst, in mammalian cells, this
codon usage may ultimately be attributed to exronic splicing regulatory elements that distort
synonymous codon usage near intron-exon boundaries in mammals1,20.
Though previous research has demonstrated that codon usage bias is mostly prevalent in
prokaryotic bacterial cells, as of recent mammalian codon bias is becoming more widely
observed. The high concentration of exonic splicing enhancers (ESEs) may explain codon bias in
amino acid sites where splicing occurs13. By examining human exons, it was found that 47 of the
59 codons with at least one synonym showed differential usage in the proximity of exon ends. Of
those, 42 were significant after more extensive testing13. Splice-site regulation impacts the choice
of synonymous codons in mammals1,20.
There are many other factors that contribute to this phenomenon. Codon bias can be
accredited to numerous ailments that commonly affect humans and molecular genetics research
has helped link prevalent diseases to codon usage bias. Further investigation of this could lead to
revolutionary discoveries that highlight the mechanism of pathogenesis in cancer development
that may be attributed to the codon usage bias. Understanding the degree to which synonymous
mutations contribute to human disease and the underlying molecular mechanisms could
potentially provide valuable tools in the biomedical field.
Synonymous mutations, mutations in introns, 3’ and 5’ UTRs, and various other noncoding regions were previously thought to be silent because the overall fitness of the organism
was said to be unaffected because the amino acid sequence was not altered1,17,20. Thus these
mutations were considered “neutral” from an evolutionary standpoint. However, synonymous
codons are under evolutionary pressure1. Likewise, the resulting protein synthesis and folding of
the polypeptides has led to observations that tie in codon usage bias. Human disease can occur
due to abnormal mRNA spicing. Evidence also suggests that synonymous SNPs or synonymous
single nucleotide polymorphisms (sSNPs) may have an impact on the stability of the resulting
messenger RNA and by default protein expression and enzymatic activity1,17,20. These not-so
silent mutations can cause domino affects ultimately leading to the improper function of a
protein, human disease, and decreased fitness in an organism. sSNPs were recently proved to
affect the tertiary structure of proteins, which leads to enzymatic and clinical consequences. This
article explores a variety of human diseases and clinical conditions that are attributed to codon
usage bias and mechanisms by which silent mutations affect phenotype due to splicing accuracy,
translation fidelity, mRNA structure, and protein folding. Increased research could impact
clinical applications and the field of pharmogenetics.
Global Importance, Protein Abundance, ad RNA processing
It is of use to explore not only the individual mutations, but also the genome-wide
implications of codon usage bias. A study by Chen and co-workers shed some light on the issue
and addresses it directly. The study concluded that non-synonymous SNPs and sSNPs shared a
similar likelihood and effect disease association in nearly equal ratios. Their work supported the
findings of Chamary and Hurst that 5-10% of human genes contain at least one region in which
silent mutations are potentially harmful. Recent global analysis of the control of gene expression
suggests dominance at the translational level1,17,20. The rate of translation is a function of
initiation, elongation, and termination. Initiation is thought of being the rate-limiting step. In the
primary transcription products of human genes (pre-mRNA), exons are separated from noncoding introns. The spliecesome constitutes the cellular machinery that regulates and executes
that removal of introns with great precision.
Efficient splicing has limited tolerance of mutations in the exonic splicing enhancers.
Thus, the disruption of the spliceosome explains the association of synonymous mutations with
human disease. Likewise, post-transcriptional modifications exist as well. MicroRNAs, or
miRNAs are post-transcriptional regulators; these molecules target about 60% of mammalian
genes1,20. In humans, the target sites are 3’UTRs of mRNA. Recent works suggests that a
synonymous mutation in the coding region of IRGM (immune related GTPase family M) alters
an miRNA binding site20. The 313C to T substitution leads to reduced binding of miR-196; this
313T allele is thus associated with Crohn’s disease because of the expression of the IRGM
protein, which is an underlying mechanism for risk1,17,20. Synonymous mutations may influence
protein levels by altering mRNA degradation. For example, a synonymous mutation in the D2
dopamine receptor results in less stable mRNA secondary structure and increased mRNA
degradation.
Translation Initiation and Elongation affected by Codon Usage Bias
mRNA secondary structure modifies protein expression and has physiological
consequences. Studies on the nature of synonymous mutation relating to initiation have shown
that the location of the mutation in the gene is crucial to function. A more stable local mRNA
structure at the beginning of genes impedes translation initiation. Though the genetic code is
degenerate; synonymous codons are not used in equal frequencies. Some organisms have shown
evidence of preferentially using certain codons that correspond to more abundantly available
tRNA molecules. There may thus be a co-evolution between preferentially used codons and
abundant tRNA molecules. In more complex mammalian cells, rare and frequent codons may be
potentially necessary to balance the amount of polypeptides produced and protein folding1,17,20.
An approach was used to determine whether there are conserved patterns of distribution
of rare and frequent codons across individual mRNAs and in the transcriptome. In mRNA there
is a consistent cluster of rare codons at the beginning of a sequence. This led to the expectation
that the first 50 or so translated codons would be at a slow rate. In highly expressed genes,
“ramps” that slow the elongation portion of translation immediately following initiation are
present. The ramp is hypothesized to space the ribosomes on the mRNA at an adequate distance
in order to prevent congestion, which could lead to stalling and misfolded proteins10. After this “
slow” step of elongation, similar to a ramp, rapid translation elongation follows1,20. However, it
is codon usage bias does not only impact the overall affect of a single protein. Misfolding that
occurs as a result of a synonymous mutation has consequences that strongly surpasses the
previously expected contributions. Morimoto and his colleagues have shown that a single protein
misfolding can lead to a cascade effect of misfolding in proteins and proteotoxicity1,17,20.
Isochore involvement
Patterns of codon bias usage in mammals is considerably varied from those in other taxa.
Due to the small effective population size, which limits the efficiency of selection, selective
mechanisms were initially ruled out. The clearest pattern of gene to gene codon usage in
mammals arises from large variation in GC content (isochores) rather than selection, according
to Bernardi. Isochores are caused by processes that are primarily related to recombination and
repair such as biased gene conversion. Hurst and others found several sources of potentially
strong selection on synonymous mutations in mammals1,20. Their findings correlate with the
traditional view of translational selection (weak but positive relationship between gene
expression and codon bias)10. When expression levels were compared to tRNA abundance were
more contradictory.
Implication in human disease and importance of synonymous mutations
Despite previous perceptions that synonymous mutations are “silent”, there is an ever
growing list of diseases associated with codon usage bias. The review article by Sauna and
Kimichi-Sarfaty provide information on this, detailed in Figure 1. The diseases were identified
using an approach that uses arrays with thousands of genes that are known or predicted to be
associated with a disease, condition, or multi-gene trait. This serves as a compromise between
candidate and genome-wide approaches. About 50 diseases affecting organ systems have been
identified to date.
Figure 1:
!
Pictured above is an excerpt from an article by Sauna and Kimichi-Sarfatay that lists a few of the numerous
human diseases that can be attributed to codon bias.
Numerous recent finding implicated codon bias in common human diseases. A recent
finding regarding the ∆508 mutation in the cystic fibrosis transmembrane conductance regulator
(CFTR) has shed light on the impact of codon usage bias15. The CFTR mutation is the main
mutation associated with cystic fibrosis. Until recently research has been on the protein and the
consequences of deletion of phenylalanine at the 508 position15. In the wild-type protein the
isoleucine precedes the phenylalanine (codons ATC and TTT). The mutation occurs due to the
deletions of the last C of the isoleucine and the first two Ts of the phenylalanine codon. The
resulting codon is ATT, which encodes isoleucine but with a synonymous substitution from ATC
to ATT. Bartoskewski showed that this mutation alters the mRNA structure and ultimately leads
to a misfolded protein. If the codon at position 507 was ATC rather than the mutation ATT, the
mRNA would be folded properly and higher protein levels would be observed15.
Cancer
A recent article by Lampson shed some light on how rare codons regulate KRas
oncogenesis. Mutations that make KRas, a small Ras GTPase, permanently bound to GTP and
active has been proved to promote cancer. KRas is a major player in the signal transduction
pathway in cells and acts as a molecular on/off switch2. There are numerous Ras GTPases, which
have a similar amino acid composition, however expression and activation of each small Ras
GTPase (KRas, HRas, NRas) yields different cellular responses in regards to tumorigenesis2. It
was found that KRas is poorly translated when compared to HRas because it contains rare
codons. Therefore, when rare codons are converted to preferred codons KRas expression and
tumorigenesis potential is increased. Differences in synonymous nucleotides play a large role in
codon usage and have a clear impact on KRas expression and function2,5. This pathway is clearly
indicated in Figure 2.
Figure 2:
!
The above picture is an excerpt from an article by BODEMANN B. and WHITE M. called Ras GTPases: codon bias holds KRas down but
not out in Current biology. It outlines how KRas codon bias limits protein expression.
Murugan et al studied several genes (MMP27, FGD1, TRRAP, and GRM3) in thyroid
cancer samples and cell lines7. Somatic mutations are known to occur frequently in the abovementioned genes. However, these genes were uncommon or absent in thyroid cancer7. Therefore,
it is possible that rare codons selectively limited the accumulation of Ras proteins. This had a
cascade effect on the Ras pathway activation and tumorigenesis.
Several recent studies have shown the implications of codon bias in one of the most
common human ailments, cancer. Recent studies show the effects of codon bias on breast cancer,
melanoma, and thyroid cancer to name a few. Previous research shows that the activationinduced deaminase (AID)/APOBEC family is composed of enzymes that have the ability to
delaminate cytosines in single-stranded DNA, thus making them potent mutagens4. A study by
Lindely investigated the extent to which codon bias might be important in influencing the
location of the TP53 mutations in breast cancer4. Lindely did this by observing codon-bias
patterns and analyzing ssDNA target specificities of cytidine deaminases of the AID/APOBEC
family. The data indicate that codon context strongly influences the likely location of mutations
at motifs for AID, APOBEC1 and WA sites4. A highly significant preference for transitions of
cytosine to occur at the first nucleotide position and for transitions of guanosine to occur at the
second nucleotide position in the mutated codon was found unexpectedly. Thus, the mechanisms
involved seem to be responsive to codon reading frames and to have an inherent ability to
differentiate between the cytosines on the nontranscribed strand and those on the transcribed
strand in the context of an open transcription bubble4.
Likewise, another study by Gartner et al explored recurrent functional synonymous
mutations in melanoma9. The study used whole-genome sequencing to categorize somatic
mutations in 29 melanoma samples. In the study, confirmation of one synonymous somatic
mutation in BCL2L12 in 285 samples found 12 cases that contained a recurrent F17F mutation9.
This mutation led to increased levels of BCL2L12 mRNA and protein levels because of
differential targeting of wild type and mutant BCL2L129. Protein made from mutant BCL2L12
transcript bound p53, which inhibited UV-induced apoptosis more efficiently than wild type
BCL2L12, and decreased p53 target gene transcription9. This report demonstrated the selection
of a recurrent somatic synonymous mutation in cancer. Overall, the data indicated that silent
alterations have a role to play in human cancer.
HIV and HPV
Human immunodeficiency virus (HIV) is a slowly replicating retrovirus, or lentivirus that
leads to acquired immunodeficiency syndrome (AIDS). It is one of the most devastating human
diseases because it leads to the progressive failure of the immune system. The nucleotide
composition in viral genomes differs from that of the host14. The genome of the HIV virus
contains and above average percentage of adenine nucleotides while containing a below average
amount of cytosine. This deviation in base compositions has implications for the amino acids
that are encoded by open reading frames, which plays a role in the highly conserved genome of
this particular retrovirus12. Recent studies have demonstrated that codon bias may be implicated
in this destructive virus and it’s affect on humans11. Research by Martrus et al recently revealed
that changes in codon-pair in HIV have a large impact on the virus’ replication in cell culture3.
By using synonymous codon pairs, Martrus et al recoded preferred and unpreferred versions of
the gag and pol genes in HIV-1. The unpreferred viruses of the virus had a considerably lower
capacity to replicate3. This was done by using synthetic attenuated virus engineering.
Likewise, Manquing et al., demonstrated how human schlafen 11 is implicated in codon
usage based inhibition of HIV protein synthesis. In mammals specifically, a large consequence of
viral infection is induction cytokines with potent antiviral activity, or type I interferons16.
Schlafen genes are a subset of interferon-stimulated early response genes induced by pathogens
such as the HIV retrovirus. Schlafen 11 (SLFN11) specifically targets the production of
retroviruses. It is destructive to the HIV-1 retrovirus because it selectively inhibits the expression
of viral proteins in a codon-usage dependent manner in the later stages of viral reproduction16.
SLFN11 binds to transfer RNA and impacts the pool of free tRNA and counteracts the effects
made by the presence of HIV16. The SLFN genes, which encode a family of proteins only found
in mammals, prevent the synthesis of viral proteins in cells infected with HIV by codon-bias
discrimination and thus acts as a restriction factor. However, a recent article by Jakobsen
highlights that restriction of HIV-1 and other retroviruses in primary cells should be further
examined, as should the antiviral defenses of SLFN proteins6.
Cladel and coworkers made synonymous codon changes in the oncogenes of the
cottontail rabbit papillomavirus, which then caused the virus to demonstrated increased
oncogenicity and immunogenicity8. They did this while under the impression that rare codons
allow the virus to escape immune inspection due to the known correlation between rare codons
and low protein production. Rare codons in the oncogenes were changed to make them more
mammalian-like and the mutant genomes were later tested in an in vivo animal model8. The
oncogenic potential of the altered genomes increased while the amino acid sequences of the
proteins remained the same8. This demonstrates that codon usage modifies protein production
and plays a role in disease outcome.
Treatment Outcomes:
Silent variations can occur in genes directly associated with disease pathogenesis.
However, related studies show that codon usage bias in genes that have not demonstrated a
known link to the mechanism of disease implication, but are essential to understand because of
the possibility of aiding in disease outcome and treatment. A synonymous single nucleotide
polymorphism observed in Wilms’ tumor 1 (WT1) of patients that have childhood acute myeloid
leukemia correlated with improved outcomes of the disease. Likewise, protein transporters and
metabolizing enzymes that are involved in drug distribution have mutations that affect the
effectiveness of drugs. A study in ABCBI gene products that are correlated to multi-drug
resistance during chemotherapy by circumventing the absorption distribution and metabolism
and excretion of drugs. The silent mutations in a haplotypes in the ABCB1 gene that codes a
molecular pump has been shown to act on various drugs. For example, it has been associated
with the survival of patients with metastatic renal cell cancer that were treated with sunitinib. In
this case, the prolonged survival could be attributed to the altered affinity of some drugs for
ABCB1 as a consequence of the haplotype, which leads to an increased availability of tyrosine
kinase inhibitor, sunitinib.
Clinical Implications of Codon Bias
The recent headway made in understanding the affect of codon usage bias in human
disease and treatment has prompted a call for personalized medicine. Codon usage bias does not
only affect disease risk, but plays a major role in how patients respond to medication, whether
medications cause adverse affects, and how the disease may progress. It is for this reason that
treatment should be tailored to the individual, rather than basing treatment on the population
norm. The ABCB1 transporter is implicated in drug resistance to chemotherapeutic agents and
there is evidence that there are sSNPs in proteins that are drug targets that also affect the safety
and efficacy of drugs. In summary, codon bias can be accredited to numerous ailments that
commonly affect humans and molecular genetics research has helped link prevalent diseases to
codon usage bias. Further investigation of this could lead to revolutionary discoveries that
highlight the mechanism of pathogenesis in cancer development that may be attributed to the
codon usage bias. Understanding the degree to which synonymous mutation contributes to
human disease and the underlying molecular mechanisms could potentially provide valuable
tools in the biomedical field. References:
1.CHU D., KAZANA E., BELLANGER N., SIN
GH T., TUITE M.,HAAR T. VON DER, 2013
Translation elongation can control
translation initiation on eukaryotic mRNAs.
The EMBO journal 33: 21–34.
2.Lampson, B.L., Persching, N.L.K., Prinz,
J.A.,Lacsina, J.R., Marzluff, W.F., Nicchitta,
C.V.,
MacAlpine, D.M., and Counter, C.M.
(2013).Rare codons regulate KRas
oncogenesis.Curr. Biol. 23, 70–75.
3.MARTRUS G., NEVOT M., ANDRES C., CLO
TET B., MARTINEZM., 2012 Changes in
codon-pair bias of human immunodeficiency
virus type 1 have profound effects on virus
replication in cell culture. Retrovirology 10:
78.
8.CLADEL N., BUDGEON L., HU J., BALOGH
K.CHRISTENSENN., 2013 Synonymous
codon changes in the oncogenes of the
cottontail rabbit papillomavirus lead to
increased oncogenicity and immunogenicity
of the virus. Virology 438: 70–83.
9.GARTNER J., PARKER S., PRICKETT T., DU
TTON-REGESTERK., STITZEL M., et al.,
2013 Whole-genome sequencing identifies
a recurrent functional synonymous mutation
in melanoma. Proceedings of the National
Academy of Sciences of the United States of
America 110: 13481–6.
4.LINDLEY R., 2013 The importance of
codon context for understanding the Ig-like
somatic hypermutation strand-biased
patterns in TP53 mutations in breast cancer.
Cancer genetics 206: 222–6.
10.DOHERTY A., MCINERNEY J., 2013
Translational selection frequently overcomes
genetic drift in shaping synonymous codon
usage patterns in vertebrates. Molecular
biology and evolution 30: 2263–7.
5.BODEMANN B., WHITE M., 2013 Ras
GTPases: codon bias holds KRas down but
not out. Current biology : CB23: R17–20.
11.ROYCHOUDHURY S., MUKHERJEE D.,
2012 Complex codon usage pattern and
compositional features of retroviruses.
Computational and mathematical methods in
medicine 2013: 848123.
6.JAKOBSEN M., MOGENSEN T., PALUDAN S.
, 2013 Caught in translation: innate
restriction of HIV mRNA translation by a
schlafen family protein. Cell research 23:
320–2.
7.MURUGAN A., YANG C., XING M., 2013
Mutational analysis of the GNA11, MMP27,
FGD1, TRRAP and GRM3 genes in thyroid
cancer. Oncology letters 6: 437–441.
12.CHEN Y., 2012 A comparison of
synonymous codon usage bias patterns in
DNA and RNA virus genomes: quantifying
the relative importance of mutational
pressure and natural selection. BioMed
research international 2013: 406342
13.STERGACHIS A., HAUGEN E., SHAFER A.,
FU W., VERNOTB., REYNOLDS A., RAUBITSC
HEK A.,
ZIEGLER S., LEPROUSTE., AKEY J.,
STAMATOYANNOPOULOS J., 2013 Exonic
transcription factor binding directs codon
choice and affects protein evolution. Science
(New York, N.Y.) 342: 1367–72.
14.KUYL A. VAN DER, BERKHOUT B., 2011
The biased nucleotide composition of the
HIV genome: a constant factor in a highly
variable virus. Retrovirology 9: 92.
15.SCOTT A., PETRYKOWSKA H., HEFFERON
T., GOTEA V.,ELNITSKI L., 2012 Functional
analysis of synonymous substitutions
predicted to affect splicing of the CFTR
gene. Journal of cystic fibrosis : official
journal of the European Cystic Fibrosis
Society 11: 511–7.
16.LI M., KAO E., GAO X., SANDIG H., LIM
MER K., PAVONETERNOD M., JONES T., LANDRY S., PAN T.,
WEITZMAN M.,DAVID M., 2012 Codonusage-based inhibition of HIV protein
synthesis by human schlafen 11. Nature 491:
125–8.
17.SAUNA Z., KIMCHI-SARFATY C., 2011
Understanding the contribution of
synonymous mutations to human disease.
Nature reviews. Genetics 12: 683–91.
18.FATH S., BAUER A., LISS M., SPRIESTERS
BACH A., MAERTENSB., HAHN P., LUDWIG C
., SCHÄFER F., GRAF M., WAGNERR., 2010
Multiparameter RNA and codon
optimization: a standardized tool to assess
and enhance autologous mammalian gene
expression. PloS one 6: e17596.
19.PANDIT A., SINHA S., 2010 Differential
trends in the codon usage patterns in HIV-1
genes. PloS one 6: e28889.
20.PLOTKIN J., KUDLA G., 2010
Synonymous but not the same: the causes
and consequences of codon bias. Nature
reviews. Genetics 12: 32–42.