Alu sequences in the coding regions of mRNA: a source of protein

PERSPECTIVES
A ~picai Alu element (Fig. 1) is 282 nucleotides long
and is composed of two homologous but distinct subunits, right and left, derived from the 7SL RNA gene by
internal deletions and point mutations t.a. The subunits
have a high G+C content (about 65°,6 in retropositionally
active Alu elements) and are connected by an adeninerich linker. The sequence ends in a polyadenyl tail.
The estimated copy number of 700 000 Aiu elements
per human haploid genome predicts a density of one
Alu every 4 kb of genomic DNA3--5. In many cases, Alu
elements have been found in clusters, separated by
up to a few hundred basepairs of non-Alu DNA4.6. At
the cytogenetic level, Alu repeats are concentrated in
R bands, the most transcriptionally active areas of the
genome 4,7. In practice, Alu elements are found in the
introns of almost all known protein-coding genes.
Alu repeats occur only in humans and other primates. They can be grouped into subfamilies8-11, which
differ in their respective consensus sequences at a
number of positions. These distinct Alu subfamilies are
thought to be relics of periods of intense amplification
that have occurred in primates from the time of eutherian
radiation until the present (reviewed in Refs 12, 13). For
example, the average age of members of the human
subfamily Alu-Sx, which accounts for nearly half of all
Alu copies present in DNA, is estimated to be about 40
million years, pre-dating the divergence of New World
monkeys. Members of the Alu-J subfamily are tho~Jght
to he about 55 million years old L4' dating from the time
of prosimian divergence. As might be expected, older
repeats are common to all primates, while those amplified relatively recently are restricted to closely related
primate species. Examples of more recently amplified
repeats are the Alu-Sb subfamilies, found only in
immans ancl great apes~ ~,ls..j~,
Alu sequences in the
coding regions of mRNA:
a source of protein
variability
WOJCIECHMAKAgOWSKI,GRANTA. MITCHEIL
AND DAMIANLABUDA
Disperskm of repetitive sequence elements is a source of
genetic variability that contributes to genome evolution.
Alu elements, the most common dispersed repeats in the
human genome, can cause genetic diseases by several
mechanisms, including de hove Alu inserlions and splicing
of intragenic Alu elements into mRN,¢ Such mutations
might contribute positive~ to protein evolution if they are
advantageous or neutral To test this hypothesis, we
searched the literature and sequence databases for
examples of proteiwcodtng regions th~ contain Alu
sequences: 17Alu 'cassettes' inserted within 15 different
coding sequences werefound. In three instances, these
events caused genetic #iseasev; the possiblefmtctional
significance of the other Aiu.containing mRNA~ls
discussecL Our analysis suggests that splice.mediated
insertion of introntc elements is the major mechailism by
which Alu segments are introduced into mRNAs.
radiation, even rare events involving these repeats
might have an impact on primate evolution. There are
at least two distinct mechanisms by which an AhJ segment can be lntroclucecl into a protein-coding region;
both involve RNA intermecllates, The fit,st is retropositlon, the mechanism by which Alu repeats proliferate
Rationale and methods
within primate genomesla,t3,tg, This process requires
Because Alu sequences are so numerous anti have reverse transcription o1' the RNA of a retropositlonally
been present in the primate genome since mammalian active Alu subfamily and its subsequent insertion into
the coding region of a gene, Typically, the
Alu element is full-length or truncated in
'
> Sense
the 5' region, and is flanked by direct
67
120136
201
repeats. The second mechanism involves
splicing of an intronic Alu sequence into
the coding region of mRNA (see below).
Any Alu sequence could potentially be used
in splicing, including repeats that are modified by mutations and those belonging
to subfamilies that are retropositionally
silent.
In most instances, the presence of
61
128
an Alu element within a transcript is preAntisense
dicted to result in premature tennination,
since the element contains numerous stop
Stop codons in: ® 1st reading frame
•
3rd reading frame
codons, particularly in the 'sense' orien• 2nd reading frame
i
Direct repeat
tation (Fig. 1), (Here, we define the
orientation of an Alu element as 'sense'
FIGURE1. Diagram showing the distribution of stop codons in all six potential if the polyadenyl tail is downstream with
reading frames of the Alu consensus sequenceII The left and right Alu respect to the direction of transcription
subunits are indicated, as are the adenine-rich linker (L) and tail (T). of the host gene, and 'antisense' if it is
The positions of stop codons in the sense Alu sequence and the antisense in the opposite orientation.) Alu-related
Alu strand are indicated above and below the Alu sequences found within protein-coding
sequence, respectively. regions are referred to as Mu cassettes.
"riG Jtn~E 1994 VOL. 10 No. 6
PERSPECTIVES
Because of the large number of Alu el283
ements and their sequence diversity,
(a)
28229O
171739 65 95 118
281 289
certain potential cassettes are predicted to
Anti-lectin antibody
AIu-Sx
contain no stop codons.
Factor IX*
Alu-Sbl
We searched sequence databases and
Platelet glycoprotein lib
AIu-Sx
the literature for proteins and eDNAKRAB zinc finger protein
coding regions that contain Alu-related
AIu-Sb
Cholinesterase*
sequences (see also Ref. 20). The m.AST
AIu-Sb2
HLA-DR-~I' (M15073)
program zl was used to look for Alu
AIu-Sx
HLA-DR-iH (X12544)
cassettes translated from the consensus
Alu-Sx
sequence into all six reading frames
A4 amy~oidpeptide
Alu-J
among more than 100000 sequences,
Alu coding
strand
including those of -2000 human proteins
stored in the Protein Data Base, Swiss(b)
115
279 275
158 138117
23 4
Prot, Protein Identification Resource and
GenBank. We found 17 Alu fragments
Omithine aminotransferase*
AIu-Sx
within the open reading frames of 15 difAlu-J
Sedne/threonine kinase
ferent cDNAs with scores between 69 and
Alu-Sx
Complement C5
151 (Table 1; Fig. 2). Scores defined in the
Alu-Sc
Decay acceleratingfactor
B~ST program are based on comparison
Alu-Sx
Integrin I~1
of amino acid sequences using the PAM-]20
Alu-Sx
c-rel phosphoprotein
matrix2L Nine of these Alu elements were
Alu-J
Biliary glycoprotain (Alu 60)
in the antisense orientation. In 12 of the
Alu-J
Biliary glycoprotein (Alu 44)
17 cases (71%), there is evidence for
Alu-J
Lectin-like membrane protei,
splice-mediated events, In two other cases,
Complementary
Alu sequences were directly inserted into
Alu strand
exons by retroposition; in the three remaining cases, the mechanism of insertion is
uncertain. Three of the 17 insertion events l~Gt.mE2. Diagram showing Alu elements detected in the coding regions of
were deleterious, resulting in premature cDNAs. Boxes represent Alu cassettes that were introduced by alternative
termination of translation and causing a splicing (pink) following an inactivation of the legitinmte donor splicing site
upstream by intron sliding (red), by direct insertion through retroposition
functional deficiency of the gene product.
(purple), or by an uncertain mechanism (green). The names of the host
These three deleterious cases are proteins are indicated to the left, whereas the Aiu subfamilies of origin of
included because they iUustrate the poten- Alu cassettes involved are indicated to the right (nomenclature according to
tial of Alu elements to enrich the existing Ret~s 11, 15). Horizontal am)ws show the orientation of the Alu element; sense
ix)oi of protein-coding RNA. Furtl'tennore,
Alu cassettes are shown in (a) and antisense in (b). Vertical arrows indicate
deleterious recessive alleles are well tol- in-fi'ame stop codons. Asterisks mark host proteins that are inactivated by
erated in populations and may, in some irtsetakms. The numbers at the top of the figure indicate the limits ()f the
cases, provide a selective advantage fi)r various Alu cassettes relative to tile nucluotkh: sequence of the consensus.
heterozygotes. Unlbrtunately, the fflnc+
tional aspects of most of the remaining 12 Alu-containing Antisense Aht cassettes
All nine antisense Alu cassettes (Fig. 2) appear to
peptkles are relatively poorly studied, but the current
have been spliced into their host mRNAs. Most of the
state of knowledge is summarized below.
Alu cassettes observed are delimited by the predicted
Splicing of lntronlc Alu elements Intocoding regions potential splice junctiorts. In six cases, r)oth the cDNAs
and the corresponding genomic sequences are known,
ofmRNA
This discussion is divided into two parts, each of allowing precise identification of the participating splice
which concerns either antisense Alu elements or sense sites (Table 2). Splice-mediated insertion of an intronic
Alu elements. The general Alu consensus sequence was Alu element was first inferred by Brownell et a/23, who
compared with the consensus sequence for splice observed two forms of the human REL proto-oncogene
sites2z. Antisense Alu elements contain several potential eDNA, one of which included an Alu cassette. A similar
observation has been made tbr c(mlplement C5, which
splice sites, Nine sites resemble the donor splicing consensus AG/GTRA in at least four positions. Three sites has an Alu-containing mRNA variant in HepG2 cells
as a minor species, as deterrnined by northern blot
differ by no more than two nucleotides from the accepanalysis2'L Decay accelerating factor (DAF) is a cell
tot recognition sequence YYYYYYYYYNYAG/R. The
nlembrane glycoprotein that binds adivated complernent.
requirement for polypyrimidine tracts at the acceptor
splice recognition site is met by the complements About 10% of DAF mRNA contains an Alu cassette '~. In
the omithine 8..arrtinotransferase (OAT) gene of an indiof the adenine-rich linker (residues 133--121) and the
polyadenyl tail (residues 290-283), which are adjacent vidual who has gyrate atrophy of the choroid and
acceptor splice sites at two locations (Fig. 2b). Another retinaZ¢L we reported a mutation that activates a cryptic
potential acceptor junction occurs in the Alu comp- donor site within the right subunit of art intronic antisense Aiu element. In thi,: ~:ase, a 136 bp fragrnent is
lement at position 205 but has not yet been found to
recruited as an additional exon in all detectable inature
flank Alu inserts in cDNAs. However, no potential
OAT mRNA from the patient. A stop c(~lon premnt in
acceptor site occurs in the sense Alu sequence.
TIG JUNE I994 VOL. 10 No. 6
189
PERSPECTIVES
Table 1. Proteins whose ¢DNAs contain AIM
dements tn their coding reglam
6enlla~
lte¢
~ n o .
3,4 amyloid
peptide
AntiJectin
antibody epitope
Biliary glycoprotein
Cholinestemsea
Complement C5
REL phosphopmtein
Decay accelerating factor
Factor IXa
HLA-DR-~I
Integrin [31
KRABzinc finger protein
I~in-like membrane
protein
Omithine
aminotransferasea
Platelet glycoprotein lib
Serine/thteonine kinase 2
M34875
28
X58236
M76741
$75201
M57729
41
27
36
51
23
25
35
M
33
29
M30142
X12544
M15073
M84237
Ll1672
38
L14542
31
J02963
L20321
26
39
30
cassettes (Ilz and Ily) derived from two intronic repeats,
Alu 44 and Alu 60, respectively (Fig. 3). The junction
sequences of these Alu sequences are near-perfect
matches of the consensus splice sequence, in contrast
to the corresponding sites in three other intronic Alus
that are not used in splicing (Table 3). The splicing variants of the mRNA may be very old, since Alu 44 and
Alu 60 belong to the J subfamily. There is no amino acid
similarity in the predicted translation products of these
novel exons, since each participating Alu fragment
is read in a different open reading frame. All three
fomls of biliary glycoprotein can be detected by western
immunoblot analysis in a transient transfection assay27.
Other alternatively spliced mRNAs, with or without an
Alu cassette, such as complement C5, DAF, and A4
amyloid peptide 24,25,2s, may also coexist as processing
variants.
In three other cases, Alu cassettes have been tentatively identified as having resulted from splicing.
Although the structures of the primary transcripts are
still unknown, these Alu cassettes correspond to the
predicted positions of splice junctions in genomic Alus
(Fig. 2b). The three examples are integrin [~1 (Ref. 29),
serine/threonine kinase 2 (Ref. 30; GenBank accession
number L20321) and a lectin-like membrane protein3].
aSequences that are inactivated by the Alu insertion.
Sense Ah~ cassettes
Because it lacks an intrinsic polypyrimidine tract, the
the reading frame of the new exon causes premature
sense Alu consensus sequence is less likely to provide
termination of protein synthesis, resulting in virtual
functional acceptor splice junctions. However, one
absence of OAT activity.
splicing event that involves a sense Alu element was
The gene encoding biliary glycoprotein provides an
identified, in a cDNA clone of the A4 amyloid precursor
interesting examl,le of Alu splicing 27, Three mRNA variprotein. In this case, a polypyrimidine tract in the direct
ants are produced owing to the alternative splicing of repeat flanking the Aiu element is juxtaposed to a
an exon (lla), or of one of two almost identical Alu sequence at the 5' end of this element; this sequence
resembles an acceptor splice junction. Tile
resulting exon is predicted to contribute the
Exon Ib
ExonIla Alu 29 Alu 44
Alu 56 Alu 60
Alu 77 Exon TM
20 carboxy-termin:d residues of the protein,
tile stop ctx.lon and part of the 3' nontmnslated
region of this variant peptide 2a.
The sense Alu consensus sequence does,
however, contain ~veral potential donor splice
282 nt
sites. One, at position 69, is predicted to have
been used in a transcript of the HLA-DR-131
antigen. Figure 4 depicts three variants of
Exon IIz
Exon Ib ExonIla Alu 29 Alu 44
Alu 56 Alu 60
Alu 77 Exon TM
the HLA-DR-I~I cDNA, detected by library
screening, The first, denoted X02902, is considered to be the usual form3Z; while the two
others are alternatively spliced, apparently
owing to a lack of splicing at the intron 5
93 at
donor site33,34, As a result, exon 5 is extended
into a nearby downstream Alu sequence in
Exon Ily
intron 5 to include, in clone X12544, a stop
Exon Ib ExonIla Alu 29 Alu 44
Alu 56 Alu 60
Alu 77 Exon TM
codon within the Alu. In clone M15073, the
donor site at position 69 within the Alu is
used, and splicing occurs with exon 6.
X12544 and M15073 are allelic: a dinudeotide
deletion within the Alu sequence of M15073
93 nt
changes the open reading frame in the
extended exon 5 to match that of exon 6.
l~¢t~ 3. A scheme for alternative splicing in the human biliary These three cDNA clones may illustrate two
glycoprotein mRNA.Boxes represent exons and arrows the five intronic phases of 'intron sliding': inactivation of an
antisense Aiu elements. TM, exon encoding the transmembrane domain. existing splice site followed by activation of
Dotted lines indicate the splicing patterns found in the three cDNAs27. a cryptic splice site. The presence of several
"FIGJUNE 1994 VOL. 10 No. 6
190
PERSPECTIVES
sequences resembling splice junctions within Alu eIements could increase the probability of intron sliding.
apparently allelic form of this mRNA has b ~ n also
reported, which contains no Alu cassette and differs in
the 3' nontranslated region't0. The third example, found
in anti-lectin antibody epitope, is represented by a
single cDNA clone isolated from an expression library
and identified using a monoclonal antibody (presumably on the basis of the peptide sequence WGAE,
which the clone contains) that also reacted with myelin
brain protein and with a bovine [3-galactosidase
lectin41. It is possible that these last two examples represent cloning artefacts.
We eliminated from our compilation a 12 kDa B cell
growth facto# 2 (GenBank accession number M15530)
that contains two different Alu cassettes spanning nearly
half its reported coding sequence. When this region was
examined in detail, we found an in-frame stop codon
and two frameshift mutations in all species studied,
including human, chimpanzee, gorilla, gibbon, baboon
and macaque. These findings strongly suggest that
this region could not encode the proposed peptide 43
Direct Mu insertions and other events
There are two documented cases of de novo Alu retroposition into the coding regions of genes, in the genes
encoding Factor IX (Ref. 35) and cholinesterase36. Both
insertions introduce premature stop codons. In the case
of Factor IX, this causes haemophilia B in hemizygous
males. In the second case, premature termination causes
cholinesterase deficiency, which is usually benign and
is inherited as an autosomal recessive trait; however,
when affected individuals undergo surgery, injection of
the myorelaxant succinylcholine36 can produce prolonged paralysis.
Both these insertions involve Alu elements from
young Alu subfamilies, Alu-Sbl and Alu-Sb2, demonstrating that members of these subfamilies are currently
retropositionally active in the human genome, in contrast to older subfamilies which were spread in the past.
These events illustrate the potential of
Alu elements to become incorporated
TAetE 2. Alu.related splice junctions within
into protein-encoding open reading fi'ames.
protein.coding sequences
If similar events occurred earlier in evolution, leading to protein variants that
Acceptor
Donor
were maintaitaed by selection, we would
3' splice junction
5' spike junction
expect to find within proteins a number
Protein
of Alu cassettes, full-length and flanked
A4 amyloid peptide
TATTCTtP~2CaCSGt
by direct repeats, but originating from
Aiu-J (1-.8)
OC-~CoG,I,C~
older Aiu subfamilies. These are frequently
Biliary glycoprotein
TtTTcTTTcTAG~AG
CAG~GtGTGA
detected in introns and nontranslated
Alu-J (12%114; 2%17)
TaTTtTTTgTAG~AG C A G ~ ~
regions of transcripts, but not in exons.
Biliary glycoprotein
T~tGTTTTCAG~AG
CAG~GtGTGA
Therefore, it is likely that direct retroAlu-J (123-116; 2%17)
TTTaGT. . . . . . SAG
CAG~GcGTGA
position of Alu cassettes into proteincoding regions is usually deleterious and
Complement C5
??????AGACaG,~AG
that the resulting sequences were elimAlu-Sx ( 2 8 6 - 2 7 4 )
TTIT~AGACgGiAG
inated during ewflution 57,
c-RELphosphoprotein
TTGTATFFI~AG ~TA
CAG~GtGgGA
Three additional Alu cassettes in tile
Alu-Sx (130-117; 25-17)
TTGTATTTTTAG~TA
CAG~GcGtGA
sense orientation were identified, but their
Decay
accelerating
origin cannot yet be deduced conclusfactor
??? ?'I~AGACAG~GC
CAG~GtGtGt
ively. The organization of one, which lies
Alu-J(286-274;160-152) TTTTTGAGACAG~Cg
CAG~GcGcGc
within the open reading frame for KRAB
HIA-DR-~I
GAG~GTCAGG
(Kr0ppel-associated box) zinc finger proAlu-Sx (67-75)
CO,O,[G'r~GG
tein ZNF91 (Ref. 38), resembles that in the
gene for A4 amyloid, which was recruited
Integrin [31
????????????,[,Tc
CAGe??????
by splicing, and also that of HLA-DR-~I
Alu-Sx (283-272; 160-152) TTTGAGACGGAG~Ta
C~3~~
clone X12544, which was recruited by
Lectin-like membrane
intron sliding. However, the possibility
protein
????????????~tA
CaGe???
that this Alu element integrated by de
Alu-J (12%114; 6-1)
GTATTI'ITAGTA~gA CcG~GCC
novo retroposition within the last exon,
Serine/threonine
thus causing little or no disruption in
kinase 2
????????????SAG
CaGe??????
the coding region, cannot be excluded.
Mu-Sx (286-274; 140-132) TTITI~AGACC43~AG
Cc~3,[,CTAAT1
~
KRAB-ZNF91 was found to be expressed
Omithine
in all human tissues examined by in situ
aminotransferase
'I"I'I'FI"rI'P~GAG
~AC
CaGSgTAATT
hybridization. This polypeptide (from a
Alu-Sc (289-278; 140-132) 'rl-~'ri-l-i'i-~GAG,LAC CgG~,eTAATT
family of polymorphic proteins) may have
Splice site ¢onsetlstl~
YYYYYYYYNYAG~RN
MAG~
arisen relatively recently, since its Alu
cassette belongs to the Alu-Sb subfamily.
Sequences observed at splice sites of Alu exons (upper), are compa~d
As expected, KRAB-ZNF91 is specific to
with the consensus sequence of the corresponding Alu subfamily
humans and primates and was not de(lower);mismatches are shown in lower case. The numbering of the
tected in rodents38.
consensus positionsinvolved,given in parenthesesin the leftcolumn,
In the second example, platelet
indicate their orientation. Splice-site consensus sequences~ are shown at
glycoprotein IIb, the Alu cassette is found
the bottom of the table. R, purine; Y, pyrimidine; N, nudeotide.
in the middle of the coding region39. An
TIG JUNE 1994 VOL. 10 No. 6
191
PERSPECTIVES
contribute between 22 and 55 codons
to their host proteins. We found that
Exon6
A]u insertions did not consistently use
a common reading frame ar,d that
many belonged to old subfamilies
t
such as Alu-J and Alu-Sx, which have
STOP
been present in the human lineage for
more than 40 million years. The averAlu-containing cDNA
age mutation frequency of these subfamilies, measured with respect to the
corresponding consensus sequences,
is 0.14 (0.3 in CpG dinucleotide positions and 0.12 at non-CpG positions).
Moreover, small insertions and deletions contribute to frame-switching
within the cassettes, further enriching
coding possibilities. Therefore, despite
the fact that many Alu cassettes overlap
(Fig. 2), little sequence identity was
t
STOP
found among the corresponding peptide
fragments, suggesting that the occurrence of Alu cassettes in coding regions
FtGUH 4. Diagram illustrating intron sliding in HLA-DR-~I. Numbers to the left of
of
mRNA is due to the abundance of
the mRNAs refer to their GenBank accession numbers. X02902 is believed to
represent the usual form of mature mRNAa2.Its two alternative forms X12544 intronic Alu elements rather than the
(Ref. 34) and M15073(Ref. 33) are compared with the hypothetical primary addition of a specific sequence motif.
Few Alu-containing peptides that
transcript 33. Vertical arrows indicate stop codons. Genomic and cDNA
sequences are not drawn to scale. are not deleterious have been im'estigated carefully with respect to their
(see also GenBank accession numbers U05307, U05312, expression and function. One of these is DAF, which
has been studied in transfection experiments with wildand EMBL LIGN accession number DS16865). Two
other sequences that were excluded because of insuf- type and Alu-containing cDNAs. The Alu cassette had
ficient information were those of Mahlavu hepatobeen predicted to create a hydrophilic carboxy-terminai
cellular carcinoma DNA44 (GenBank accession number
region in the peptide, which would inhibit the miX555777) and a candidate cDNA for X-linked retin.. gration of DAF into the cell membrane. Caras et ai. 2.s
opathy 'is (GenBank accession number $58722).
observed that DAF translated from the wild-type
message was membrane-bound, while DAF peptide
Mu.related peptldes
expressed from Alu-containlng cDNA was not. They
Of the 17 AIu sequences found in mRNA.coding hypothesized that a fi'action of Alu-containing DAF
regions, seven contain in-frame stop codons (Fig, 2) mRNA in normal cells probably accounts for the soluble
and three others are predicted to cause frame shifts.
form of DAF, As discussed eadier, translatkm of Alu1he Alu cassettes that we identified are predicted to
related peptides has been demonstrated for billary glycoprotein using a transient expre~ion assay27, and evolutionary arguments have been shown to be consistent
TABm3. Comparison of spUdng junctions of
with the cellular function of both these peptides. HowAIu repeats within the gene encodinghuman
ever, there is more often a lack of even circumstantial
binary #ycoprotein
evidence about the possible functional implications of
Aiu insertions, and further studies are needed.
Acceptor
Donor
Since most of the Alu splicing events described here
Re#on
3' splice Jtmction
5' splice Junction
appear to affect only a fraction of transcripts, they have
the potential to create new or transient peptide funcActive sites
tions
while the existing function of the locus is mainExon Ib
SAT
CTGSGTAAGT
rained
by an alternatively spliced or non-Alu-containing
Exon lla
GCTTCTCCACAGSAG
ACGSGTGTGA
mRNA. On the other hand, in the case of serine/
Exon llz
threonine kinase, lectin-like protein and KRAB zinc
(Alu 44)
TITTCTTTCTAO~AG
CAG~GTGTGA
finger protein, Alu..containing mRNA is the only known
Exon lly
(AIu 60)
TI'ITGTTTI"CAG~,AG CAGSGTGTGA
message. It is interesting that a large proportion of
Exon TM
TC~X2CATGACAGiAT
CAGIGTATGA
seemingly functional proteins with Alu cassettes are
related to cellular or immune recognition, two processes
Inaettve sites
characterized by the involvement of a great diversity
Alu 29
TFFFFITI~TAG~AG
CAG~GAATGT
of sequences.
MU 56
°I'I'ITVFITGTAGSAG CAG ~TCATAC
Non-Alu-containing cDNA
Exon¢ Exon5
X02902 t i E [
AIu 77
Consensus
~q'~'ITI'~AGTAGSAG
YYYYYYYYNYAG~RN
I
CAG~GGGTC~
MAG~GTRAGT
Conclusions
The introduction of Alu cassettes into proteincoding regions of the genome defines a novel mechanism
TIG JUNE 1994 VOL. 10 NO. 6
192
PERSPECTIVES
by which Alu elements contribute to genomic evolution. Other mechanisms include the bulk effect of
Alu repeats (which form 5% of genomic DNA, increasing
its G+C content), homologous recombination between
nearby Alu elements to cause ' deletion or duplication of
genes (see, for example, Refs 46, 47), and modulation
of transcription by Alu repeats near the 5' end of
genes48A9.
It appears from our analysis that splice-mediated
insertion of an intronic Alu fragment is the major
mechanism by which Alu repeats enter protein-coding
regions. Most insertions involve antisense Alu cassettes
introduced as new exons; however, insertion can also
occ(]r by intron sliding, which could involve Alu elements in either orientation. Because of the large number and the sequence diversity of Alu elements, it is
expected that many intronic antisense repeats could
provide sequence cassettes for splicing into the
message of their host gene. Given the number of such
events identified among the -2000 human protein-coding
sequences in the GenBank database, we anticipate that
a few hundred Alu insertions will be identified in the
future.
In practice, it is important to evaluate newly discovered cDNAs for the presence of Alu sequences. While
in some cases their presence may represent a cloning
artefact, in others it may have functional relevance.
Alu elements in coding sequences have so far only
been clearly shown to have deleterious effects.
Unfortunately, the evidence available at present is too
fragmentary to allow us to draw conclusions regarding
the functional consequences of most Alu insertions
presented here, It is expected that most of the mutations
will be deleterious, since they involve substantial changes
in coding sequences, However, because insertion of
Alu-containing sequences into coding transcripts is an
ongoing process specific to primates, and has the
potential to change and diversify the function of the
resulting gene product, it is important that the process
is recognized as a mechanism of evokntkm. Insertkm of
Alu sequences represents yet another way in which
retroposons may act as 'seeds' of evolutionS0.
Acknowledgements
We thank E. giokiewicz for critical comments on the text
and D, Valle for discussions. We also thank T. Barnett,
T. Adamkiewicz, O. Aprelikowa and E.T. Liu for data, and
M. Patenaude for excellent secretarial assistance. This study was
supported by a grant from The Cancer Research Society, Inc.
References
I Jelinek, W.R. el al. (1980) Proc. NatlAcad. Sci. USA
77, 1398-1402
2 Ullu, E. and Tschudi, C. (1984) Nature312, 171-172
3 Hwu, H.R., Roberts, J.W., Davidson, E.H. and Britten, R.J.
(1986) Proc. Natl Acad. Sci. USA 83, 387%3879
4 Moyzis, R.K. et al. (1989) Genomics 4, 273-289
5 Rinehart, F.P., Ritch, T.G., Deininger, P.L. and Schmid,
C.W. (1981) Biochemistry20, 3003-3010
6 ffis, F.J.M. et aL (1983) Nature Genet. 3, 137-138
7 Korenberg, J.R. and Rykowski, M.C. (1988) Cell 53,
391--400
8 "Willard,C., Nguyen, H.T. and Schmid, C.W. (1987)J. ]Viol.
Evol. 26, 180-186
9 Jurka, J. and Smith, T. (1988) Proc. NatlAcad. Sci. USA
85, 4775--4778
I 0 Quentin, Y. (1988).]'. Mol. Evol. 27, 194-202
I I Jurka, J. and Milosavljevic,A. (1991)./. Mol. Evol. 32, 105-121
12 Schmid, C. and Maraia, R. (1992) Curt. Opin. Genet. Dev.
2, 874-882
13 Deininger, P.L., Batzer, M.A., Hutchison, C.A., III and
Edgell, M.H. (1992) Trends Genet. 8, 307-311
14 Labuda, D. and Striker, G. (1989) Nucleic Acids Res. 17,
2477-2491
15 Jurka, J. (1993) Nucleic Acids Res. 21, 2252
16 Hutchinson, G.B. et al. (1993) Nucleic Acids Res. 21,
3379-3383
17 Matera, A.G., Hellmann, U., Hintz, M.F. and Schmid, C.W.
(1990) Nucleic Acids Res. 18, 6019--6023
18 Batzer, M.A. et al. (1990) Nucleic Acids Res. 18, 6793--6798
19 Rogers, J.H. (1985) Int. Rev. Cytol. 93, 187-279
20 Claverie, J.M. (1992) Genomics 12, 838-841
21 Ahschul, S.F. etal. (1990).]'. Mol. Biol. 215, 403--410
22 Ohshima, Y. and Gotoh, Y. (1987)J. Mol. Biol. 195, 247-250
23 Brownell, E., Mittereder, N. and Rice, N.R. (1989)
Oncogene 4, 93%942
24 Lundwall, A.B. et ai. (1985) J. Biol. Chem. 260, 2108-2112
25 Caras, I.W. et al. (1987) Nature 325, 54%548
26 Mitchell, G.A. et ai. (1991) Proc. Naa Acad. Sci. USA 88,
815--819
27 Barnett, T.R., Drake, L. and Pickle, W., II (1993) Mol. Cell.
Biol. 13, 1273-1282
28 de Sauvage, F. and Octave, J.N. (1989) Science245, 651-653
29 Languino, L.R. and Ruoslahti, E. (1992) J. Biol. Chem. 267,
7116--7120
30 Cance, W.G., Craven, R.J., Weiner, T.M. and Liu, E.T.
(1993) Int.J. Cancer54, 571-577
31 Adamkiewicz, T.V., McSherry, C., Bach, F.H. and
Houchins, J.P. (1994) Immunogenetics 39, 218--221
32 Cairns, J.S. et al. (1985) Nature 217, 166--168
33 Gregersen, P.K. et aL (1986) Proc. Nati Acad. Sci. USA 83,
2642-2646
34 Cairns, J.S., Dahi, C.A., Curtsinger, J.M. and Bach, F.H.
(1988) Nucleic Acids Res. 16, 9353
35 Vidaud, D. et al. (1993)Eur..L Hun:. Genet. 1, 30-36
36 Muratani, K. el al. (1991) Proc. Nati Acad. Scl. USA 88,
11315-11319
37 Kimura, M. (1993) Neutral 'iheory of molecular EtxJhaion,
Cambridge University Press
38 Bellefroid, E.J. et ai. (1993)I,.'MBOJ. 12, 1363-1374
39 Loftus, J.C. et al. (1987) Proc. Natl Acad. ScI. USA 87,
7114-7118
40 Poncz, M. el al. (1987).L Biol. Chem. 262, 8470-8482
41 Abbot, W.M., Mellor, A., Edwards, Y. and Feizi, T. (1989)
Btochem. J. 259, 283--290
42 Sharma, S., Mehta, S., Morgan, J. and Maizel, A. (1987)
Science235, 1489-1492
4.$ Zi~kiewicz, E., Makalowski, W., Mitchell, G.A. and
Labuda, D. Science (in press)
44 Yang, S.S. et al. (1990) Cancer Res. 50, 5658-5667
45 Wong, P. et al. (1993) Genonffcs 15, 467--471
46 Kudo, S. and Fukuda, M. (1989) Proc. NatiAcad. Sci. USA
86, 4619-4623
47 Lehrman, M.A. et al. (1985) Science227, 140-146
48 Kim,J.H. et ai. (1989) Nucleic Acids Res. 17, 5687-5700
49 Brini, A.T., Lee, G.M. and Kinet, J.P. (1993) J. Biol. Cl~em.
268, 1355-1361
50 Brosius, J. (1991) science251. 753
51 Haviland, D.L. et al. (1991)J. lmmunol. 146, 362-368
]
,~,Or~OWSK& G.A. Mn'~reu.L ,,~o D ~
,~ueen,v rue ]
DJ~qsuolv op MemCAL G E N L ~ HdPrrAL S~NTEJusrtlve
ReSF.AR~ I N s m ~ DEPAgru~vr OFP e m , 4 n u ~ U N ~ t ~
DEMONI~I~ MONTREAL,QUEBE.~~tNAJ~ H 3 T IC£
TIG JUNE 1994 VOL. 10 No. 6
193