Identification of protein targeting motifs

4207
Journal of Cell Science 112, 4207-4211 (1999)
Printed in Great Britain © The Company of Biologists Limited 1999
JCS0771
Motif Trap: a rapid method to clone motifs that can target proteins to defined
subcellular localisations
Luis A. Bejarano and Cayetano González*
Cell Biology and Biophysics Programme, European Molecular Biology Laboratory, Heidelberg, Germany
*Author for correspondence (e-mail: [email protected])
Accepted 27 September; published on WWW 17 November 1999
SUMMARY
We have developed a rapid procedure termed Motif Trap
(MT) to identify protein motifs that are able to target
proteins to a distinct subcellular localisation in eukaryotic
cells. By expressing random DNA fragments fused to green
fluorescent protein (GFP), individual cells with the GFP
localisation of interest are readily isolated allowing for the
expressed DNA fragments to be cloned by RT-PCR. These
can then be used to identify the corresponding full-length
cDNAs. Using MT, we have identified patterns of GFP
localisation which correspond to every major organelle and
compartment. We have shown that MT is useful to identify
new sequences that determine subcellular localisation as
well as known targeting motifs.
INTRODUCTION
essential part, are needed. An attempt to this end was provided
by the method developed to identify nuclear markers in fission
yeast (Sawin and Nurse, 1996). However, with the sequencing
of higher eukaryotes, a similar technique is needed for high
efficiency determination of subcelular localisation in these
organisms. To identify protein motifs that are able to target
proteins to a distinct localisation in cultured cell lines derived
from higher eukaryotes we have developed a rapid procedure
termed Motif Trap (MT). The method is based on libraries
made of DNA fragments fused to GFP. Following transfection
with a MT library, the cells that display a GFP localisation of
interest are isolated and the corresponding cDNA fragment that
encodes the targeting motif cloned by RT-PCR.
One of the most conspicuous features of the eukaryotic cell is
its compartmentalization. Numerous organelles and domains
subdivide the anatomy of the cell. Moreover, most of these
can be further subdivided into well differentiated regions
or structures. The physiological relevance of such
compartmentalisation is paramount. Every major cellular
activity can be assigned to one or more subcellular
compartments. Furthermore, the intricate regulatory networks
which operate within eukaryotic cells greatly rely on the
differential compartmentalisation of their components. The
close relationship between subcellular localisation and
function is such that determining the preferential localisation
of a protein is often an essential step towards determining its
function. Typically, the identification of proteins that are
localised in a given compartment requires a very lengthy
procedure that relies on the preparative isolation of such a
compartment through cell fractionation techniques. This
approach presents several limitations. Firstly, it is totally
dependent upon the yield and enrichment that can be achieved
by these cell fractionation techniques. Secondly, cell
fractionation can lead to the loss of peripheral, yet functionally
relevant components. Finally, and more importantly, it can only
be applied to those compartments that are amenable to cell
fractionation. The relative amount and the biochemical
characteristics of every single protein can impose an additional
burden.
The genomic revolution is providing an ever increasing
number of sequences with unknown functions. To match this
wealth of sequence data, high throughput methods to generate
functional information, of which subcellular localisation is an
Key words: GFP, Protein targeting motif, Subcellular localisation,
Visual screen
MATERIALS AND METHODS
Construction of the MT vector (MT#1)
Using primers A (CATGTTGGCGGCCGCGGTACCGTCGA) and B
(GCCCGGGCGTGAGCAAGGGCGAG) we modified pEGFP-N1
(Clontech) by PCR to introduce an SrfI site between nucleotides three
and four of the GFP coding sequence. This insertion shifts the initial
ATG codon of the GFP out of frame with the rest of the coding
sequence. This ensures that only insert-containing plasmids will
express GFP. PCR was carried out with the Expand High Fidelity PCR
System (Boehringer). Oligo A also introduced a NotI site 10
nucleotides upstream of the GFP CDS. The PCR product was purified
using the PCR Quiaquick Purification System (Qiagen), ligated with
Rapid Ligation Kit (Boehringer) and used to transform Epicurian Coli
XL1-Blue (Stratagene), by heat-shock. Transformed cells were plated
out in LB-Agar suplemented with 30 µg/ml kanamycin and incubated
for 16 hours at 37°C. The modified vector was then isolated by
minipreps and the NotI fragment subcloned into a pQE31 vector
4208 L. A. Bejarano and C. González
ATG-Srf
Fig. 1. The Motif Trap approach. Random DNA fragments cloned
into the MT vector will produce fusion proteins between the
polypeptides encoded by these inserts and GFP. The MT vector
contains a high efficiency cloning site immediately after the initial
ATG of the GFP which shifts this codon out of frame with the rest of
the coding sequence. Thus, GFP can only be expressed from vectors
carrying an insert that restores the reading frame. Upon transfection
with the MT library, a fraction of the cells can be observed to express
GFP which in some cases will be localised in a particular
compartment or organelle. These cells can be cloned and the insert
that encodes for the relevant subcelluar localisation signal can be
isolated by RT-PCR.
I
GF
P
MT
vector
DNA fragments
Clone fragments
into MT vector
previously modified to introduce a NotI site between the BamHI and
KpnI sites with an adaptor made with oligos Not1-1b
(GATCGCGGCCGCGTAC) and Not1-8 (GCGGCCGC). The
resulting colonies were checked under a transiluminator to test the
expression of GFP and the NotI fragment was then isolated from one
of the colonies and subcloned into of pEGFP-N1-Not, a modified
version of pEGFP-N1 that carries an aditional NotI site inserted in
position 635-642.
MT Library
Transfect and identify
cells with localised GFP
Subclone
RT-PCR
RT-PCR
RT-PCR
Isolated DNA fragments encoding subcellular localisation signals
Construction of the MT libraries
To construct the MmcDNA-MT library, cDNA was obtained from
NIH/3T3 cells by random priming, purified with QIAEX II Gel
Extraction Kit, cloned into the SrfI site of the MT#1 vector using the
Rapid Ligation Kit (Boehringer) and transformed into E. coli XL1Blue MR (Stratagene). Plating out a small aliquot of these cells, we
estimated that the library contained about 420,000 clones of which
1.6% had no insert. The complete library was then plated out onto
a sterile Nylon filter laid on a 24×24cm plate containing LB
supplemented
with
kanamycin,
incubated at 30°C for 24 hours,
replicated into another filter, and
reincubated for 4 hours at 37°C. The
DNA from the library was then purified
with Plasmid Maxi kit (QIAGEN). The
DmgDNA-MT library was prepared in
the same way except that the inserts in
this case were Drosophila genomic
DNA fragments between 100 and 150
nt in length.
Transfection of HEK293 cells
with the MT libraries and cloning
of cells displaying localised GFP
HEK293 cells were transfected with the
MT libraries following the method
described by Chen and Okayama
(1987). The cells were observed twelve
Fig. 2. Patterns of GFP localisation
generated by transfection with a MT
library. Low (A) and high (B to I)
magnification views of HEK293 cells
transfected with the MmcDNA-MT
library. GFP is shown in green. These
cells were counterstained for DNA
using propidium iodide (red).
(B) Mitochondria; (C) centrosomal
region (arrow); (D) cytokinesis furrow
(arrow); (E) the chromosomes
(arrows); (F) the mitotic spindle. We
have not determined yet the subcellular
localisation of GFP in the cells shown
in G,H,I. Bars, 15 µm.
Identification of protein targeting motifs 4209
Fig. 3. Sequencing the inserts that target GFP localisation. (A) The
GFP fusion in clone 02/11#22 shows a strong nucleolar localisation
with a faint homogeneous nuclear background. (B) The insert from
this clone contains a well defined bipartite NLS (red) and meets the
consensus of a nucleolar localisation signal. (C) In clone 09/07#18
GFP colocalised with the ER as shown by counterstaining with an
antibody against α-calnexin (not shown). (D) The insert from this
cell line encodes a peptide of 35 amino acids that contains a
predicted trans-membrane motif (PMSIFIQLIYFLLFLFLGVIC).
This sequence does not have a match in the sequence databases.
to sixteen hours after transfection to check for localised GFP using an
inverted LEICA DMI-RBE microscope with a long distance 63×
Fluotar objective. The position of the cells of interest was labelled
with a diamond pen, and then cloned by a combination of manual
cloning and serial dilutions, as described by Harlow and Lane (1988).
In some cases, the cells were first cloned using a fluorescenceactivated cell sorter (FACS) and the resulting clones were later
analysed to determine the presence of localised GFP.
Cloning of the DNA fragments encoding sucellular
localisation sequences
The DNA fragments of interest were isolated from cloned cells,
obtained as described before, by RT-nested PCR. To this aim
we used a set of nested oligos named Fir (AGCTTCGAATTCGCGGCCGCCAACATG) Sec (TATGATCTAGAGTCGCGGCCGCTTTAC) Thi (TAGCGCTACCGGACTCAGATCTCGAGC) and Fou
(AAAACCTCTACAAATGTGGTATGGCTG) which flank the SrfI site
of the MT#1 vector. mRNA was isolated from the cloned cells using
the mRNA Capture Kit (Boehringer). The reverse transcriptase reaction
and the first round of PCR were carried out using the Titan One Tube
RT-PCR Kit with the Expand High Fidelity PCR System (Boehringer).
Oligo Fou was used to prime the RT reaction. The first round of PCR
was carried out with oligos Thi and Fou as primers. The second round
of PCR was primed with oligos Fir and Sec. The PCR product was run
on an agarose gel and isolated with QIAEX II Gel Extraction Kit
(QIAGEN). The isolated fragment was then digested with NotI and
subcloned into the MT#1 vector. This vector was the used to transfect
HEK293 cells to check if the localisation of the GFP fused to the
isolated fragment reproduced the localisation of GFP in the original
clone, in which case the insert was sequenced.
RESULTS AND DISCUSSION
The construction of a Motif Trap library and its use to identify
protein motifs that can target proteins to different cellular
compartments is summarised in Fig. 1. The source DNA, either
genomic or cDNA, is fragmented and cloned into the MT
vector. Upon transfection with a MT library, the cells that have
taken up a plasmid carrying an insert that encodes a targeting
signal can readily be identified by the localisation of GFP.
These cells are then isolated by a combination of manual and
serial dilution cloning and the corresponding cDNA fragments
cloned by RT-PCR.
Between eight and ten hours after transfection
of HEK293 cells with the MmcDNA-MT library,
some cells start to express GFP and the first
localisation patterns are recognisable (Fig. 2A).
The percentage of cells transfected with a MT
library that are expected to express GFP depends
exclusively on the proportion of clones that carry
a stop codon in frame with the GFP. This, in turn,
depends on the size and base composition of the
insert. Thus, the proportion of positive clones is
expected to be high in libraries made with short
inserts and should be minimum in MT libraries
made of full-length cDNAs. In a typical
transfection experiment with the MmcDNA-MT
library, about 50% of the cells express GFP of
which 20% display a distinct localisation of this
reporter. Fig. 2B to I shows some of the GFP
localisation patterns that we observed. Fig. 2B
shows GFP specifically localised in the
Fig. 4. Cryptic mitochondrial localisation sequences.
The mitochondrial localisation of these GFP fusions
was determined by colocalisation experiments
between mitotracker (A) and the GFP fusion (B). (C)
A merged view of the two figures. (D) Sequence of 7
inserts that drive GFP to mitocondria. Most of these
correspond to small ORFs present in non-coding
genomic DNA.
4210 L. A. Bejarano and C. González
mitochondria as confirmed by counterstaining with the
mitochondria-specific marker mitotracker (not shown). In the
cell shown in Fig. 2C, GFP displays a fairly uniform
distribution in the cytoplasm, but is significantly concentrated
in a small area near the nucleus that corresponds to the
centrosome (arrow) as revealed by counterstaining with a
human autoimmune anti-centrosome antibody (not shown).
Fig. 2D,E and F show mitotic cells from different GFP
expressing lines. GFP can be seen to localise in the cytokinesis
furrow (arrow; Fig. 2D), the chromosomes (Fig. 2E) and the
mitotic spindle (Fig. 2F). GFP does not appear to be localised
during interphase in these two cell lines. We have not yet
determined the precise subcellular localisation of GFP in the
cells shown in Fig. 2G,H and I. These are a few examples of
the patterns of GFP localisation that we have observed. Using
MT we have been able to identify cells with GFP localised in
every major organelle and compartment.
To demonstrate the use of MT to isolate protein targeting
motifs we have cloned and sequenced the DNA inserts from
some of these cells. As expected, we have found sequences that
correspond to known proteins and contain targeting signals
which are consistent with the observed localisation of the GFP
fusion. One of these is clone 02/11#22, (Fig. 3A,B). The GFP
fusion in this cell line shows a distinct nucleolar localisation
with a weak nuclear background. The insert from this line is
identical to the fragment that spans between amino acids 62
and 131 of the mouse homologue of the HTLV-I tax responsive
element binding protein TAXREB107 (Nacken et al., 1995).
This fragment, contains a well defined bipartite nuclear
localisation signal (KRKYSAAKTKVEKKKKKE) and meets
the consensus of a nucleolus localisation signal. We have also
found inserts that are new sequences which do not have a match
in the databases. This is the case of clone 09/07#18 (Fig.
3C,D). These cells contain GFP that is tightly localised to the
endoplasmic reticulum (ER), as shown by counterstaining with
an antibody against the ER marker α-calnexin (not shown;
Cannon and Helenius, 1999). The insert from this cell line
encodes a peptide, 35 amino acids long and contains
a
predicted
trans-membrane
motif
(PMSIFIQLIYFLLFLFLGVIC) that may account for the ER localisation
shown by the fusion protein (von Heijne, 1992). Indeed, this
does not mean to say that the identified motif contains an active
retention signal since the ER accumulation of the fusion
protein could result from its failure to progress further into the
secretory pathway. These observations demonstrate the use of
MT to identify and clone protein fragments bearing specific
targeting signals.
The main feature of MT is that it is not biased by the
particular features of a given protein. Polypeptides containing
a localisation signal are identified solely on the basis of their
capability to drive GFP to a particular subcellular
compartment. The abundance, physiological role, and possible
regulation of the full-length protein is totally irrelevant.
Moreover, MT is not limited to compartments that are
amenable to cell fractionation. In principle, proteins belonging
to any compartment that can be visualised by fluorescence
microscopy could be identified with this method.
Nevertheless, several limitations apply to the Motif Trap
approach. Since the MT vector generates N-terminal fusions
with the reporter, it cannot be used to identify localisation
signals that reside in the C-terminal end of the molecule such
as CAAX motifs (Hancock et al., 1990) or proteins whose
compartmentalisation relays on the subcellular localisation of
their transcripts, which usually depends on the 3′UTR region
(reviewed by Kislauskis and Singer, 1992). These limitations
could be circumvented by a different MT vector designed to
produce C-terminal fusions, but in this case the reporter gene
would be expressed regardless of the presence of an insert and
whether or not the insert carries an open reading frame.
Consequently, a great many more cells would have to be scored
to find clones that display a particular localisation of the
reporter.
An additional limitation of the MT approach is due to the
very nature of the subcellular localisation motifs. Given the
short size and low complexity of many of these, stretches of
amino acids that match a consensus, but do not act as targeting
sequences are not uncommon. For instance, it has been
estimated that a significant fraction of the proteins that carry a
nuclear localisation signal are never found in the nucleus
(Dingwall and Laskey 1991). Mitochondrial signal sequences
(or pre-sequences) also fall into this category. Mitochondrial
presequences have been defined as N-terminal peptides that are
generally basic, 15-30 amino acids long, rich in hydroxilated
amino acids, deficient in acidic amino acids, and can form an
amphiphilic α-helix or β-sheet (Roise and Schatz, 1988). To
get an estimate of the incidence of cryptic mitochondrial
signals and how could they affect the identification of
mitochondrial proteins using MT, we made a library with small
inserts of Drosophila genomic DNA that was then tested in
human HEK293 cells. We used genomic DNA from
Drosophila as a source or high complexity DNA which, when
cut in small inserts, would render a nearly random collection
of sequences. The proportion of clones from this library that
displayed a mitochondrial localisation was very high, around
15% of the GFP expressing cells. Cloning and sequencing of
seven of these inserts showed that none of them corresponded
to known mitochondrial proteins. In fact, most of them
corresponded to very short ORFs present in non-coding
sequences like satellite DNA and rDNA (Fig. 4). Interestingly,
despite the nearly random origin of these inserts, all of them
fit very precisely the defined consensus for a mitochondrial
pre-sequence: short, with a net positive charge, containing
some hydroxilated amino acids and predicted to fold as an
amphiphilic α-helix, thus illustrating the power of the MT
approach to define consensus localisation signals using nearlyrandom DNA.
Short coding sequences, just like non-coding sequences, are
therefore expected to contain numerous cryptic subcellular
localisation motifs that are normally silent due to different
reasons. Probably, the most common one is their position
within the folded protein which renders them inaccessible to
the other components of the subcellular localisation machinery
(Roise and Schatz, 1988). In addition, some potential
localisation signals may fail to target because of the presence
of other targeting signals that act in a dominant manner and
recruit the protein to a different compartment (Dingwall and
Laskey 1991). These cryptic targeting signals can be revealed
by taking them out of their natural context in a fusion with a
reporter protein. Moreover, short DNA fragments can also
contain non-coding ORFs that may encode artefactual targeting
sequences. Although non-naturally occurring sequences
capable of targeting may have very interesting applications,
Identification of protein targeting motifs 4211
they can produce a bothersome background when naturally
occurring targeted proteins are to be identified. The incidence
of all these artefactual results can be considerably reduced by
the use of MT libraries made with large cDNA inserts so that
nearly full-length proteins are fused to the reporter. These
protein fusions have a higher chance of folding, thus
inactivating cryptic targeting signals. They also have more
chances of carrying the more dominant targeting motif, the one
that determines the normal localisation of the protein.
Moreover, longer cDNAs will increase the frequency of stop
codons in any of the incorrect ORFs thus decreasing their
artefactual expression.
Finally, the MT approach is constrained by the limitations
that apply to artificial protein expression in eukaryotic cells.
The main source of artefactual results is overexpression.
Overexpression can lead to the ectopic accumulation of a
protein in one of the compartments it normally goes through
to reach its final destination, to the production of aggregates,
and, when overexpression results in protein levels which
exceeds the capability of the protein folding machinery, to the
recruitment of the unfolded protein in the aggresome (Johnston
et al., 1988).
Despite these potential limitations, that can be reasonably
controlled by choosing the right vector, insert size and
expression conditions, MT provides a rapid approach to
identify localised proteins. There are several potential
applications of MT that we can envisage will be developed in
the future. MT may offer a powerful tool in the field of
functional genomics as a high throughput method to determine
the subcellular localisation of the fast growing number of
sequences which are being generated by ongoing genome
projects. Another application of MT is the identification of
proteins that are differentially sorted in differentiating cells like
epithelial cells that are induced to polarise or primary cultures
of differentiating neurons (Dotti and Simons, 1990). Similarly,
MT could be used to detect proteins which are specifically
relocated under different stress conditions, following heatshock for instance, or after infection with pathogens which
result in a dramatic rearrangement of the architecture of the
cell (Cudmore et al., 1997). Another interesting application of
MT will be the cloning of interacting partners of a given
protein by transfecting cells which contain the protein labelled
with a fluorochrome that produces FRET (Cubitt et al., 1995)
with the library’s reporter. Finally, systematic screenings with
a MT library could be used to identify new domains within
known organelles and compartments. An additional spin off of
MT is the establishment of a large collection of cell lines that
carry GFP at different locations.
While this paper was being reviewed, a related manuscript
was published describing a visual screen of a GFP-fusion
library (Rolls et al., 1999). Unlike the method described here,
this screening was carried out by examining the patterns of GFP
localisation in cells transfected with pools of clones generated
by subdividing the library. Further subdivision into smaller
pools and rescreening was used to isolate the clones responsible
for a particular pattern of interest. Another difference with the
MT method is that in the library used by Roll et al. (1999) the
cDNAs were inserted after the 3′ end of the GFP coding
sequence. Technical diferences aside, the visual screen
described by these authors and the MT approach are
conceptually identical and represent the first two of what may
soon become a large number of variations on the same theme.
We are indebted to M. Way, C. Dotti, S. Cohen, E. Karsenti, T.
Nilsson, T. Hyman, R. Pepperkok and G. Griffiths for their
encouragement and support as well for their critical reading of this
manuscript. Some of the initial experiments that lead to the
development of MT were carried out by L.A.B. in the laboratory of
J. Jimenez. G. Alonso, suggested the use of the HEK293 cells. L.A.B.
is recipient of a postdoctoral fellowship from the Spanish ‘Ministerio
de Educacion’. Work in our laboratory is partially financed by EU
grant CT960005.
REFERENCES
Cannon, K. S. and Helenius, A. (1999). Trimming and readdition of glucose
to N-linked oligosaccharides determines calnexin association of a substrate
glycoprotein in living cells. J. Biol. Chem. 274, 7537-7544.
Chen, C. and Okayama, H. (1987). High-efficiency transformation of
mammalian cells by plasmid DNA. Mol. Cell Biol. 7, 2745-2752.
Cubitt, A. B., Heim, R., Adams, S. R., Boyd, A. E., Gross, L. A. and Tsien,
R. Y. (1995). Understanding, improving and using green fluorescent
proteins. Trends Biochem. Sci. 20, 448-455.
Cudmore, S. Reckmann, I. and Way, M. (1997). Viral manipulations of the
actin cytoskeleton. Trend Microbiol. 5, 142-147.
Dingwall, C. and Laskey, R. A. (1991). Nuclear targeting sequences – a
consensus? Trends Biochem. Sci. 16, 478-481.
Dotti, C. G. and Simmons, K. (1990). Polarized sorting of viral glycoproteins
to the axon and dendrites of hippocampal neurons in culture. Cell 62, 6372.
Hancock, J. F., Paterson, H. and Marshall, C. J. (1990). A polybasic domain
or palmitoylation is required in addition to the CAAX motif to localize
p21ras to the plasma membrane. Cell 63, 133-139.
Harlow, E. and Lane, D. (1988). Antibodies: a Laboratory Manual. Cold
Spring Harbour Laboratory Press, NY.
Johnston, J. A., Ward, C. L. and Kopito, R. R. (1998). Aggresomes: a
cellular response to misfolded proteins. J. Cell Biol. 143, 1883-1898.
Kislauskis, E. H. and Singer, R. H. (1992). Determinants of mRNA
localization. Curr. Opin. Cell Biol. 4, 975-978.
Nacken, W., Klempt, M. and Sorg, C. (1995). The mouse homologue of the
HTLV-I tax responsive element binding protein TAXREB107 is a highly
conserved gene which may regulate some basal cellular functions. Biochim.
Biophys. Acta 1261, 432-434.
Roise, D. and Schatz, G. (1988). Mitochondrial presequences. J. Biol. Chem.
63, 4509-4511.
Rolls, M. M., Stein, P. A., Taylor, S. S., Ha, E., McKeon, F. and Rapoport,
T. A. (1999). A visual screen of a GFP-fusion library identifies a new type
of nuclear envelope membrane protein. J. Cell Biol. 146, 29-43.
Sawin, K. E. and Nurse, P. (1996). Identification of fission yeast nuclear
markers using random polypeptide fusions with green fluorescent protein.
Proc. Nat. Acad. Sci. USA 94, 15146-15151.
von Heijne, G. (1992). Membrane protein structure prediction, hydrophobicity
analysis and the positive-inside rule. J. Mol. Biol. 225, 487-494.