Selection of soluble protein expression constructs

908
Biochemical Society Transactions (2010) Volume 38, part 4
Selection of soluble protein expression constructs:
the experimental determination of protein
domain boundaries
Michael R. Dyson1
Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1QW, U.K.
Abstract
Proteins can contain multiple domains each of which is capable of possessing a separate independent
function and three-dimensional structure. It is often useful to clone and express individual protein domains
to study their biochemical properties and for structure determination. However, the annotated domain
boundaries in databases such as Pfam or SMART are not always accurate. The present review summarizes
various strategies for the experimental determination of protein domain boundaries.
Introduction
The expression and purification of recombinant proteins
is a crucial first step in many biochemical studies to
enhance our knowledge of gene function. Once a protein
has been successfully expressed and purified, this opens
the door to several research avenues, including interaction
studies with other proteins or nucleic acids, testing for
enzyme activity, the generation of antibodies and ultimately
structure determination. However, successful soluble protein
expression is sometimes difficult, depending on the target.
For structure determination, the requirement for milligram
quantities of protein often means that the crucial bottleneck
is in identifying an expression construct that can express high
levels of soluble protein. It is not unusual for a researcher to
spend 2 years in this search with no guarantee of success at
the end. For other applications such as interaction studies or
antibody generation, although the amount of purified protein
needed is less, there is still a requirement for soluble and
correctly folded protein. Fortunately, our knowledge in the
field of protein expression in heterologous hosts has rapidly
accumulated in the last decade, so that protein expression is
no longer quite the process of trial and error that it used
to be. This has been helped by several studies and new
methods that have come from laboratories engaged in highthroughput protein expression for structure determination
[1] or antibody generation [2]. For example, there have been
several studies linking various protein properties and features
that correlate with the ability of a protein to be successfully
expressed in Escherichia coli [3–5]. These features are not
all listed in the present review, but the main conclusions
were that the simpler the protein was in terms of molecular
mass, having a low number of contiguous hydrophobic
Key words: fusion protein, genetic screening, library, protein domain boundary, protein
expression, sequence alignment.
Abbreviations used: CAT, chloramphenicol acetyltransferase; (m)DHFR, (murine) dihydrofolate
reductase; EGFR, epidermal growth factor receptor; ESPRIT, expression of soluble proteins by
random incremental truncation; GFP, green fluorescent protein; HEK, human embryonic kidney;
PECAM1, platelet endothelial cell adhesion molecule 1; Tat, twin-arginine translocation.
1
email [email protected]
C The
C 2010 Biochemical Society
Authors Journal compilation amino acids and low complexity regions [6], the more
likely soluble expression in E. coli would be. This finding
indicated that, for a large complex protein containing many
protein domains, it is better to truncate and express either
a single domain or a small set of tandem domains. Protein
domains can form a compact three-dimensional structure, are
20–500 amino acids in length and may be defined as part of a
protein sequence able to evolve, function and exist independently of the rest of the protein chain. The cloning and expression of single protein domains requires the precise mapping of
the domains and domain boundaries on the linear polypeptide
sequence. The aim of the present review is to present various
strategies to identify a region of a protein comprising one or
more protein domains capable of soluble expression.
Alternative strategies for full-length
protein expression
Although truncating proteins to achieve the desired
expression construct is often an excellent route to success,
one also should be aware of alternative strategies for the
soluble expression of a full-length protein. A popular choice
is fusion with a solubility enhancing tag such as MBP
(maltose-binding protein) [7], but one should always cleave
the tag following expression to check that the target protein
remains soluble. Expression in E. coli can result in inclusion
body formation, and it is possible to find conditions to refold
misfolded proteins, but the conditions are usually specific to
that particular protein [8]. Almost 90% of proteins that have
been expressed, crystallized and their structures deposited in
the Protein Data Bank (PDB) [9] were expressed in E. coli.
However, various eukaryotic expression systems [10] are now
being used more frequently for the expression of eukaryotic
proteins, including the yeasts Saccharomyces cerevisiae [11]
and Pichia pastoris [12], the wheat-germ cell-free system
[13] and insect [14] or mammalian cells [15,16]. These have
the advantage that they have compatible chaperones and
binding partners that may aid protein folding. Many proteins
Biochem. Soc. Trans. (2010) 38, 908–913; doi:10.1042/BST0380908
Experimental Approaches to Protein–Protein Interactions
are naturally part of multi-component protein assemblies
in vivo, and various strategies have been developed to
co-express binding partners to stabilize proteins that are
inherently unstable when expressed alone [17–19].
Figure 1 Domain map of human EGFR (1210 amino acids)
Marked protein features and Pfam domains include: A, signal peptide
(amino acids 1–24); B, receptor L domain (amino acids 57–168 and
361–481); C, furin-like cysteine-rich region (amino acids 184–338);
D, transmembrane region (amino acids 646–668); E, low-complexity
regions (amino acids 651–666, 1002–1015 and 1025–1046); and F,
protein tyrosine kinase domain (amino acids 712–968).
Experimental determination of protein
domains by semi-empirical methods
Protein domains are usually identified in databases such
as Pfam [20] and SMART [21] by homology through
sequence alignment. This works very well to identify core
consensus motifs, but is less accurate at predicting the domain
edges where there is greater sequence variability [4,22].
This is illustrated by comparing domain edges predicted
by Pfam for proteins and protein domains where their
structures have been determined. For example, human EGFR
(epidermal growth factor receptor) is a type I integral
membrane protein which is involved in cell growth and
differentiation, and its abnormal activation is associated
with human cancer. Two regions of this protein have been
expressed, crystallized and their structures determined: a
truncated ectodomain comprising amino acids 1–501, which
was expressed in CHO (Chinese-hamster ovary) cells [23],
and the intracellular kinase domain (amino acids 672–998),
purified from baculovirus-infected insect cells [24]. The sizes
of the successful protein expression constructs are larger
than those predicted by Pfam (Figure 1). For example the
ectodomain consists of the mapped Pfam domains receptor
L domain (amino acids 57–168 and 361–481) and a furinlike cysteine-rich region (amino acids 177–338), whereas the
expression construct that gave crystals of sufficient quality
for structure determination was extended at the C-terminus
by 20 amino acids compared with the last mapped Pfam
domain. Similarly, the expressed kinase domain was 40 and
30 amino acids longer at the N- and C-terminus respectively.
This and other comparisons suggest that there may be small
‘windows’ in polypeptide space surrounding mapped Pfam
domains that define a successful expression construct. So it
might be possible to screen a small number of variants where
the co-ordinates are spaced at five amino acid intervals. This
is illustrated in Figure 2 for expression of the first Ig domain
of CD31 or PECAM1 (platelet endothelial cell adhesion
molecule 1) using data from our own laboratory (S. Chapple,
A. Crofts and M.R. Dyson, unpublished work). Here various
constructs were designed by ‘primer walking’ at five amino
acid intervals from the end of the first annotated Ig domain
(amino acids 40–101) and expression screening in HEK
(human embryonic kidney)-293E cells [25,26]. Expression
was achieved with a construct spanning amino acids 1–116
and 1–121, but not constructs smaller than this. Interestingly,
attempted expression of the first Ig domain of the c-kit oncogene failed using this strategy (Figure 2), but expression of
the first two domains in tandem was successful, showing that
protein domains can be stabilized by a neighbouring domain.
The construct that expressed was 12 amino acids longer than
the second Pfam Ig domain. Recently, a webtool has been
described to aid in the process of annotating features on to a
polypeptide sequence and designing primers to PCR-amplify
several truncation variants to a region of interest [27]. It is
also worthwhile to BLAST search the PDB to check whether
a homologous protein has been crystallized, and this may
point to where the likely structural domain boundaries are
located.
Library-based approaches
Although the semi-empirical methods described above have
proved to be successful, only a limited number of variants can
be tested because of the costs and time involved in designing
individual constructs. This can be a problem when it has been
reported that single amino acid additions can have large effects
on the levels of soluble protein expression [28]. Furthermore,
designing constructs around an annotated domain is not
possible for proteins that lack any annotated domains, as
was the case for influenza virus polymerase [29]. Here, one
can envisage making a large library of constructs by randomly
fragmenting a gene, cloning into an expression vector and then
screening for soluble protein expression. There are several
ways to randomly fragment DNA before subcloning, including the physical method of shearing and the enzymatic methods of endo- or exo-nuclease treatment, and these have been
reviewed in [30]. The methods can be split into two: either
the creation of a random set of fragments by internal cleavage
or exonuclease cleavage to generate a 5 or 3 nested set of
fragments. Figure 3 illustrates a method involving PCR in the
presence of dUTP and cleavage with endonuclease V [28].
Reporters of soluble protein expression
Once a set of random fragments have been created and
cloned into a suitable expression vector, one needs to screen
this library for individual clones with the desired properties
of minimum size and high-level soluble expression. This
is usually performed by fusion to a peptide or protein
tag. For example, Reich et al. [31] generated a random
fragment library of p85a by performing a PCR where the
standard dNTP mixture was doped with dUTP and then
the product was treated with uracil-DNA glycosylase to
excise the uracil bases, generating abasic sites. These are
cleaved by endonuclease IV, giving a single-strand nick that
is converted into a double-strand break and blunt-ended
C The
C 2010 Biochemical Society
Authors Journal compilation 909
910
Biochemical Society Transactions (2010) Volume 38, part 4
Figure 2 Experimental domain mapping of receptor ectodomains
Truncated receptor ectodomains were expressed as C-terminal rat Cd4 (domains 3 and 4) fusions in HEK-293E cells as
described previously [25]. At 5 days after transfection, cells were harvested and supernatants were analysed for secreted
protein expression by separation by SDS/PAGE, Western blotting and detection with anti-His mouse monoclonal antibody
(Novagen) followed by goat anti-mouse Cy5 (indodicarbocyanine)-labelled antibody (GE Healthcare). aa, amino acids; TM,
transmembrane domain.
by S1 nuclease. The advantage of this method is that the
size distribution is dependent solely on the proportion of
dUTP included in the original PCR mixture. Transformants
are screened for soluble expression by the principle of ‘tag
availability’, where it is reasoned that soluble, correctly
folded, monomeric protein will have a freely accessible Cterminal tag, whereas misfolded aggregated protein will not.
Colonies on membranes that react with an anti-tag antibody
are used to inoculate liquid cultures in multiwell plates, then
expression is induced, followed by cell lysis. Soluble protein
is separated from inclusion bodies by passing the lysate
through a filter membrane [32] before affinity purification
with final detection by a dot-blot Western procedure. All
four known globular domains of p85a were identified using
C The
C 2010 Biochemical Society
Authors Journal compilation this procedure. A variation of the ‘tag availability’ method has
also been described by Tarendeau et al. [29], termed ESPRIT
(expression of soluble proteins by random incremental truncation), where nested sets of deletion fragments are generated
from either the 5 or 3 ends of a gene that is already cloned into
an expression vector. Screening is enabled by fusion at the Cterminus to a peptide mimic of BCCP (biotin carboxyl carrier
protein) that can be biotinylated in vivo as long as the peptide
tag remains accessible. High-density arrays are printed and
probed with both streptavidin and anti-histidine antibody to
identify expression hits that are then validated by small-scale
expressions. ESPRIT has produced some notable successes,
including identifying expression constructs used to solve the
structures of influenza polymerase PB2 subunit [29,33,34],
Experimental Approaches to Protein–Protein Interactions
Figure 3 Generation of a library of Fli1 fragments
(A) Full-length Fli1 gene was PCR-amplified in the presence of different dTTP/dUTP ratios, digested with endonuclease V and
analysed by agarose gel (1.6%) electrophoresis. Lane M, DNA standard (HyperLadder 1; BIOLINE marker); lane 1, 0.2 mM
dTTP/0 mM dUTP; lane 2, 0.15 mM dTTP/0.05 mM dUTP; lane 3, 0.1 mM dTTP/0.1 mM dUTP; lane 4, 0.05 mM dTTP/0.15 mM
dUTP; lane 5, 0 mM dTTP/0.2 mM dUTP. ORF, open reading frame. (B) Histogram showing the size distribution frequency
(N) of 240 randomly picked Fli1 library entry plasmids.
the Helicobacter pylori CagA [35], and the neurofibromatosis
type 1 protein NF1 (neurofibromin 1) [36].
C-terminal protein fusions have also been used as reporters
of soluble expression where the misfolding of the upstream
target perturbs the folding and function of the C-terminal
fusion. GFP (green fluorescent protein) acts as a very efficient
reporter [37], enabling the picking of green colonies to
enrich populations of soluble expression clones [38,39]. This
was used to identify soluble domains of telomerase reverse
transcriptase [40]. Protein or peptide reporter fusions can
influence the folding of its upstream partner and so a split
GFP system was developed by Cabantous et al. [41,42], where
a 15-amino-acid GFP fragment (β-strand 11) is fused to a
test protein and the GFP β-strand 1–10 detector fragment
is expressed separately. Green colonies only arise when the
peptide tag is accessible and can conjugate to the GFP. In
a further development of the method, a split GFP system
(GFP insertion) has been employed to avoid the selection
of false-positive artefacts from internal cryptic ribosomebinding sites in the target RNA [38]. In addition four GFP
reporters have been selected that sense different degrees of
protein misfolding. This enables the selection of variants
with improved folding using a reporter with the appropriate
stringency based on the brightness of the GFP insertion
fusions [43]. Wigley et al. [44] has used complementation
between the α- and ω-fragments of β-galactosidase as the
basis of solubility assay validated with Alzheimer’s disease Aβ
(amyloid β-peptide) and a non-amyloidogenic mutant [44].
An alternative reporter system was developed by Lesley
et al. [45] which involved inserting reporters downstream of
the promoters of proteins normally up-regulated during the
expression of misfolded proteins such as chaperones. Christ
et al. [46] devised a method to identify protein domains
termed ‘shotgun proteolysis’. Here a random cDNA library is
fragmented, cloned into a filamentous bacteriophage display
vector between an N-terminal tag and the phage pIII gene,
phages are subjected to limited proteolysis and only proteaseresistant phages are able to be captured via an affinity tag,
eluted and amplified. The method requires that the target
domain is able to be secreted to the E. coli periplasm for phage
display and may miss constructs that contain a folded domain,
but are flanked by flexible linker polypeptide sequences.
Genetic screening for soluble protein
expression
Potentially greater library sizes can be screened by
using ‘life/death’ genetic reporters. CAT (chloramphenicol
acetyltransferase) [29] has been used as a C-terminal fusion
to monitor protein solubility by chloramphenicol resistance
[47]. However, CAT is trimeric, which may result in the
formation of higher-order aggregates for multimeric proteins.
C The
C 2010 Biochemical Society
Authors Journal compilation 911
912
Biochemical Society Transactions (2010) Volume 38, part 4
Alternatively, mDHFR (murine dihydrofolate reductase) has
the advantage that it is monomeric and has been shown not to
greatly perturb the folding of upstream protein fusion partners [4]. Selection is enabled by plating expression libraries on
minimal agar plates in the presence of TMP (trimethoprim),
which effectively inhibits E. coli DHFR (dihydrofolate
reductase) but not mDHFR. DHFR selection enabled the
identification of the ETS domain of the transcription factor
Fli1 and soluble fragments of the cell-surface receptor
PECAM1, including the natively unfolded cytoplasmic
domain, which was used to select for antibodies using phage
display [28]. This work also highlighted that all hits from random screens need to be characterized further by biophysical
methods such as NMR [48,49]. Lim et al. [50] has described
a promising selection system based on the Tat (twin-arginine
translocation) pathway, which requires the protein to be
folded before periplasmic secretion. When a set of mammalian
polypeptides was fused between an N-terminal Tat export
signal and a C-terminal selectable marker (β-lactamase), there
was a very good correlation between their solubility profile
and the ability to confer growth in the presence on ampicillin.
Concluding remarks
There is a large difference between the experiments possible
in a laboratory geared up for high-throughput expression
with specialized robotics, as found in structural-genomics
laboratories or industry, compared with more standard
research laboratories. Unless a researcher can collaborate
with a group for access to robotics, he or she needs to
decide on the easiest route to the desired outcome in an
expression project. In many cases, if the target protein
domain is annotated, simply designing a set of six to twelve
constructs will usually yield an expression hit. However, for
high-value intractable targets, the powerful combinatorial
biology techniques reviewed in the present paper can rescue
a failing project. Some of these methods do not require access
to specialized equipment. Not only can these techniques
yield the desired expression construct, but also they provide
valuable, experimentally derived, information regarding the
true domain boundaries of proteins.
Acknowledgements
I thank Rajika L. Perera and Maikel Fransen (Department of
Biochemistry, University of Cambridge) for critical reading of this
manuscript.
Funding
This work was supported by the Wellcome Trust.
References
1 Terwilliger, T.C., Stuart, D. and Yokoyama, S. (2009) Lessons from
structural genomics. Annu. Rev. Biophys. 38, 371–383
C The
C 2010 Biochemical Society
Authors Journal compilation 2 Schofield, D., Pope, A., Clementel, V., Buckell, J., Chapple, S., Clarke, K.,
Conquer, J., Crofts, A., Crowther, S., Dyson, M. et al. (2007) Application of
phage display to high throughput antibody generation and
characterization. Genome Biol. 8, R254
3 Braun, P., Hu, Y., Shen, B., Halleck, A., Koundinya, M., Harlow, E. and
LaBaer, J. (2002) Proteome-scale purification of human proteins from
bacteria. Proc. Natl. Acad. Sci. U.S.A. 99, 2654–2659
4 Dyson, M.R., Shadbolt, S.P., Vincent, K., Perera, R. and McCafferty, J.
(2004) Production of soluble mammalian proteins in Escherichia coli:
identification of protein features that correlate with successful
expression. BMC Biotechnol. 4, 32
5 Goh, C.-S., Lan, N., Douglas, S.M., Wu, B., Echols, N., Smith, A., Milburn,
D., Montelione, G.T., Zhao, H. and Gerstein, M. (2004) Mining the
structural genomics pipeline: identification of protein properties that
affect high-throughput experimental analysis. J. Mol. Biol. 336, 115–130
6 Esnouf, R.M., Hamer, R., Sussman, J.L., Silman, I., Trudgian, D., Yang, Z.R.
and Prilusky, J. (2006) Honing the in silico toolkit for detecting protein
disorder. Acta Crystallogr. Sect. D Biol. Crystallogr. 62, 1260–1266
7 Waugh, D.S. (2005) Making the most of affinity tags. Trends Biotechnol.
23, 316–320
8 Chow, M.K., Amin, A.A., Fulton, K.F., Fernando, T., Kamau, L., Batty, C.,
Louca, M., Ho, S., Whisstock, J.C., Bottomley, S.P. and Buckle, A.M. (2006)
The REFOLD database: a tool for the optimization of protein expression
and refolding. Nucleic Acids Res. 34, D207–D212
9 Berman, H. (2008) The Protein Data Bank: a historical perspective. Acta
Crystallogr. Sect. A Found. Crystallogr. 64, 88–95
10 Aricescu, A.R., Assenberg, R., Bill, R.M., Busso, D., Chang, V.T., Davis, S.J.,
Dubrovsky, A., Gustafsson, L., Hedfalk, K., Heinemann, U. et al. (2006)
Eukaryotic expression: developments for structural proteomics. Acta
Crystallogr. Sect. D Biol. Crystallogr. 62, 1114–1124
11 Lang, C. (2007) Saccharomyces cerevisiae: a microbial eukaryotic
expression system. in Methods Express: Expression Systems (Dyson, M.R.
and Durocher, Y., eds), pp. 109–121, Scion Publishing Ltd, Bloxham
12 Lin-Cereghino, G.P., Leung, W. and Lin-Cereghino, J. (2007) Expression of
proteins in Pichia pastoris. in Methods Express: Expression Systems
(Dyson, M.R. and Durocher, Y., eds), pp. 123–144, Scion Publishing Ltd,
Bloxham
13 Goshima, N., Kawamura, Y., Fukumoto, A., Miura, A., Honma, R., Satoh,
R., Wakamatsu, A., Yamamoto, J., Kimura, K., Nishikawa, T. et al. (2008)
Human protein factory for converting the transcriptome into an in
vitro-expressed proteome. Nat. Methods 5, 1011–1017
14 Hunt, I. (2005) From gene to protein: a review of new and enabling
technologies for multi-parallel protein expression. Protein Expression
Purif. 40, 1–22
15 Aricescu, A.R., Lu, W. and Jones, E.Y. (2006) A time- and cost-efficient
system for high-level protein production in mammalian cells. Acta
Crystallogr. Sect. D Biol. Crystallogr. 62, 1243–1250
16 Nettleship, J.E., Assenberg, R., Diprose, J.M., Rahman-Huq, N. and Owens,
R.J. (2010) Recent advances in the production of proteins in insect and
mammalian cells for structural biology. J. Struct. Biol., doi:10.1016/
j.jsb.2010.02.006
17 Romier, C., Ben Jelloul, M., Albeck, S., Buchwald, G., Busso, D., Celie,
P.H.N., Christodoulou, E., De Marco, V., van Gerwen, S., Knipscheer, P.
et al. (2006) Co-expression of protein complexes in prokaryotic and
eukaryotic hosts: experimental procedures, database tracking and case
studies. Acta Crystallogr. Sect. D Biol. Crystallogr. 62, 1232–1242
18 Szymczak, A.L., Workman, C.J., Wang, Y., Vignali, K.M., Dilioglou, S.,
Vanin, E.F. and Vignali, D.A.A. (2004) Correction of multi-gene deficiency
in vivo using a single ‘self-cleaving’ 2A peptide-based retroviral vector.
Nat. Biotechnol. 22, 589–594
19 Trowitzsch, S., Bieniossek, C., Nie, Y., Garzoni, F. and Berger, I. (2010)
New baculovirus expression tools for recombinant protein complex
production. J. Struct. Biol., doi:10.1016/j.jsb.2010.02.010
20 Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin,
O.L., Gunasekaran, P., Ceric, G., Forslund, K. et al. (2009) The Pfam
protein families database. Nucleic Acids Res. 38, D211–D222
21 Letunic, I., Doerks, T. and Bork, P. (2009) SMART 6: recent updates and
new developments. Nucleic Acids Res. 37, D229–D232
22 Dyson, M.R. (2007) Expression strategy. in Methods Express: Expression
Systems (Dyson, M.R. and Durocher, Y., eds), pp. 1–10, Scion Publishing
Ltd, Bloxham
23 Garrett, T.P.J., McKern, N.M., Lou, M., Elleman, T.C., Adams, T.E., Lovrecz,
G.O., Zhu, H.-J., Walker, F., Frenkel, M.J., Hoyne, P.A. et al. (2002) Crystal
structure of a truncated epidermal growth factor receptor extracellular
domain bound to transforming growth factor α. Cell 110, 763–773
Experimental Approaches to Protein–Protein Interactions
24 Stamos, J., Sliwkowski, M.X. and Eigenbrot, C. (2002) Structure of the
epidermal growth factor receptor kinase domain alone and in complex
with a 4-anilinoquinazoline inhibitor. J. Biol. Chem. 277, 46265–46272
25 Chapple, S., Crofts, A., Shadbolt, S.P., McCafferty, J. and Dyson, M.R.
(2006) Multiplexed expression and screening for recombinant protein
production in mammalian cells. BMC Biotechnol. 6, 49
26 Durocher, Y., Perret, S. and Kamen, A. (2002) High-level and
high-throughput recombinant protein production by transient
transfection of suspension-growing human 293-EBNA1 cells. Nucleic
Acids Res. 30, E9
27 Mooij, W.T., Mitsiki, E. and Perrakis, A. (2009) ProteinCCD: enabling the
design of protein truncation constructs for expression and crystallization
experiments. Nucleic Acids Res. 37, W402–W405
28 Dyson, M.R., Perera, R.L., Shadbolt, S.P., Biderman, L., Bromek, K.,
Murzina, N.V. and McCafferty, J. (2008) Identification of soluble protein
fragments by gene fragmentation and genetic selection. Nucleic Acids
Res. 36, E51
29 Tarendeau, F., Boudet, J., Guilligay, D., Mas, P.J., Bougault, C.M., Boulo, S.,
Baudin, F., Ruigrok, R.W.H., Daigle, N., Ellenberg, J. et al. (2007) Structure
and nuclear import function of the C-terminal domain of influenza virus
polymerase PB2 subunit. Nat. Struct. Mol. Biol. 14, 229–233
30 Prodromou, C., Savva, R. and Driscoll, P.C. (2007) DNA fragmentationbased combinatorial approaches to soluble protein expression. Part I.
Generating DNA fragment libraries. Drug Discov. Today 12,
931–938
31 Reich, S., Puckey, L.H., Cheetham, C.L., Harris, R., Ali, A.A.E.,
Bhattacharyya, U., Maclagan, K., Powell, K.A., Prodromou, C., Pearl, L.H.
et al. (2006) Combinatorial domain hunting: an effective approach for
the identification of soluble protein domains adaptable to
high-throughput applications. Protein Sci. 15, 2356–2365
32 Knaust, R.K.C. and Nordlund, P. (2001) Screening for soluble expression
of recombinant proteins in a 96-well format. Anal. Biochem. 297, 79–85
33 Guilligay, D., Tarendeau, F., Resa-Infante, P., Coloma, R., Crepin, T., Sehr,
P., Lewis, J., Ruigrok, R.W.H., Ortin, J., Hart, D.J. and Cusack, S. (2008) The
structural basis for cap binding by influenza virus polymerase subunit
PB2. Nat. Struct. Mol. Biol. 15, 500–506
34 Tarendeau, F., Crepin, T., Guilligay, D., Ruigrok, R.W. H., Cusack, S. and
Hart, D.J. (2008) Host determinant residue lysine 627 lies on the surface
of a discrete, folded domain of influenza virus polymerase PB2 subunit.
PLoS Pathog. 4, e1000136
35 Angelini, A., Tosi, T., Mas, P., Acajjaoui, S., Zanotti, G., Terradot, L. and
Hart, D.J. (2009) Expression of Helicobacter pylori CagA domains by
library-based construct screening. FEBS J. 276, 816–824
36 Bonneau, F., Lenherr, E.D., Pena, V., Hart, D.J. and Scheffzek, K. (2009)
Solubility survey of fragments of the neurofibromatosis type 1 protein
neurofibromin. Protein Expression Purif. 65, 30–37
37 Waldo, G.S., Standish, B.M., Berendzen, J. and Terwilliger, T.C. (1999)
Rapid protein-folding assay using green fluorescent protein. Nat.
Biotechnol. 17, 691–695
38 Kawasaki, M. and Inagaki, F. (2001) Random PCR-based screening for
soluble domains using green fluorescent protein. Biochem. Biophys. Res.
Commun. 280, 842–844
39 Nakayama, M. and Ohara, O. (2003) A system using convertible vectors
for screening soluble recombinant proteins produced in Escherichia coli
from randomly fragmented cDNAs. Biochem. Biophys. Res. Commun.
312, 825–830
40 Jacobs, S.A., Podell, E.R., Wuttke, D.S. and Cech, T.R. (2005) Soluble
domains of telomerase reverse transcriptase identified by
high-throughput screening. Protein Sci. 14, 2051–2058
41 Cabantous, S., Terwilliger, T.C. and Waldo, G.S. (2005) Protein tagging
and detection with engineered self-assembling fragments of green
fluorescent protein. Nat. Biotechnol. 23, 102–107
42 Cabantous, S. and Waldo, G.S. (2006) In vivo and in vitro protein
solubility assays using split GFP. Nat Methods 3, 845–854
43 Cabantous, S.p., Rogers, Y., Terwilliger, T.C. and Waldo, G.S. (2008) New
molecular reporters for rapid protein folding assays. PLoS ONE 3, e2387
44 Wigley, W.C., Stidham, R.D., Smith, N.M., Hunt, J.F. and Thomas, P.J.
(2001) Protein solubility and folding monitored in vivo by structural
complementation of a genetic marker protein. Nat. Biotechnol. 19,
131–136
45 Lesley, S.A., Graziano, J., Cho, C.Y., Knuth, M.W. and Klock, H.E. (2002)
Gene expression response to misfolded protein as a screen for soluble
recombinant protein. Protein Eng. 15, 153–160
46 Christ, D. and Winter, G. (2006) Identification of protein domains by
shotgun proteolysis. J. Mol. Biol. 358, 364–371
47 Maxwell, K.L., Mittermaier, A.K., Forman-Kay, J.D. and Davidson, A.R.
(1999) A simple in vivo assay for increased protein solubility. Protein
Sci. 8, 1908–1911
48 Scheich, C., Leitner, D., Sievert, V., Leidert, M., Schlegel, B., Simon, B.,
Letunic, I., Bussow, K. and Diehl, A. (2004) Fast identification of folded
human protein domains expressed in E. coli suitable for structural
analysis. BMC Struct. Biol. 4, 4
49 Woestenenk, E.A., Hammarstrom, M., Hard, T. and Berglund, H. (2003)
Screening methods to determine biophysical properties of proteins in
structural genomics. Anal. Biochem. 318, 71–79
50 Lim, H.K., Mansell, T.J., Linderman, S.W., Fisher, A.C., Dyson, M.R. and
DeLisa, M.P. (2009) Mining mammalian genomes for folding competent
proteins using Tat-dependent genetic selection in Escherichia coli.
Protein Sci. 18, 2537–2549
Received 5 May 2010
doi:10.1042/BST0380908
C The
C 2010 Biochemical Society
Authors Journal compilation 913