908 Biochemical Society Transactions (2010) Volume 38, part 4 Selection of soluble protein expression constructs: the experimental determination of protein domain boundaries Michael R. Dyson1 Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1QW, U.K. Abstract Proteins can contain multiple domains each of which is capable of possessing a separate independent function and three-dimensional structure. It is often useful to clone and express individual protein domains to study their biochemical properties and for structure determination. However, the annotated domain boundaries in databases such as Pfam or SMART are not always accurate. The present review summarizes various strategies for the experimental determination of protein domain boundaries. Introduction The expression and purification of recombinant proteins is a crucial first step in many biochemical studies to enhance our knowledge of gene function. Once a protein has been successfully expressed and purified, this opens the door to several research avenues, including interaction studies with other proteins or nucleic acids, testing for enzyme activity, the generation of antibodies and ultimately structure determination. However, successful soluble protein expression is sometimes difficult, depending on the target. For structure determination, the requirement for milligram quantities of protein often means that the crucial bottleneck is in identifying an expression construct that can express high levels of soluble protein. It is not unusual for a researcher to spend 2 years in this search with no guarantee of success at the end. For other applications such as interaction studies or antibody generation, although the amount of purified protein needed is less, there is still a requirement for soluble and correctly folded protein. Fortunately, our knowledge in the field of protein expression in heterologous hosts has rapidly accumulated in the last decade, so that protein expression is no longer quite the process of trial and error that it used to be. This has been helped by several studies and new methods that have come from laboratories engaged in highthroughput protein expression for structure determination [1] or antibody generation [2]. For example, there have been several studies linking various protein properties and features that correlate with the ability of a protein to be successfully expressed in Escherichia coli [3–5]. These features are not all listed in the present review, but the main conclusions were that the simpler the protein was in terms of molecular mass, having a low number of contiguous hydrophobic Key words: fusion protein, genetic screening, library, protein domain boundary, protein expression, sequence alignment. Abbreviations used: CAT, chloramphenicol acetyltransferase; (m)DHFR, (murine) dihydrofolate reductase; EGFR, epidermal growth factor receptor; ESPRIT, expression of soluble proteins by random incremental truncation; GFP, green fluorescent protein; HEK, human embryonic kidney; PECAM1, platelet endothelial cell adhesion molecule 1; Tat, twin-arginine translocation. 1 email [email protected] C The C 2010 Biochemical Society Authors Journal compilation amino acids and low complexity regions [6], the more likely soluble expression in E. coli would be. This finding indicated that, for a large complex protein containing many protein domains, it is better to truncate and express either a single domain or a small set of tandem domains. Protein domains can form a compact three-dimensional structure, are 20–500 amino acids in length and may be defined as part of a protein sequence able to evolve, function and exist independently of the rest of the protein chain. The cloning and expression of single protein domains requires the precise mapping of the domains and domain boundaries on the linear polypeptide sequence. The aim of the present review is to present various strategies to identify a region of a protein comprising one or more protein domains capable of soluble expression. Alternative strategies for full-length protein expression Although truncating proteins to achieve the desired expression construct is often an excellent route to success, one also should be aware of alternative strategies for the soluble expression of a full-length protein. A popular choice is fusion with a solubility enhancing tag such as MBP (maltose-binding protein) [7], but one should always cleave the tag following expression to check that the target protein remains soluble. Expression in E. coli can result in inclusion body formation, and it is possible to find conditions to refold misfolded proteins, but the conditions are usually specific to that particular protein [8]. Almost 90% of proteins that have been expressed, crystallized and their structures deposited in the Protein Data Bank (PDB) [9] were expressed in E. coli. However, various eukaryotic expression systems [10] are now being used more frequently for the expression of eukaryotic proteins, including the yeasts Saccharomyces cerevisiae [11] and Pichia pastoris [12], the wheat-germ cell-free system [13] and insect [14] or mammalian cells [15,16]. These have the advantage that they have compatible chaperones and binding partners that may aid protein folding. Many proteins Biochem. Soc. Trans. (2010) 38, 908–913; doi:10.1042/BST0380908 Experimental Approaches to Protein–Protein Interactions are naturally part of multi-component protein assemblies in vivo, and various strategies have been developed to co-express binding partners to stabilize proteins that are inherently unstable when expressed alone [17–19]. Figure 1 Domain map of human EGFR (1210 amino acids) Marked protein features and Pfam domains include: A, signal peptide (amino acids 1–24); B, receptor L domain (amino acids 57–168 and 361–481); C, furin-like cysteine-rich region (amino acids 184–338); D, transmembrane region (amino acids 646–668); E, low-complexity regions (amino acids 651–666, 1002–1015 and 1025–1046); and F, protein tyrosine kinase domain (amino acids 712–968). Experimental determination of protein domains by semi-empirical methods Protein domains are usually identified in databases such as Pfam [20] and SMART [21] by homology through sequence alignment. This works very well to identify core consensus motifs, but is less accurate at predicting the domain edges where there is greater sequence variability [4,22]. This is illustrated by comparing domain edges predicted by Pfam for proteins and protein domains where their structures have been determined. For example, human EGFR (epidermal growth factor receptor) is a type I integral membrane protein which is involved in cell growth and differentiation, and its abnormal activation is associated with human cancer. Two regions of this protein have been expressed, crystallized and their structures determined: a truncated ectodomain comprising amino acids 1–501, which was expressed in CHO (Chinese-hamster ovary) cells [23], and the intracellular kinase domain (amino acids 672–998), purified from baculovirus-infected insect cells [24]. The sizes of the successful protein expression constructs are larger than those predicted by Pfam (Figure 1). For example the ectodomain consists of the mapped Pfam domains receptor L domain (amino acids 57–168 and 361–481) and a furinlike cysteine-rich region (amino acids 177–338), whereas the expression construct that gave crystals of sufficient quality for structure determination was extended at the C-terminus by 20 amino acids compared with the last mapped Pfam domain. Similarly, the expressed kinase domain was 40 and 30 amino acids longer at the N- and C-terminus respectively. This and other comparisons suggest that there may be small ‘windows’ in polypeptide space surrounding mapped Pfam domains that define a successful expression construct. So it might be possible to screen a small number of variants where the co-ordinates are spaced at five amino acid intervals. This is illustrated in Figure 2 for expression of the first Ig domain of CD31 or PECAM1 (platelet endothelial cell adhesion molecule 1) using data from our own laboratory (S. Chapple, A. Crofts and M.R. Dyson, unpublished work). Here various constructs were designed by ‘primer walking’ at five amino acid intervals from the end of the first annotated Ig domain (amino acids 40–101) and expression screening in HEK (human embryonic kidney)-293E cells [25,26]. Expression was achieved with a construct spanning amino acids 1–116 and 1–121, but not constructs smaller than this. Interestingly, attempted expression of the first Ig domain of the c-kit oncogene failed using this strategy (Figure 2), but expression of the first two domains in tandem was successful, showing that protein domains can be stabilized by a neighbouring domain. The construct that expressed was 12 amino acids longer than the second Pfam Ig domain. Recently, a webtool has been described to aid in the process of annotating features on to a polypeptide sequence and designing primers to PCR-amplify several truncation variants to a region of interest [27]. It is also worthwhile to BLAST search the PDB to check whether a homologous protein has been crystallized, and this may point to where the likely structural domain boundaries are located. Library-based approaches Although the semi-empirical methods described above have proved to be successful, only a limited number of variants can be tested because of the costs and time involved in designing individual constructs. This can be a problem when it has been reported that single amino acid additions can have large effects on the levels of soluble protein expression [28]. Furthermore, designing constructs around an annotated domain is not possible for proteins that lack any annotated domains, as was the case for influenza virus polymerase [29]. Here, one can envisage making a large library of constructs by randomly fragmenting a gene, cloning into an expression vector and then screening for soluble protein expression. There are several ways to randomly fragment DNA before subcloning, including the physical method of shearing and the enzymatic methods of endo- or exo-nuclease treatment, and these have been reviewed in [30]. The methods can be split into two: either the creation of a random set of fragments by internal cleavage or exonuclease cleavage to generate a 5 or 3 nested set of fragments. Figure 3 illustrates a method involving PCR in the presence of dUTP and cleavage with endonuclease V [28]. Reporters of soluble protein expression Once a set of random fragments have been created and cloned into a suitable expression vector, one needs to screen this library for individual clones with the desired properties of minimum size and high-level soluble expression. This is usually performed by fusion to a peptide or protein tag. For example, Reich et al. [31] generated a random fragment library of p85a by performing a PCR where the standard dNTP mixture was doped with dUTP and then the product was treated with uracil-DNA glycosylase to excise the uracil bases, generating abasic sites. These are cleaved by endonuclease IV, giving a single-strand nick that is converted into a double-strand break and blunt-ended C The C 2010 Biochemical Society Authors Journal compilation 909 910 Biochemical Society Transactions (2010) Volume 38, part 4 Figure 2 Experimental domain mapping of receptor ectodomains Truncated receptor ectodomains were expressed as C-terminal rat Cd4 (domains 3 and 4) fusions in HEK-293E cells as described previously [25]. At 5 days after transfection, cells were harvested and supernatants were analysed for secreted protein expression by separation by SDS/PAGE, Western blotting and detection with anti-His mouse monoclonal antibody (Novagen) followed by goat anti-mouse Cy5 (indodicarbocyanine)-labelled antibody (GE Healthcare). aa, amino acids; TM, transmembrane domain. by S1 nuclease. The advantage of this method is that the size distribution is dependent solely on the proportion of dUTP included in the original PCR mixture. Transformants are screened for soluble expression by the principle of ‘tag availability’, where it is reasoned that soluble, correctly folded, monomeric protein will have a freely accessible Cterminal tag, whereas misfolded aggregated protein will not. Colonies on membranes that react with an anti-tag antibody are used to inoculate liquid cultures in multiwell plates, then expression is induced, followed by cell lysis. Soluble protein is separated from inclusion bodies by passing the lysate through a filter membrane [32] before affinity purification with final detection by a dot-blot Western procedure. All four known globular domains of p85a were identified using C The C 2010 Biochemical Society Authors Journal compilation this procedure. A variation of the ‘tag availability’ method has also been described by Tarendeau et al. [29], termed ESPRIT (expression of soluble proteins by random incremental truncation), where nested sets of deletion fragments are generated from either the 5 or 3 ends of a gene that is already cloned into an expression vector. Screening is enabled by fusion at the Cterminus to a peptide mimic of BCCP (biotin carboxyl carrier protein) that can be biotinylated in vivo as long as the peptide tag remains accessible. High-density arrays are printed and probed with both streptavidin and anti-histidine antibody to identify expression hits that are then validated by small-scale expressions. ESPRIT has produced some notable successes, including identifying expression constructs used to solve the structures of influenza polymerase PB2 subunit [29,33,34], Experimental Approaches to Protein–Protein Interactions Figure 3 Generation of a library of Fli1 fragments (A) Full-length Fli1 gene was PCR-amplified in the presence of different dTTP/dUTP ratios, digested with endonuclease V and analysed by agarose gel (1.6%) electrophoresis. Lane M, DNA standard (HyperLadder 1; BIOLINE marker); lane 1, 0.2 mM dTTP/0 mM dUTP; lane 2, 0.15 mM dTTP/0.05 mM dUTP; lane 3, 0.1 mM dTTP/0.1 mM dUTP; lane 4, 0.05 mM dTTP/0.15 mM dUTP; lane 5, 0 mM dTTP/0.2 mM dUTP. ORF, open reading frame. (B) Histogram showing the size distribution frequency (N) of 240 randomly picked Fli1 library entry plasmids. the Helicobacter pylori CagA [35], and the neurofibromatosis type 1 protein NF1 (neurofibromin 1) [36]. C-terminal protein fusions have also been used as reporters of soluble expression where the misfolding of the upstream target perturbs the folding and function of the C-terminal fusion. GFP (green fluorescent protein) acts as a very efficient reporter [37], enabling the picking of green colonies to enrich populations of soluble expression clones [38,39]. This was used to identify soluble domains of telomerase reverse transcriptase [40]. Protein or peptide reporter fusions can influence the folding of its upstream partner and so a split GFP system was developed by Cabantous et al. [41,42], where a 15-amino-acid GFP fragment (β-strand 11) is fused to a test protein and the GFP β-strand 1–10 detector fragment is expressed separately. Green colonies only arise when the peptide tag is accessible and can conjugate to the GFP. In a further development of the method, a split GFP system (GFP insertion) has been employed to avoid the selection of false-positive artefacts from internal cryptic ribosomebinding sites in the target RNA [38]. In addition four GFP reporters have been selected that sense different degrees of protein misfolding. This enables the selection of variants with improved folding using a reporter with the appropriate stringency based on the brightness of the GFP insertion fusions [43]. Wigley et al. [44] has used complementation between the α- and ω-fragments of β-galactosidase as the basis of solubility assay validated with Alzheimer’s disease Aβ (amyloid β-peptide) and a non-amyloidogenic mutant [44]. An alternative reporter system was developed by Lesley et al. [45] which involved inserting reporters downstream of the promoters of proteins normally up-regulated during the expression of misfolded proteins such as chaperones. Christ et al. [46] devised a method to identify protein domains termed ‘shotgun proteolysis’. Here a random cDNA library is fragmented, cloned into a filamentous bacteriophage display vector between an N-terminal tag and the phage pIII gene, phages are subjected to limited proteolysis and only proteaseresistant phages are able to be captured via an affinity tag, eluted and amplified. The method requires that the target domain is able to be secreted to the E. coli periplasm for phage display and may miss constructs that contain a folded domain, but are flanked by flexible linker polypeptide sequences. Genetic screening for soluble protein expression Potentially greater library sizes can be screened by using ‘life/death’ genetic reporters. CAT (chloramphenicol acetyltransferase) [29] has been used as a C-terminal fusion to monitor protein solubility by chloramphenicol resistance [47]. However, CAT is trimeric, which may result in the formation of higher-order aggregates for multimeric proteins. C The C 2010 Biochemical Society Authors Journal compilation 911 912 Biochemical Society Transactions (2010) Volume 38, part 4 Alternatively, mDHFR (murine dihydrofolate reductase) has the advantage that it is monomeric and has been shown not to greatly perturb the folding of upstream protein fusion partners [4]. Selection is enabled by plating expression libraries on minimal agar plates in the presence of TMP (trimethoprim), which effectively inhibits E. coli DHFR (dihydrofolate reductase) but not mDHFR. DHFR selection enabled the identification of the ETS domain of the transcription factor Fli1 and soluble fragments of the cell-surface receptor PECAM1, including the natively unfolded cytoplasmic domain, which was used to select for antibodies using phage display [28]. This work also highlighted that all hits from random screens need to be characterized further by biophysical methods such as NMR [48,49]. Lim et al. [50] has described a promising selection system based on the Tat (twin-arginine translocation) pathway, which requires the protein to be folded before periplasmic secretion. When a set of mammalian polypeptides was fused between an N-terminal Tat export signal and a C-terminal selectable marker (β-lactamase), there was a very good correlation between their solubility profile and the ability to confer growth in the presence on ampicillin. Concluding remarks There is a large difference between the experiments possible in a laboratory geared up for high-throughput expression with specialized robotics, as found in structural-genomics laboratories or industry, compared with more standard research laboratories. Unless a researcher can collaborate with a group for access to robotics, he or she needs to decide on the easiest route to the desired outcome in an expression project. In many cases, if the target protein domain is annotated, simply designing a set of six to twelve constructs will usually yield an expression hit. However, for high-value intractable targets, the powerful combinatorial biology techniques reviewed in the present paper can rescue a failing project. Some of these methods do not require access to specialized equipment. Not only can these techniques yield the desired expression construct, but also they provide valuable, experimentally derived, information regarding the true domain boundaries of proteins. Acknowledgements I thank Rajika L. Perera and Maikel Fransen (Department of Biochemistry, University of Cambridge) for critical reading of this manuscript. Funding This work was supported by the Wellcome Trust. References 1 Terwilliger, T.C., Stuart, D. and Yokoyama, S. (2009) Lessons from structural genomics. Annu. Rev. Biophys. 38, 371–383 C The C 2010 Biochemical Society Authors Journal compilation 2 Schofield, D., Pope, A., Clementel, V., Buckell, J., Chapple, S., Clarke, K., Conquer, J., Crofts, A., Crowther, S., Dyson, M. et al. (2007) Application of phage display to high throughput antibody generation and characterization. Genome Biol. 8, R254 3 Braun, P., Hu, Y., Shen, B., Halleck, A., Koundinya, M., Harlow, E. and LaBaer, J. (2002) Proteome-scale purification of human proteins from bacteria. Proc. Natl. Acad. Sci. U.S.A. 99, 2654–2659 4 Dyson, M.R., Shadbolt, S.P., Vincent, K., Perera, R. and McCafferty, J. (2004) Production of soluble mammalian proteins in Escherichia coli: identification of protein features that correlate with successful expression. BMC Biotechnol. 4, 32 5 Goh, C.-S., Lan, N., Douglas, S.M., Wu, B., Echols, N., Smith, A., Milburn, D., Montelione, G.T., Zhao, H. and Gerstein, M. (2004) Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis. J. Mol. Biol. 336, 115–130 6 Esnouf, R.M., Hamer, R., Sussman, J.L., Silman, I., Trudgian, D., Yang, Z.R. and Prilusky, J. (2006) Honing the in silico toolkit for detecting protein disorder. Acta Crystallogr. Sect. D Biol. Crystallogr. 62, 1260–1266 7 Waugh, D.S. (2005) Making the most of affinity tags. Trends Biotechnol. 23, 316–320 8 Chow, M.K., Amin, A.A., Fulton, K.F., Fernando, T., Kamau, L., Batty, C., Louca, M., Ho, S., Whisstock, J.C., Bottomley, S.P. and Buckle, A.M. (2006) The REFOLD database: a tool for the optimization of protein expression and refolding. Nucleic Acids Res. 34, D207–D212 9 Berman, H. (2008) The Protein Data Bank: a historical perspective. Acta Crystallogr. Sect. A Found. Crystallogr. 64, 88–95 10 Aricescu, A.R., Assenberg, R., Bill, R.M., Busso, D., Chang, V.T., Davis, S.J., Dubrovsky, A., Gustafsson, L., Hedfalk, K., Heinemann, U. et al. (2006) Eukaryotic expression: developments for structural proteomics. Acta Crystallogr. Sect. D Biol. Crystallogr. 62, 1114–1124 11 Lang, C. (2007) Saccharomyces cerevisiae: a microbial eukaryotic expression system. in Methods Express: Expression Systems (Dyson, M.R. and Durocher, Y., eds), pp. 109–121, Scion Publishing Ltd, Bloxham 12 Lin-Cereghino, G.P., Leung, W. and Lin-Cereghino, J. (2007) Expression of proteins in Pichia pastoris. in Methods Express: Expression Systems (Dyson, M.R. and Durocher, Y., eds), pp. 123–144, Scion Publishing Ltd, Bloxham 13 Goshima, N., Kawamura, Y., Fukumoto, A., Miura, A., Honma, R., Satoh, R., Wakamatsu, A., Yamamoto, J., Kimura, K., Nishikawa, T. et al. (2008) Human protein factory for converting the transcriptome into an in vitro-expressed proteome. Nat. Methods 5, 1011–1017 14 Hunt, I. (2005) From gene to protein: a review of new and enabling technologies for multi-parallel protein expression. Protein Expression Purif. 40, 1–22 15 Aricescu, A.R., Lu, W. and Jones, E.Y. (2006) A time- and cost-efficient system for high-level protein production in mammalian cells. Acta Crystallogr. Sect. D Biol. Crystallogr. 62, 1243–1250 16 Nettleship, J.E., Assenberg, R., Diprose, J.M., Rahman-Huq, N. and Owens, R.J. (2010) Recent advances in the production of proteins in insect and mammalian cells for structural biology. J. Struct. Biol., doi:10.1016/ j.jsb.2010.02.006 17 Romier, C., Ben Jelloul, M., Albeck, S., Buchwald, G., Busso, D., Celie, P.H.N., Christodoulou, E., De Marco, V., van Gerwen, S., Knipscheer, P. et al. (2006) Co-expression of protein complexes in prokaryotic and eukaryotic hosts: experimental procedures, database tracking and case studies. Acta Crystallogr. Sect. D Biol. Crystallogr. 62, 1232–1242 18 Szymczak, A.L., Workman, C.J., Wang, Y., Vignali, K.M., Dilioglou, S., Vanin, E.F. and Vignali, D.A.A. (2004) Correction of multi-gene deficiency in vivo using a single ‘self-cleaving’ 2A peptide-based retroviral vector. Nat. Biotechnol. 22, 589–594 19 Trowitzsch, S., Bieniossek, C., Nie, Y., Garzoni, F. and Berger, I. (2010) New baculovirus expression tools for recombinant protein complex production. J. Struct. Biol., doi:10.1016/j.jsb.2010.02.010 20 Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunasekaran, P., Ceric, G., Forslund, K. et al. (2009) The Pfam protein families database. Nucleic Acids Res. 38, D211–D222 21 Letunic, I., Doerks, T. and Bork, P. (2009) SMART 6: recent updates and new developments. Nucleic Acids Res. 37, D229–D232 22 Dyson, M.R. (2007) Expression strategy. in Methods Express: Expression Systems (Dyson, M.R. and Durocher, Y., eds), pp. 1–10, Scion Publishing Ltd, Bloxham 23 Garrett, T.P.J., McKern, N.M., Lou, M., Elleman, T.C., Adams, T.E., Lovrecz, G.O., Zhu, H.-J., Walker, F., Frenkel, M.J., Hoyne, P.A. et al. (2002) Crystal structure of a truncated epidermal growth factor receptor extracellular domain bound to transforming growth factor α. Cell 110, 763–773 Experimental Approaches to Protein–Protein Interactions 24 Stamos, J., Sliwkowski, M.X. and Eigenbrot, C. (2002) Structure of the epidermal growth factor receptor kinase domain alone and in complex with a 4-anilinoquinazoline inhibitor. J. Biol. Chem. 277, 46265–46272 25 Chapple, S., Crofts, A., Shadbolt, S.P., McCafferty, J. and Dyson, M.R. (2006) Multiplexed expression and screening for recombinant protein production in mammalian cells. BMC Biotechnol. 6, 49 26 Durocher, Y., Perret, S. and Kamen, A. (2002) High-level and high-throughput recombinant protein production by transient transfection of suspension-growing human 293-EBNA1 cells. Nucleic Acids Res. 30, E9 27 Mooij, W.T., Mitsiki, E. and Perrakis, A. (2009) ProteinCCD: enabling the design of protein truncation constructs for expression and crystallization experiments. Nucleic Acids Res. 37, W402–W405 28 Dyson, M.R., Perera, R.L., Shadbolt, S.P., Biderman, L., Bromek, K., Murzina, N.V. and McCafferty, J. (2008) Identification of soluble protein fragments by gene fragmentation and genetic selection. Nucleic Acids Res. 36, E51 29 Tarendeau, F., Boudet, J., Guilligay, D., Mas, P.J., Bougault, C.M., Boulo, S., Baudin, F., Ruigrok, R.W.H., Daigle, N., Ellenberg, J. et al. (2007) Structure and nuclear import function of the C-terminal domain of influenza virus polymerase PB2 subunit. Nat. Struct. Mol. Biol. 14, 229–233 30 Prodromou, C., Savva, R. and Driscoll, P.C. (2007) DNA fragmentationbased combinatorial approaches to soluble protein expression. Part I. Generating DNA fragment libraries. Drug Discov. Today 12, 931–938 31 Reich, S., Puckey, L.H., Cheetham, C.L., Harris, R., Ali, A.A.E., Bhattacharyya, U., Maclagan, K., Powell, K.A., Prodromou, C., Pearl, L.H. et al. (2006) Combinatorial domain hunting: an effective approach for the identification of soluble protein domains adaptable to high-throughput applications. Protein Sci. 15, 2356–2365 32 Knaust, R.K.C. and Nordlund, P. (2001) Screening for soluble expression of recombinant proteins in a 96-well format. Anal. Biochem. 297, 79–85 33 Guilligay, D., Tarendeau, F., Resa-Infante, P., Coloma, R., Crepin, T., Sehr, P., Lewis, J., Ruigrok, R.W.H., Ortin, J., Hart, D.J. and Cusack, S. (2008) The structural basis for cap binding by influenza virus polymerase subunit PB2. Nat. Struct. Mol. Biol. 15, 500–506 34 Tarendeau, F., Crepin, T., Guilligay, D., Ruigrok, R.W. H., Cusack, S. and Hart, D.J. (2008) Host determinant residue lysine 627 lies on the surface of a discrete, folded domain of influenza virus polymerase PB2 subunit. PLoS Pathog. 4, e1000136 35 Angelini, A., Tosi, T., Mas, P., Acajjaoui, S., Zanotti, G., Terradot, L. and Hart, D.J. (2009) Expression of Helicobacter pylori CagA domains by library-based construct screening. FEBS J. 276, 816–824 36 Bonneau, F., Lenherr, E.D., Pena, V., Hart, D.J. and Scheffzek, K. (2009) Solubility survey of fragments of the neurofibromatosis type 1 protein neurofibromin. Protein Expression Purif. 65, 30–37 37 Waldo, G.S., Standish, B.M., Berendzen, J. and Terwilliger, T.C. (1999) Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17, 691–695 38 Kawasaki, M. and Inagaki, F. (2001) Random PCR-based screening for soluble domains using green fluorescent protein. Biochem. Biophys. Res. Commun. 280, 842–844 39 Nakayama, M. and Ohara, O. (2003) A system using convertible vectors for screening soluble recombinant proteins produced in Escherichia coli from randomly fragmented cDNAs. Biochem. Biophys. Res. Commun. 312, 825–830 40 Jacobs, S.A., Podell, E.R., Wuttke, D.S. and Cech, T.R. (2005) Soluble domains of telomerase reverse transcriptase identified by high-throughput screening. Protein Sci. 14, 2051–2058 41 Cabantous, S., Terwilliger, T.C. and Waldo, G.S. (2005) Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nat. Biotechnol. 23, 102–107 42 Cabantous, S. and Waldo, G.S. (2006) In vivo and in vitro protein solubility assays using split GFP. Nat Methods 3, 845–854 43 Cabantous, S.p., Rogers, Y., Terwilliger, T.C. and Waldo, G.S. (2008) New molecular reporters for rapid protein folding assays. PLoS ONE 3, e2387 44 Wigley, W.C., Stidham, R.D., Smith, N.M., Hunt, J.F. and Thomas, P.J. (2001) Protein solubility and folding monitored in vivo by structural complementation of a genetic marker protein. Nat. Biotechnol. 19, 131–136 45 Lesley, S.A., Graziano, J., Cho, C.Y., Knuth, M.W. and Klock, H.E. (2002) Gene expression response to misfolded protein as a screen for soluble recombinant protein. Protein Eng. 15, 153–160 46 Christ, D. and Winter, G. (2006) Identification of protein domains by shotgun proteolysis. J. Mol. Biol. 358, 364–371 47 Maxwell, K.L., Mittermaier, A.K., Forman-Kay, J.D. and Davidson, A.R. (1999) A simple in vivo assay for increased protein solubility. Protein Sci. 8, 1908–1911 48 Scheich, C., Leitner, D., Sievert, V., Leidert, M., Schlegel, B., Simon, B., Letunic, I., Bussow, K. and Diehl, A. (2004) Fast identification of folded human protein domains expressed in E. coli suitable for structural analysis. BMC Struct. Biol. 4, 4 49 Woestenenk, E.A., Hammarstrom, M., Hard, T. and Berglund, H. (2003) Screening methods to determine biophysical properties of proteins in structural genomics. Anal. Biochem. 318, 71–79 50 Lim, H.K., Mansell, T.J., Linderman, S.W., Fisher, A.C., Dyson, M.R. and DeLisa, M.P. (2009) Mining mammalian genomes for folding competent proteins using Tat-dependent genetic selection in Escherichia coli. Protein Sci. 18, 2537–2549 Received 5 May 2010 doi:10.1042/BST0380908 C The C 2010 Biochemical Society Authors Journal compilation 913
© Copyright 2026 Paperzz