proteins STRUCTURE O FUNCTION O BIOINFORMATICS Crystal structure and substrate-binding mode of cellulase 12A from Thermotoga maritima Ya-Shan Cheng,1 Tzu-Ping Ko,2 Tzu-Hui Wu,3 Yanhe Ma,4 Chun-Hsiang Huang,5 Hui-Lin Lai,3 Andrew H.-J. Wang,2,5 Je-Ruei Liu,1,6* and Rey-Ting Guo4* 1 Institute of Biotechnology, National Taiwan University, Taipei 106, Taiwan 2 Institute of Biological Chemistry, Academia Sinica, Taipei 115, Taiwan 3 Genozyme Biotechnology Inc., Taipei 106, Taiwan 4 Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China 5 Genomics Research Center, Academia Sinica, Taipei 115, Taiwan 6 Department of Animal Science and Technology, National Taiwan University, Taipei 106, Taiwan ABSTRACT Cellulases have been used in many applications to treat various carbohydrate-containing materials. Thermotoga maritima cellulase 12A (TmCel12A) belongs to the GH12 family of glycoside hydrolases. It is a b-1,4-endoglucanase that degrades cellulose molecules into smaller fragments, facilitating further utilization of the carbohydrate. Because of its hyperthermophilic nature, the enzyme is especially suitable for industrial applications. Here the crystal structure of TmCel12A was determined by using an active-site mutant E134C and its mercury-containing derivatives. It adopts a b-jellyroll protein fold typical of the GH12-family enzymes, with two curved b-sheets A and B and a central active-site cleft. Structural comparison with other GH12 enzymes shows significant differences, as found in two longer and highly twisted b-strands B8 and B9 and several loops. A unique Loop A3-B3 that contains Arg60 and Tyr61 stabilizes the substrate by hydrogen bonding and stacking, as observed in the complex crystals with cellotetraose and cellobiose. The high-resolution structures allow clear elucidation of the network of interactions between the enzyme and its substrate. The sugar residues bound to the enzyme appear to be more ordered in the 22 and 21 subsites than in the 11, 12 and 23 subsites. In the E134C crystals the bound 21 sugar at the cleavage site consistently show the a-anomeric configuration, implicating an intermediate-like structure. Proteins 2011; 79:1193–1204. C 2010 Wiley-Liss, Inc. V Key words: hyperthermophile; endoglucanase; catalytic intermediate; active site mutant; mercury derivatives; synchrotron radiations; biofuel industry. C 2010 WILEY-LISS, INC. V INTRODUCTION Glucose produced by photosynthesis is a major energy source for life. The a-1,4-linked glucose stored in starch-rich food such as rice, wheat, corn, or potato can be readily released by digestive enzymes such as the amylases. The b-1,4-linked glucose found in cellulose, which plays structural role in the plant cell wall, requires microbial enzymes to make it available. Termites feasting on wood and cattle browsing in a prairie are hosts of these microbes, which produce enzymes including xylanases and cellulases that first degrade the polysaccharide molecules into fragments, and eventually into disaccharides and monosaccharides. Recent demand on renewable energy resource has been increasing to a level that exploitation of plant waste becomes a potential means to obtain biofuel and other products. Heterologous cellulase genes have been overexpressed in well-characterized microorganisms such as Escherichia coli to promote their efficiency in biofuel conversion.1 In addition, cellulases are also used in fruit juice processing and textile production. Thermotoga maritima cellulase 12A (TmCel12A; 257 amino-acid residues) belongs to the GH12 family of glycoside hydrolases,2–4 which is closely related to the GH11 sister family of xylanase. These two families constitute Clan C of glycoside hydrolases according to the classification by CAZy (www.cazy.org). Both are retaining enzymes for hydrolysis of the glucans. The crystal structure of a GH12-family cellulase, the endoglucanase CelB2 from the bacteria Streptomyces lividans, was first determined by Sulzenbacher et al.5 It revealed a similar protein fold as observed in the GH11-family xylanases, comprising two large antiparallel b-sheets with jellyroll topology that are packed against each other in a sandwich-like manner. The b-sheets are curved and interconnected Additional Supporting Information may be found in the online version of this article. Ya-Shan Cheng and Tzu-Ping Ko contributed equally to this work. Grant sponsor: National Science Council; Grant number: NSC 98-2313-B002-033-MY3 *Correspondence to: Je-Ruei Liu, Institute of Biotechnology, National Taiwan University, Taipei 106, Taiwan. E-mail: [email protected] and Rey-Ting Guo, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China. E-mail: [email protected] Received 23 September 2010; Revised 10 November 2010; Accepted 17 November 2010 Published online 30 November 2010 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/prot.22953 PROTEINS 1193 Y.-S. Cheng et al. by associated loops. An extended substrate-binding cleft is formed across the molecular surface that suggests binding to at least five glucose units in regions denoted the 23, 22, 21, 11, and 12 subsites. In this cleft the two catalytic residues Glu120 and Glu203, which correspond to Glu134 and Glu231 in TmCel12A, are facing each other and separated by about 7 Å. Sulzenbacher et al. also directly observed the covalent attachment of the nucleophile Glu120 to a substrate analogue,6 attesting to the double displacement mechanism of catalysis. In addition to SlCelB2, GH12-family enzymes from seven other organisms, Bacillus licheniformis, Rhodothermus marinus, Streptomyces sp 11AG8, Aspergillus niger, Humicola grisea, Hypocrea jecorina (or Trichoderma reesei), and Hypocrea schweinitzii, are known for their three-dimensional structures.7–14 All have the similar protein fold of b-sandwich, but with variations especially in the connecting loops, which may affect the affinity of the enzymes to the substrate and result in different cleavage efficiency despite the common catalytic mechanism. Since industrial application is usually carried out at elevated temperatures, thermal stability has also been studied by extensive structural comparison and mutagenesis.11,12,15 Here we report the crystal structures of TmCel12A and its complex with cellobiose and cellotetraose, and present a detailed analysis for the mode of substrate binding. MATERIALS AND METHODS Protein expression and purification The TmCel12A gene was obtained from Thermotoga maritima MSB8 genomic DNA library (ATCC 43589) and cloned into the vector pET16b by using XbaI and NdeI. A His6-tag was added before the N-terminus for purification purposes. The primers used here were 50 -GCTCTAGAAATAATTTTG TTTAACTTTAAGAAGGAGATATACCATGGGCCACCACCA CCACCACCACATGGTACTGATGACAAAA-30 (forward) and 50 -GGAATTCCATATGTTATCATTCTCTCACCTCCA GATCAAT-30 (reverse). The E134C mutant was prepared by using QuickChange sited-directed mutagenesis kit (Agilent) with TmCel12A/pET16b as the template and a forward primer of 50 -AAGCATCGATCGGCGATGTTTGCAT CATGGTCTGGTTCTATTT-30 . The constructs were transformed into E. coli BL21 (DE3) where the protein expression was induced by adding IPTG. The proteins were then purified by FPLC using Ni-NTA column and DEAE column. The buffer and gradient were 25 mM Tris, pH 7.5, 150 mM NaCl, and 20–250 mM imidazole for the Ni-NTA column and those for the DEAE column were 25 mM Tris, pH 7.5, and 0–250 mM NaCl. The proteins were eluted at about 75 mM imidazole and 125 mM NaCl, respectively. The purified proteins were finally concentrated to 5 mg mL21 in 25 mM Tris, 150 mM NaCl, pH 7.5. 1194 PROTEINS Crystallization and data collection The wild-type TmCel12A protein was first crystallized by using the Index screen kit (Hampton Research) and sitting-drop vapor diffusion method. The reservoir solution (No.66) contained 0.2M ammonium sulfate, 0.1M Bis-Tris, pH 5.5 and 25% w/v PEG3350. Better (orthorhombic) crystals were obtained by optimizing the reservoir composition as 0.1M Bis-Tris, pH 5.5, 10% glycerol, and 15% PEG3350. The reservoir for crystallizing the E134C mutant was slightly different; it contained 0.1M ammonium sulfate, 0.1M Bis-Tris, pH 5.5, 5% glycerol, and 18% PEG3350. Adding 10 mM cellobiose to the protein solution resulted in a different (monoclinic) crystal form. All crystals were obtained at room temperatures. They reached suitable sizes for X-ray diffraction in 2 days. Before flash freezing to cryogenic temperatures, the wildtype crystals were soaked for about 1 h in a cryoprotectant solution that contained 0.12M Bis-Tris, pH 5.5, 15% glycerol, and 20% PEG3350. The wild-type TmCel12A-cellotetraose complex crystals were obtained by including 10 mM cellotetraose in the soaking solution. The cryoprotectant for the E134C crystals (both forms) contained 0.15M ammonium sulfate, 0.15M Bis-Tris, pH 5.5, 15% glycerol, and 25% PEG3350. To prepare heavy-atom derivatives, the 15 mercurycontaining reagents in Heavy Atom Screen Hg (Hampton Research) was each diluted 10-fold in the cryoprotectant solution (final concentration 2 mM) and used in soaking the orthorhombic E134C crystals for 1 h. Soaking in the cryoprotectant containing 10 mM cellobiose or cellotetraose resulted in the orthorhombic E134C-substrate complex crystals. All X-ray diffraction experiments were carried out at the National Synchrotron Radiation Research Center (NSRRC) in Hsinchu, Taiwan. The wavelength used in collecting the native E134C and the heavy-atom derivative data was either 1.0011 Å (BL 13B1) or 0.9762 Å (BL 13C1). The other dataset were collected at a wavelength of 1.0008 Å (BL 13B1). The diffraction images were processed by using HKL2000.16 The mercury datasets were scaled with anomalous signals conserved for phasing purposes. One dataset of ‘‘native’’ E134C crystal and 12 datasets of the derivatives were collected (see Table SI in the Supporting Information), in addition to five datasets of the wild-type enzyme and substrate-complexes used in refinement and analysis. Structure determination and refinement Although the MIR (multiple isomorphous replacement) datasets were not collected for maximizing the absorption by mercury, the higher energy of the two fixed wavelengths than the theoretical value of 1.0101 Å allowed sufficient anomalous signals to be observed, which turned out to be effective in phase angle calculations. Using SOLVE and RESOLVE,17–19 combinations of any 4 of the 12 datasets with the native dataset of E134C crystal resulted in FOM (figure of merit) values from 0.60 to 0.78, Z-scores from 8.5 to 12.0, and up to 460 auto-built TmCel12A-Substrate Complex Structure Table I Data Collection and Refinement Statistics for the TmCel12A Crystals G4 5 cellotetraose G2 5 cellobiose Data collection Space group Unit-cell parameters a () b () c () Resolution () Unique reflections Redundancy Completeness (%) Average I/r(I) Rmerge (%) Refinement No. of protein chains No. of reflections Rwork (95% of data) Rfree (5% of data) R.m.s.d. bonds () R.m.s.d. angles (8) Dihedral angles (%) Most favored Allowed Disallowed No. of non-H atoms Protein Water Carbohydrate Average B (2) Protein Water Carbohydrate PDB ID code Wild-type Wild-type E134C E134C E134C Native G4-soak G2-soak G4-soak G2-cocrystal P212121 P212121 P212121 P212121 C2 41.7 74.0 181.0 42.3 73.9 181.2 42.0 74.1 180.3 42.1 73.9 179.7 25–2.09 (2.16–2.09) 33,848 (3312) 5.4 (5.0) 98.7 (99.6) 20.2 (6.8) 10.3 (29.3) 25–1.98 (2.05–1.98) 37,964 (3308) 5.9 (5.7) 93.3 (82.9) 25.0 (8.1) 7.4 (28.8) 25–1.47 (1.52–1.47) 94,215 (9191) 8.2 (8.1) 97.7 (96.6) 40.8 (7.8) 5.0 (28.3) 25–1.78 (1.84–1.78) 54,430 (5323) 5.7 (5.3) 99.5 (98.7) 29.2 (4.2) 5.6 (37.0) 230.9 46.9 116.4 b 5 114.28 25–1.80 (1.86–1.80) 104,758 (9837) 6.8 (5.9) 98.9 (94.1) 43.2 (3.8) 4.3 (35.0) 2 32,726 (3076) 0.177 (0.200) 0.220 (0.246) 0.020 1.9 2 36,868 (3062) 0.146 (0.151) 0.189 (0.197) 0.020 2.0 2 92,639 (8504) 0.172 (0.198) 0.197 (0.232) 0.020 1.9 2 52,750 (4756) 0.169 (0.205) 0.202 (0.225) 0.020 2.0 4 100,096 (8268) 0.190 (0.314) 0.230 (0.347) 0.020 2.0 90.6 9.0 0.4 90.4 9.2 0.4 90.7 8.8 0.5 89.1 10.4 0.5 89.9 9.7 0.4 4246 296 4235 543 90 4179 764 92 4171 536 114 8390 1023 128 30.1 48.2 14.7 34.0 28.1 3AMM 15.8 37.8 20.6 3AMN 21.4 39.9 30.2 3AMP 33.4 49.3 34.5 3AMQ 3AMH All positive reflections (without sigma-cutoff) were used in the refinement. Values in parentheses are for the outermost resolution shells. amino-acid residues. The best results were obtained using the derivatives of mersalyl acid, thimerosal, phenyl-mercury acetate, and tetrakis(acetoxymercuri)methane. Because there are two TmCel12A molecules in an asymmetric unit of the orthorhombic crystal, a model of continuous polypeptide chain was readily constructed by superimposing one molecule on the other, using the program O.20 Subsequent refinement was carried out by using CNS,21 which was also employed in solving the structure of the monoclinic E134C-cellobiose crystal by molecular replacement. When four copies of the A-chain model from the orthorhombic crystals were correctly placed in the monoclinic unit cell, the initial R-value was 0.316 for the 1.8 Å-resolution data. At the beginning of refinement all substrate molecules in the four complex crystals were visible in the difference Fourier maps. Water molecules were included according to strong electron densities and reasonable interactions with the protein model. All atoms were refined using isotropic temperature factors. O was used in model adjustment and analysis of the protein structure and its interaction with the substrate. The figures were produced by using PyMOL.22 RESULTS Structural features of TmCel12A Because molecular replacement approaches using other known cellulase structures from the Protein Data Bank (PDB; www.pdb.org) did not yield a correct solution, the active-site residue Glu134 was mutated into a cysteine for efficient binding to mercury compounds. The mutant protein E134C retained less than 0.2% activity (data not shown), but crystallized in the same unit cell as did the wild-type TmCel12A. Twelve mercury-based heavy-atom derivatives of the mutant crystal were obtained and used in solving the structure by MIR methods. Five structures are presented here1: wild-type without bound substrate,2 wild-type soaked with cellotetraose,3 E134C soaked with cellobiose,4 E134C soaked with cellotetraose, and5 E134C cocrystallized with cellobiose in a different unit cell. Data collection and refinement statistics are listed in Table I. Each crystal structure contains two protein monomers as its asymmetric unit, except for the E134C-cellobiose cocrystal, which contains four monomers. All 12 polypepPROTEINS 1195 Y.-S. Cheng et al. tide chains seen in these crystals are continuous from Nto C-terminus. Two of them include a four-residue extension from the N-terminus due to the engineered His-tag. (The models are summarized in Supporting Information Table SII). The root-mean-square deviations (RMSD) of Ca-positions between different monomers range from 0.2 Å to 0.4 Å (Supporting Information Table SIII), which are comparable to but larger than those between the crystallographically equivalent monomers (all less than 0.2 Å). Whether native or mutant, substrate-bound or free, the enzyme tends not to undergo significant conformational changes except for side-chain rotations (Supporting Information Fig. S1). The largest change was a flipping-over of the Lys216-Asp217 peptide bond by 1808. On the other hand, in all structures the dihedral angles of Tyr61 (u 5 68.88 3.48 and w 5 263.98 3.78, expressed as mean standard deviation) fell into the disallowed region for nonglycine residues. This special conformation of g-turn23 is unambiguously defined, as can be judged from the clear electron densities and backbone interactions with its neighboring residues (Supporting Information Fig. S2). As shown in Figure 1, the overall jelly-roll fold of TmCel12A is similar to those of other GH11- and GH12family enzymes. The two curved b-sheets A and B are packed against each other and linked by interconnecting loops. The outer Sheet A of six strands is bent by about 708. The inner Sheet B of nine strands is bent by nearly 1308 and it is also highly twisted. The two catalytic residues Glu134 and Glu231 are embedded, respectively in the juxtaposed bstrands B5 and B4. The active-site cleft is formed by this inner Sheet B and its associated loops, whereas the outer Sheet A mainly serves a structural role. Two cross-over loops that connect the b-strand B3 to A5 (residues 70–93) and A6 to B4 (197–225) are significantly longer than the loop (or ‘‘cord’’) between B6 and B9 (142–149). The Loop A6-B4 also encompasses a three-turn a-helix (198–209; Fig. 1). Interestingly, the b-ribbon formed by the outermost strands B8 and B9 are severely twisted by 1808 and the C-terminal region of B9 forms anti-parallel b-sheet interactions with A6, thus extending the size of the six-stranded outer Sheet A by two strands. Such extension may contribute to the protein’s stability at high temperatures. Comparison with other GH12 enzymes Based on the source organisms, three major categories of GH12-family cellulases are found in nature: archaeal, bacterial, and fungal (CAZy; www.cazy.org). When the eight structures of 2NLR (Streptomyces lividans),6 1H8V (Hypocrea jecorina),10 1KS5 (Aspergillus niger),14 1OA3 and 1OA4 (Hypocrea schweinitzii and Streptomyces sp.11AG8),12 1UU6 (Humicola grisea),13 2BWA (Rhodothermus marinus),9 and 2JEN (Bacillus licheniformis)7 from the PDB (www.pdb.org) are superimposed, it is clear that the bacterial enzymes are more diverse (Supporting Information Table SIV and Fig. S3). The struc- 1196 PROTEINS ture of 1OA4 is virtually identical to that of 2NLR (RMSD 5 0.49 Å) because both enzymes are from the same bacterial genus Streptomyces. These are significantly different from the other two bacterial enzymes of 2BWA and 2JEN (RMSD 5 1.36–1.51 Å) and the four fungal enzymes (RMSD 5 1.40–1.74 Å). Likewise, the structures of 1OA3 and 1H8V, both from Hypocrea, are nearly identical (RMSD 5 0.34 Å), but they are also relatively less different from those of 1KS5 and 1UU6 (RMSD 5 0.83– 1.12 Å). Superposition of the structure of TmCel12A with the above eight structures showed RMSDs of 1.53–1.75Å, placing it among the bacterial GH12-family enzymes. The major differences are between the connecting loops, whereas the b-strands constituting the jelly-roll scaffold are largely conserved, with RMSDs of about 1.0 Å for 130–160 Ca-positions (Supporting Information Table SIV). The most prominent structural differences between these enzymes occur in the Loops B2-A2, A3-B3, B4-A4, B3-A5, B5-B6, B8-B7, B7-A6, and B8-B9 (Supporting Information Fig. S4). The first three loops overhang the central cleft of the protein molecule and form a major part of the substrate-binding site. They are more variable in the bacterial enzymes than in the fungal enzymes, as are the overall structures (Supporting Information Fig. S3). The Streptomyces and Rhodothermus enzymes (2NLR/1OA4 and 2BWA) appear comparatively more similar to each other than to the Bacillus enzyme (2JEN), but the Thermotoga enzyme TmCel12A is the most different. As shown in Figure 2(A), the Loop B2-A2 is longer and is shifted toward the neighboring Loop A1-B1; the Loop A3-B3 is much longer and displaces part of Loop B2-A2 of the other structures; and the Loop B4-A4 is also longer and displaces part of Loop A3-B3. The crossover Loop B3-A5 and the adjacent Loop B5-B6 also form part of the distal substrate-binding subsites 23 and 24, and supposedly account for specificities regarding different substrate lengths at the nonreducing end. Slight movements in the Loop B8-B7 are dependent on the interactions, if present, with bound substrates. This loop is longer in 1KS5 and 2JEN, the latter of which is a xyloglucanase for branched substrates.7 The shift in Loop B7-A6, as compared with other Cel12 structures, is a result of interactions with the 1808-twisted b-strands B8 and B9, which extends the outer b-sheet A beyond strand A6. Such an extension has never been observed before in this enzyme family. Like the known structures of Cel12 from the other species, the interior of the jelly-roll b-sandwich of TmCel12A is filled by hydrophobic residues that constitute the protein core, including a large fraction of aromatic amino acids, mostly phenylalanine. The central cleft is also embedded with several tryptophan and tyrosine side chains (Trp26, 75, 118, 138, 176, 178, and Tyr61, 65, 180), which interact with the substrate molecule. There are more aromatic amino-acid residues in the TmCel12A-Substrate Complex Structure Figure 1 The protein fold of TmCel12A. (A) A ribbon diagram of the model is shown in a stereoscopic view. The secondary structural elements and loops are spectrum-colored from blue (N-terminus) to red (C-terminus) according to their positions in the amino-acid sequence. The b-strands are organized into two large, mostly anti-parallel sheets A and B that pack against each other. Between strands A6 and B4 lies the only a-helix in the structure, which is shown in orange. (B) The protein topology is depicted as a schematic diagram. (C) The model is rotated about 908 to show the active-site cleft and how the twisted strands B8 and B9 (yellow) in one b-sheet associate with the outmost strand A6 (orange) of the other. cleft of the bacterial enzymes when compared with the fungal enzymes, as reflected by the presence of four to eight tryptophan residues in the former and two or three in the latter. Tyr61 is unique to TmCel12A (see below) whereas Tyr65 is strictly conserved. The equivalent to Tyr180 is a valine in all others except for the Rhodothermus enzyme, which also has a Tyr163. In TmCel12A, Gly233 replaces either a tryptophan or a phenylalanine residue in the other enzymes [e.g., Trp205 in 2NLR and Phe202 in 1H8V; Fig. 2(B)]. This occurs at the junction between strand B4 and Loop B4-A4, and makes the peptide chain deviates by 1508 from all others. The longer Loop B4-A4 of TmCel12A also extends its overhang from the central cleft by about 7 Å. To compensate for the shifted Loop B4-A4, the side chain of Leu109 is rotated slightly outward to fill the space but still makes hydroPROTEINS 1197 Y.-S. Cheng et al. Figure 2 Unique loops in TmCel12A. (A) The A-chain model of the mutant E134C crystal soaked with cellobiose is superimposed onto other known structures of GH12-family enzymes. The colors used here are red for TmCel12A, blue for 1H8V and 1OA3, cyan for 1KS5, green for 1UU6, gray for 2NLR and 1OA4, orange for 2BWA, and pink for 2JEN. The bound cellobiose molecules are shown as stick models with yellow carbons. The view is approximately orthogonal from that in Figure 2. (B) A close view of the region shows where the Loop B4-A4 of TmCel12A deviates from its equivalent loops in the other structures. This occurs at Gly233, which replaces an aromatic amino-acid residue (Trp or Phe) in the others. The B4A4 loop makes a sharp turn here, and results in a very different course. The neighboring Loop A3-B3 shows even larger deviation from its equivalents in the other enzymes. phobic interactions with the core residues. The next residue Pro110, instead of a hydrophilic residue in the other enzymes, gives some rigidity to the loop structure. On 1198 PROTEINS the other hand, Loop B6-B9, or the ‘‘cord,’’ is structurally more conserved, which varies by no more than a single residue in its length. TmCel12A-Substrate Complex Structure The cleft-bound cellotetraose When the wild-type TmCel12A crystals were soaked with cellotetraose, each of the two enzyme molecules in the asymmetric unit bound to one molecule of the substrate in the active-site cleft. By comparing the cellotetraose-bound and the unbound enzyme structures, the sugar was found to displace eleven water molecules in the active site of one TmCel12A and nine in the other. Among these active-site water molecules, eight occupied equivalent positions. Furthermore, 6 of the 8 conserved waters correspond to 6 of the 14 hydroxyl-group positions in the bound cellotetraose molecule. Although specific interactions between the enzyme and the substrate are probably the major determinants for cellulose binding to TmCel12A, the displaced water molecules can nonetheless increase the entropy that favors the enzyme-substrate association, especially at an elevated temperature. On the other hand, the side chain of Arg60 lacked strong electron densities in the native crystal but became more ordered in the presence of bound substrate (Supporting Information Fig. S2). The average temperature factors of the two Arg60 residues are 44.1 Å2 in the native crystal and 17.3 Å2 in the cellotetraose complex. The guanidine groups have an average of 55.6 Å2 in the former and 19.9 Å2 in the latter. When these are compared with the overall protein temperature factors of 30.1 Å2 and 14.7 Å2 (Table I), it is evident that Arg60 is significantly stabilized by binding to substrate, especially for its side chain. The poorly and well defined guanidine groups in the native and complex crystals differ by a nearly 1808 rotation (Supporting Information Fig. S2). This Arg60 is located in the Loop A3-B3, which is longer than its equivalents in the other enzymes, protrudes conspicuously over the active-site cleft, and is unique to TmCel12A. The two bound cellotetraose molecules show an RMSD of 0.178 Å for the 45 non-hydrogen atoms, indicating an almost identical means of binding. The four bglucose residues occupy the 22, 21, 11, and 12 subsites of the central cleft. However, the occupancy may vary significantly, as reflected by the average temperature factors of these residues, 12.4 Å2 (22), 18.4 Å2 (21), 35.3 Å2 (11), and 44.5 Å2 (12), and also by the corresponding strength of electron density (Supporting Information Fig. S5). Previous studies showed that TmCel12A was most active at 958C and pH 5.2,3 Because the wildtype enzyme is supposed to remain active in the crystal, given the high cellotetraose concentration and length of incubation, hydrolysis of the substrate molecules should have occurred during the soaking time. Nevertheless, 14 direct hydrogen bonds between the enzyme and the substrate can be unambiguously identified by analyzing the refined crystal structure. As shown in Figure 3, the 22 sugar residue at the non-reducing end is sandwiched by the large aromatic side chains of Trp26 and Trp75, and apparently stabilized by the strong stacking interactions with its six-atom sugar ring. In addition, it makes three direct hydrogen bonds and at least two indirect hydrogen bonds (not shown), mediated by conserved water molecules that were observed in both complex structures. Judging by the extensive interactions between the 22 residue and the enzyme, it is not surprising that this most tightly bound residue had the strongest density and the lowest temperature factor. The adjacent 21 residue, which is to be attacked by the nucleophile Glu134, makes four direct hydrogen bonds with the enzyme. One carboxyl oxygen atom of Glu134 makes a bond of 2.8–2.9 Å to the O2 atom of the 21 residue, and the other carboxyl oxygen is 3.6 Å from the anomeric carbon C1 of the sugar, ready for nucleophilic attack (see Fig. 3). This latter oxygen also makes a short hydrogen bond to the side chain of Glu116 (not shown), which is to deprotonate the nucleophile in the catalytic reaction. On the other hand, the side chain of Glu231 is hydrogen bonded to the O6 atom of the 21 residue and the O3 and O4 atoms of the 11 residue, and one of the carboxyl oxygen atoms is also 2.6–2.7 Å from O5 of the 21 residue. In the first halfreaction, Glu231 stabilizes the negatively charged transition state by providing a proton; in the second half-reaction, it serves as a catalytic base to activate a water molecule, which is supposed to take over the current position of the O4 atom (see Fig. 4). Of particular note is that the side-chain guanine group of Arg60 makes hydrogen bonds to all three sugar residues 22, 21, and 11, and its backbone carbonyl group makes a hydrogen bond to the O6 of residue 11. Besides, the shortest distance between the side chains of Arg60 and Trp178 is 3.8–3.9 Å. In this way an enclosure is formed around the 21 cleavage site, presumably important for efficient catalysis by holding the 21 sugar residue in place. Because the 11 and 12 subsites are more open to the solvent, the bound sugar residues are less ordered (Supporting Information Fig. S5), despite the six direct hydrogen bonds to the 11 sugar and the stacking interaction of Tyr61 with the 12 sugar (see Fig. 3). Interestingly, the O3 atom of 21 makes a hydrogen bond to the O5 of 22 (not shown) and so does the O3 of 12 to the O5 of 11, resulting in a similar conformation of the two disaccharide units. The E134C-sugar complexes Two cellobiose molecules were bound to each TmCel12A molecule when the mutant crystals of E134C were soaked with the disaccharide (Supporting Information Fig. S6). One cellobiose molecule occupied the 22 and 21 subsites and the other occupied the 11 and 12 subsites of the enzyme. The RMSD between the bound sugars in the two crystallographically-independent E134C-cellobiose complexes is 0.139 Å for 46 non-hydrogen atoms. When the E134C crystal was soaked with cellotetraose, the bound sugar residues occupied the 23 subsite in addition to the 22, 21, 11, and 12 sites PROTEINS 1199 Y.-S. Cheng et al. Figure 3 The wild-type TmCel12A-cellotetraose complex. The four units of b-glucose are shown as heavy stick models with gray carbons and labeled from 22 to 12 according to the subsites. Some surrounding amino-acid residues of TmCel12A are shown as thin stick models with green carbons. The dash lines denote the direct hydrogen bonds between the enzyme and the substrate. (Supporting Information Fig. S6). At the reducing end of the binding site, some densities beyond the O1 atom of the 12 sugar were also seen. Because the mutant enzyme is inactive, the bound sugars should remain intact. Consequently, the observed densities might represent two bound cellotetraose molecules, but one was modeled as a cellotriose and the other as a cellobiose due to disorder at both ends of the substrate-binding cleft. The RMSD between the two independent five-sugar-residue models from 23 to 12 is 0.240 Å for 57 non-hydrogen atoms. In the E134C-cellobiose cocrystal, each of the four protein molecules in the asymmetric unit had its 22 and 21 subsites occupied by a cellobiose molecule. The RMSD varies from 0.071 to 0.136 Å between the four cellobiose models. Weak densities were also observed in the region of 11 and 12 subsites in three of the four protein molecules, but they could only be modeled as a b-glucose molecule plus a few waters (Supporting Information Fig. S7). The RMSD ranges from 0.094 Å to 0.205 Å. The lower occupancies might be due to the 1200 PROTEINS absence of substrate in the cryoprotectant solution (see Materials and Methods). Despite their different lengths, the models of bound sugars in the E134C mutant crystals superimpose well on one another, as shown in Figure 5(A). Similar to those observed in the cellotetraose complex structure of the wild-type enzyme, the sugar residues bound to the 22 and 21 subsites of the mutant appear to be the most stable, judging by their individual average temperature factors (Supporting Information Figs. S6 and S7). The stability of the bound substrate residues in the subsites may have the order of 22 > 21 > 11 > 12 > 23, which also matches well with the corresponding strength of electron density. Unlike its succeeding sugar units, the 23 residue observed in the E134C-cellotetraose structure is much exposed to the solvent (Supporting Information Fig. S8), with its 6-hydroxyl group making a single direct hydrogen bond to the side chain of Glu76. Interactions in the other subsites are similar to those in the wild-type complex, including the sandwiched stacking of 22 sugar TmCel12A-Substrate Complex Structure Figure 4 Catalytic mechanism of TmCel12A. Here the two-step reaction typical of a retaining enzyme is depicted in a schematic diagram. The two glucose residues correspond to those bound to the 21 and 11 subsites. The acidic side chain of Glu116 adjacent to the nucleophile Glu134 is believed to maintain a negative charge at low pH values. by Trp26 and Trp75, the four direct hydrogen bonds of Arg60 to three sugar residues, and the single-sided stacking of Tyr61 with the 12 residue. However, there are a few exceptions particularly in the 21 subsite, which will be detailed below. It is also worth noting that in every bound glucose residue, the 6-hydroxyl group consistently forms at least one direct hydrogen bond to the TmCel12A protein, no matter whether the enzyme is wild-type or a mutant. Specifically, the O6 atoms of the glucose residues from 23 to 12 are hydrogen bonded to Glu76 OE2, Arg60 NH1, Trp26 NE1/Glu231 OE1, Arg60 NH2, and Thr145 N. These O6 atoms no longer interact with the O2 atoms of their proceeding glucose residues of the same chain and the O3 atoms of the neighboring chains as observed in crystalline cellulose.24 The O2 atom of residues 22, 21, and 11 also forms separate hydrogen bonds to Asn24 ND2, Glu134 OE2, and Thr145 O of the wild-type enzyme (see Fig. 3). In all TmCel12A-substrate complex structures, the two glucose residues bound to the 22 and 21 subsites of the enzyme are the most clearly visible. The RMSD between the disaccharide models is 0.08 Å for the two wild-type enzyme complexes and it ranges from 0.07 to 0.20 Å between the eight E134C mutant complex models. By contrast, a marked increase in RMSD is seen between the disaccharides in the wild-type and the mutant structures, ranging from 0.65 to 0.72 Å (Supporting Information Table SV). While all models of the 22 sugar residue are nearly identical to one another, those of the 21 residue show a distinct structural difference at the C1 atom. The 21 glucose residue has the b-anomeric configuration in the two wild-type complex structures but has the a-anomeric configuration in all eight complexes of the E134C mutant. For a D-glucopyranoside the b-anomer is more stable than the a-anomer. The two anomers are interconvertible in aqueous solution with a ratio of 2:1. Probably the mutant E134C prefers binding to a-anomer at the 21 subsite, which corresponds to the substrate cleavage site. The presence of high substrate concentration in the crystallization solution should have allowed the binding to full occupancy. The O3 and O6 atoms remained hydrogen bonded to Arg60, Trp26 and Glu231 as in the wild-type structure, but the substitution of Glu134 by Cys134 aborted its hydrogen bond to the O2 atom. As shown in Figure 5(B), the original position of Glu134 OE2 is filled in by a water molecule, which forms three hydrogen bonds to Cys134 SG, Trp173 NE1 and the sugar’s O2 atom. Interestingly, the sugar ring is slightly rotated and the C1 atom is thus shifted by about 1 Å toward the original nucleophile residue, now Cys134. The O1 atom of the sugar, which is an a-anomer, is about 3.5 Å from Cys134 SG. It is located half-way between Glu134 OE1 (not OE2) of the wild-type enzyme and the sugar’s C1 atom, and replaces the Glu134 OE1 atom in forming a hydrogen bond to Glu116. In view of the double displacement mechanism of a retaining enzyme,15 the bound substrate models in the E134C complex structures may mimic the catalytic intermediate configuration. DISCUSSION Two approaches are taken in structure determination of a protein crystal: experimental phasing and molecular replacement.25 Because the number of structures in the PROTEINS 1201 Y.-S. Cheng et al. Figure 5 The E134C-substrate complexes. (A) The enzyme-bound substrate molecules in the E134C crystals are superimposed and shown as thin stick models. The models with carbon atoms colored blue and green are from the soaked cellobiose and cellotetraose complexes. Those with yellow carbons are from the cocrystallized cellobiose complex. Note that all of the 21 sugars show the same a-anomeric configuration. (B) The E134C cellobiose-soaking crystal structure is superposed onto that of the wild-type cellotetraose complex. Carbon atoms in the E134C model are colored in pink and green for the protein and substrate, and those of the wild-type are in gray and cyan. Conserved hydrogen bonds in both complexes are shown as black dash lines. New bonds in the mutant structure and a mediating water molecule are colored in magenta. PDB is increasing rapidly, it becomes more likely to find a homologous structure for studying a new protein. By aligning the protein sequences and substituting the side chains a model can usually be created to yield some information about, for example, active-site environment and inter-molecular interactions. The accuracy of the resulting model, however, is dependent on the extent of sequence identity. Although about two-thirds of structures in the PDB were solved by molecular replacement,26 isomorphous replacement and anomalous dispersion still remain in wide use. One reason is that the homologous proteins contain significant variations especially in the loop regions, despite their common folds. Other reasons can be conformational changes, oligomer formation, or different crystal packing, which may affect the accuracy of Patterson function search. The amino-acid sequence of TmCel12A has less than 20% identity to those of other bacterial cellulases and less than 14% to the fungal enzymes (Supporting Information Table SVI). It turns out that the proteins share no more than a common fold and a few conserved 1202 PROTEINS active-site residues such as Glu134 and Glu231. Significant variations occur in most connecting loops between the b-strands and also in some b-strands in the two jelly-roll sheets. Consequently it is not surprising that our molecular replacement search failed to yield a correct solution. Because the catalytic nucleophile of TmCel12A had been identified to be Glu134 by sequence comparison, it was mutated to a cysteine for preparation of heavy atom derivatives. As expected, a major site was located adjacent to each Cys134 side chain. In general, the active site of an enzyme is more easily identified by the aminoacid sequence and also more likely to bind to heavy atoms because some ‘‘active’’ functional groups must be present there. In the study of the Aspergillus niger enzyme (1KS5),14 the enzyme’s activity was inhibited by a palladium ion bound to the side chain of the nucleophile Glu116 in the active site, which showed clear electron density in the Fourier map. Another example is seen in the structure determination of hexaprenyl pyrophosphate synthase from Sulfolobus solfataricus, which is homologous to other prenyltransferases but also shows signifi- TmCel12A-Substrate Complex Structure cant variations.27 In that study, Asp81 of the first aspartate-rich motif in the active site was mutated to Cys81 for binding to mercury-containing compounds. Thus, it is a good alternative to try experimental phasing by mutating an active-site residue for heavy-atom binding, in addition to molecular replacement. The structure of TmCel12A differs significantly from those of other GH12 enzymes. The two outermost bstrands of the inner sheet B extend and twist to dock onto the rim of the outer sheet A. When linked by these two b-strands (B8 and B9), the two b-sheets A and B are integrated into something like a flattened b-barrel. Although it appears to be a feature of TmCel12A, whether the barrel-like formation would make this structure more stable than that of two individual b-sheets at higher temperatures remains to be investigated. The loops connecting the b-strands show variations in their lengths and dispositions, some apparently giving rise to different substrate affinity and specificity. The unique Loop A3-B3, which contains Arg60 and Tyr61, protrudes on one side of the active-site cleft. Arg60 forms direct hydrogen bonds with three sugar residues in the 22, 21, and 11 subsites, and Tyr61 provides stacking interaction with the 12 sugar. In all eight other enzyme structures of the same family, an aromatic side chain (phenylalanine or tryptophan) from Loop B4-A4 occupies the equivalent space of the Tyr61 side chain. The residue corresponds to Gly233 in TmCel12A [Fig. 2(B)]. Unlike Tyr61 that stacks with the 12 sugar, the aromatic group is perpendicular to the sugar ring, making only weak van der Waals contacts. Aromatic side chain stacking is an important means of binding to sugar, as observed in lectins and other proteins.28 The 22 sugar, which is sandwiched between Trp26 and Trp75, appears to be the most tightly bound residues due to strong stacking interaction on both sides of the sugar ring. In addition, other direct and water-mediated hydrogen bonds, presumably taking over the original intra- and inter-strand hydrogen bonds in a cellulose fiber, also contribute to the enzyme’s affinity to an isolated strand of cellulose. The a-anomer observed in the E134C-cellobiose and E134C-cellotetraose complexes, obtained either by soaking or by cocrystallization, may mimic the glycosylenzyme intermediate.6,15 The shorter chain lengths of the bound sugars than expected for cellotetraose in the complex may not be a result of cleavage because the mutant is inactive. Instead, they reflect the number of subsites in TmCel12A and their binding strengths. Judging from the electron densities and temperature factors, the bound sugar residues appear to be more stable in the 22 and 21 subsites than those in the 11, 12, and 23 subsites. The sugar residues beyond 23 and 12 were most likely disordered rather than cleaved. Consequently, the E134C mutant tends to bind to two molecules of cellotetraose instead of a single cellotetraose as observed in the wild-type complex. It suggests that the active-site envi- ronment provided by the substrate-binding cleft of the enzyme may favor a distorted geometry at the cleavage site especially around the anomeric C1-carbon of the 21 residue. The enzyme also tends to release the sugar residues on the reducing end from the active site once the glycosyl-enzyme intermediate is formed, which should have an a-anomeric configuration in the 21 sugar. The reaction then proceeds by the attack of water, assisted by Glu231 (see Fig. 4), at the C1-carbon and yields a product with its C1 reverted to the banomeric configuration. Enzymes from extremophiles have been studied extensively for their special properties. These characteristics make the enzyme highly useful in various applications, since industrial processes such as plant waste treatment usually involve high temperature and low pH. Thermostability of TmCel12A could be attributed in part to the longer b-strands B8 and B9, which associate with A6 of the other b-sheet. Hydrophobic interactions that hold the two b-sheets together are also important for stability,11,12,15 whose relationship with the protein sequence awaits further studies. Although the catalytic residues in the active site are conserved, and so are some interactions with the substrate such as stacking of Trp26 with the 22 sugar,9 the environments of the substrate-binding cleft in the GH12-family enzymes differ from one another due to large variations in the loop structures. These presumably determine substrate specificity and catalytic efficiency. The detailed enzyme-substrate interactions presented here should provide a basis for subsequent mutagenesis studies and protein-engineering projects. ACKNOWLEDGMENTS The authors are grateful to NSRRC for synchrotron beam-time allocations and data-collection assistance. The atomic coordinates and structure factors (codes 3AMH, 3AMM, 3AMN, 3AMP, and 3AMQ) have been deposited in the Protein Data Bank. REFERENCES 1. Allgaier M, Reddy A, Park JI, Ivanova N, D’haeseleer P, Lowry S, Sapra R, Hazen TC, Simmons BA, VanderGheynst JS, Hugenholtz P. Targeted discovery of glycoside hydrolases from a switchgrassadapted compost community. PLo S One 2010;5:e8812. 2. Bronnenmeier K, Kern A, Liebl W, Staudenbauer WL. Purification of Thermotoga maritima enzymes for the degradation of cellulosic materials. Appl Environ Microbiol 1995;61:1399–1407. 3. Liebl W, Ruile P, Bronnenmeier K, Riedel K, Lottspeich F, Greif I. Analysis of a Thermotoga maritima DNA fragment encoding two similar thermostable cellulases, CelA and CelB, and characterization of the recombinant enzymes. Microbiology 1996;142:2533–2542. 4. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, Peterson JD, Nelson WC, Ketchum KA, McDonald L, Utterback TR, Malek JA, Linher KD, Garrett MM, Stewart AM, Cotton MD, Pratt MS, Phillips CA, Richardson D, Heidelberg J, Sutton GG, Fleischmann RD, Eisen JA, White O, Salzberg SL, Smith HO, Venter JC, Fraser CM. Evidence for lateral gene transfer PROTEINS 1203 Y.-S. Cheng et al. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. between Archaea and bacteria from genome sequence of Thermotoga maritima. Nature 1999;399:323–329. Sulzenbacher G, Shareck F, Morosoli R, Dupont C, Davies GJ. The Streptomyces lividans family 12 endoglucanase: construction of the catalytic core, expression, and X-ray structure at 1.75 A resolution. Biochemistry 1997;36:16032–16039. Sulzenbacher G, Mackenzie LF, Wilson KS, Withers SG, Dupont C, Davies GJ. The crystal structure of a 2-fluorocellotriosyl complex of the Streptomyces lividans endoglucanase CelB2 at 1.2 A resolution. Biochemistry 1999;38:4826–4833. Gloster TM, Ibatullin FM, Macauley K, Eklöf JM, Roberts S, Turkenburg JP, Bjørnvad ME, Jørgensen PL, Danielsen S, Johansen KS, Borchert TV, Wilson KS, Brumer H, Davies GJ. Characterization and three-dimensional structures of two distinct bacterial xyloglucanases from families GH5 and GH12. J Biol Chem 2007;282:19177–19189. Crennell SJ, Hreggvidsson GO, Nordberg Karlsson E. The structure of Rhodothermus marinus endoglucanase Cel12A, a highly thermostable family 12 endoglucanase, at 1.8 A resolution. J Mol Biol 2002;320:883–897. Crennell SJ, Cook D, Minns A, Svergun D, Andersen RL, Nordberg Karlsson E. Dimerization and an increase in active site aromatic groups as adaptations to high temperatures: X-ray solution scattering and substrate-bound crystal structures of Rhodothermus marinus endoglucanase Cel12A. J Mol Biol 2006;356:57–71. Sandgren M, Shaw A, Ropp TH, Wu S, Bott R, Cameron AD, Ståhlberg J, Mitchinson C, Jones TA. The X-ray crystal structure of the Trichoderma reesei family 12 endoglucanase 3, Cel12A, at 1.9 A resolution. J Mol Biol 2001;308:295–310. Sandgren M, Gualfetti PJ, Paech C, Paech S, Shaw A, Gross LS, Saldajeno M, Berglund GI, Jones TA, Mitchinson C. The Humicola grisea Cel12A enzyme structure at 1.2 A resolution and the impact of its free cysteine residues on thermal stability. Protein Sci 2003;12:2782–2793. Sandgren M, Gualfetti PJ, Shaw A, Gross LS, Saldajeno M, Day AG, Jones TA, Mitchinson C. Comparison of family 12 glycoside hydrolases and recruited substitutions important for thermal stability. Protein Sci 2003;12:848–860. Sandgren M, Berglund GI, Shaw A, Ståhlberg J, Kenne L, Desmet T, Mitchinson C. Crystal complex structures reveal how substrate is bound in the 24 to the 12 binding sites of Humicola grisea Cel12A. J Mol Biol 2004;342:1505–1517. Khademi S, Zhang D, Swanson SM, Wartenberg A, Witte K, Meyer EF. Determination of the structure of an endoglucanase from Asper- 1204 PROTEINS 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. gillus niger and its mode of inhibition by palladium chloride. Acta Crystallogr 2002;D58:660–667. Sandgren M, Ståhlberg J, Mitchinson C. Structural and biochemical studies of GH family 12 cellulases: improved thermal stability, and ligand complexes. Prog Biophys Mol Biol 2005;89:246–291. Otwinowsk Z, Minor W. Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol 1997;276:307–326. Terwilliger TC, Berendzen J. Automated MAD and MIR structure solution. Acta Crystallogr 1999;D55:849–861. Terwilliger TC. Maximum likelihood density modification. Acta Crystallogr 2000;D56:965–972. Terwilliger TC. Automated main-chain model building by template matching and iterative fragment extension. Acta Crystallogr 2003;D59:38–44. Jones TA, Zou JY, Cowan SW, Kjeldgaard M. Improved methods for the building of protein models in electron density maps and the location of errors in these models. Acta Crystallogr 1991;A47:110– 119. Brunger AT, Adams PD, Clore GM, DeLano WL, Gros P, GrosseKunstleve RW, Jiang JS, Kuszewski J, Nilges N, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL. Crystallography and NMR system (CNS): a new software system for macromolecular structure determination. Acta Crystallogr 1998;D54:905–921. DeLano WL. The PyMOL molecular graphics system. USA: DeLano Scientific LLC; 2008. Richardson JS. The anatomy and taxonomy of protein structure. Adv Protein Chem 1981;34:167–339. Nishiyama Y, Langan P, Chanzy H. Crystal structure and hydrogen-bonding system in cellulose I-beta from synchrotron X-ray and neutron fiber diffraction. J Am Chem Soc 2002;124:9074– 9082. Adams PD, Afonine PV, Grosse-Kunstleve RW, Read RJ, Richardson JS, Richardson DC, Terwilliger TC. Recent developments in phasing and structure refinement for macromolecular crystallography. Curr Opin Struct Biol 2009;19:566–572. Long F, Vagin AA, Young P, Murshudov GN. BALBES, a molecularreplacement pipeline. Acta Crystallogr 2008;D64:125–132. Sun HY, Ko TP, Kuo CJ, Guo RT, Chou CC, Liang PH, Wang AHJ. Homodimeric hexaprenyl pyrophosphate synthase from the thermoacidophilic crenarchaeon Sulfolobus solfataricus displays asymmetric subunit structures. J Bacteriol 2005;187:8137–8148. Rudiger H, Gabius HJ. Plant lectins: occurrence, biochemistry, functions and applications. Glycoconj J 2001;18:589–613.
© Copyright 2026 Paperzz