Structural and Functional Studies of Glycoside Hydrolase

Comprehensive Summaries of Uppsala Dissertations
from the Faculty of Science and Technology 819
Structural and Functional Studies of
Glycoside Hydrolase Family 12
Enzymes from Trichoderma reesei and
other Cellulolytic Microorganisms
BY
MATS SANDGREN
ACTA UNIVERSITATIS UPSALIENSIS
UPPSALA 2003
Dissertation for the Degree of Doctor of Philosophy in Molecular Biology presented at Uppsala
University in 2003.
ABSTRACT
Sandgren, M. 2003. Structural and functional studies of glycoside hydrolase family 12 enzymes
from Trichoderma reesei and other cellulolytic microorganisms. Acta Universitatis Upsaliensis.
Comprehensive Summaries of Uppsala Dissertations from the faculty of Science and Technology
819. 68 pp. Uppsala. ISBN 91-554-5562-X
Cellulose is the most abundant organic compound on earth. A wide range of highly specialized
microorganisms, have evolved that utilize cellulose as carbon and energy source. Enzymes called
cellulases, produced by these cellulolytic organisms, perform the major part of cellulose
degradation.
In this study the three-dimensional structure of four homologous glycoside hydrolase family
12 cellulases will be presented, three fungal enzymes; Humicola grisea Cel12A, Hypocrea
schweinitzii Cel12A, Trichoderma reesei Cel12A, and one bacterial; Streptomyces sp. 11AG8
Cel12A. The structural and biochemical information gathered from these and 15 other GH family
12 homologues has been used for the design of variants of these enzymes. These variants have
biochemically been characterized, and thereby the positions and the types of mutations have been
identified responsible for the biochemical differences between the homologous enzymes, e.g.,
thermal stability and activity. The three-dimensional structures of two T. reesei Cel12A variants,
where the mutations have significant impact on the stability or the activity of the enzyme have
been determined. Four ligand complex structures of the WT H. grisea Cel12A enzyme, that have
made it possible to characterize the interactions between substrate and enzyme, have also been
determined.
The structural and biochemical studies of these closely related GH family 12 enzymes, and
their variants, have provided insight on how specific residues contribute to protein thermal
stability and enzyme activity. This knowledge can in the future serve as a structural toolbox, i.e.,
to design Cel12A enzymes with specific properties and features by introducing subtle changes in
structural components of the enzymes. These can then be utilized to develop new industrial
products or fine-tune enzymes in already existing applications.
Key Words: Cellulase, endoglucanase, thermal stability, homologues, protein structure, ligand
complex, X-ray crystallography
Mats Sandgren, Department of Cell and Molecular Biology, Uppsala University, Biomedical
Centre, Box 596, SE-751 24 Uppsala, Sweden. E-mail: [email protected]
©Mats Sandgren 2003
ISSN: 1104-232X
ISBN: 91-554-5562-X
Printed in Sweden by Uppsala University, Universitetstryckeriet, Uppsala 2003
To my lovely family
TABLE OF CONTENTS
PAPERS INCLUDED IN THE THESIS ...................................................................... 6
ABBREVIATIONS ..................................................................................................... 7
1 INTRODUCTION ................................................................................................... 8
2 BACKGROUND .....................................................................................................10
2.1 Wood............................................................................................................10
2.1.1 Cellulose............................................................................................10
2.1.2 Other plant cell wall components ......................................................13
2.2 Cellulolytic organisms .................................................................................13
2.2.1 Trichoderma reesei ...........................................................................14
2.2.2 Hypocrea schweinitzii .......................................................................14
2.2.3 Humicola grisea ................................................................................15
2.2.4 Streptomyces sp. 11AG8....................................................................15
2.2.5 Industrial applications of glycoside hydrolases.................................15
2.3 Cellulases .....................................................................................................15
2.3.1 Classification of cellulases ................................................................16
2.3.2 Domain structure organization of cellulases .....................................17
2.3.3 Hydrolytic mechanism of cellulases..................................................18
2.4 The cellulolytic system of Trichoderma reesei............................................20
2.4.1 Induction of cellulases.......................................................................24
2.4.2 Synergy between cellulases...............................................................24
2.4.3 Three-dimensional Structures of T. reesei cellulases ........................26
3 METHODS .............................................................................................................27
3.1 X-ray crystallography ..................................................................................27
3.1.2 Phase determination ..........................................................................27
3.1.3 Model building and structure refinement ..........................................30
3.2 Protein crystallization ..................................................................................30
4 RESULTS AND DISCUSSION ................................................................................32
4.1 Aim of thesis ................................................................................................32
4.2 The Trichoderma reesei Cel12A structure...................................................32
4.2.1 Crystallization and structure determination.......................................33
4.2.2 Protein structure ................................................................................34
4.2.3 Substrate-binding cleft ......................................................................35
4.3 Thermal stability and activity of GH family 12 enzymes ............................36
4.3.1 Thermal stability................................................................................38
4.3.2 Relative enzyme activity ...................................................................39
4.3.3 Structural features affecting stability.................................................40
4.3.4 Discussion .........................................................................................44
4.4 The Humicola grisea Cel12A structure, and stabilizing cysteines..............44
4.4.1 Thermal stability................................................................................44
4.4.2 Relative enzyme activity ...................................................................46
4.4.3 Protein structures...............................................................................46
4.4.4 Discussion .........................................................................................49
4.5 H. grisea Cel12A complex structures..........................................................49
4.5.1 Overall protein structures ..................................................................49
4.5.2 Oligosaccharide complexes...............................................................49
4.5.3 Protein carbohydrate interactions......................................................52
4.5.4 Transglycosylation ............................................................................54
5 CONCLUDING REMARKS ...................................................................................56
6 ACKNOWLEDGEMENTS .....................................................................................57
7 REFERENCES .......................................................................................................58
8 APPENDICES ........................................................................................................67
PAPERS INCLUDED IN THE THESIS
This thesis is based upon the following original publications and manuscripts, and will
be referred to in the summary by their Roman numerals:
I.
Sandgren, M., Shaw, A., Ropp, T. H., Wu, S., Bott R., Cameron, A. D., Ståhlberg,
J., Mitchinson, C. and Jones, T. A. (2001). The X-ray crystal structure of the
Trichoderma reesei family 12 endoglucanase 3, Cel12A, at 1.9 Å resolution. J.
Mol. Biol. 308: 295-210
II. Sandgren, M., Gualfetti, P. J., Shaw, A., Gross, L. S, Saldajeno, M., Day, A. G.,
Jones, T. A. and Mitchinson, C. (2003). Comparison of family 12 glycoside
hydrolases and recruited substitutions important for thermal stability. Protein Sci.
12: 848-866
III. Sandgren, M., Gualfetti, P. J., Paech, C., Paech, S., Shaw, A., Gross, L., Saldajeno,
M., Berglund, G. I., Jones, T. A. and Mitchinson C. (2003). The Humicola grisea
Cel12A enzyme structure at 1.2 Å resolution, and the recruitment of residues
important for thermal stability of glycoside hydrolase family 12 enzymes.
Submitted.
IV. Sandgren, M., Shaw, A., Gualfetti, P. J., Gross, L. G., Berglund, G. I., Ståhlberg
J., Kenne, L., Driguez H.,, Jones, T. A. and Mitchinson C. (2003). Crystal complex
structures reveals how a cellulose chain is bound in the 35 Å long substrate-binding
cleft of Humicola grisea Cel12A, spanning from the – 4 to the + 2 binding site of
the enzyme.
In manuscript.
Reprints of the articles were made with permission from the copyright holders.
6
ABBREVIATIONS
CBH
CBM
CD
cd
DP
EG
Fo
Fc
GH
G2
G4
G5
G2SG2
Glc
NAG
HCA
kDa
MAD/SAD
MIR/SIR
mme
MR
MW
NCS
NMR
PEG
RMSD
Rmerge
Rfactor
Tm
H. grisea
H. jecorina
H. schweinitzii
S. sp. 11AG8
S. lividans
T. reesei
cellobiohydrolase
cellulose binding module
circular-dichroism
catalytic domain
degree of polymerization
endoglucanase
observed structure factor amplitude
calculated structure factor amplitude
glycoside hydrolase
cellobiose
cellotetraose
cellopentaose
thio-linked cellobioside
glucose
N-acetyl-glucose-amine
hydrophobic cluster analysis
kilo Daltons
multiple/single anomalous dispersion
multiple/single isomorphous replacement
mono-methyl-ether
molecular replacement
molecular weight
non-crystallographic symmetry
nuclear magnetic resonance
polyethylene glycol
root-mean-square deviation
6hkl 6i~I – < I >~ 6hkl 6i ~ I ~
6hkl~~Fobs~–~Fcalc~~ 6hkl ~Fobs~
mid-point of thermal denaturation
Humicola grisea
Hypocrea jecorina
Hypocrea schweinitzii
Streptomyces sp. 11AG8
Streptomyces lividans
Trichoderma reesei
7
1I
NTRODUCTION
______________________________________________________________________
Cellulose is the most abundant polymer on earth. It has been estimated that the annual
production of cellulose, through photosynthesis, is 40 billion tons (Coughlan 1985).
This accounts for roughly half of the annual CO2 fixation, which is estimated to be 15%
of the total atmospheric carbon on earth (Gottschalk 1988). Cellulose is predominantly
produced by terrestrial plants and marine algae, but can also be produced by other
organisms such as bacteria. Cellulose is mainly used as structural reinforcement of plant
cell walls. The cellulose content in the wood of higher plants like trees can be as much
as 50%.
Due to the large amounts of cellulose in the biosphere a wide range of highly
specialized cellulose-degrading organisms have evolved that utilize cellulose as carbon
and energy source. These organisms play a key role in the recycling of cellulose, and
thereby maintain the carbon cycle on earth. The spontaneous degradation of cellulose in
nature is extremely slow. Enzymes called cellulases perform the major part of cellulose
degradation. These are produced by the cellulolytic organisms, and have the capability
of hydrolyzing highly ordered crystalline cellulose into shorter cellooligomers and
glucose. Cellulases are produced by a great number of organisms such as plants, plant
pathogens and cellulolytic microorganisms, both bacterial and fungal. Among fungi and
bacteria, there exist species that secrete complete sets of cellulolytic enzymes that
synergistically have the capability to degrade highly crystalline cellulose completely,
e.g., Trichoderma reesei, Clostridium thermocellum, and Thermobifida fusca.
Since the early 1950's, scientists have tried to shed some light on how
microorganisms, mainly fungi, manage the difficult task of degrading cellulose. Among
the best-studied cellulolytic system, to date, is that of the aerobic filamentous soft-rot
fungus Trichoderma reesei. In this thesis, where further investigations on the properties
and the function of one class of cellulases will be presented, the cellulolytic system of
T. reesei will be used as the point of comparison. The cellulolytic system of T. reesei
consists of a large set of different cellulases, with totally different characteristics. Some
degrade the cellulose chains from the ends, some make internal cuts, some are
expressed in high quantities, and some in barely detectable levels. The reason why T.
reesei has so many different cellulases is not fully understood.
8
Cellulases have become useful in a wide range of commercial applications over the last
few decades, e.g., in the textile, paper and pulp, and detergent industries. These are
applications where one wants to modify cellulose fibers (usually cotton), to get a better
product. The interest in finding new commercial applications for cellulases has initiated
world-wide extensive research programs on cellulases to identify new cellulases in
nature that have suitable characteristics for the considered applications and processes
(van Solingen et al. 2001). In many cases where a useful cellulase has been identified,
the enzyme has also been genetically modified to get shifted or new biochemical
properties that better fit the considered application.
Cellulose has for many years been considered to be a putative starting material for
ethanol production. Cellulose is a renewable resource unlike the fossil fuels in use
today. To produce ethanol on a huge scale, the carbon source must be cheap, easily
accessible, and abundant. An additional advantage of using ethanol as fuel is that the
carbon dioxide that is produced when consuming the bio-fuel, is taken up by the plants,
and re-assimilated into cellulose. The increased carbon dioxide level in the atmosphere
is the main reason for the so-called greenhouse effect.
When producing ethanol from cellulose, the first step is to convert cellulose into
glucose, which is then used as the carbon source in ethanol fermentation. It is in the first
step, where one potentially could utilize cellulases to hydrolyze cellulose instead of
treating it with acid, as is done today. But to date this process has been too slow, and
the yield too low to be economically feasible. Thus there exists a strong demand for
research on cellulases, as well as on the other steps in the conversion of cellulose into
ethanol.
X-ray protein crystallography has over the last couple of decades become a powerful
tool for studying the structure and function of proteins at atomic resolution. The first
cellulase X-ray structure was published in 1990 (Rouvinen et al. 1990). Since then a
wide range of different cellulase structures have been published.
In the present study (Papers I-V), detailed atomic structures of several members
from one class of cellulases (GH family 12) will be presented, as well as examples of
how this structural knowledge, in combination with sequence and biochemistry data,
has been used to genetically modify proteins so that they better fit in their industrial
applications.
9
2B
ACKGROUND
______________________________________________________________________
2.1 Wood
Wood is built up of elongated plant cells that consist of many different components.
The dominating ones are cellulose, hemicellulose, and lignin (Mohr and Schopfer
1995). A schematic picture of a wood cell is shown in Figure 1. The primary and
secondary cell walls are built up of cellulose, hemicellulose, and lignin, whereas the
middle lamella consists mainly of lignin (Sjöström 1993). The structure of the wood
cell, and the ratio of its different components varies a lot depending on which plant
species the cell comes from, cell type, and development stage. The approximate
composition of the three most common components in wood is: 35-50% cellulose, 2030% hemicellulose, and 20-30 % lignin (Sjöström 1993).
Figure 1. A schematic picture of the
structural organization of a wood cell.
The cell consists of the middle lamella
(ML), the primary cell wall (P), the
secondary cell wall (S), and the warty
layer (W). The secondary cell wall S is
divided into the outer (S1), middle
(S2), and inner layers (S3). Reprinted
from (Sjöström 1993).
2.1.1 Cellulose
Cellulose, the major structural component in the plant cell wall, was discovered in
plants in 1837 (Hon 1994). The annual production of cellulose in the biosphere has
been reported to be 4x1010 tons (Coughlan 1985), and the total amount of cellulose on
earth has been estimated to be 7x1011 tons. The half time for spontaneous degradation
of cellulose has been estimated to be 4.7x106 years (Wolfenden et al. 1998). Cellulose
10
is not only synthesized by plants, it is also synthesized by a wide range of other
organisms such as bacteria, protists, algae, and fungi.
Upon polymerization of a cellulose chain from glucose units, the C1 carbon (the
anomeric carbon) of a glucose residue is covalently linked to the O4 hydroxyl oxygen
of next glucose residue. Each added glucose residue gives the net release of one water
molecule. The added glucose can be linked to the growing glucan chain with two
different configurations at the anomeric carbon, in either the D- or the E-position (i.e.,
in axial or equatorial positions), resulting in either a D-1,4 or a E-1,4 glycosidic bond
(Figure 2). The polymerization of a glucan chain where the glucose residues are linked
with D-1,4 bonds produces starch, whereas a glucan polymer linked with E-1,4
glycosidic bonds produces cellulose. The end of the glucan chain with an anomeric
carbon that is not linked to another glucose residue is referred to as the reducing end of
the polymer. The other end of the polymer is the non-reducing end.
Figure 2. Two E-D glucose units linked with a E-1,4 bond. The end of the molecule
where the C1 carbon has a free hydroxyl, shown here in the equatorial position, is
called the reducing end of the cellulose chain. A glucose molecule with a hydroxyl
group in the axial position at C1 forms a D-1.4 linkage to the next glucose residue if the
chain polymerization continues from this.
Cellulose is a linear homo-polysaccharide consisting of anhydrous glucose units that are
linked by E-1,4-glycosidic bonds. Each glucose unit is rotated 180q relative to its two
neighbors, thus the smallest repetitive unit in the cellulose polymer is cellobiose (Figure
3). Individual cellulose chains align side by side with other cellulose chains, with a
network of hydrogen bonds between neighboring glucose units to form sheets of
cellulose (Blackwell et al. 1978). These sheets stack on top of one another, due to
hydrophobic interactions and van der Waals forces, thus forming highly crystalline
cellulose micro-fibrils, of different diameters depending on the source of the cellulose
11
(Nieduszynski and Preston 1970). The length of the cellulose chains in the micro-fibril,
i.e., the degree of polymerization (DP), varies from 2,000 to 15,000 glucose units,
depending on source. The highly crystalline regions of cellulose in the plant cell wall
are separated by less ordered regions of cellulose that is called amorphous cellulose.
The crystallinity of cellulose varies from 50 to 90%, also depending on the source (Hon
1994).
A
B
Figure 3. a) A short cellulose chain built up of eight glucose residues. Each glucose
residue in cellulose is rotated 180q compared to its neighbor, thus making cellobiose
the smallest repetitive unit in cellulose. b) The picture shows the hydrogen bonds
(dashed lines) involved in forming the cellulose structure, within a cellulose chain, and
between two adjacent chains. There are two intra-molecular hydrogen bonds between
two neighboring glucose residues in a cellulose chain; one is formed between the O3-H
atom of a residue to the O5 oxygen of the next residue, and the second one from the O2H atom to the O6 oxygen within the same residue. There is one intermolecular
hydrogen bond formed between the glucose molecules in adjacent cellulose chains, i.e.,
from O2-H atoms of glucose molecules in one chain to the O3-H atom of glucose
molecules in the adjacent chain. There are no hydrogen bonds to the cellulose chains in
the planes above or below.
12
2.1.2 Other plant cell wall components
There are several other polymeric compounds in the plant cell wall in addition to
cellulose. The composition of these compounds varies greatly for different plant
sources. The two most common ones are hemicellulose, and lignin.
Hemicellulose
Hemicellulose is defined as the fraction of the cell wall that can be extracted with alkali
(Mohr and Schopfer 1995). It is a heterogeneous polysaccharide composed of a wide
range of different polysaccharides, which usually are heavily branched with side
groups. The most common building blocks in hemicellulose are: O-acetyl
galactoglucomannan, and arabino 4-O-methylglucoronoxylan in softwood, whereas
acetyl 4-O-methylglucoronoxylan, and glucomannan are the most common building
blocks in hardwood (Sjöström 1993). The individual chains in hemicellulose are shorter
than in cellulose, with a DP of 100-200.
Lignin
Lignin is a heterogeneous aromatic polymer, where the main aromatic components are
coniferyl, commaryl, and sinapyl alcohol. The different building blocks of lignin are
linked to one and another in an apparently random fashion, giving lignin a very
complicated structure (Sjöström 1993; Mohr and Schopfer 1995). Lignin is covalently
bound to side groups on different hemicelluloses, forming a complex matrix that
surrounds the cellulose micro-fibrils. The lignin matrix gives the plant cell wall
strength, and also protects the cell wall from attack by cellulolytic microorganisms.
2.2 Cellulolytic organisms
In nature there are many microorganisms, fungal and bacterial, that produce enzymes
that are capable of catalyzing the hydrolysis of cellulose. These microorganisms can be
found in plant debris and soil, i.e., where degradation of plant material takes place
(Béguin and Aubert 1994; Tomme et al. 1995; Bayer et al. 1998). The cellulolytic
organisms can be sorted into two different subcategories depending on how the
cellulolytic microorganism organizes its enzymes.
One class of cellulolytic microorganisms has cellulolytic enzymes that are
organized into multi-enzyme complexes called cellulosomes. In these complexes the
individual enzyme molecules are anchored onto a common scaffold. Several different
types of enzymes, with different types of catalytic specificity's, e.g., endoglucanases,
and cellobiohydrolases CBH, can be attached to this. A typical bacterial cellulosome is
composed of 50 protein molecules with a total molecular weight (MW) of 2-6 M Dalton
(Da). An example of a cellulolytic organism in this class is the bacterium Clostridium
thermocellum.
The second class of cellulolytic organisms produces enzymes that are not attached
to one another, and act individually on cellulose. But the different types of enzymes
13
work cooperatively when hydrolyzing cellulose, and by doing this gain strong synergy
effects. Examples of fungi from this class are T. reesei and Humicola grisea, and of
bacteria, Streptomyces lividans and Cellulomonas fimi. Bacteria and fungi are not the
only cellulase-producing organisms that exist. Cellulases have also been isolated from:
blue mussel (Xu et al. 2000), termites (Watanabe et al. 1998), crayfish (Byrne et al.
1999), and plants, e.g., Arabidopsis (Williamson et al. 2002).
Most cellulolytic microorganisms lack efficient ligninase systems, and cannot or
have problems degrading lignin. It is only some bacidomycetes (white-rot fungi) that
have such efficient systems. The complex nature of lignin makes its direct degradation
by enzymes a difficult task. The degradation of lignin is less well characterized, and
there are conflicting opinions on how the different lignin-degrading enzymes act and
cooperate. Most likely aromatic radicals, produced by extracellular peroxidases from
the lignin-degrading organism (Bourbonnais and Paice 1990), are involved.
2.2.1 Trichoderma reesei
The fungus Trichoderma reesei was first isolated by the US Army during World War II,
on the Solomon Islands. The US Army had huge problems with deterioration of their
cotton materials, e.g., in tents and parachutes, and the isolation of T. reesei was a result
of their effort to find a solution (Reese et al. 1950; Reese 1976). T. reesei is a soft-rot
fungi that belongs to the deuteromycetes. It is a filamentous fungus that apparently is
lacking a ligninase system. No sexual cycle has so far been observed for this fungus.
The strain of T. reesei that is considered to be the wild type (WT) is called QM6a.
T. reesei was originally named Trichoderma viride. This name was later changed to
Trichoderma reesei when it was discovered that the T. viride QM6a strain was
morphologically different from other T. viride strains (Simmons 1977). It was therefore
described as a new fungal species, and was named Reesei, after the scientist who first
isolated the strain. It was later reported that the T. reesei QM6a strain could not be
distinguished from the type strain of another Trichoderma species, Trichoderma
longibrachiatum (Bisset 1984). Lately it has been shown that T. reesei is a clonal
derivative of Hypocrea jecorina (Meyer et al. 1992; Kuhls et al. 1996), and H. jecorina
is an ascomycete with a well-defined sexual cycle.
2.2.2 Hypocrea schweinitzii
It was at the same time shown that the sexual species of Hypocrea, H. schweinitzii and
H. jecorina, were genetically two clearly distinct ascomycete species (Kuhls et al.
1996). In this study we will compare the biochemical properties of one type of cellulase
from two of these ascomycete species, T. reesei and H. schweinitzii. To avoid
confusion, the names Trichoderma reesei and Hypocrea schweinitzii will be used
throughout this thesis when these two ascomycete are discussed.
14
2.2.3 Humicola grisea
The fungus Humicola grisea var. thermoidea is a thermophilic mitosporic ascomycota.
H. grisea has an optimal growth temperature of 40qC, and a maximum at 58qC
(Maheshwari et al. 2000). The fungus produces a set of different cellulases that all are
highly thermoresistant, and synergistically are capable of hydrolyzing highly crystalline
cellulose. The cellulolytic system of H. grisea resembles that of T. reesei to a large
extent, with cellulases belonging to the same cellulase families in both fungal species.
2.2.4 Streptomyces sp. 11AG8
The bacterium Streptomyces sp. 11AG8 is an alkalophilic actinomycete. The species
was discovered in mud samples from East African soda lakes, in a search in these
extreme environments for novel alkaliphilic species of cellulase-producing bacteria (van
Solingen et al. 2001). The strain has an optimal pH range for growth between pH 7.4
and pH 10.5, and a temperature range for growth between 20q and 40qC. The closest
Streptomyces neighbor of the 11AG8 strain is Streptomyces thermoviolaceus which has
a sequence identity of 97.2 % in its 16S rDNA sequence (van Solingen et al. 2001).
Two other actinomycete species that are considered to be close neighbors to the 11AG8
strain are Streptomyces lividans, and Streptomyces rochei.
2.2.5 Industrial applications of glycoside hydrolases
Glycoside hydrolases (GH) from both fungal and bacterial sources are widely used in
industrial applications. Cellulases have been used in the textile industry to
enzymatically produce the effect of stone-washing on jeans (van Solingen et al. 2001),
and as agents in detergents to enzymatically remove short fibers on the textile surface
(depilling). In the juice and wine industries, pectinases and cellulases are used for
maceration and clarification of the beverages. In the animal feed industry the addition
of GHs, mainly E-glucanases and xylanases, have increased the digestibility of the feed
for the animals. In the paper and pulp industries xylanases and mananases are added in
the pulp bleaching processes.
2.3 Cellulases
The crystalline nature of the micro-fibrils in cellulose makes the spontaneous
degradation of cellulose in nature extremely slow, and it also makes cellulose highly
resistant to enzymatic degradation. In spite of this, there is a yearly breakdown and
recycling of cellulose accounting for 15 % of the total atmospheric carbon (Gottschalk
1988). A class of enzymes called cellulases performs the major part of this degradation.
These enzymes have the capability of catalyzing the hydrolysis of the E-1,4-glycosidic
bond between glucose units in cellulose. The cellulases are produced by a great number
of organisms such as plants, plant pathogens and cellulytic microorganisms, both
15
bacterial and fungal (Béguin and Aubert 1994; Tomme et al. 1995; Bayer et al. 1998).
Many cellulase-degrading organisms overcome the resistance of the highly crystalline
cellulose substrate by secreting sets of different types of cellulolytic enzymes, e.g.,
endo- and exo-cellulases. These different enzymes act synergistically, enabling a more
rapid and efficient hydrolysis of cellulose (Henrissat et al. 1985; Nidetzky et al. 1994).
Cellulases are GHs, and can be found in at least 12 families of this very large group of
enzymes.
2.3.1 Classification of cellulases
The cellulases form two general classes: the endoglucanases (EG, EC 3.2.1.4) and the
cellobiohydrolases (CBHs, EC 3.2.1.91). The EGs hydrolyse internal E-1,4-glycosidic
bonds in the cellulose polymer, and in less wellordered “amorphous” regions within the
cellulose micro-fibril. The CBHs are believed to work processively on the free ends of
cellulose polymer chains, degrading the polymer from amorphous regions into the
crystalline regions of the cellulose micro-fibrils, successively liberating cellobiose and
small saccharides like cellotriose and cellotetraose from the cellulose chain (Sprey and
Bochem 1992). There is a debate though, if CBHs can be called exo-cellulases or not,
since recent activity measurements indicate that there is no absolute requirement for
cellulase chain ends. The CBHs may be able to perform an internal cut in the middle of
a cellulose chain, creating two new chain ends from which they then can proceed
(Armand et al. 1997; Boisset et al. 2000).
According to the enzyme nomenclature of CBHs, they act from the non-reducing
end of the cellooligomer chain. Recent structural and functional studies do not support
this. Rather it is suggested that the directionality may differ. For example the two CBHs
from T. reesei, Cel6A and Cel7A, seem to work from opposite chain ends of the
cellooligomer. Cel6A works preferentially from the non-reducing end (Koivula et al.
1998a), while Cel7A works from the reducing end (Divne et al. 1998). Many cellulases
have cellulose-binding modules (CBM’s) that facilitate their adsorption onto crystalline
cellulose, bringing the catalytic domains physically close to their site of action
(Reinikainen et al. 1992; Linder and Teeri 1996).
Glycoside hydrolase families
The classification of GHs and cellulases has changed several times during the years that
research has been carried out on these enzymes. As new GHs were discovered during
the 1980's and early 1990's, and as new activity experiments were performed, the older
classification systems of the different GHs became more and more inefficient.
The new way that was adopted for classification of GHs was hydrophobic cluster
analysis (HCA) (Gaboriaud et al. 1987). In this system the different enzymes are
classified into structurally related GH families based on similarities in the distribution
of hydrophobic amino acids in their sequences. It was Bernard Henrissat who first
applied the new HCA classification system to the GHs. He classified the then known
16
cellulases into different GH families named with a one letter codes (A, B, C, …)
(Henrissat et al. 1989; Henrissat and Davies 1997; Henrissat 1998). When the number
of GH families grew, the one letter codes of the families had to be changed to a number
instead. Today there exist more then 2,500 known protein sequences of GHs, which
have been classified into at least 90 different GH families. The classification of the
different GHs can be monitored at the CAZy web page (URL: http://afmb.cnrsmrs.fr/~cazy/CAZY/index.html) (Coutinho and Henrissat 1999).
The cellulases are placed into a GH family according to the catalytic core of the
enzyme. For example, T. reesei CBH I, has a catalytic core that has been classified as
GH family 7. So the enzyme is called T. reesei Cel7A because it is a cellulase (Cel), and
A because it is the first family 7 enzyme reported from this organism. To date cellulases
have been assigned to at least 11 of the known GH families; 5-9, 12, 26, 44, 45, 48, and
61.
2.3.2 Domain structure organization of cellulases
Most fungal cellulases are organized in two structurally independent domains; a
catalytic core module, and a cellulose-binding module (CBM). These two domains are
usually interconnected via a short flexible linker. The domain structure organization of
T. reesei cellulases has recently been described in a review (Koivula et al. 1998b).
The catalytic core module
The catalytic core module is the part of the cellulase where the hydrolysis of the
cellulose chain takes place. This domain is the largest part of the enzyme. It varies
greatly in size between different cellulases, e.g., the catalytic core modules of T. reesei
cellulase vary from 166 to 430 amino acids in length.
The linker region
The two domains in the enzyme are interconnected via a flexible linker. The length of
the linker varies in size, from less than 20 to over 40 amino acids in the different
enzymes. The linker is usually very rich in threonines, serines, and prolines, and it is
heavily glycosylated (Harrison et al. 1998). The role of the linker is probably to keep
the two domains apart, and to restrict their movements with respect to one and another,
so that the catalytic domain is always within close distance to the CBM, which binds on
the surface of a cellulose fiber. The glycosylation of the linker probably makes it less
flexible, and probably decreases its sensitivity to proteolytic enzymes (Harrison et al.
1998) that also are secreted by the microorganism. Deletion experiments on the linker
of T. reesei Cel7A have shown that if most of the linker is removed, the rate of
crystalline cellulose degradation is drastically reduced (Srisodsuk et al. 1993).
17
The cellulose-binding module (CBM)
The CBM is a small wedge-shaped domain consisting of approximately 35 amino acids.
It can be connected via the flexible linker to either the carboxy-terminus or the aminoterminus of the catalytic module. The function of the CBM is to bind on the surface of
cellulose, and serve as an "anchor" for the enzyme, keeping it strongly adsorbed to the
cellulose surface (Ståhlberg et al. 1988; Reinikainen et al. 1992; Linder and Teeri
1996). This reduces the need for strong binding of the catalytic domain to the cellulose,
and thereby enables the enzyme to have a higher rate of turnover (Ståhlberg 1991).
There is no evidence that the fungal CBMs can penetrate into the cellulose fiber and
disrupt the structure, or have any catalytic activity. The CBMs are classified into at least
30 different CBM families according to the classification by (Tomme et al. 1995). The
classification of CBMs can be monitored at the CAZy web page (URL:
http://afmb.cnrs-mrs.fr/~cazy/CAZY/index.html) (Coutinho and Henrissat 1999).
CD
CBM
Figure 4. A hypothetical structure model of the intact T. reesei Cel7A enzyme. In the
model the enzyme is adsorbed onto the surface of a cellulose micro-fibril, via the
cellulose-binding module (CBM). The catalytic module (CD) is processively
hydrolyzing one of the accessible cellulose chains from the reducing end, and releasing
cellobiose units. The model is composed of the known structures of the catalytic module
and CBM (Kraulis et al. 1989; Divne et al. 1994), and a hypothetical linker connecting
the two modules. The figure was kindly provided by Dr. Christina Divne.
2.3.3 Hydrolytic mechanism of cellulases
The cellulolytic enzymes have two different enzymatic mechanisms by which they can
hydrolyze the glycosidic bonds in cellulose; the retaining and the inverting mechanisms
(Koshland 1953; Sinnott 1990; Davies and Henrissat 1995).
18
Retaining mechanism
The retaining glycoside hydrolase mechanism (Figure 5a) leads to a net retention of the
configuration at the anomeric carbon (C1) of the substrate after cleavage. This is
performed via a double displacement mechanism, i.e., the hydrolysis of a glycosidic
bond creates a product with the same configuration at the anomeric carbon as the
substrate had before hydrolysis.
A
B
Figure 5. The two enzyme mechanisms observed for cellulases: the retaining
mechanism (a), and the inverting mechanism (b). In the retaining mechanism the
configuration at the anomeric carbon will be in E-configuration after hydrolysis, i.e.,
the configuration is retained. The distance between the two catalytic carboxylates in the
retaining enzymes is ~5.5 Å. In the inverting mechanism the configuration at the
anomeric carbon will be changed from E to D configuration upon hydrolysis. The
distance between the two catalytic carboxylates in the inverting enzymes usually varies
between 6.5 and 9.5 Å.
19
The catalytic machinery of these enzymes involves two catalytic carboxylate
residues that usually sit at opposite sides of the sugar plane. In the first step of the
double displacement mechanism (glycosylation), one of the two carboxylic groups
provides a general acid-catalysed leaving group departure, simultaneously with a
nucleophilic attack on the anomeric carbon by the second carboxylate to form a
glycosyl-enzyme intermediate. In the second step (deglycosylation), the first
carboxylate residue now functions as a general base that activates an incoming
nucleophile (a water molecule in the case of hydrolysis, and a new glycosyl group in the
case of a transglycosylation event) by stealing a proton from it. This activated
nucleophile then hydrolyses the glycosyl-enzyme intermediate. The GHs with retaining
mechanism often have transglycosylating abilities. The distance between the two
carboxylates in the enzymes with retaining mechanism is approximately 5.5 Å.
Inverting mechanism
The inverting glycoside hydrolase mechanism (Figure 5b) leads to a net inversion of the
configuration at the anomeric carbon (C1) of the substrate after cleavage. This is
performed via a single nucleophilic displacement mechanism, i.e., the hydrolysis of a
beta-glycosidic bond creates a product with the alpha-configuration, and vice-versa.
The catalytic machinery of these enzymes involves two catalytic carboxylates. These
two carboxylate residues provide a general acid-catalyzed leaving group departure, and
a general base-assistance to nucleophilic attack by a water molecule from the opposite
side of the sugar ring. The distance between the two carboxylates in the enzymes with
inverting mechanism is much less constrained than for the retaining enzymes. It is
usually in the range 6.5-9.5 Å, but there are cases where it is much shorter than this,
i.e., the longer distance is not required by the mechanism.
Glucosyl-binding sites
The glucosyl-binding sites in the catalytic core of the cellulases are numbered using the
position of the catalytic cleavage site in the enzyme as the reference point (Davies et al.
1997a). Thus the binding sites towards the reducing end of the cello-oligomer from this
reference point have increased positive integers; +1, +2, …, and the sites towards the
non-reducing end have increased negative ones; -1, -2, ….. These numbers depend on
how many subsites the enzymes have, i.e., 1,2 etc.
2.4 The cellulolytic system of Trichoderma reesei
The filamentous soft rot fungus Trichoderma reesei (described in section 2.2.1) is one
of the most efficient cellulose-degrading organisms known, and its cellulolytic system
is also one of the most studied. The fungus is an ideal cellulolytic model organism for
studying cellulase degradation since it secretes large amounts of all cellulases needed
for degradation of crystalline cellulose. To date, two CBHs (Cel6A and Cel7A), and at
least five EGs (Cel5A, Cel7B, Cel12A, Cel45A, and Cel61A), have been found in the
20
cellulolytic system of T. reesei. These enzymes belong to six different GH families, 5,
6, 7, 12, 45, and 61. T. reesei has two cellulases in GH family 7, one CBH (Cel7A), and
one EG (Cel7B).
Table 1. The seven known cellulases in the cellulolytic system of T. reesei.
______________________________________________________________________
Name
Old name
Number
Position StereoAmount of total
of residues of CBM selectivity
cellulase (%)
______________________________________________________________________
Cel5A
EG II
397
N
Retaining
8
Cel6A
CBH II
447
N
Inverting
18
Cel7A
CBH I
497
C
Retaining
55
Cel7B
EG I
436
C
Retaining
9
Cel12A
EG III
218
*
Retaining
<1
Cel45A
EG V
270
C
Inverting
**
Cel61A
EG IV
344
C
Not known
**
______________________________________________________________________
The names of the cellulases are based on the GH nomenclature system suggested by
(Henrissat 1998). The position of the CBM can either be at the carboxy-terminus (C) or
at the amino-terminus (N) of the catalytic module. (*) Cel12A does not have a CBM.
(**) Percentage of total secreted T. reesei cellulase is not known for this enzyme.
______________________________________________________________________
T. reesei cellulose binding domain
All the known T. reesei cellulases, except Cel12A (one of the GH family 12 cellulases
that will be described in more detail in the Results section of this thesis), have a
cellulose-binding domain consisting of 35 highly conserved amino acids, and all of
them belong to CBM family 1, according to the classification system by (Tomme et al.
1995). The T. reesei CBMs are connected to their catalytic core modules via a 35-44
residue long, heavily glycosylated (Harrison et al. 1998) linker. The CBM is located at
the carboxy-terminus of the Cel7A, Cel7B, Cel45A, and Cel61A catalytic core
modules, and at the amino-terminus of the Cel5A, and Cel6A catalytic core modules.
The three-dimensional structures of the Cel7A and Cel7B CBM have been determined
by Nuclear Magnetic Resonance (NMR) (Kraulis et al. 1989; Mattinen et al. 1998).
They form a wedge-shaped structure with one flat surface where three conserved
aromatic residues are located. These are believed to stack on the glucose units on the
cellulose micro-fibril surface (Mattinen et al. 1997).
21
Cel5A
Cel5A is an EG that belongs to GH family5. The gene for the enzyme was cloned by
(Saloheimo et al. 1988), Gene-Bank accession number M19373. The enzyme has an
estimated molecular weight of 42 kilo Dalton (kDa), but has an apparent molecular
weight of 48 kDa on a SDS-PAGE gel due to glycosylation. It has a pI of 5.5-5.6
(Shoemaker and Brown 1978). Cel5A hydrolyzes the E-1,4-glycosidic bonds in
cellulose using the retaining mechanism (Henrissat et al. 1985). The amount of
expressed Cel5A has been estimated to be between 5-10 % of total expressed cellulase
in T. reesei (Ståhlberg 1991; Ilmen et al. 1997).
Cel6A
Cel6A is a GH family 6 CBH. The gene for the enzyme was cloned by (Teeri et al.
1987), Gene-Bank accession number M16190. The enzyme has an estimated molecular
weight of 47 kDa, 53 kDa on a SDS-PAGE, and it has a pI of 5.9 (Fägerstam and
Pettersson 1980; Bhikhabhai et al. 1984). Cel6A is a processive enzyme that hydrolyzes
the glycosidic bonds in cellulose using the inverting mechanism, and it has been shown
that the enzyme preferably hydrolyzes the cellulose chain from the non-reducing end
(Barr et al. 1996; Boisset et al. 2000). There have been reports that Cel6A possesses
some endoglucanase activity (Nutt et al. 1998). The amount of expressed Cel6A has
been estimated to be between 17-20% of total expressed cellulase in T. reesei
(Ståhlberg 1991; Ilmen et al. 1997).
Cel7A
Cel7A is a GH family 7 CBH, and it was the first T. reesei GH family 7 cellulase that
was discovered. The gene for the enzyme was cloned by (Wey et al. 1994), Gene-Bank
accession number X69976. Cel7A has an estimated molecular weight of 52 kDa, 66
kDa on a SDS-PAGE, and it has a pI of 4.3 (Fägerstam et al. 1977; Shoemaker et al.
1983). Cel7A is the major cellulase produced by T. reesei, and it has been estimated
that 50-60 % of total expressed cellulase in the fungus is Cel7A (Ståhlberg 1991; Ilmen
et al. 1997). It is probably the key enzyme needed for hydrolysis of crystalline cellulose
by the fungus. Cel7A is a processive enzyme that hydrolyzes the glycosidic bonds in
cellulose using the retaining mechanism, and it has been shown that the enzyme
preferably hydrolyzes the cellulose chain from the reducing end (Barr et al. 1996; Divne
et al. 1998) (Imai et al. 1998).
Cel7B
Cel7B is a GH family 7 EG. The gene for the enzyme was cloned by (Penttila et al.
1986), Gene-Bank accession number M15665. Cel7B has an estimated molecular
weight of 48 kDa, 50-55 kDa on a SDS-PAGE, and it has a pI of 4.5 (Shoemaker et al.
1983; Bhikhabhai et al. 1984). Cel7B is homologous to Cel7A, with about 45 %
sequence identity. The main difference between the two GH family 7 structures is that
the substrate-binding cleft is less covered by extended loops in the endoglucanase
22
(Cel7B) than in the exoglucanase (Cel7A). Cel7B hydrolyzes the glycosidic bonds in
cellulose using the retaining mechanism. The amount of expressed Cel7B has been
reported to be between 6-10% of total expressed cellulase in T. reesei (Ståhlberg 1991;
Ilmen et al. 1997).
Cel12A
Cel12A is a GH family 12 EG. The gene for the enzyme was cloned by (Ward et al.
1993; Okada et al. 1998), Gene-Bank accession number AB003694. The enzyme has a
molecular weight of 25 kDa, has a neutral pI of 7.5, does not have a CBM, and is only
sparsely glycosylated (Håkansson et al. 1978; Ülker and Sprey 1990; Sprey and Ülker
1992; Hayn et al. 1993). Cel12A hydrolyzes the glycosidic bonds in cellulose using the
retaining mechanism. The two catalytic residues in Cel12A are the two carboxylates
E116 and E200 (Okada et al. 2000). The amount of expressed Cel12A has been
reported to be less than 1% of total expressed cellulase in T. reesei (Ülker and Sprey
1990).
The specific function for T. reesei Cel12A is not known. Some biochemical data on
Cel12A can be found in the literature, including studies of activity on soluble substrates
(Hayn et al. 1993), and insoluble cellulase (Sprey and Bochem 1992). There have been
reports that Cel12A, besides cellulose activity, has activity against E-glucan and xylan
(Hayn et al. 1993; Karlsson et al. 2002). It has been shown that Cel12A has an ability to
induce extension of type I cell walls from cucumber and wheat (Yuan et al. 2001).
The three-dimensional structure of the T. reesei Cel12A enzyme has been solved
within the work of this thesis. The procedure on how the Cel12A enzyme structure was
determined is described in the Methods and Results section, and in Paper I. In addition
to the T. reesei Cel12A structure, the three-dimensional structures of three homologous
GH family 12 enzymes; Hypocrea schweinitzii Cel12A, Humicola grisea Cel12A, and
Streptomyces sp. 11AG8 Cel12A, have been determined (Papers II-III). The T. reesei
enzyme is structurally and biochemically compared to the other three GH family 12
homologues in the Results section (Papers II-IV).
Cel45A
Cel45A is a GH family 45 EG. The gene for the enzyme was cloned by (Saloheimo et
al. 1994), Gene-Bank accession number Z33381. The enzyme has an estimated
molecular weight of 23 kDa, 36 kDa on a SDS-PAGE, and it has a pI of 2.9
(Shoemaker et al. 1983; Saloheimo et al. 1997). Ce45A hydrolyzes the glycosidic bonds
in cellulose using the inverting mechanism. The amount of expressed Cel45A of total
expressed cellulase in T. reesei is not known. There have been reports that Cel145A,
besides cellulase activity, has activity against E-glucan but not xylan (Saloheimo et al.
1994; Karlsson et al. 2002).
23
Cel61A
Cel61A is a GH family 61 EG. The gene for the enzyme was cloned by (Saloheimo et
al. 1997), Gene-Bank accession number Y11113. The enzyme has an estimated
molecular weight of 34 kDa on SDS-PAGE, and it has an estimated pI of 6.0 (Karlsson
et al. 2001). The hydrolytic mechanism and the total amount of expressed enzyme is not
known for T. reesei Cel61A.
2.4.1 Induction of cellulases
T. reesei and many other cellulolytic organisms do not express their cellulases
constitutively. The production of the cellulases is only turned on when these are needed,
i.e., the cellulases are induced (Kubicek and Penttilä 1998). When T. reesei is grown
with cellulose as the only carbon source, the genes for the cellulases are induced, and
the enzymes are expressed. However if the fungus is grown on glucose there is no
cellulase expression, i.e., the cellulases are glucose repressed. It has been shown that
this glucose repression of the T. reesei cellulases is on a transcriptional level (Ilmen et
al. 1997). It has also been shown that the expression of most T. reesei cellulases is coregulated, and that they always are expressed at same relative amounts (Ilmen et al.
1997). There are many known molecules that induce cellulase expression in T. reesei,
e.g., cellobiose, cellotriose, cellotertraose, lactose, and sophorose (Sternberg and
Mandels 1980; Kubicek et al. 1993). The exact mechanism by which the cellulase
expression is induced in T. reesei is not fully understood. There must exist an enzyme
in T. reesei that is constitutively expressed, recognizes cellulose, and from this produces
a substrate that induces cellulase expression. Such an enzyme has not yet been
identified. It has been suggested that E-glucosidase could be one such enzyme (Vaheri
et al. 1979; Fowler and Brown 1992), since it produces sophorose as a
transglycosylation product from cellulose, and sophorose is the most efficient cellulase
inducer so far identified. Although E-glucosidase is an efficient cellulase inducer, it is
not essential for cellulase expression (Fowler and Brown 1992).
2.4.2 Synergy between cellulases
When the combination of two enzymes is more efficient than the sum of the enzymes
acting alone, the two enzymes have synergy. This is something often found in the case
of cellulases, and maybe why many cellulolytic organisms have multiple sets of the
same type of cellulase, e.g., T. reesei has at least two CBHs, and five EGs. The synergy
between cellulases can easily be detected by miximg two single components from a
cellulolytic system, e.g., combining one of the EGs with one of the CBHs, and
comparing the hydrolytic activity of this combined system with the cumulative activity
of the two cellulases acting alone. There are many studies that show synergy between
EGs and CBHs, during hydrolysis of cellulose (Henrissat et al. 1985; Bailey et al. 1993;
Irwin et al. 1993; Medve et al. 1998). Presumably the EG makes internal cuts in the
24
cellulose chain, and thereby provides new accessible chain ends for the
cellobiohydrolase/exoglucanase to work on to gain increased hydrolytic activity (Figure
6). This model for the synergy between endoglucanases and exoglucanases is called the
endo-exo model (Béguin and Aubert 1994; Tomme et al. 1995).
Figure 6. The endo-exo model for synergy between endoglucanases and
cellobiohydrolase/exonucleases in a cellulolytic system during hydrolysis of cellulose.
In this model the endoglucanases make internal cuts in the amorphous parts of the
cellulose micro-fibril and thereby provide new accessible chain ends for the
exoglucanase to recogniz, and from these start to processively hydrolyze the cellulose.
The combined system has increased hydrolytic activity compared with the enzymes
working alone.
There have been investigations on the optimal mixtures of one of T. reesei's two CBHs,
Cel6A or Cel7A, with one of its EGs, Cel5A or Cel7B (Henrissat et al. 1985). The
results show that T. reesei Cel6A needs much less EG than T. reesei Cel7A to reach
maximal synergy. There have also been several reports of detected synergy between T.
reesei's two CBHs, Cel6A and Cel7A (Fägerstam and Pettersson 1980; Henrissat et al.
1985; Irwin et al. 1993; Medve et al. 1998). The reports that T. reesei Cel6A possesses
some endoglucanase activity (Nutt et al. 1998), could possibly explain the observed
synergy between T. reesei's two CBHs. Another explanation could be that these two
exoglucanases processively hydrolyse the cellulose from different directions. T. reesei
25
Cel6A hydrolyzes from the non-reducing end (Divne et al. 1998) and T. reesei Cel7A
from the reducing end (Divne et al. 1998), thereby exposing new chain ends for the
other enzyme to work on. There are also several reports of processive endoglucanases
(Reverbel-Leroy et al. 1997). The findings that the two types of cellulases, CBH and
EG, can possess both types of activities, e.g., endo- and exoglucanase activity, make it
difficult to fit the activity data from a cellulase into a simple model like the endo-exo
synergy model but this is still the best model we have to explain how a cellulolytic
system like T. reesei’s works.
2.4.3 Three-dimensional structures of T. reesei cellulases
The first three-dimensional X-ray crystallography structure of a cellulase was the
structure of T. reesei Cel6A catalytic domain (Rouvinen et al. 1990). Since then, many
new cellulase structures have been determined, and today there exist at least a hundred
known unique cellulase structures. There are cellulase structure representatives for most
of the eleven cellulase-containing GH families, except for GH families 26, 44 and 61.
For a recent report on new cellulase structures see the CAZy web page;
http://afmb.cnrs-mrs.fr/~cazy/CAZY/index.html.
To date four three-dimensional structures out of T. reesei's seven cellulases have
been determined by X-ray crystallography: Cel6A (Rouvinen et al. 1990), Cel7A
(Divne et al. 1994), Cel7B (Kleywegt et al. 1997) and Cel12A (Paper I of this thesis).
Three of these structures, Cel6A, Cel7A andCel7B have been determined from the
enzymes catalytic domain only, since it has not been possible to get useful protein
crystals of the intact cellulase. The problem with getting crystals from the intact
cellulase is most likely due to the highly flexible linker between the catalytic core and
the CBM in the cellulase. A big flexible part in an enzyme often causes bad crystal
packing or no crystal formation at all.
T. reesei Cel12A does not have a CBM or a linker, so this three-dimensional
structure is of an intact cellulase. The overall structure of the T. reesei Cel12A enzyme
is described in the Results section of this summary, and in some more detail in Paper I.
The T. reesei Cel12A structure is one of the four homologous GH family 12 enzymes
whose structures have been determined within the work of this thesis (Paper I-IV).
In addition to the four T. reesei catalytic core domain structures, the threedimensional structures of the CBMs of T. reesei Cel7A and Cel7B have been
determined by NMR (Kraulis et al. 1989; Mattinen et al. 1998). The most striking
difference between T. reesei's two known CBH structures, Cel6A and Cel7A, and its
two known EG structures, Cel7B and Cel12A, is that the CBHs form a tunnel where the
cellulose chain binds and the EGs form an open cellulose binding cleft.
26
3M
ETHODS
______________________________________________________________________
This chapter will briefly describe the methodsthat commonly are used in X-ray
crystallography to determine the three-dimensional structure of proteins. These include
protein crystallization, X-ray diffraction, structure solving, model building and
refinement.
3.1 X-ray crystallography
The methodology of X-ray crystallography is based on the fact that the atoms in a
molecule, or rather the electrons in the atoms, scatter X-rays. The diffraction pattern
one gets, if one puts a molecule in an X-ray source, is a representation of the space that
the electrons of the atoms in a molecule occupy, i.e., the molecules electron density. In
this thesis the molecule represents the different protein molecules whose structure has
been determined.
All diffracted X-rays that a molecule in an X-ray source gives rise to are
represented by a spot in the diffraction pattern (Figure 7). These diffraction rays can be
described by an amplitude (|FP|), a wavelength (O), and a phase angle (FP). If these
three values are known for all the spots in the diffraction pattern, the electron density
for the molecule can be calculated. The amplitudes of the diffracted X-rays are
determined by the intensities of the spots in the diffraction pattern and the wavelength is
a property of the X-ray source. The information about the phase angle, however, is lost
in the collected diffraction image. This is known as the phase problem in X-ray
crystallography, and is the fundamental problem when solving a new protein structure.
With the X-ray sources available today, it is not possible to obtain a strong enough
diffraction pattern from a single protein molecule. The solution is therefore to use a
crystal of the protein molecule because a crystal contains millions of identical copies of
the protein molecule packed into the lattice of the crystal. By using a protein crystal the
diffraction will be enhanced, and thus make it possible to detect and collect a diffraction
pattern from the protein molecule (Figure 7).
3.1.2 Phase determination
There exist several alternative methods for recovering the phase angle information that
is lost in the collected diffraction pattern. These include the multiple/single
isomorphous replacement (MIR/SIR), the multiple/single anomalous dispersion
(MAD/SAD), and the molecular replacement (MR) methods. The structure
27
determinations of the four new homologous GH family 12 cellulase-structures that are
presented in this thesis have been done by MIR and MR methods. These two phasedetermining methods are briefly described below.
Figure 7. A typical X-ray protein diffraction pattern, collected on a T. reesei Cel12A
crystal (space group P21), diffracting to 1.9 Å
Molecular replacement phasing
If there exists a protein with known structure that is similar to the protein of interest, the
known structure can be used in a procedure called molecular replacement (MR) to solve
the phases of the unknown structure. This method was first described by (Rossmann
and Blow 1962). The idea of this method is to calculate an electron density map from
the known structure and then rotate and translate this calculated density in the unit cell
of the unknown protein, until there is a maximum overlap between them. The
approximate phases from the correctly oriented search model will then provide a
starting point for building the unknown protein. The rough initial phases will improve
as the building and refining of the new structure progresses. The molecular replacement
method is a powerful phasing method, but since the difference between the correct
solution and a wrong solution often is small, it can be tidious work to find the right
solution, especially if there is low homology between the search model and the
28
unknown protein structure. Several programs for molecular replacement are available.
The molecular replacement program that was used to solve three of the homologous GH
family 12 cellulase-structures in this thesis was AMoRe (Navaza 1994). The molecular
replacement method has today become the most common phase-determining method
since the number of known protein structures drastically has increased.
Isomorphous replacement phasing
When there is no known protein structure similar to the protein that one wants to solve
the structure of the phases of the unknown structure have to be determined ab initio,
using methods other then MR. Today there exists a wide range of different methods to
solve the phases of a unknown protein structure ab initio, but the most commonly used
today is multiple isomorphous replacement (MIR) method. MIR was used to solve the
first of the four homologous GH family 12 structures that are presented in this thesis,
the T. reesei Cel12A structure.
The MIR method was the first phasing technique that became available to
macromolecular crystallographers. This method and most other ab initio phasing
methods are based on the fact that heavy atoms such as: transition metals, lanthanides,
uranium and even noble gasses under pressure, can quite successfully be soaked into
protein crystals, and they frequently bind to well defined sites in the native protein.
Such a protein crystal is then called a heavy atom derivative. Assuming that the heavy
metal soaked into the crystals does not disturb the structure of the derivative crystal
(i.e., it is isomorphous), we are able to derive information on the structure factor
amplitudes for the heavy metal from the differences between the derivative and the
native X-ray dataset (isomorphous replacement).
When the resulting derivative crystal is isomorphous to the native crystal, the
structure factors FP of the protein, FPH of the derivative and FH of the heavy atom are
related by FH = FPH - FP, which is a complex equation. The amplitudes |FPH| and |FP| can
be measured, giving the isomorphous difference |FPH|-|FP|. The coordinates of the heavy
metal atoms can be determined by a difference Patterson map between the heavy atom
and the native dataset. This will give FH making it possible to calculate FP. As only one
heavy metal or derivative (single isomorphous replacement (SIR)) still leads to phase
ambiguity, it is necessary to use, at least, a second heavy atom derivative to remove the
phase ambiguity. In theory one extra derivative should be enough, but because of errors
in the data, usually several are required (multiple isomorphous replacement (MIR).
Anomalous scattering (AS) phase information can be obtained from the scattering of a
heavy atom whose absorption frequency is close to the wavelength of the X-ray. This
small anomalous signal from the heavy atom is sometimes enough to remove the phase
ambiguity of one heavy atom derivative (SIRAS). When more than one heavy atom
derivative and anomalous differences are used, then the method is called MIRAS.
29
3.1.3 Model building and structure refinement
Once initial phases have been obtained for the X-ray dataset, a preliminary electron
density map can be calculated and displayed. The protein structure model is built by
manually fitting the atoms in the protein into this electron density using one of the
available protein structure graphics programs, e.g., O (Jones et al. 1991). The structure
model is completed with alternating cycles of model building in the graphics program,
and maximum-likelihood model refinement, bulk-solvent corrections, and anisotropic
scaling using one of the protein structure refinement programs, e.g., CNS (Brünger et
al. 1998) or Refmac (Murshudov et al. 1997).
The quality of the model during the course of structure building and refinement is
monitored by the crystallographic R-factor (Rcryst) and free R-factor (Rfree) (Brünger
1992a). The Rcryst is a measurement of how well the model’s calculated structure factor
amplitudes corresponds to the experimental amplitudes. The Rfree is calculated in the
same way as the Rcryst, but with the difference that the experimental data used for the
calculation is only a fraction of the total data (typically 3-10 %). This set of the data is
never included in any of the many refinements of the protein structure, from the initial
model to the final structure. This makes the Rfree a much less biased indicator of the
quality of the model than the Rcryst.
3.2 Protein crystallization
One of the bottlenecks today when determining protein structure by X-ray
crystallography, is to obtain good crystals of the protein. The first protein crystals were
grown over 150 years ago, and these were crystals of hemoglobin (McPherson 1999).
Since then, protein crystallization has evolved from trial and error experiments into the
more rational crystallization experiments that we perform today, that are based on the
empirical data that have been gathered by protein crystallographers over the years.
Protein crystallization has today become a science by itself, and has great impact in a
wide range of areas within life science.
The most frequently used protein crystallization method today is the vapor
diffusion technique (Figure 8). This is also the crystallization method that has been used
to crystallize all the different GH family 12 enzymes in this thesis, for which structures
are presented. In this method a small volume (1-10 Pl) of the protein, purified and
homogenous, is mixed with an equal volume of a crystallization solution, and
positioned as a small droplet on a siliconized cover slip. The crystallization solution
usually consist of, buffer, salt and a precipitant, usually polyethylene glycol (PEG) or
salt. The cover slip with the droplet of protein and crystallization solution is inverted
and positioned over a reservoir containing 200-1000 Pl of the crystallization solution,
and sealed with grease or oil. The higher concentration of the precipitant in the
reservoir, with respect to the droplet, drives the system to reach equilibrium by vapor
30
diffusion of water from the droplet to the reservoir. In a successful crystallization
experiment the protein should reach supersaturation, and crystals start to form.
Crystals (Figure 9) can grow to a size of 0.1-0.5 mm in all directions after some time of
incubation.
Figure 8. Three different types of set-ups for crystallization by vapor diffusion
a) hanging drop, b) sitting drop and c) sandwich drop.
Figure 9. Typical crystal of T. reesei Cel12A, space group P21.
There exist many other crystallization methods beside vapor diffusion, e.g., batch
crystallization, dialysis, seeding and free interface diffusion. Each of these methods has
its advantages for certain types of experiments, but they are all much less used than the
vapor diffusion method.
31
4R
ESULTS AND DISCUSSION
______________________________________________________________________
In this chapter, the results will be summarized and briefly discussed. These results are
described in much more detail in the four papers (I-IV).
4.1 Aim of thesis
Cellulases are used in a wide range of industrial applications. Examples of industrial
processes where GH family 12 enzymes can be used are: textile, paper and pulp,
detergent, and ethanol production. Many of these industrial processes are performed at
elevated temperatures and at non-physiological pHs. This makes it of great importance
to have detailed knowledge on how to shift the pH and temperature profiles of the
enzymes that potentially will be used in these processes, so that these enzymes still are
stable and active.
The overall goal of this thesis work was to determine the three-dimensional
structure of a set of homologous GH family 12 enzymes by X-ray crystallography.
Identifying the structural differences in these homologous Cel12A enzymes that could
explain some of the differences in the biochemical performance profiles of these
enzymes; e.g., pH range of activity, and thermal stability. This knowledge can then be
used to modify the Cel12A enzymes of interest towards targeted properties in industrial
processes.
The work was done in close collaboration with the company Genencor
International Inc. The major business area of Genencor is production of industrial
enzymes, and cellulases are one class of enzymes that the company is focused on.
Genencor's interest in cellulases, and especially to use GH family 12 enzymes in some
of there applications, have made it possible to use the structural and biochemical
knowledge that we have gathered on this class of enzymes, to improve/modify the
enzymes for existing or new industrial applications. The close collaboration also means
that most of the results that will be presented in this thesis are the combined results
from experiments that have been performed both by us at Uppsala University and by the
several research groups at Genencor.
4.2 The Trichoderma reesei Cel12A structure (Paper I)
The structure of the fungal Cel12A enzyme from T. reesei will be described in detail
here. Since the structures of the other three Cel12A homologous, solved within this
work, to a large extent resemble the T. reesei Cel12A structure, only structural features
32
that significantly differ from the T. reesei Cel12A structure will be described. The T.
reesei Cel12A enzyme is also the enzyme to which the different GH 12 homologues
and variants of these will be structurally and biochemically compared.
4.2.1 Crystallization and structure determination
Precipitated or crystalline T. reesei Cel12A protein stocks were washed with water,
prior to dissolution in 20 % (w/v) sucrose solution, then concentrated to 15 mg/ml.
Crystals were obtained from 200 mM cacodylate buffer (pH 6.0), 200 mM ammonium
acetate and 10-30 % (w/w) mono-methyl-ether (mme) polyethylene glycol (PEG) 2000,
at 20-24 oC using hanging and sitting drops (McPherson 1982). Large single, wedgeshaped crystals grew to a maximum size of 1 mm in all directions, within 1-2 days. The
crystals belong to the monoclinic space group P21 with cell dimensions a = 69.8 Å, b =
71.4 Å, c = 124.8 Å and E = 91.4°, and have a calculated Vm of 2.0 (Matthews 1968)
with an estimated six non-crystallographic-symmetry (NCS) related molecules in the
asymmetric unit. The variant M154C, crystallized isomorphously with the wild type. 1
mM ethylmercury-thio-salicylate (EMTS) was added to the mother liquor, and the
M154C crystals were left for one day for the drop to re-equilibrate. Isomorphous
crystals were also obtained when ammonium acetate was replaced with 200 mM
trimethyl-lead acetate (Me3Pb).
All data sets were collected on an R-axis IIC image-plate system, mounted on a
Rigaku RU-200 rotating-anode generator. The apo wild type and M154C, M154CEMTS and M154C-Me3Pb heavy-atom data sets that were used for phasing were
collected at room temperature. Subsequently, a 1.9 Å resolution native data set was
collected at 100 K, from a single crystal, using 30 % mme PEG 2000 as cryo-protectant.
The data sets were processed and scaled with DENZO and SCALEPACK (Otwinowski
and Minor 1997). Data collection and processing statistics, for the native data set, are
given in given in Appendix I.
Three sites for mercury atoms could be readily identified in the isomorphous
difference Patterson map for the EMTS-soaked crystal data against the apo M154C
mutant data. Initial positions of the heavy-atoms were identified using the program
RSPS (Knight 2000), and refined with MLPHARE (Otwinowski 1991). Cross-phased
difference Fouriers revealed lead sites for crystals grown in the presence of tri-methyllead. Subsequent heavy-atom refinement using SHARP (La Fortelle et al. 1997)
revealed further binding-sites for each derivative. The MIR phases obtained from
SHARP (FOMcentic = 0.6, FOMacentric = 0.43) produced an interpretable electron density
map.
The map was improved by exploiting the six-fold NCS of the asymmetric unit.
Initial local symmetry operators were determined in O (Jones et al. 1991) by using the
positions of the heavy-atom derivatives. The operators were subsequently refined and
the electron densities were averaged with programs from the RAVE program package
33
(Kleywegt and Jones 1999). Phases were further refined, using automated solvent
flattening and histogram matching, with a solvent content of 40 %, and the resolution
was extended to the resolution limits of the apo mutant data-set, using the program DM
(Cowtan and Main 1998). All model building was performed using the program O. The
initial model was built using skeletonised density, main-chain and side-chain databases
and baton building methods (Jones et al. 1991; Jones and Kjeldgaard 1997). Maximum
likelihood model refinement was performed using the program CNS (Brünger et al.
1998). Utilizing the native high resolution data-set required the use of molecular
replacement to more accurately position the molecular model in the unit cell of the
frozen crystal. A set of 2567 reflections, representing 2.7 % of the total 86691
reflections between 15 and 1.9 Å, was used to monitor the R-free (Brünger 1992a). A
summary of refinement and final model statistics, based on the native high-resolution
data set, is given in Appendix II.
4.2.2 Protein structure
The fold of the T. reesei Cel12A protein was, as expected, the same as the previously
solved GH family 12 enzyme from S. lividans, CelB2, and similar to the GH family 11
xylanases, i.e., a E-sandwich. T. reesei Cel12A consists of 15 long E-strands that fold
into two twisted, largely anti-parallel E-sheets, A and B, which pack on top of one
another (Figure 10). The convex E-sheet A consists of 6 anti-parallel strands, labeled
A1-6. Sheet B consists of nine strands, B1-9, and is largely anti-parallel.
The E-strands in the two E-sheets are numbered consecutively from 1-6/9 after their
order in the sheets, with E-strands A/B1 closest to the proposed non-reducing end of the
binding cleft, to the left in Figure 10. There is a single D-helix in the structure that
packs against the outer convex surface of E-sheet B. The enzyme is compact and the
dimensions are approximately 40 Å x 40 Å x 30 Å. The only two cysteines in the T.
reesei Cel12A protein, Cys 4 and Cys 32, form a disulfide bond that bridges E-strands
A1 and A2.
Post-translational modifications
The N-terminal glutamine undergoes a cyclization and condensation reaction with the
amine group of the N-terminus, to produce a cyclic pyro-glutamate. This is common in
fungal extracellular enzymes and often makes the protein resistant to proteolytic
degradation. Also, a NAG residue is found covalently attached to Asn 164, which is
part of an N-glycosylation amino-acid sequence motif in the protein. The NAG residue
stacks with the side-chain of Tyr 124 from the same molecule, and with Asn 91 of an
NCS-related molecule in an apparent dimer contact.
34
A
B
Figure 10. Schematic ribbon diagram drawing, a) top and b) side view, of the T. reesei
Cel12A crystal structure, color-ramped according to residue number, starting with red
at the N-terminus and ending with blue at the C terminus. The two E-sheets in the
structure are labeled A and B. Individual strands are labeled (A1-A6 or B1-B9)
according to their positions in the two E-sheets.
A
B
Figure 11. Close-up, a) top and b) side view of the substrate-binding cleft of T. reesei
Cel12A. Some of the most important residues have their side chains drawn, with carbon
atoms in gold, nitrogens blue, oxygens red and sulphur green.
4.2.3 Substrate-binding cleft
The concave surface of E-sheet B forms a large crevice in the molecular surface
perpendicular to the strand direction. This crevice is approximately 35 Å long, 8 Å
wide, 15 Å deep, and is the cellulose substrate binding site. In T. reesei Cel12A, we
estimate that the cleft has the potential to bind at least six glucose residues, spanning
from - 4 to + 2, using the binding site nomenclature of (Davies et al. 1997b).
35
Substrate binding
Glucosyl binding sites are frequently formed by the exposed surfaces of aromatic sidechains of Trp, Phe and Tyr residues. This is also true for the binding crevice of T. reesei
Cel12A, which contains two fully exposed tryptophan rings as well as a pair of exposed
tyrosine residues. Three more aromatic side-chains have ring edges exposed in the
crevice. The upper edge of one side of the crevice (the top strip of residues in Figure
11) has a clearly defined hydrophobic strip made of side-chains from Trp 7, Trp 22, Val
57 and Phe 202. The edge of the crevice is completed by the side-chains of Met 154,
Asn 151 and Ile 130. The side-chains that fill the bottom of the crevice are strikingly
different from the hydrophobic strip (Figure 11). The side-chains from strands B1-4 in
this region are predominantly polar, and rich in asparagine, threonine
and some glutamine residues.
Catalytic site
The crevice contains two glutamate residues, Glu 116 and Glu 200, which are invariant
throughout the GH family. We predicted Glu 116 to be the catalytic nucleophile in T.
reesei Cel12A, an identification supported by site-directed mutagenesis studies by
(Okada et al. 2000). The carboxylate group oxygens of the two glutamates are separated
by ~5.4 Å, a distance frequently observed for the nucleophile/acid-base involved in a
retaining mechanism (McCarter and Withers 1994). The nucleophile is in close
proximity to two other residues that are strictly conserved in family 12, Asp 99 and Met
118 (Figure 11 and 12). Together with the two invariant glutamates, the aspartate is the
third member of a catalytic trio, similar to those first observed in the GH clan-B
enzymes (Keitel et al. 1993; Divne et al. 1994). The acid-base Glu 200 forms only one
hydrogen bond, to the side-chain of Asn 95. The side-chain of Ile 130 is suitably
positioned at the bottom of the crevice to help form at least part of the +1 product site.
Both the – 1 and + 1 sites are devoid of exposed aromatic side-chains.
Substrate-binding cleft reducing end
In both T. reesei Cel12A and S. lividans CelB2, residues from the cord region, in
particular Pro 129/133 - Ile 130/134, are likely to form the bottom of the + 2 (or + 3)
site. The proline ring is sandwiched between two aromatic residues Trp 120/124 and
Tyr 147/Trp 151. The cord is structurally well conserved between T. reesei Cel12A and
S. lividans CelB2, and in the six NCS molecules. In Cel12A there are no signs of the
conformational flexibility that has been suggested to occur upon substrate binding in T.
reesei Xyn II (GH family 11 xylanase) (Muilu et al. 1998).
4.3 Thermal stability and activity of GH family 12 enzymes
(Paper II)
In this study we have measured the stability and activity of several GH family 12
homologues, and point mutants of these, in an attempt to identify residues important for
36
thermal stability. Significant biochemical diversity was seen among the examined
homologues (Table 2). Notably, the enzyme from the fungus Hypocrea schweinitzii
(Goedegebuur et al. 2002) differs from the T. reesei Cel12A enzyme at only 14
residues, but is significantly less thermally stable. We have systematically introduced
these 14 differences into the T. reesei Cel12A enzyme by site-directed mutagenesis, and
examined their effects on the thermal stability of the enzyme.
A1
Secondary str. elem.
T. reesei seq.
S. lividans
S. sp 11AG8
T. reesei
H. grisea
H. schweinitzii
B1
1
l
Q
-
D
N
I
-
T
Q
Q
R
Q
B2
T
Q
T
S
T
I
I
S
L
S
C
C
C
C
C
E
D
D
E
D
P
R
Q
L
Q
F
Y
W
Y
Y
G
G
A
G
A
T
T
T
Y
T
T
T
F
W
F
20
l
T
T
T
S
S
I
I
G
G
G
Q
Q
-
G
D
N
N
N
R
R
G
G
G
Y
Y
Y
Y
Y
V
V
T
E
I
V
V
V
L
V
Q
Q
S
L
S
*
50
R
E
H
S
H
V
I
A
T
A
R
R
L
L
L
W
W
W
W
W
G
G
G
G
G
40
l
S
T
A
K
A
T
S
S
D
S
A
T
A
G
A
G
S
T
S
S
-
A
A
G
G
G
P
T
F
W
F
Q
Q
G
Q
G
C
C
C
C
C
V
I
V
T
V
T
N
T
Y
T
C
C
I
I
I
H
H
-
Y
Y
-
T
G
-
N
N
-
C
C
-
S
A
-
P
P
-
G
R
-
T
T
-
D
T
P
Q
P
L
L
Q
R
Q
A
V
A
L
S
T
T
V
D
V
D
G
S
G
S
- - - - - - L S - G
T N N G
L N - G
G
G
A
T
N
A
I
A
V
M
R
R
R
R
R
T
K
T
L
I
V
I
V
* *
60
l
F
F
W
W
W
N
N
N
N
N
B3
T. reesei seq.
G
G
S
Q
S
N
N
N
N
N
30
l
* *
A3
Secondary str. elem.
S. lividans
S. sp 11AG8
T. reesei
H. grisea
H. schweinitzii
A2
10
l
T
T
D
A
D
Q
Q
-
A
A
W
W
W
D
D
Q
E
Q
G
G
W
W
W
70
l
S
S
S
Q
S
A
V
G
G
G
P
P
G
A
G
T
T
Q
P
Q
N
N
-
G
G
N
D
N
A
A
N
N
N
P
P
V
V
V
K
K
K
K
K
S
S
S
S
S
Y
Y
Y
Y
Y
l
P
P
Q
P
Q
S
S
N
Y
N
V
V
S
V
V
F
Y
Q
G
Q
N
D
I
K
I
G
G
A
Q
N
P
P
K
G
K
D
S
N
S
N
T
S
S
D
S
V
I
I
I
I
S
G
S
N
G
A
S
S
S
S
F
F
L
L
L
N
N
G
A
G
R
R
K
R
K
V
V
Y
Y
Y
G
G
G
G
G
* * *
A5
Secondary str. elem.
T. reesei seq.
B5
80
l
S. lividans
S. sp 11AG8
T. reesei
H. grisea
H. schweinitzii
A
A
M
M
M
P
P
P
R
P
100
l
S
S
T
T
T
S
S
T
S
T
I
V
A
V
A
S
S
S
S
S
Y
Y
W
W
W
G
R
S
T
S
F
Y
Y
Y
Y
V
T
S
D
S
D
G
G
R
G
G
N
S
T
S
A
G
I
I
I
V
V
R
R
R
Y
Y
A
A
A
N
N
N
N
N
S
A
A
A
A
Y
Y
Y
Y
Y
D
D
D
D
D
Q
Q
G
Y
G
P
P
P
P
P
I
I
I
I
I
S
S
S
T
S
P
P
S
F
S
V
V
Q
H
Q
G
G
G
S
G
T
T
T
Q
T
A
A
V
V
V
S
H
N
N
N
V
V
V
L
V
G
G
G
A
G
R
R
Q
R
Q
*
R
D
N
N
N
A
Q
Y
Y
Y
T
A
L
L
L
R
R
N
D
N
T
T
H
H
H
D
N
V
P
V
G
G
T
N
T
T
S
S
T
T
W
W
W
W
W
E
E
T
D
T
V
V
L
L
L
W
W
Y
W
Y
S
T
Y
T
Y
G
G
G
G
G
G
S
Y
Y
Y
*
G
G
G
G
G
S
S
A
N
A
N
N
M
M
M
D
D
Q
R
Q
* *
R
H
K
H
K
G
G
G
G
G
*
L
L
Y
Y
Y
A
A
N
P
N
E
T
A
A
A
N
P
A
R
G
D
D
G
E
G
Q
R
D
D
D
T
T
Y
Y
Y
W
W
Q
Q
Q
E
E
E
E
E
V
V
V
V
V
L
I
Y
Y
Y
*
S
S
S
S
S
F
F
F
F
F
V
L
V
L
V
L
L
V
L
V
T
T
L
I
L
S
S
S
V
S
M
M
M
M
M
I
I
I
I
I
W
W
W
W
W
* * *
*
170
l
A
A
A
P
A
P
P
Q
P
Q
S
-
V
I
Y
Y
Y
Q
Q
Q
Q
Q
*
A
A
F
V
F
S
S
T
G
S
A
A
N
D
N
I
I
T
I
T
S
S
T
R
T
*
F
F
T
T
T
E
E
E
E
E
W
W
Y
F
Y
S
S
S
S
S
F
F
G
C
G
*
A4:1
G
G
G
G
G
G
S
N
D
S
* *
W
W
F
F
F
Q
E
T
T
T
G
G
G
N
G
S
G
S
G
G
G
P
G
A
T
A
-
V
V
V
I
V
A4:2
218
l
P
P
P
C
P
D
D
D
D
D
*
210
l
Y
Y
Y
N
Y
I
I
L
L
L
A6
200
l
N
N
N
N
N
G
G
G
l
N
N
N
N
N
190
A
S
D
R
D
V
V
S
G
S
160
B4
V
V
R
E
R
l
Y
W
Y
B7
*
l
V
V
F
F
F
A
P
P
P
P
l
G
G
G
G
G
180
F
F
F
F
F
T
T
N
D
N
*
alpha helix
D
D
N
D
N
P
P
A
R
A
150
*
T. reesei seq.
M
K
K
K
K
D
D
A
A
A
l
G
G
G
G
G
* * *
Secondary str. elem.
L
L
T
T
T
140
l
I
V
I
I
I
120
l
W
W
F
F
F
B8
130
P
P
D
G
D
I
I
L
V
L
* *
B9
T. reesei seq.
S. lividans
S. sp 11AG8
T. reesei
H. grisea
H. schweinitzii
A
A
V
V
V
*
Secondary str. elem.
110
l
N
D
D
*
S. lividans
S. sp 11AG8
T. reesei
H. grisea
H. schweinitzii
B6
90
G
G
T
R
T
L
L
L
F
L
A
A
N
T
N
V
V
V
C
V
l
N
N
A
R
A
S
S
S
D
S
F
F
W
F
W
S
S
T
R
T
S
S
A
A
A
T
A
S
D
S
V
V
I
L
I
E
N
N
W
N
T
A
-
*
Figure 12. Structure based sequence alignment of five GH family 12 enzymes with
known structure. The secondary structure elements of the proteins are drawn at the top
of the alignment. The position of the nucleophile and the acid-base in the sequences is
indicated with filled and open arrows, respectively. The aligned protein sequences, with
their GenBank or PDB access codes indicated in parentheses, are: Streptomyces
lividans CelB2 (U04629, 2NLR); Streptomyces sp. 11AG8 Cel12A (AF233376, 1OA4);
Humicola grisea Cel12A (AF435071, yyy); Trichoderma reesei Cel12A (AB003694,
1H8V); Hypocrea schweinitzii Cel12A (AF435068, 1OA3).
37
4.3.1 Thermal stability
Circular-dichroism (CD) experiments to determine the thermal stability of the Cel12A
homologues were performed on an Aviv 62ADS spectrophotometer (Protein Solutions,
Lakewood, NJ). Buffer conditions were 0.05 M Bis-Tris propane, 0.05 M ammonium
acetate, adjusted to pH 8.0 with acetic acid. The final protein concentration for each
experiment was in the range of 10-20 PM. Data was collected in a 0.1 cm path length
cell. The experiments were performed at 217 nm, the wavelength in the far-UV spectra
with maximum signal difference, for a predominantly E-sheet protein. The temperature
was increased from 30 to 90 °C with data collected every two degrees. The
equilibration time at each temperature was 0.1 minutes and data were collected for 4
seconds per sample. The mid-point of the transition (Tm) is an apparent value because
the thermal denaturation of the Cel12A proteins studied was not reversible. The
apparent Tm for each examined enzyme is listed in Table 2.
Compared to the Tm of 54.4 °C for T. reesei Cel12A enzyme, the H. schweinitzii
enzyme has a Tm that is 5.2 °C lower, while the Streptomyces sp. 11AG8 Cel12A
homologue has a Tm that is 12.6 °C higher (Table 2). The 14 T. reesei Cel12A variants,
recruited from the H. schweinitzii Cel12A enzyme, have Tm changes ranging from –4.0
°C, to +2.5 °C for the most stable variant, compared to the WT enzyme. The most
significant changes in stability occur within the first 63 amino acids in the enzyme. The
T. reesei Cel12A A35S variant has the largest decrease in stability (–4.0 °C), while
mutating the same residue in T. reesei Cel12A to a valine (recruited from the more
stable homologue, S. sp 11AG8 Cel12A), produces the largest increase in stability (+7.7
°C). Interestingly, there are three substitutions in the less stable H. schweinitzii Cel12A
enzyme, G41A and T11S/T16I. These cause a Tm increase (of 2.5 and 1.1 °C,
respectively) when introduced in T. reesei Cel12A (Table 2).
The Tm variation from least to most stable homologue is 20.9 °C (Table 2). These
types of stability differences among homologous proteins are common, and there has
long been an interest in identifying the sources of the increased stability in the
extremophiles (Jaenicke 2000). However, no generically reliable rules from
comparisons of sequences alone or, even, modeled structures have emerged to predict
stabilizing changes. Although we find a high level of structural homology between the
Cel12A proteins from the mesophile (T. reesei) and the more stable enzyme from the
alkalophilic bacterium (S. sp. 11AG8), it is by no means obvious, given the low
sequence identity of 28 % (Figure 12), which changes are needed to stabilize the T.
reesei protein. However, we were able to use the characterization of a less stable
homologue, H. schweinitzii Cel12A, together with examination of the T. reesei Cel12A
structure and sequence analysis, to guide our choice of recruitment from the more stable
homologue. Our clear identification of a single residue responsible for large differences
in stability within a structural family may be unusual.
38
Table 2. Thermal denaturation data, and relative specific enzyme activity
___________________________________________________________________
GH12 homologue
variant
' Tma
Tm (qC)
Activityb
___________________________________________________________________
T. reesei Cel12A
H. schweinitzii Cel12A
S. sp. 11AG8 Cel12A cd
S. sp. 11AG8 Cel12A fl
T. koningii Cel12A
G. roseum Cel12C
F. javanicum Cel12A
0.0
-5.2
11.3
12.4
4.1
-8.5
-0.3
54.4
49.2
65.7
66.8
58.5
45.9
54.1
1.00
1.08
3.84
3.78
1.00
0.42
N.D.
T. reesei Cel12A
A35V
7.7
62.1
W7Y
-1.0
53.4
T11S/T16I
1.1
55.5
A35S
-4.0
50.4
S39N
0.5
54.9
G41A
2.5
56.9
S63V
-0.8
53.6
A66N
0.1
54.5
S77G
0.1
54.5
N91D
0.5
54.9
S143T
0.5
54.9
T163S
0.3
54.7
N167S
0.2
54.6
A188G
0.5
54.9
______________________________________________________________________
a
' Tm values are relative to T. reesei Cel12A WT. bThe specific enzyme activit is
expressed as molar specific activity relative to T. reesei Cel12A WT. N.D. = not
determined
______________________________________________________________________
4.3.2 Relative enzyme activity
To evaluate specific enzyme activity of the Cel12A homologues, an o-Nitrophenyl Ecellobioside (oNPC, Sigma N 4764) hydrolysis assay was used. In a microtiter plate,
100 µl 50 mM sodium acetate, pH 5.5 and 20 µl 25 mg/mL oNPC in assay buffer were
added. Once equilibrated, 10 Pl cellulase was added and the plate incubated at 40ºC for
10 minutes. To stop the reaction, 70 Pl of 0.2 M glycine, pH 10.0 was added. The plate
was then read in a microtiter plate reader at 410 nm. As a reference, 10 µl of a
0.1mg/ml solution of T. reesei Cel12A enzyme provided an OD of around 0.3.
Extinction coefficients for the other Cel12 homologues were calculated on the basis of
their amino-acid compositions.
39
All examined GH family 12 proteins were active enzymes. There was a nine-fold
difference in the specific activities of the homologues on oNPC (Table 2). The T. reesei
Cel12A variants with the greatest and least stability (A35V and A35S, respectively),
showed ~30 % higher activity than the WT enzyme. The activity of selected Cel12
enzymes, as a function of temperature was also determined. The Cel12 enzymes with
greatest thermal stability, Cel12A S. sp. 11AG8 and T. reesei Cel12A variant A35V,
showed a continual increase in activity over the full temperature range from 25 to 60
°C. The less stable homologues, T. reesei Cel12A and H. schweinitzii Cel12A, and the
destabilized T. reesei Cel12A variant A35S showed a decrease in activity at the highest
temperature, presumably due to thermal inactivation.
4.3.3 Structural features affecting stability
To better understand the structural basis for the large differences in stability among the
GH family 12 homologues, we determined the crystal structure of two of these
enzymes, the fungal Cel12A from H. schweinitzii, the catalytic domain (cd) of the
bacterial Cel12A from the S. sp. 11AG8. We also determined the structure of the most
stabilized T. reesei Cel12A variant, A35V, with an increase in Tm of +7.7 °C.
Protein crystallization
The T. reesei Cel12A A35V variant was crystallized under conditions similar to the WT
enzyme. Large single, wedge-shaped crystals grew to a maximum size of 1 mm in all
directions within 1-2 days. The crystals belong to the space group P21 with cell
dimensions a = 68.3 Å, b = 71.3 Å, c = 119.3 Å and E = 91.5°, and have a calculated
Vm of 1.9 (Matthews 1968) with 6 molecules in the asymmetric unit. S. sp 11AG8
Cel12A crystals were obtained from 10-20 % (w/w) mme PEG 5000, 200 mM sodium
cacodylate, pH 5.0-6.0, at 25 oC and using a protein concentration of 15 mg/ml. The
crystals belong to space group P212121 with cell dimensions: a = 65.1 Å, b = 54.5, c =
62.5 Å, and have a calculated Vm of 2.4 (Matthews 1968) with one molecule in the
asymmetric unit. H. schweinitzii Cel12A crystals were obtained from a crystallization
agent containing 200 mM cacodylate buffer (pH 6.0), 200 mM ammonium acetate, 1030 % (w/w) mme PEG 2000, 1 % isopropyl alcohol, at 20-24 oC, using a protein
concentration of 15 mg/ml. Single, larger crystals (0.1 mm in all direction) were
obtained after 6 months of incubation. These crystals belong to space group P21 with
cell dimensions: a = 62.5 Å, b = 77.5 Å, c = 83.4 Å and E = 98.5°, and have a
calculated Vm of 2.0 (Matthews 1968), with 4 molecules in the asymmetric unit.
Structure solutions and refinements
The bacterial S. sp. 11AG8 Cel12A catalytic domain (cd) structure was solved by
molecular replacement (MR), using X-plor 3.1 (Brünger 1992b), with the bacterial S.
lividans CelB catalytic core structure (PDB code 1NLR) as the search model. The
structure of H. schweinitzii Cel12A was solved by MR with Amore (Navaza 1994),
40
using all of the homologous WT T. reesei Cel12A structure (residues 1-218) as a search
model. The MR search gave one clear solution, with 4 molecules in the asymmetric
unit. The initial map was improved by exploiting the four-fold NCS of the asymmetric
unit. The changes in unit cell parameters of the T. reesei Cel12A A35V mutant with
respect to the WT structure required the use of molecular replacement, using Amore to
solve the structure. Summaries of model refinement statistics are given in Appendix I.
The H. schweinitzii Cel12A structure
The fungal Cel12A enzyme from H. schweinitzii crystallizes with four NCS-related
molecules in the asymmetric unit. The complete set of CD atoms from the four NCS
molecules in the H. schweinitzii Cel12A structure and the six NCS molecules in T.
reesei Cel12A structure, can be superimposed with pair-wise root mean square
deviations (RMSD) in the range of 0.3-0.5 Å. Some of the biggest main-chain
differences between the two structures can be found in the two loops corresponding to
residues 11-16 and 37-43, connecting E-strands B1 to B2 and A2 to A3 in the structure
(Figures 13).
Most of the 14 differences in the H. schweinitzii Cel12A enzyme compared to the
T. reesei enzyme, are located on the protein surface, and distributed over the whole
molecule (Figures 13). Many of the side-chains point out into the surrounding solution.
All of these substitutions (except N91D) are from a neutral to another neutral amino
acid, and most have little or no effect on the Tm of the enzyme (with ' Tm's in the range
0.1-0.5 °C), Table 2. There is a clustering of substitutions on the first E-strands in the
structure, A2-3 and B1-3, where some of the individual substitutions have a large effect
on the Tm of the T. reesei Cel12A enzyme. This region also has some of the largest
structural differences between the two fungal enzymes. The substitution in this region
that has the biggest effect on the Tm is the alanine to serine at residue 35, located on Estrand A2 on the smaller E-sheet A close to the N-terminus (Figure 13). This
substitution causes a reduction of the Tm by 4.0 °C when introduced into T. reesei
Cel12A (Table 2). The likely explanation for this reduction in Tm, is the introduction of
a hydrophilic residue in the hydrophobic environment, at the edge of the two E-sheets.
This disrupts the hydrophobic interactions with the side chains of the three surrounding
hydrophobic residues.
The T. reesei Cel12A A35V structure
The A35V variant crystallizes in the same space group and with same cell constants as
the WT T. reesei Cel12A enzyme. Like the WT enzyme it crystallizes with six NCSrelated protein molecules in the asymmetric unit, making up three pairs of interacting
molecules. The biggest differences among the six NCS molecules correspond to loop
regions, which take on different conformations in the different NCS molecules, and
most of these are affected by the crystal contacts.
The functional group of residue A35 of T. reesei Cel12A, points into the core between
the interacting E-sheets, where it interacts with a number of spatially adjacent side-
41
chains. Mutating residue 35 does not cause a major conformational change or local rearrangement in the structure. However, the introduction of a valine at this position
influences packing. The methyl groups make good van der Waals contacts with the
neighboring hydrophobic residues side-chains (Figure 14a). In the WT enzyme, the
packing of the alanine 35 CE methyl group is not so tight as in the A35V variant. The
approximately equivalent pairs of CD's, from the NCS molecules in the T. reesei
Cel12A WT and A35V structures, can be superimposed with pair-wise RMSD's in the
range of 0.4-0.6 Å.
The Streptomyces sp. 11AG8 Cel12A structure
The cd of the bacterial S. sp. 11AG8 Cel12A enzyme crystallizes with only one
molecule in the asymmetric unit. The S. sp. 11AG8 Cel12A cd structure is most similar
to the Streptomyces lividans GH family 12 enzyme CelB2, with which it shares 72 %
sequence identity (Figure 12) and there are no insertions or deletions in the sequences.
In the S. sp. 11AG8 Cel12A structure there is no observable density for the residues in
the “linker” region beyond Ala 222. The complete set of CD atoms from the two
structures can be superimposed with a RMSD of 0.49 Å.
There are two disulfide-bridges in the S. sp. 11AG8 enzyme. The one between Cys
5 and Cys 31, linking E-strands A1 and A2, is conserved throughout the whole GH
family 12 (Paper I). The second disulfide-bridge is formed between Cys 64 and Cys 69,
and is found in an extended loop between E-strands B3 and A5, which forms part of the
substrate-binding cleft of the enzyme. Structural alignment shows that the bacterial GH
family 12 enzymes have an insertion in this loop compared to the fungal GH family 12
enzymes (Paper I). This suggests that the purpose of this disulfide bridge may be to
stabilize the insertion.
One of the largest main-chain differences in the S. sp 11AG8 Cel12A structure
compared with that of the S. lividans is at residue V/A34, the equivalent site of the T.
reesei Cel12A A35V mutation, shown to affect the temperature stability in the T. reesei
Cel12A enzyme. The loop corresponding to residues 33-38 has a small rigid body shift
so that in the CelB2 enzyme it moves closer to the upper sheet (Figure 14b). In CelB2,
the CE of A34 has hydrophobic contacts with the three equivalent inward-pointing
residues from the upper sheet (T11, I13 and V19, and inter-atomic separations in the
range 3.6-4.2 Å). In the S. sp. 11AG8 Cel12A structure, the methyl groups of V34 also
remain in good van der Waals contact with the equivalent side-chains (separations of
3.5-4.3 Å).
42
A
B
Figure 13. Schematic ribbon diagram drawing, (a) top and (b) side view, of the H.
schweinitzii Cel12A crystal structure. Color-ramped according to residue number,
starting with red at the N-terminus and ending with blue at the C-terminus of the
structure The structures have side-chains drawn for the 14 residues that differ from the
T. reesei Cel12A protein sequence.
A
B
Figure 14. Interactions and conformational changes close to residue 35 of the fungal
GH 12 enzymes from T. reesei (WT and A35V have carbon atoms colored yellow and
goldenrod respectively), and H. schweinitzii (carbons colored gold). Red bubbles
indicate contacts in T. reesei A35V Cel12A, blue bubbles in H. schweinitzii Cel12A. (b)
Interactions and conformational changes close to residue 34 of the bacterial GH 12
enzymes from S. sp. 11AG8 (carbons colored gold), and S. lividans (carbons colored
yellow). Red bubbles indicate contacts in S. lividans CelB2, blue bubbles in S. sp.
11AG8 Cel12A.
43
4.3.4 Discussion
Examination of the Cel12A structures provides structural rationalization for the
differences in stability in the enzymes. The clear identification of a single residue
responsible for large differences in stability within a structural family, as in this study,
may be unusual. It is obvious from other studies (Lehmann and Wyss 2001; Lehmann et
al. 2002), and from the limited sampling presented in this study, that not all
recruitments from more stable homologues are stabilizing. Conversely, not all
recruitments from a less stable homologue are de-stabilizing. In contrast to the T. reesei
Cel12A A35S variant, six of the 14 substitutions recruited from the H. schweinitzii
Cel12A stabilized the T. reesei Cel12A enzyme. Recruiting mutants with increased
stability from less stable homologues has been previously reported in other protein
families (Shaw et al. 1999; Lehmann et al. 2000; Perl et al. 2000).
4.4 The Humicola grisea Cel12A structure, and stabilizing
cysteines (Paper III)
In this study one additional GH family 12 homologue has been biochemically
characterized, Cel12A from the fungus Humicola grisea. The secreted H. grisea
Cel12A gene product is identical to the GH family 12 gene product reported for
Humicola insolens, and it shares only 44% sequence identity to the T. reesei Cel12A
enzyme (Figure 12).
4.4.1 Thermal Stability
As for the other GH family 12 homologues, we have characterized the thermal stability
of the H. grisea enzyme (Table 3). The H. grisea Cel12A enzyme is much more
thermally stable than the T. reesei WT enzyme. It has a Tm (68.7 °C) that is 14.3 °C
higher than the Tm (54.4 °C) of the T. reesei enzyme (Table 3).
In cases like this with low homology between enzymes, systematic recruitment to
identify the underlying cause of variation is difficult or impossible. A statistical
approach then is needed to determine the preferential occurrence of certain amino acids
at particular positions. The preferential data can be gathered from a subset of
homologous with natural diversity and available sequences (Shaw et al. 1999).
To identify residues that contribute to the different Tm behaviors between the H.
grisea and T. reesei Cel12A enzymes, a sequence alignment of 21 GH family 12
enzymes (Goedegebuur et al. 2002), and a structure alignment of the four homologous
Cel12A enzymes with known protein structure (Figure 12) were used. Based on this
data, we identified three free-cysteine residues (C175, C206 and C216), in the H. grisea
Cel12A enzyme as being potentially important for enzyme stability (Figure 12 and 15).
44
Table 3. Thermal denaturation data and relative specific enzyme activity
___________________________________________________________________
GH12 homologue
Variant
' Tma
Tm (qC)
Activityb
___________________________________________________________________
T. reesei Cel12A
H. grisea Cel12Ac
T. reesei Cel12Ac
WT
WT
G170C
P201C
V210C
G170C/P201C
P201C/V210C
G170C/P201CV210C
G170C/V210C
0.0
14.3
54.4
68.7
1.00
0.12
2.1
3.9
0.1
0.7
0.7
0.0
ND
56.5
58.3
54.5
55.1
55.1
54.4
ND
0.69
0.21
NA
0.38
0.50
0.14
0.28
H. grisea Cel12Ad
C175G
1.3
70.0
0.65
C175S
0.2
68.9
0.73
C206P
-9.1
59.6
0.91
C206S
-5.4
63.3
1.55
C216V
0.8
69.5
0.69
C216S
-5.6
63.1
0.65
______________________________________________________________________
a
' Tm values are relative to: cT. reesei Cel12A WT, dH. grisea Cel12A WT. bThe specific
activities given are relative to: cT. reesei Cel12A WT, dH. grisea Cel12A WT.
______________________________________________________________________
A possible simple explanation for the increased stability of the H. grisea enzyme is that
two of these Cys residues are involved in an additional disulfide bond, in addition to the
disulfide bond between Cys residues 6 and 35 (H. grisea Cel12A numbering). This
disulfide bond is totally conserved within GH family 12 and known to be very
important for the stability. However examination of the T. reesei Cel12A structure
made it look unlikely that these three additional Cys residues in H. grisea Cel12A make
an extra disulfide bond in the enzyme. It is unusual for a protein to have so many free
cysteines, unless they have a function, since they can be oxidized or lead to misformation of essential disulfides (e.g., Cys6-35).
To determine if these free cysteines are important for stability, we introduced them
into the corresponding positions in the T. reesei Cel12A enzyme, as single, double, and
triple mutations. We also constructed six H. grisea Cel12A variants where we
exchanged the free cysteines with the corresponding residues in the T. reesei enzyme
(C175G, C206P and C216V) and with serine, and determined the thermal stability of
these variants (Table 3). The free cysteines in H. grisea Cel12A seem to play a key role
in modulating the stability of the enzyme (Table 3). The T. reesei Cel12A cysteine
variants recruited from H. grisea Cel12A have Tm changes ranging from 0.1 °C to an
45
increase of 3.9 °C for the most stable variant (P201C). The H. grisea Cel12A variants
have Tm changes ranging from a decrease of 9.1 °C for the least stable variant (C206P),
to an increase of 1.3 °C for the most stable one (C175G), Table 3.
4.4.2 Relative enzyme activity
The specific activities of WT and mutant enzymes were determined with o-NPC as
substrate (Table 3). The H. grisea enzyme has only 12 % of the activity of the T. reesei
enzyme at 40 °C. The T. reesei enzyme is more sensitive to the mutations. One H.
grisea Cel112A mutation (C206S) actually increases the enzyme activity slightly.
4.4.3 Protein structures
To shed some light on the structural basis for the large thermal stability differences
between the two fungal Cel12A enzymes, the crystal structures of the H. grisea Cel12A
WT and the T. reesei Cel12A cysteine variant (P201C), were determined. Both
structures have the expected overall-fold of a GH family 12 enzyme. Statistics on data
collection, refinement and the final structure models are given in Appendices I and II.
The H. grisea Cel12A structure
The final model of the H. grisea enzyme contains the complete sequence of 224 amino
acids. Equivalent CD atoms from the H. grisea and the T. reesei Cel12A structures can
be superimposed with pair-wise RMSD's in the range of 0.3-0.5 Å. There are no larger
insertions or deletions in the H. grisea structure compared with that of T. reesei. Four of
the extra six residues in the H. grisea structure are distributed over the whole molecule
as single amino acid insertions, mainly in the loops connecting the E-strands. Two of
the extra six residues are located at the C-terminus. Some of the biggest main-chain
differences that exist between the two structures can be found in the loops that have an
extra residue in the H. grisea structure. The three free cysteines, C175, C206 and C216,
are located on E-strands A6, B4 and A4 respectively (Figure 15). Their side chains
point into the core between the two E-sheets and thy form extensive interactions with
surrounding residues (Figure 16).
The T. reesei Cel12A P201C structure
The T. reesei Cel12A P201C variant crystallizes in the space group P31, with two NCS
molecules in the asymmetric unit, forming a structural dimer. The two NCS molecules
block their substrate-binding clefts as in our previous T. reesei Cel12A structures
(Paper I and II). Residue 201 in T. reesei Cel12A is located on E-strand B4, a strand on
the bigger E-sheet B, one residue from the catalytic acid/base (E200), Figure 15. Figure
16 shows the environment around residue 201 in the WT and mutant T. reesei Cel12A
structures. The introduction of a cysteine does not cause a major conformational change
in the variant compared to the WT structure.
46
A
B
Figure 15. Schematic ribbon diagram drawing, (a) top and (b) side view, of the H.
grisea Cel12A crystal structure. The structure has side chains drawn for the three free
cysteine residues (C175, C206 and C216), and the two catalytic residues (E120 and
E205).
Figure 16. Interactions and conformational changes close to residues 206 and 201 of
the H. grisea Cel12A structure, and the T. reesei WT and P201C Cel12A structures,
respectively. The H. grisea Cel12A structure has carbon atoms colored goldenrod and
the T. reesei WT and P201C Cel12A structures have carbon atoms colored yellow and
orange. The blue bubbles indicate contacts to the free cysteine residues in H. grisea
Cel12A.
47
Introducing the free cysteines from H. grisea Cel12A into equivalent positions in T.
reesei Cel12A results in an increase of the Tm by 0.1-3.0 °C. The T. reesei P201C
structure shows that the mutation causes only small local changes in the protein
structure. One reason for the increased Tm of the P201C variant is the filling of a small
cavity by the cysteine side chain, and the resulting set of van der Waals interactions
(Figure 16). The change, however, is smaller than the C206P mutation in the H. grisea
enzyme. Not all of the interactions present in the H. grisea enzyme are made in the T.
reesei P201C enzyme. In particular the Q34 - C206 interaction that exists in the H.
grisea enzyme is missing in the variant. The effect on the solvent accessibility of the
interacting residues when introducing the P201C mutation is very small.
The free cysteine residue in H. grisea Cel12A, which has the least effect on the Tm is
C175. If this cysteine is mutated to a glycine or a serine, the Tm of the enzyme is
increased by 1.3 °C and 0.2 °C, respectively, compared to the WT (Table 3), but the
enzyme activity, however, is decreased for both variants (C175G/S), Table 3. Cysteine
residue 175 is located on E-strand A6, on the smaller E-sheet A that creates the convex
outer surface of the enzyme, close to the only D helix in the structure (Figure 15). The
side chain of the residue points into the core between the two E-sheets, where it has a
set of van der Waals interactions with the side chains of six adjacent residues.
The free cysteine at residue 206 has the largest effect on the Tm of the H. grisea
enzyme. This residue sits next to the catalytic acid/base (Glu 205) on E-strand B4
(Figure 15b). The side chain points into the E-sheet core where it has an extensive
network of interactions with the side chains of eight adjacent residues (Figure 16).
Many of the interactions with SG are polar in nature, in particular the contact with the
indole NH1 of W52, indicative of a short S-HN hydrogen bond (3.3 Å). Mutating this
residue in H. grisea Cel12A to a proline, or a serine, causes a reduction in the Tm of 9.1
°C and 5.4 °C, respectively, compared to the WT enzyme. Although this residue is
positioned next to the acid/base, the Tm for these two mutations is reduced significantly.
The enzyme activity is only modestly decreased for the glycine mutation, and
drastically increased for the serine mutation (relative specific activity of 91 % and 150
%, compared to WT), Table 3.
The third free cysteine is located at residue 216 on E-strand A4 in E-sheet A, the
sheet that creates the outer convex surface of the enzyme. This cysteine residue has its
side chain pointing into the core region below the active site in the structure, where it
interacts with the side chains of five adjacent residues. These are van der Waals
contacts so mutating this residue to a valine causes little change in the Tm. The
introduction of the more polar serine residue with no possibility for hydrogen bonding
has a large effect on the Tm, (Table 3). The enzyme activity is decreased for the
mutations (relative specific activity of 0.7 % for both), compared to WT (Table 3)
48
4.4.4 Discussion
The effects of the single mutations in the H. grisea and the T. reesei Cel12A enzymes
do not explain the total difference in thermal stability between them. In the next obvious
step to combine the mutations, little or no additivity is seen, and multiple variants of the
T. reesei enzyme still do not approach the H. grisea Cel12A stability. Interestingly, we
see that the free Cys residues in H. grsea Cel12A are not necessarily the most
stabilizing amino acids to have in these positions. Replacing Cys 175 with Gly or Cys
216 with Val in H. grisea Cel12A, leads to an increase in the Tm by 1.2 °C and 0.78 °C,
respectively, compared to the WT enzyme.
4.5 H. grisea Cel12A complex structures (Paper IV)
Only one ligand-bound GH 12 structure has previously been reported, the crystal
structure of the catalytic core domain of Steptomyces lividans CelB2 (Figure 16), in
complex with 2-deoxy-2-fluoro-cellotrioside (Sulzenbacher et al. 1999). In this study,
we present four new ligand-bound GH 12 structures, H. grisea Cel12A in complex with
cellobiose (G2), cellotetraose (G4), cellopentaose (G5) and thio-linked cellotetraose
(G2SG2).
4.5.1 Overall protein structures
Statistics on data collection, refinement and the final complex models are given in
Appendices I, and II. As was previously predicted for the apo H. grisea Cel12A crystal
structure (Paper III), the substrate-binding cleft of the enzyme is not blocked by other
protein molecules in the crystal. This made it possible to soak a ligand into the binding
site of the enzyme molecules in the crystal. The four H. grisea Cel12A complex crystals
all have the same space group (P43212), and similar cell constants to the apo crystal.
The final models of the H. grisea Cel12A G2, G4, G5 and G2SG2 complexes contain
one protein molecule with all the 224 amino acids in the enzyme. There are no major
changes in the main-chain conformations in the complexes, compared to the apo
structure. The largest number of changes occurs in the side chains of amino acids lining
the binding cleft, especially in the complex structures where the ligand spans the active
site (G5 and G2SG2). The separation between the carboxylate groups of the catalytic
nucleophile and the acid/base glutamyl residues varies between 5.8-6.6 Å, in the four
different complex structures.
4.5.2 Oligosaccharide complexes
The ligand complexes were obtained by equilibrating apo H. grisea Cel12A crystals for
36 hours in drops consisting of 30 % (w/w) mme PEG 2000 and 20 mM of the ligands,
over a reservoir solution containing 30 % (w/w) mme PEG 2000. The different enzyme
ligand interactions are summarized in Table 4. The H. grisea Cel12A G2, G5 and
49
G2SG2 complexes show well-defined electron-density in the +1 and +2 subsites of the
enzyme. All complexes show the same directionality, they have their reducing end
pointing towards the right in Figure 17 and 18.
Figure 17. The H. grisea Cel12A crystal structure in complex with a modeled
cellohexaose ligand (consisting of the 4 to 2 glucans from the cellotetraose structure,
and the 1 to +2 glucans from the cellopentaose structure), spanning from binding site
4 to +2, colored in gold. The structure has side chains drawn for the two catalytic
residues Glu 120 and Glu 205, colored in gold.
The cellobiose complex shows density for a ligand only in the product site (Figure
18d). The cellotetraose chain in the H. grisea Cel12A G4 complex structure is bound in
the binding cleft spanning from sites –4 to –1. The electron density for the cellotetraose
molecule is well defined for the glucosyl residues occupying binding sites –4 to –2, but
is less well defined for the glucosyl unit occupying binding site –1 (Figure 18b). The
cellopentaose chain and the thio-linked cellotetraose chain in the H. grisea Cel12A G5
and G2SG2 complex structures are bound spanning sites –2 to +2 (Figure 18c). In these
structures the electron densities for the two glucosyl residues occupying binding site +1
and +2 are well defined, but are less defined for the two glucosyl residues occupying
50
binding site –1 and –2 (Figure 18c). There is no detectable electron-density in the
binding cleft for the fifth glucosyl residue in the cellopentaose molecule.
A
B
C
D
Figure 18. Electron density maps for the H. grisea Cel12A-cellotetraose (b),
cellopentaose (c), and cellobiose (d) complexes. The electron density for the
cellohexaose complex (a) is a combination of that from the cellotetraose complex (4
to 2 glucans), and the cellopentaose complex (1 to +2 glucans). The maps shown are
maximum-likelihood V A weighted 2~Fobs~ ~Fcalc~ maps, contoured at 1V. For clarity
the electron density has been cut around the cellooligomers using a masking radius of
1.5 Å. The catalytic nucleophile (Glu 120) and the acid/base (Glu 205) are shown for
reference.
51
Conformation of glucosyl residues
The glucosyl residues in binding sites –4 and –2 of the H. grisea Cel12A complex
structures, both have a 4E elongated chair conformation (Figure 19a). However, in
binding site –3, + 1 and +2, the glucosyl residues are in the full 4C1 chair conformation.
The glucosyl residues in sites –4 and –3 are only present in the cellotetraose complex
structure. The confrontation of the glucosyl residue in binding site –1 that best fits the
electron density in the two complex structures (G5 and G2SG2) where the glucan chain
spans over the binding site, is a 1S3 skew-boat conformation (Figure 19b). In the G4
structure where the glucan chain ends in the –1 site, the glucosyl residue in this position
has a full 4C1 chair conformation (Figure 19a). If one looks down the binding cleft from
the entrance at site –4 towards the exit at site +2, the cellulose chain makes a righthanded twist, approximately 55º.
4.5.3 Protein carbohydrate interactions
Binding sites –4, –3 and –2
There is one direct hydrogen bond formed between the glucosyl in binding site –4 and
the protein, from the glucosyl O6 hydroxyl to the amid group of Asn114 in H. grisea
Cel12A (Figure 19a). There is also a hydrogen bond between the amide group of Asn
114 and the O3 hydroxyl on the glucosyl in binding site –3. This glucosyl also has two
more hydrogen bonds to the protein, from the O2 hydroxyl to the hydroxyl group on
Tyr 66, and from the O6 hydroxyl and the hydroxyl group on Tyr 9. There is only one
protein-carbohydrate hydrogen bond formed in binding site –2, and that is between the
glucosyl O2 hydroxyl and the OG1 of Asn 22. The glucosyl ring in site –2 has
hydrophobic stacking interactions with the indole ring of Trp 24.
Binding site –1, the active site
The nucleophile and acid/base catalysts, Glu 120 and Glu 205, are positioned on
opposite sides of the linkage between sites –1 and +1 (Figure 17). The distance between
the anomeric C1 carbon of the glucosyl in site –1 and the OH2 of the nucleophile Glu
120 is 3.2 Å in the G4 complex structure where the glucosyl is in a full chair
conformation. In the G5 and G2SG2 complex structures where the glucosyl has adapted
a 1S3 skew-boat conformation and the ligand is spanning the active site, it is 3.5 Å.
There is a hydrogen bond of 2.6 Å between the O2 hydroxyl on the –1 glucosyl and the
OH1 of Glu120. The O2 hydroxyl on the –1 glucosyl also interacts with the OG1 of
Asn155, with a distance of 3.0 Å. The distance between the anomeric C1 carbon and
the OH2 of the acid/base Glu 205 is 4.0 Å in the G4 complex structure and 3.2 Å in the
G5 and G2SG2 complex structures. The OH2 of Glu 205 has close contacts to the O6 and
O5 hydroxyl groups on the glucosyl in the –1 site, 2.6 Å and 3.1 Å respectively in the
G5 and G2SG2 complex structures, and 2.9 Å and 3.0 Å respectively in the G4 complex
structure. There is only one detectable water-mediated hydrogen bond between the
52
protein and the glucosyl in site –1, and it is formed between the O3 hydroxyl and the O
atom of Trp 115.
A
Figure 19. Proteincarbohydrate interactions
in the binding cleft of H.
grisea Cel12A, in sites 4
to 1 of the G4 complex
(a), and 2 to +2 of the G5
complex (b). Side chains
are shown for residues
important in ligand
binding. Possible
hydrogen bonds (distances
<3.4 Å) between the
carbohydrate and the
protein are drawn as a
series of magenta spheres.
The protein CD backbone
is drawn as a goldenrod
coil.
B
Binding sites +1 and +2
There is also one hydrogen bond of 2.5 Å from the OH2 of Glu205 to the O4 oxygen on
the glucosyl in site +1 in those complex structures (G2, G5, and G2SG2) with glucosyl
residues in the product sites. The +1 glucosyl has a hydrogen bond of 2.6 Å between the
53
O3 hydroxyl and the OH1 of Glu205, and one of 2.7 Å between the O2 hydroxyl and the
carboxyl of Tyr 132. The carboxyl oxygen of Tyr 132 also has a short contact distance
of 2.7 Å to the O4 hydroxyl of the +2 glucosyl. The O6 hydroxyl group of the +2
glucosyl has a short contact distance of 2.7 Å to the N group on Tyr 132.
4.5.4 Transglycosylation
The ligand structure obtained after soaking the H. grisea Cel12A crystals with
cellotetraose was a complete surprise. The tetraose molecule that was bound in sites –4
to –1 appeared to have a E-1,3-glycosidic bond in the middle, linking the two cellobiose
moieties. The quality of the electron density indicates full occupancy for the ligand,
strongly suggesting that it is not the result of overlapping ligands at partial occupancy,
but that we actually have a mixed E-1,3-1,4-tetraose bound to the enzyme. 1H NMR
analysis of the starting cellotetraose sample showed the exclusive presence of E-1,4glycosidic bonds and no detectable E-1,3-glycosidic links in the sample (less then
1/1000 of the glycosidic bonds). The only reasonable explanation for the E-1,3glycosidic linkage that we find between the glucose units in sites –3 and –2 is that it has
formed by transglycosylation during the ligand soaking.
Figure 20. The electron density map for the H. grisea Cel12A-cellotetraose complex
with the E-1,3-linkage indicated in the structure. The shown map is a maximumlikelihood V A weighted 2~Fobs~ ~Fcalc~ map, contoured at 1V. For clarity the
electron density has been cut around the cellotetraose ligand using a masking radius of
1.5 Å.
We have no structure of a covalently bound intermediate, but if we model the
intermediate structure of S. lividans CelB2 (Sulzenbacher et al. 1999) on our active site,
the distances to the C1 anomeric carbon are 3.1 and 4.0 Å for the O4 and O3 hydroxyl
54
groups on the glucosyl in site +1, respectively. The preferred binding of cellobiose in
the product site with O4 closest to the reaction center, suggests that E-1,4transglycosylation would occur much more frequently than the formation of a E-1,3linkage. With cellobiose as acceptor, E-1,4-transglycosylation would only have the
effect of reforming the substrate, but it is possible that cellotetraose could also be an
acceptor resulting in the formation of longer cello-oliogosaccharides as
transglycosylation products. The fact that we observe the mixed-link tetraose bound to
the enzyme provides in itself very little information about the reaction rate for E-1,3transglycosylation, relative to hydrolysis and E-1,4-transglycosylation. The formation of
a ligand with a E-1,3-linkage may be a very rare event but still yields sufficient quantity
of this species during the long incubation time (36 h) of the crystals in the cellotetraose
ligand.
The mixed-link tetraose fits remarkably well in the enzyme. In sites –1 and –2 the
binding is very similar to the cellopentaose complex. Whereas a regular E-1,4-glucan
chain would extend upwards from the active site, out of the cleft, the E-1,3-linkage
allows the glucan to extend along the bottom of the cleft. One can hypothesize that the
fit of a mixed-link E-1,3-1,4-glucan chains in H. grisea Cel12A is no coincidence.
Rather the substrate-binding cleft has evolved to allow specific binding and cleavage of
mixed-link E-1,3-1,4-glucan chains. In a mixed-link E-1,3-1,4-glucan chain, this would
allow the enzyme to perform cuts in E-1,4 linkages two linkages away from a E-1,3
linkage. And we also think that the enzymes preference of binding a mixed-link E-1,31,4-glucans accumulates this species of the ligand in the crystal, though it is a rare event
that it is formed by a transglycosylation reaction in the enzyme.
55
5C
ONCLUDING REMARKS
______________________________________________________________________
The goals of the thesis have all to some extent been fulfilled. We have solved the threedimensional structures of four homologous GH family 12 enzymes, three fungal
enzymes (Humicola grisea Cel12A, Hypocrea schweinitzii Cel12A, Trichoderma reesei
Cel12A), and one bacterial (Streptomyces sp. 11AG8 Cel12A). We have used the
structural and biochemical information that we have gathered from these, and other GH
family 12 homologues, to produce a wide range of variants of the different homologous
Cel12A enzymes. These variants have been biochemically characterized, and we have
thereby identified positions responsible for some of the biochemical differences among
the homologous enzymes. The three-dimensional structures of two T. reesei Cel12A
variants, where the mutations have had a significant impact on the stability or the
activity of the enzyme, have also been determined. Finally, we have solved four ligand
complex structures of the WT H. grisea Cel12A enzyme, which has allowed us to
characterize interactions between the substrate and the protein in the binding cleft of the
enzyme.
Attempts to determine general principles of protein stability e.g., from studies of
thermophilic proteins, have generally been difficult. Usually increased stability is a
result of the accumulation of many small, subtle changes in the protein structure. Our
results suggest that cumulative role of these changes is an under-appreciated factor in
protein stability. In the future, various recombinations of individual changes will enable
a more thorough characterization and quantification of the role of co-operativity in
protein stability.
Our structural, stability and activity studies of closely related GH family 12 enzymes,
and their variants, have provided insight on how specific residues contribute to protein
thermal stability and enzyme activity in this family. This knowledge can in the future
serve as a structural toolbox, when one wants to modify GH family 12 enzymes with
specific targeted properties and features, by introducing subtle changes in structural
components in these enzymes. And this can be utilized to develop new industrial
products, or fine tune enzymes in already existing applications.
56
6A
CKNOWLEDGEMENTS
______________________________________________________________________
I want to thank everyone who has contributed to this thesis, and all those that have
made these years at the joint structural group at Uppsala University and Swedish
University of Agricultural Sciences so enjoyable!
I am especially grateful to:
Alwyn Jones for accepting me as a PhD student and continuous support during all
these years, though it has taken several more years than the usual five to complete my
thesis. Jerry for introducing me to the world of carbohydrate-degrading enzymes, and
supporting me with your deep knowledge about this class of enzymes when I have
written my manuscripts and thesis. Tex for your deep knowledge of protein
crystallization, and that you always have been willing to share this with the rest of us,
for your help crystallizing some of my proteins, and proofreading all of my manuscripts
and thesis. Alex for helping me to solve my first structure, within the two weeks that
Alwyn bravely promised Genencor Inc. that we would solve the structure. Gunnar for
all the support that I have gotten from you these last couple of years, and especially for
your help with refining some of my structures. Sherry for all the scientific discussions
we have had during the years, and for introducing me to the world of chemotaxis and
membrane proteins during my first years at the department. Lars and Torsten for
always being willing to listen and give advice about any crystallographic problem that I
had. My present colleagues at the department Emma, Patrik, Eva-Lena, Ulrika,
Wimal, Anna, Isabella, Fredrik, Mark, Martin S., Seved, Margareta, Ulla, Karin
W, David, Inger, Janos and Hans for all scientific and non scientific discussions that
we have had during our synchrotron trips, at the lunch table, or whenever I have had a
problem that I needed advice on or help with. My former colleagues: Jill, Neel, Hans,
Nina, John, Deva, Rams, Karin K, Tove, Cicci, Martin, Jinyu, Elin, Ines and Lotta
for all the help I previously have got, and the discussions I have had with you. Erling,
Christer and Remco for keeping all computers and all other technical equipment
running at the joint structural group. Erling for all our discussions we have had about
house-building details over a mug of morning coffee. Ingrid and Elleonor for your
support with all administrative details such as salaries, and travel expenses reports. My
close collaborators at Genencor Inc. in San Francisco: Colin, Andy, Pete, Ed, Laurie,
Mae, Shan, Tracie, Rick and Carol for all the support I have got from you during the
last four years fruitful collaboration. My collaborators in Marc Claeyssens group in
Ghent: Wim, Tom and Kathleen for all the discussions we have had on cellulase
enzymology. Finally, I want to give thousands of roses to my family and relatives for all
the understanding and support that I have recived from all of you during these years.
The financial support from Genencor Inc. has been greatly appreciated.
57
7R
EFERENCES
______________________________________________________________________
Armand, S., Drouillard, S., Schulein, M., Henrissat, B., and Driguez, H. 1997. A
bifunctionalized fluorogenic tetrasaccharide as a substrate to study cellulases. J
Biol. Chem. 272: 2709-2713.
Bailey, M.J., Siika-aho, M., Valkeajarvi, A., and Penttila, M.E. 1993. Hydrolytic
properties of two cellulases of Trichoderma reesei expressed in yeast. Biotechn.
and Appl. Biochem. 17: 65-76.
Barr, B.K., Hsieh, Y.-L., Ganem, B., and Wilson, D.B. 1996. Identification of Two
Functionally Different Classes of Exocellulases. Biochemistry 35: 586-592.
Bayer, E.A., Chanzy, H., Lamed, R., and Shoham, Y. 1998. Cellulose, cellulases and
cellulosomes. Curr. Opin. Struct. Biol. 8: 548-557.
Béguin, P., and Aubert, J.P. 1994. The biological degradation of cellulose. FEMS
Microbiol Rev 13: 25-58.
Bhikhabhai, R., Johansson, G., and Pettersson, G. 1984. Isolation of cellulolytic
enzymes from Trichoderma reesei QM 9414. J Appl Biochem 6: 336-345.
Bisset, J. 1984. A revision of the genus Trichoderma. Canadian J. of Botany 62: 924931.
Blackwell, J., Kolpak, F., and Gardner, K. 1978. The structure of cellulose I and II.
Tappi J. 61: 17-72.
Boisset, C., Fraschini, C., Schulein, M., Henrissat, B., and Chanzy, H. 2000. Imaging
the enzymatic digestion of bacterial cellulose ribbons reveals the endo character
of the cellobiohydrolase Cel6A from Humicola insolens and its mode of synergy
with cellobiohydrolase Cel7A. Appl. Environ. Microbiol. 66: 1444-1452.
Bourbonnais, R., and Paice, M.G. 1990. Oxidation of non-phenolic substrates. An
expanded role for laccase in lignin biodegradation. FEBS Lett 267: 99-102.
Brünger, A.T. 1992a. Free R value: a novel statistical quantity for assessing the
accuracy of crystal structures. Nature 355: 472-475.
Brünger, A.T. 1992b. X-PLOR Version 3.1: A System for X-ray Crystallography and
NMR, 3.1 ed. Yale University Press, New Haven, CT, USA.
Brünger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros, P., Grosse-Kunstleve,
R.W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N.S., et al. 1998.
Crystallography & NMR system (CNS): a new software suite for
macromolecular structure determination. Acta Crystallog. D54: 905-921.
58
Byrne, K.A., Lehnert, S.A., Johnson, S.E., and Moore, S.S. 1999. Isolation of a cDNA
encoding a putative cellulase in the red claw crayfish Cherax quadricarinatus.
Gene 239: 317-324.
Coughlan, M. 1985. Cellulases: production, properties and applications. Biochem. Soc.
Trans. 13: 405-406.
Coutinho, P.M., and Henrissat, B. 1999. Carbohydrate-active enzymes: an integrated
database approach. In Recent Advances in Carbohydrate Bioengineering. (eds.
H. J. Gilbert, G. Davies, B. Henrissat, and B. Svensson), pp. 3-12. The Royal
Society of Chemistry, Cambridge.
Cowtan, K., and Main, P. 1998. Miscellaneous algorithms for density modification.
Acta Crystallogr. D54: 487-493.
Davies, G., and Henrissat, B. 1995. Structures and mechanisms of glycosyl hydrolases.
Structure 3: 853-859.
Davies, G.J., Wilson, K.S., and Henrissat, B. 1997a. Nomenclature for sugar binding
subsites in glycosyl hydrolases. Biochem. J. 321: 557-559.
Davies, G.J., Wilson, K.S., and Henrissat, B. 1997b. Nomenclature for sugar-binding
subsites in glycosyl hydrolases [letter]. Biochem. J. 321: 557-559.
Divne, C., Ståhlberg, J., Reinikainen, T., Ruohonen, L., Pettersson, G., Knowles, J.K.,
Teeri, T.T., and Jones, T.A. 1994. The three-dimensional crystal structure of the
catalytic core of cellobiohydrolase I from Trichoderma reesei. Science 265: 524528.
Divne, C., Ståhlberg, J., Teeri, T.T., and Jones, T.A. 1998. High-resolution crystal
structures reveal how a cellulose chain is bound in the 50 Å long tunnel of
cellobiohydrolase I from Trichoderma reesei. J. Mol. Biol. 275: 309-325.
Engh, R.A., and Huber, R. 1991. Accurate bond and angle parameters for X-ray protein
structure refinement. Acta Crystallogr. A47: 392-400.
Fägerstam, L., Håkansson, U., Pettersson, G., and Andersson, L. 1977. Purification of
three different cellulolutic enzymes from Trichoderma viride QM 9414 on a
large scale. In Proceedings of Bioconversion Symposium, Feb 21-23. (ed. T.
Gohose), pp. 165-178. Indian Institute of Technology, New Delhi.
Fägerstam, L.G., and Pettersson, L.G. 1980. The 1,4-beta-glucan cellobiohydrolases of
Trichoderma reesei QM 9414. A new type of cellulolytic synergism. FEBS
Letters 119: 97-100.
Fowler, T., and Brown, R.D., Jr. 1992. The bgl1 gene encoding extracellular betaglucosidase from Trichoderma reesei is required for rapid induction of the
cellulase complex. Mol. Microbiol. 6: 3225-3235.
Gaboriaud, C., Bissery, V., Benchetrit, T., and Mornon, J.P. 1987. Hydrophobic cluster
analysis: an efficient new way to compare and analyse amino acid sequences.
FEBS Lett. 224: 149-155.
Goedegebuur, F., Fowler, T., Phillips, J., Van Der Kley, P., Van Solingen, P.,
Dankmeyer, L., and Power, S.D. 2002. Cloning and relational analysis of 15
59
novel fungal endoglucanases from family 12 glycosyl hydrolase. Curr. Genet.
41: 89-98.
Gottschalk, G. 1988. Cellulose degradation and the carbon cycle. In Biochemistry of
and Genetics of Cellulose Degradation. (ed. J. Aubert), pp. 3-8. Academic Press,
London.
Håkansson, U., Fägerstam, L., Pettersson, G., and Andersson, L. 1978. Purification and
characterization of a low molecular weight 1,4-beta-glucan glucanohydrolase
from the cellulolytic fungus Trichoderma viride QM 9414. Biochim. Biophys.
Acta 524: 385-392.
Harrison, M.J., Nouwens, A.S., Jardine, D.R., Zachara, N.E., Gooley, A.A.,
Nevalainen, H., and Packer, N.H. 1998. Modified glycosylation of
cellobiohydrolase I from a high cellulase-producing mutant strain of
Trichoderma reesei. Eur. J. Biochem. 256: 119-127.
Hayn, M., Klinger, R., and Esterbauer, H. 1993. Isolation and partial characterization of
a low molecular weight endoglucanase from Trichoderma reesei. In
Trichoderma Reesei Cellulases and Other Hydrolases. (eds. P. Suominen, and T.
Reinikainen), pp. 153-158. Foundation for Biotechnical and Industrial
Fermentation Research, Helsinki, Finland.
Henrissat, B. 1998. Glycosidase families. Biochem. Soc. Trans. 26: 153-156.
Henrissat, B., Claeyssens, M., Tomme, P., Lemesle, L., and Mornon, J.P. 1989.
Cellulase families revealed by hydrophobic cluster analysis. Gene 81: 83-95.
Henrissat, B., and Davies, G. 1997. Structural and sequence-based classification of
glycoside hydrolases. Curr. Opin. Struct. Biol. 7: 637-644.
Henrissat, B., Driguez, H., Viet, C., and SchÜlein, M. 1985. Synergism of cellulases
from Trikoderma reesei in the degradation of cellulose. Bio/Technology 3: 722726.
Hon, D. 1994. Cellulose: a random walk along its historical path. Cellulose 1: 1-25.
Ilmen, M., Saloheimo, A., Onnela, M.-L., and Penttila, M.E. 1997. Regulation of
Cellulase Gene Expression in the Filamentous Fungus Trichoderma reesei.
Appl.Environ.Microbiol. 63: 1298-1306.
Imai, T., Boisset, C., Samejima, M., Igarashi, K., and Sugiyama, J. 1998. Unidirectional
processive action of cellobiohydrolase Cel7A on Valonia cellulose microcrystals.
FEBS Lett. 432: 113-116.
Irwin, D.C., Spezio, M., Walker, L.P., and Wilson, D.B. 1993. Activity studies of eigth
purified cellulases: specificity, synergism, and binding domain effects.
Biotechnol. and Bioeng. 42: 1002-1013.
Jaenicke, R. 2000. Stability and stabilization of globular proteins in solution. J.
Biotechnol. 79: 193-203.
Jones, T.A., and Kjeldgaard, M.O. 1997. Electron-density map interpretation. Methods
Enzymol. 277: 173-208.
60
Jones, T.A., Zou, J.-Y., Cowan, S.W., and Kjeldgaard, M. 1991. Improved methods for
building protein models in electron density maps and the location of errors in
these models. Acta Crystallogr. A47: 110-119.
Karlsson, J., Saloheimo, M., Siika-Aho, M., Tenkanen, M., Penttila, M., and Tjerneld,
F. 2001. Homologous expression and characterization of Cel61A (EG IV) of
Trichoderma reesei. Eur. J. Biochem. 268: 6498-6507.
Karlsson, J., Siika-aho, M., Tenkanen, M., and Tjerneld, F. 2002. Enzymatic properties
of the low molecular mass endoglucanases Cel12A (EG III) and Cel45A (EG V)
of Trichoderma reesei. J. Biotechnol. 99: 63-78.
Keitel, T., Simon, O., Borriss, R., and Heinemann, U. 1993. Molecular and active-site
structure of a Bacillus 1,3-1,4-beta- glucanase. Proc. Natl. Acad. Sci. 90: 52875291.
Kleywegt, G.J., and Jones, T.A. 1996. Phi/Psi-chology: Ramachandran revisited.
Structure 4: 1395-1400.
Kleywegt, G.J., and Jones, T.A. 1997. Detecting folding motifs and similarities in
protein structures. Methods Enzymol. 277: 525-545.
Kleywegt, G.J., Zou, J.Y., Divne, C., Davies, G.J., Sinning, I., Ståhlberg, J.,
Reinikainen, T., Srisodsuk, M., Teeri, T.T., and Jones, T.A. 1997. The crystal
structure of the catalytic core domain of endoglucanase I from Trichoderma
reesei at 3.6 Å resolution, and a comparison with related enzymes. J. Mol. Biol.
272: 383-397.
Kleywegt, G.L., and Jones, T.A. 1999. Software for handling macro molecular
envelopes. Acta Crystallog. D55: 941-944.
Knight, S.D. 2000. RSPS version 4.0: a semi-interactive vector-search program for
solving heavy-atom derivatives. Acta Crystallog. D56: 42-47.
Koivula, A., Kinnari, T., Harjunpaa, V., Ruohonen, L., Teleman, A., Drakenberg, T.,
Rouvinen, J., Jones, T.A., and Teeri, T.T. 1998a. Tryptophan 272: an essential
determinant of crystalline cellulose degradation by Trichoderma reesei
cellobiohydrolase Cel6A. FEBS Lett. 429: 341-346.
Koivula, A., Linder, M., and Teeri, T. 1998b. Structure-funktion relationship in
Trichoderma cellulolytic enzymes. In Trichoderma and Gliocladium. (eds. G.
Harman, and C. Kubicek), pp. 3-23. Taylor and Francis Ltd, London.
Koshland, P.J. 1953. Stereochemistry and the mechanism of enzymatic reactions. Biol.
Rev. 28: 416-436.
Kraulis, J., Clore, G.M., Nilges, M., Jones, T.A., Pettersson, G., Knowles, J., and
Gronenborn, A.M. 1989. Determination of the three-dimensional solution
structure of the C-terminal domain of cellobiohydrolase I from Trichoderma
reesei. A study using nuclear magnetic resonance and hybrid distance geometrydynamical simulated annealing. Biochemistry 28: 7241-7257.
61
Kubicek, C., and Penttilä, M. 1998. Regulation of production of plant polysaccharide
degrading enzymes by. In Trichoderma and Gliocladium. (eds. G. Harman, and
C. Kubicek), pp. 49-72. Taylor and Francis Ltd, London.
Kubicek, C.P., Messner, R., Gruber, F., Mandels, M., and Kubicek-Pranz, E.M. 1993.
Triggering of cellulase biosynthesis by cellulose in Trichoderma reesei.
Involvement of a constitutive, sophorose-inducible, glucose-inhibited betadiglucoside permease. J. Biol. Chem. 268: 19364-19368.
Kuhls, K., Lieckfeldt, E., Samuels, G.J., Kovacs, W., Meyer, W., Petrini, O., Gams, W.,
Borner, T., and Kubicek, C.P. 1996. Molecular evidence that the asexual
industrial fungus Trichoderma reesei is a clonal derivative of the ascomycete
Hypocrea jecorina. Proc. Natl. Acad. Sci. 93: 7755-7760.
La Fortelle, E.d., Irwin, J.J., and Bricogne, G. 1997. SHARP: A Maximum-Likelihood
Heavy-Atom Parameter Refinement and Phasing Program for the MIR and MAD
Methods. In Crystallographic Computing.
Lehmann, M., Loch, C., Middendorf, A., Studer, D., Lassen, S.F., Pasamontes, L., van
Loon, A.P., and Wyss, M. 2002. The consensus concept for thermostability
engineering of proteins: further proof of concept. Protein Eng. 15: 403-411.
Lehmann, M., Pasamontes, L., Lassen, S.F., and Wyss, M. 2000. The consensus
concept for thermostability engineering of proteins. Biochim. Biophys. Acta
1543: 408-415.
Lehmann, M., and Wyss, M. 2001. Engineering proteins for thermostability: the use of
sequence alignments versus rational design and directed evolution. Curr. Opin.
Biotechnol. 12: 371-375.
Linder, M., and Teeri, T.T. 1996. The cellulose-binding domain of the major
cellobiohydrolase of Trichoderma reesei exhibits true reversibility and a high
exchange rate on crystalline cellulose. Proc. Natl. Acad. Sci. 93: 12251-12255.
Maheshwari, R., Bharadwaj, G., and Bhat, M.K. 2000. Thermophilic fungi: their
physiology and enzymes. Microbiol. Mol. Biol. Rev 64: 461-488.
Matthews, B.W. 1968. Solvent content of protein crystals. J. Mol. Biol. 33: 491-497.
Mattinen, M.-L., Linder, M., Teleman, A., and Annila, A. 1997. Interaction between
cellohexaose and cellulose binding domains from Trichoderma reesei cellulases.
FEBS Letters 407: 291-296.
Mattinen, M.L., Linder, M., Drakenberg, T., and Annila, A. 1998. Solution structure of
the cellulose-binding domain of endoglucanase I from Trichoderma reesei and
its interaction with cello-oligosaccharides. Eur. J. Biochem. 256: 279-286.
McCarter, J.D., and Withers, S.G. 1994. Mechanisms of enzymatic glycoside
hydrolysis. Curr. Opin. Struct. Biol. 4: 885-892.
McPherson, A.J. 1982. Preparation and Analysis of Protein Crystals. John Wiley and
Sons, New York.
McPherson, A.J. 1999. Crystallization of biological macromolecules. Cold Spring
Harbor Laboratory Press, New York.
62
Medve, J., Karlsson, J., Lee, D., and Tjerneld, F. 1998. Hydrolysis of microcrystalline
cellulose by cellobiohydrolase I and endoglucanase II from Trichoderma reesei:
adsorption, sugar production pattern, and synergism of the enzymes. Biotechnol.
Bioeng. 59: 621-634.
Meyer, W., Morawetz, R., Börner, T., and Kubicek, C. 1992. The use of DNAfingerprint analysis in the classification of some species of the Trichoderma
aggregate. Current Genetic 21: 27-30.
Mohr, H., and Schopfer, P. 1995. Plant physiology. Springer Verlag, Berlin. Muilu, J.,
Törrönen, A., Perakyla, M., and Rouvinen, J. 1998. Functional conformational
changes of endo-1,4-xylanase II from Trichoderma reesei: a molecular dynamics
study. Proteins 31: 434-444.
Murshudov, G.N., Vagin, A.A., and Dodson, E.J. 1997. Refinement of macromolecular
structures by the maximum-likelihood method. Acta Crystallogr. D53: 240-255.
Navaza, J. 1994. AMoRe: an automated package for molecular replacement. Acta
Crystallogr. A50: 157-163.
Nidetzky, B., Steiner, W., Hayn, M., and Claeyssens, M. 1994. Cellulose hydrolysis by
the cellulases from Trichoderma reesei: a new model for synergistic interaction.
Biochemical J. 298: 705-710.
Nieduszynski, I.A., and Preston, R.D. 1970. Crystallite size in natural cellulose. Nature
225: 273-274.
Nutt, A., Sild, V., Pettersson, G., and Johansson, G. 1998. Progress curves: A mean for
functional classification of cellulases. Eur. J. Biochem. 258: 200-206.
Okada, H., Mori, K., Tada, K., Nogawa, M., and Morikawa, Y. 2000. Identification of
active site carboxylic residues in Trichoderma reesei endoglucanase Cel12A by
site-directed mutagenesis. J. Mol. Catalysis B: 249-255.
Okada, H., Tada, K., Sekiya, T., Yokoyama, K., Takahashi, A., Tohda, H., Kumagai,
H., and Morikawa, Y. 1998. Molecular characterization and heterologous
expression of the gene encoding a low-molecular-mass endoglucanase from
Trichoderma reesei QM9414. Appl. Environ. Microbiol. 64: 555-563.
Otwinowski, Z. 1991. Maximum likelihood refinement of heavy atom parameters. In
Isomorphous Replacement and Anomalous Scattering. (eds. P. Evans, and A.
Leslie), pp. 80-85, SERC Daresbury Laboratory, UK.
Otwinowski, Z., and Minor, W. 1997. Processing of X-ray diffraction data collected in
oscillation mode. Methods Enzymol. 276: 307-326.
Penttila, M., Lehtovaara, P., Nevalainen, H., Bhikhabhai, R., and Knowles, J. 1986.
Homology between cellulase genes of Trichoderma reesei: complete nucleotide
sequence of the endoglucanase I gene. Gene 45: 253-263.
Perl, D., Mueller, U., Heinemann, U., and Schmid, F.X. 2000. Two exposed amino acid
residues confer thermostability on a cold shock protein. Nat. Struct. Biol. 7: 380383.
63
Reese, E., Lavinson, H., Downing, M., and White, L. 1950. Quartermaster culture
collection. Farlowia 4: 45-86.
Reese, E.T. 1976. History of the cellulase program at the U.S. army Natick
Development Center. Biotechnol Bioeng Symp: 9-20.
Reinikainen, T., Ruohonen, L., Nevanen, T., Laaksonen, L., Kraulis, P., Jones, T.A.,
Knowles, J.K., and Teeri, T.T. 1992. Investigation of the function of mutated
cellulose-binding domains of Trichoderma reesei cellobiohydrolase I. Proteins
14: 475-482.
Reverbel-Leroy, C., Pages, S., Belaich, A., Belaich, J.P., and Tardif, C. 1997. The
Processive Endocellulase CelF, a Major Component of the Clostridium
cellulolyticum Cellulosome: Purification and Characterization of the
Recombinant Form. J. of Bacteriology 179: 46-52.
Rossmann, M., and Blow, D.M. 1962. The detection of sub-units within the
crystallographic asymmetric unit. Acta. Crystallog. 15: 24-31.
Rouvinen, J., Bergfors, T., Teeri, T., Knowles, J.K., and Jones, T.A. 1990. Threedimensional structure of cellobiohydrolase II from Trichoderma reesei. Science
249: 380-386.
Saloheimo, A., Henrissat, B., Hoffren, A.M., Teleman, O., and Penttila, M. 1994. A
novel, small endoglucanase gene, egl5, from Trichoderma reesei isolated by
expression in yeast. Mol. Microbiol. 13: 219-228.
Saloheimo, M., Lehtovaara, P., Penttilää, M., Teeri, T.T., Ståhlberg, J., Johansson, G.,
Pettersson, G., Claeyssens, M., Tomme, P., and Knowles, J.K.C. 1988. EGIII, a
new endoglucanase from Trichoderma reesei: the characterization of both gene
and enzyme. Gene 63: 11-21.
Saloheimo, M., Nakari-Setala, T., Tenkanen, M., and Penttila, M. 1997. cDNA cloning
of a Trichoderma reesei cellulase and demonstration of endoglucanase activity
by expression in yeast. Eur J Biochem 249: 584-591.
Shaw, A., Bott, R., and Day, A.G. 1999. Protein engineering of alpha-amylase for low
pH performance. Curr. Opin. Biotechnol. 10: 349-352.
Shoemaker, S.P., and Brown, R.D., Jr. 1978. Characterization of endo-1,4-beta-Dglucanases purified from Trichoderma viride. Biochim. Biophys. Acta. 523: 147161.
Shoemaker, S.P., Watt, K., Tsitovsky, G., and Cox, R. 1983. Characterisation and
properties of cellulases purified from Trichoderma reesei strain L27.
Bio/Technology 1: 687-690.
Simmons, E. 1977. Classification of some cellulase producing Trichoderma species. In
2nd International Mycological Congress. (ed. H.E.a.S. Bigelow, E.G.), pp. 618,
Tampa University of South Florida.
Sinnott, M.L. 1990. Catalytic mechanisms of enzymic glycosyl transfer. Chem. Rev. 90:
1171-1202.
64
Sjöström, E. 1993. Wood Chemistry Fundamentals and applications, 2 ed. Academic
Press Inc., London.
Sprey, B., and Bochem, H.P. 1992. Effect of endoglucanase and cellobiohydrolase from
Trichoderma reesei on cellulose microfibril structure. FEMS Microbiol. Lett. 97:
113-118.
Sprey, B., and Ülker, A. 1992. Isolation and properties of a low molecular mass
endoglucanase from Trichoderma reesei. FEMS Microbiol. Lett. 92: 253-257.
Srisodsuk, M., Reinikainen, T., Penttilä, M., and Teeri, T.T. 1993. Role of the
interdomain linker peptide of Trichoderma reesei cellobiohydrolase I in its
interaction with crystalline cellulose. J. of Biological Chemistry 268: 2076620761.
Ståhlberg, J. 1991. Functional organization of cellulases from Trichoderma reesei. In
Doctoral thesis. Acta Universitatis Upsaliensis. Comprehensive Summaries of
Uppsala Dissertations from the Faculty of Science 344. 45pp, Uppsala. ISBN 91554-2800-2. Uppsala University.
Ståhlberg, J., Johansson, G., and Pettersson, G. 1988. A binding-site-deficient,
catalytically active, core protein of endoglucanase III from the culture filtrate of
Trichoderma reesei. Eur. J. Biochem. 173: 179-183.
Sternberg, D., and Mandels, G.R. 1980. Regulation of the cellulolytic system in
Trichoderma reesei by sophorose: induction of cellulase and repression of betaglucosidase. J. Bacteriol. 144: 1197-1199.
Sulzenbacher, G., Mackenzie, L.F., Wilson, K.S., Withers, S.G., Dupont, C., and
Davies, G.J. 1999. The crystal structure of a 2-fluorocellotriosyl complex of the
Streptomyces lividans endoglucanase CelB2 at 1.2 Å resolution. Biochemistry
38: 4826-4833.
Teeri, T.T., Lehtovaara, P., Kauppinen, S., Salovuori, I., and Knowles, J. 1987.
Homologous domains in Trichoderma reesei cellulolytic enzymes: gene
sequence and expression of cellobiohydrolase II. Gene 51: 43-52.
Tomme, P., Warren, R.A., and Gilkes, N.R. 1995. Cellulose hydrolysis by bacteria and
fungi. Adv. Microb.Physiol. 37: 1-81.
Ülker, A., and Sprey, B. 1990. Characterization of an unglycosylated low molecular
weight 1,4-beta- glucan-glucanohydrolase of Trichoderma reesei. FEMS
Microbiol Lett 69: 215-219.
Vaheri, M., Leisola, M., and Kauppinen, V. 1979. Transglycosylation products of the
cellulase system of Trichoderma Reesei. Biotechnology Letters 1: 41-46.
van Solingen, P., Meijer, D., van der Kleij, W.A., Barnett, C., Bolle, R., Power, S.D.,
and Jones, B.E. 2001. Cloning and expression of an endocellulase gene from a
novel streptomycete isolated from an East African soda lake. Extremophiles 5:
333-341.
Ward, M., Wu, S., Dauberman, J., Weiss, G., Larenas, E., Bower, B., Rey, M.,
Clarkson, K., and Bott, R. 1993. Cloning, Sequence and Preliminary Structural
65
Analysis of a Small, High pI Endoglucanase (EGIII) from Trichoderma reesei. In
The Tricell 93 symposium. (eds. P. Suominen, and T. Reinikainen), pp. 153-158.
Foundation for Biotechnical and Industrial Fermentation Research, Espoo,
Finland.
Watanabe, H., Noda, H., Tokuda, G., and Lo, N. 1998. A cellulase gene of termite
origin. Nature 394: 330-331.
Wey, T.T., Hseu, T.H., and Huang, L. 1994. Molecular cloning and sequence analysis
of the cellobiohydrolase I gene from Trichoderma koningii G-39. Curr.
Microbiol. 28: 31-39.
Williamson, R.E., Burn, J.E., and Hocart, C.H. 2002. Towards the mechanism of
cellulose synthesis. Trends Plant Sci. 7: 461-467.
Wolfenden, R., Lu, X., and Young, G. 1998. Spontaneous hydrolysis of glycosides. J.
of American Chem. Soc. 120: 6814-6815.
Xu, B., Hellman, U., Ersson, B., and Janson, J.C. 2000. Purification, characterization
and amino-acid sequence analysis of a thermostable, low molecular mass endobeta-1,4-glucanase from blue mussel, Mytilus edulis. Eur. J. Biochem. 267:
4970-4977.
Yuan, S., Wu, Y., and Cosgrove, D.J. 2001. A fungal endoglucanase with plant cell
wall extension activity. Plant Physiol. 127: 324-333.
66
67
Appendix I. Data collection and processing statistics
T. reesei
Cel12A
A35V
T. reesei
Cel12A
WT
S. sp. 11AG8
WT
Cell12A
WT
Cel12A
H. schweinitzii
H. grisea
WT
Cel12A
T. reesei
P201C
Cel12A
H. grisea
G2
Cel12A
H. grisea
G5
Cel12A
G2SG2
G4
H. grisea
Cel12A
H. grisea
Cel12A
69.7
71.6
121.4
91.5°
a=
b=
c=
E=
Cell parameters (Å)
b
R merge (%)
15.9 (4.0)
6.5 (23.0)
o
24.9 (2.3)
3.7 (38.1)
94.4 (94.3)
2.8
172895
490560
1.53-1.50
25-1.50
91.5
119.3
71.3
15.4 (8.6)
5.4 (16.1)
72.9 (63.5)
3.3
26559
88448
1.53-1.50
29-1.50
90o
62.6
54.6
65.2
P212121
P21
68.3
0.5°
0.5°
1.54
Raxis II
Rigaku
Home
14.0 (5.7)
9.7 (12.6)
98.9 (84.6)
4.0
85680
346287
1.73-1.70
30-1.70
98.5o
83.4
77.5
62.5
P21
0.5°
0.98
ADSCQ4R
ID14 EH4
ESRF
Lund
24.1 (4.0)
7.6 (35.2)
94.6 (84.4)
18.3 (2.8)
7.4 (39.0)
99.8 (99.6)
9.1
4.3
5.5
34631
17.6 (4.8)
12.0 (37.7)
100 (99.6)
315178
184371
42832
321815
58847
1.52-1.49
50-1.49
90o
166.2
49.3
49.3
P43212
1.0
0.934
ADSCQ4R
1.73-1.70
20-1.70
120o
69.1
70.6
70.6
P31
ESRF
ID14 EH1
1.24-1.22
30-1.22
90o
165.5
49.2
49.2
P43212
0.5°
093
0.5°
MAR CCD
1.09
ID14 EH1
ESRF
MAR CCD
711
ESRF
ESRF
50-1.40
20.4 (13.8)
8.8 (31.6)
99.7 (99.2)
9.7
41537
406063
1.42-1.40
25.8 (6.4)
11.2 (36.7)
50 (12.7)
4.4 (11.4)
99.3 (90.5)
8.9
98.9 (97.8)
32477
12.6
289753
1.56-1.52
42-1.52
90o
166.0
49.3
49.3
23699
298714
1.74-1.7
50-1.7
90o
167.6
166.1
49.5
P43212
0.5
P43212
0.933
ADSCQ4R
0.5
49.5
90o
ESRF
ID14 EH2
0.933
ADSCQ4R
ID14 EH2
49.3
49.3
P43212
1.0
0.934
ADSCQ4R
ID14 EH1
Rmerge = 6hkl 6i~I – < I >~/ 6hkl 6i ~ I ~.
____________________________________________________________________________________________________________________________________________________________________
b
Numbers in parentheses are for the highest resolution bins.
a
____________________________________________________________________________________________________________________________________________________________________
I/V(I)
92.0 (84.5)
Average multiplicity
a
2.6
No. of unique reflections
Completeness (%)
230270
87132
No. of observed reflections
1.93-1.90
P21
Space group
29-1.90
0.5°
Oscillation range
Resolution range outer shell
1.54
Resolution range (Å)
MAR CCD
Raxis II
Detector
Wavelength Å
0.93
ESRF
ID14 EH1
Home
Rigaku
Collected
____________________________________________________________________________________________________________________________________________________________________
Data set
____________________________________________________________________________________________________________________________________________________________________
68
T. reesei
Cel12A
A35V
T. reesei
Cel12A
WT
S. sp. 11AG8
WT
Cell12A
H. schweinitzii
WT
Cel12A
H. grisea
WT
Cel12A
T. reesei
P201C
Cel12A
H. grisea
G2
Cel12A
H. grisea
G5
Cel12A
H. grisea
G4
Cel12A
H. grisea
G2SG2
Cel12A
6
218
9972
Protein molecules in AU
Residues in protein
Protein atoms
84
0
NAG atoms
Ligand atoms
Average overall <B> factor (Å2)
1OA4
117
0
0
821
18.1; 19.3
1
222
1686
29-1.5
25464
1OA3
598
0
4
56
0
2573
19.2; 21.7
4
218
6648
29-1.70
83042
xxx
xxx
289
0
2
28
0
300
17
0
1331
20.9; 25.1
2
218
3320
20-1.7
40981
1793
13.5; 15.1
1
224
1832
30-1.22
56977
xxx
199
5
23
1089
14.9; 16.1
1
224
1832
47-1.5
33445
246
0
45
169
7
45
224
1832
224
1832
191
6
45
31369
1027
14.5; 18.3
1
22674
985
16.6; 20.7
1
40152
1298
14.8; 16.8
1
224
1832
xxx
47-1.5
xxx
48-1.7
xxx
47-1.4
____________________________________________________________________________________________________________________________________________________________________
Values were calculated with O (Jones et al. 1991), (Jones and Kjeldgaard 1997), CNS (Brünger et al. 1998), MOLEMAN [Kleywegt,
a
1996 #1744], and LSQMAN (Kleywegt and Jones 1997). Refmac 5.0 (Murshudov et al. 1997). From Eng & Hubert (Engh and Huber 1991)
b
According to the stringent boundary definition of (Kleywegt and Jones 1996).
Average protein <B> factor (Å2)
Average water <B> factor (Å2)
11.8
14.6
20.1
16.2
11.2
13.1
15.4
12.5
10.9
13.2
9.9
13.8
29.9
15.5
10.5
11.4
14.8
11.1
9.3
12.1
21.4
24.3
25.9
19.0
23.3
23.0
22.7
20.7
23.3
23.2
a
0.011
0.011
0.013
0.015
0.011
0.010
0.013
0.005
0.010
0.011
RMSD bond lengths (Å)
1.6
1.5
1.7
1.7
RMSD bond angels (º)a
1.3
1.5
1.5
1.3
1.6
1.7
2.0
1.9
2.1
1.5
1.3
2.1
RMSD 'B on bonded atoms (Å2) 1.3
1.3
1.2
1.2
Average RMSD NCS CD (Å)
0.4
0.5
0.2
0.5
Average RMSD NCS all at. (Å) 0.5
0.7
0.5
0.6
b
0.7
0.7
0.5
2.0
0.9
1.0
0.7
0.5
0.5
2.0
Ramachandran outliers (%)
____________________________________________________________________________________________________________________________________________________________________
84
0
643
0
6
Waters
1180
Residues with double conform. 0
N-glycosylation (NAG) residues 6
6
218
9978
5174
20.5; 22.3
2571
18.9; 23.3
test set
R & Rfree factor (%)
1OA2
20-1.5
167265
1H8V
Resolution used in refinement (Å) 29-1.9
Reflections in: working set
84281
PDB access codes
____________________________________________________________________________________________________________________________________________________________________
Protein
____________________________________________________________________________________________________________________________________________________________________
Appendix II. Structure refinement and final model statistics
Acta Universitatis Upsaliensis
Comprehensive Summaries of Uppsala Dissertations
from the Faculty of Science and Technology
Editor: The Dean of the Faculty of Science and Technology
A doctoral dissertation from the Faculty of Science and Technology,
Uppsala University, is usually a summary of a number of papers. A few
copies of the complete dissertation are kept at major Swedish research
libraries, while the summary alone is distributed internationally through
the series Comprehensive Summaries of Uppsala Dissertations from the
Faculty of Science and Technology. (Prior to October, 1993, the series was
published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology”.)
Distributor:
Uppsala University Library,
Box 510, SE-751 20 Uppsala, Sweden
ISSN 1104-232X
ISBN 91-554-5562-X