human and mouse proteases: a comparative genomic approach

REVIEWS
HUMAN AND MOUSE PROTEASES:
A COMPARATIVE GENOMIC
APPROACH
Xose S. Puente*, Luis M. Sánchez*, Christopher M. Overall‡ and Carlos López-Otín*
The availability of the human and mouse genome sequences has allowed the identification and
comparison of their respective degradomes — the complete repertoire of proteases that are
produced by these organisms. Because of the essential roles of proteolytic enzymes in the control
of cell behaviour, survival and death, degradome analysis provides a useful framework for the
global exploration of these protease-mediated functions in normal and pathological conditions.
PROTEASOME
An intracellular protein complex
that is responsible for degrading
intracellular proteins that have
been tagged for destruction by
ubiquitin.
NUCLEOPHILE
A chemical group that can
donate a pair of electrons in a
chemical reaction.
*Departamento de
Bioquímica y Biología
Molecular, Facultad de
Medicina, Instituto
Universitario de Oncología,
Universidad de Oviedo,
33006–Oviedo, Spain.
‡
Departments of
Biochemistry and Molecular
Biology, and Oral Biological
and Medical Sciences,
C.I.H.R. Group in Matrix
Dynamics, University of
British Columbia,
Vancouver, British Columbia
V6T 1Z3, Canada.
Correspondence to C.L.-O.
e-mail: [email protected]
doi:10.1038/nrg1111
544
Proteases perform essential functions in all living
organisms. They were initially recognized as gastric
juice proteolytic enzymes that were involved in the
nonspecific degradation of dietary proteins. However,
recent advances have provided a new view of the proteolytic world. As well as mediating nonspecific protein hydrolysis, proteases also act as processing
enzymes that perform highly selective, limited and
efficient cleavage of specific substrates, which initiates irreversible decisions at the post-translational
level that influence many biological processes.
Proteolytic processing events are fundamental in
ovulation, fertilization, embryonic development,
bone formation, control of homeostatic tissue
remodelling, neuronal outgrowth, antigen presentation, cell-cycle regulation, immune and inflammatory cell migration and activation, wound healing,
angiogenesis and apoptosis1,2. Accordingly, alterations
in the structure and expression patterns of proteases
underlie many human pathological processes including cancer, arthritis, osteoporosis, neurodegenerative
disorders and cardiovascular diseases1–6.
This impressive diversity in protease functions
derives from the evolutionary invention of enzymes
with structural designs that range from simple catalytic
devices with a minimal domain organization7, through
giant proteases such as tripeptidyl peptidase II (REF. 8), to
precisely engineered protein-degradation machines,
such as the PROTEASOME9. In terms of specificity, some
| JULY 2003 | VOLUME 4
proteases show high fidelity with exquisite specificity in
their ability to target a unique protein, whereas others
are clearly promiscuous, with an indiscriminate
degradative activity against many substrate partners.
Proteases also use distinct strategies to define their intraor extracellular spatial localization, and in many cases
act in the context of complex networks that comprise
distinct proteases, substrates, inhibitors, receptors and
binding proteins. The availability of the human and
mouse euchromatic genomic sequences has raised the
possibility of addressing global questions about the
degradome — the complete set of proteases that are
expressed at a specific moment or circumstance by a
cell, tissue or organism10. In this review, we present a
comparative analysis of the human and mouse
degradomes and discuss the potential importance of
this global study for understanding and treating the
growing group of pathological conditions that involve
abnormal or deficient protease function.
The human degradome: 553 and counting
On the basis of the mechanism of catalysis, proteases
are classified into five distinct classes: aspartic, metallo,
cysteine, serine and threonine proteases. Proteases in
the first two classes use an activated water molecule as
a NUCLEOPHILE to attack the peptide bond of the substrate, whereas in the remaining classes the nucleophile
is a catalytic amino-acid residue (Cys, Ser or Thr,
respectively) that is located in the active site from
www.nature.com/reviews/genetics
© 2003 Nature Publishing Group
REVIEWS
BOX 1 | Using TBLASTN, InterPro and hidden Markov models to identify proteases
HIDDEN MARKOV MODEL
(HMM). A probabilistic model
that is applied to protein and
DNA sequence pattern
recognition. HMMs represent a
system as a set of discrete states
and as transitions between
those states. Each transition has
an associated probability.
HMMs are valuable because
they allow a search or alignment
algorithm to be built on firm
probabilistic bases, and the
parameters (transition
probabilities) can be easily
trained on a known data set.
ORTHOLOGUES
Homologous genes that have
originated as a result of a
speciation event.
PARALOGUES
Homologous genes that have
originated as a result of a
duplication event.
SYNTENY
Gene loci on the same
chromosome. This term is often
used to refer to gene loci in
different organisms that are
located on a chromosomal
region of common evolutionary
ancestry.
MEROPS
A database that provides a
comprehensive catalogue and
structure-based classification of
proteases and inhibitors from a
range of organisms.
EXOSITE
A substrate-binding site that lies
outside the catalytic domain of a
protease and is located on
specialized substrate-binding
modules or domains.
RETROTRANSPOSITION
The incorporation of DNA
segments in a genome through a
reverse-transcription-mediated
mechanism.
AUTOPHAGY
A nutritionally and
developmentally regulated
process that is involved in the
intracellular destruction of
endogenous proteins and the
removal of damaged organelles.
Human and mouse genomic sequences were analysed for the presence of unidentified proteases using TBLASTN at the
Ensembl genome browser. Each available protease sequence was used to query both genomes, and all hits with P<10–2
were analysed using the FASTA program against a custom degradome database. For each single protease locus, a 500-kb
genomic sequence flanking the target gene was analysed for the presence of further members of that family, as ~23% of
human and 34% of mouse protease genes are organized in clusters. Also, InterPro annotations of the public human and
mouse genomes were used to identify putative new members of known families. Ensembl predictions containing
InterPro protease motifs were manually inspected to distinguish between true proteases, pseudogenes and false
positives. We also built a HIDDEN MARKOV MODEL (HMM) for each of the 63 different protease families that were present in
human and mouse. Further HMMs for protease families that were not described in mammals were obtained from the
Pfam database and used to screen protein predictions from Ensembl (Releases 12.31.1 and 12.3.1), FANTOM 2.1 DB and
RefSeq (version 10 February 2003). The combined application of these strategies led to the identification of 72 human
and 137 mouse putative proteases that were not present in MEROPS, although many of these were already present as
gene predictions in the National Center for Biotechnology Information (NCBI) databases. Fifteen of the 72 human
proteases, and 63 of the 137 mouse proteases, correspond to ORTHOLOGUES of previously known proteases in one of these
organisms. Twenty-eight known mouse genes were found to be orthologues of known human proteases, despite being
previously classified as PARALOGUES. Orthology assignation was based on four different criteria: SYNTENY, amino-acid
sequence identity (>70%), function conservation and relevant supporting literature.
which the class names derive. The different classes can
be further divided into families on the basis of aminoacid sequence comparison, and these families can be
assembled into clans on the basis of similarities in their
three-dimensional (3D) folding11.
By using primary information that was retrieved
from public and private sequencing projects12,13, combined with data from the MEROPS, InterPro and Ensembl
databases (BOX 1), and our own experimental data, we
have annotated a total of 553 genes that encode proteases
or protease homologues in the human genome (TABLE 1;
BOX 2). Ninety-three of these proteins seem to be catalytically inactive proteases owing to substitutions in specific
amino-acid residues that are located in critical active-site
regions (TABLE 1). These inactive homologues are abundant in some protease families and might have important
roles as regulatory or inhibitory molecules, acting as
dominant negatives by binding substrates through the
inactive catalytic or EXOSITE ancillary domains in nonproductive complexes, or by titrating inhibitors from the
milieu to increase the net proteolytic activity10.
We have identified more than 200 protease pseudogenes in the human genome, including processed
pseudogenes that have arisen by RETROTRANSPOSITION, and
non-processed pseudogenes that have resulted from
duplication and the accumulation of frameshifts and stop
codons in the duplicated gene. Also, more than 150
sequences are related to aspartic proteases that are
embedded in endogenous retroviral elements, but they
have not been included in the catalogue of human proteases (see Supplementary Tables S1–S5 online). The
most recent release of the MEROPS database (6.2;
24 March 2003) annotates 548 entries for human proteases, although it includes a number of pseudogenes.
Seventy-two human proteases that are included in our
catalogue are absent from MEROPS. The phylogenetic
tree that is shown in FIG. 1 reflects the distribution of the
553 annotated human proteases and homologues in the
different catalytic classes. Metalloproteases and serine
proteases are the most densely populated groups, with
186 and 176 members, respectively, followed by 143
NATURE REVIEWS | GENETICS
cysteine proteases. Threonine and aspartic proteases are
highly specialized and are therefore less numerous with
27 and 21 members, respectively. Following the family
classification that is used in the MEROPS database, we
conclude that the human proteases belong to 63 different
families, the largest being the 01 family of serine proteases. Other families with many representatives are the
ubiquitin-specific proteases (USPs) and the disintegrin
and metalloproteases (ADAMs), whereas there are several
families with a single member in the human genome
(Supplementary Tables S1–S5 online). Interestingly, 125
(22%) of the catalogued proteases are membrane-bound
proteins; this emphasizes the relevance of the proteolytic
processes that take place at this cellular interface14.
The annotated human degradome is likely to continue to grow in the near future (BOX 3), as new
enzymes with unusual structures and catalytic mechanisms are identified and characterized. There are
recent examples of experimental work that has led to
the unmasking of some of these ‘hidden proteases’,
including: JAMM proteins, which are a new family of
deubiquitylating metalloproteases15; rhomboid proteins, which are atypical serine proteases that were
first described in Drosophila16; autophagins, which are
a family of cysteine proteases that are involved in cell
death by AUTOPHAGY17; and signal-peptide peptidases,
which are aspartic proteases that catalyse the
intramembrane proteolysis of signal peptides 18.
Koonin et al. have also described new superfamilies
of predicted cysteine and aspartic proteases that
include several human paralogues19,20. Nevertheless,
some newly identified proteases remain as predictions without experimental evidence for proteolytic
activity, and further work will be necessary to show
their enzymatic properties.
The mouse degradome: increased complexity
The recent availability of the first version of the mouse
genome sequence21 will accelerate the structural and
functional characterization of the human degradome.
The identification of mouse orthologues of human
VOLUME 4 | JULY 2003 | 5 4 5
© 2003 Nature Publishing Group
REVIEWS
Table 1 | The human and mouse protease genes and pseudogenes
Human/mouse
Total number
Number of proteases/
protease homologues
Number per catalytic class
Aspartic
Cysteine
Metallo
Serine
Human proteases
553
460/93
21
143
186
176
27
Mouse proteases
628
525/103
27
153
197
227
24
Orthologous pairs
24
Threonine
514
429/85
21
127
176
166
Human specific
35
28/7
0
15
8
9
3
Mouse specific
85
73/12
5
25
12
43
0
Human gene/
mouse pseudogene
4
3/1
0
1
2
1
0
Mouse gene/
human pseudogene
29
22/7
1
1
9
18
0
PARALOGONS
Chromosomal regions that
contain groups of paralogous
genes in the same order, which
have presumably arisen by the
duplication of large genomic
fragments.
546
protease genes provides essential information to perform
evolutionary studies, identify regulatory elements and
create knockout and knock-in models that are useful
to examine protease functions in vivo. Although the
mouse genome is ~14% smaller than the euchromatic
human genome, its degradome is larger — it contains
628 proteases and protease homologues. The distribution of proteases in the different classes is shown in
FIG. 1 and TABLE 1. Note that sequences that are related
to proteases of endogenous retroviruses and the identified protease pseudogenes were not included in our
list of mouse proteases (Supplementary Tables S1–S5
online). We include 138 putative mouse proteases and
homologues that were absent from the last release of
the MEROPS database. Comparative analysis between
human and mouse degradomes indicates a high percentage (82%) of mouse genes with a strict orthologue in the human genome. The assignation of
orthology was mainly based on sequence identities
(mean 83%) and location in regions of conserved synteny. In some cases, especially in those protease clusters that contain several paralogous genes and
pseudogenes, it has been difficult to decide which
mouse sequences are bona fide orthologues of the corresponding human genes. There might also be cases of
orthologues that are difficult to recognize because of
their rapid evolution in one or both lineages22 (low
degree of identity; ~50%), such as MMP1 and
McolA23. Nevertheless, beyond these difficulties in
orthology assignment, there are clear examples of
protease genes that are unique in human or mouse
lineages. We could not find a human counterpart for
85 of the 628 genes analysed. Similarly, 35 genes seem
to be specific to the human lineage. In principle, these
differences might result from specific deletion events
or the creation of new protease genes in one of the
}lineages, although the possibility that the orthologues
of any of the lineage-specific genes are missing owing
to the incompleteness of the available genome
sequences cannot be ruled out. Nevertheless, detailed
analysis of these differences indicates that most derive
from changes in the number of paralogous genes in
protease gene families that are present in the genomes
of both species.
| JULY 2003 | VOLUME 4
Mouse protease genes that are absent in human.
A remarkable example of local gene expansions that have
occurred in the mouse degradome is that of placental
cathepsins, which are a group of eight cysteine protease
genes that are present on mouse chromosome 13B3 and
absent from the human genome24,25. There are also three
placental cathepsin pseudogenes in this region, which
reflects the dynamic nature of gene birth and death during the evolution of this family. Similarly, the SENP family of sentrin-specific proteases26 is expanded in the
mouse lineage — there are 14 members in the mouse
but seven in the human. There are also nine testases,
which are a subfamily of testis-specific ADAMs that are
located at mouse chromosome 8B (REF. 27), for which we
have not found human orthologues. The family of tissue
kallikreins has also been expanded in the mouse
genome. This large cluster of serine protease genes on
mouse chromosome 7B2 contains 28 genes and several
pseudogenes; the equivalent human cluster located at
19q13 contains just 15 functional genes28,29. The interest
in human kallikreins is growing as a result of their frequently altered expression patterns in tumour
processes30. Indeed, prostate-specific antigen (PSA), one
of the most relevant tumour markers, is encoded by
KLK3, which is a member of this gene family31. Human
kallikreins are frequently expressed in reproductive
organs, although their functions and specific substrates
are unknown. Similarly, little information is available
about the functional roles of most members of the large
family of mouse kallikreins28,29.
The prolactin inducible protein (PIP) gene family
provides another example of gene-family expansion in
mice. This protein has been recently characterized as an
aspartic protease32, and is encoded by a single locus on
human chromosome 7q34. A highly divergent mouse
counterpart is located on chromosome 6B2, and further
PIP-related genes are expressed in male reproductive
organs in mice33. There are also human–mouse differences in haematopoietic serine proteases. Proteolytic
enzymes such as tryptases and chymases are the main
SECRETORY-GRANULE constituents in many haematopoiteic
cell lineages. At least seven mouse MAST-CELL chymase
genes (Mcpt1, Mcpt2, Mcpt4, Mcpt8, Mcpt9, Mcpt10 and
McptL) are absent in human34. Because mast-cell
www.nature.com/reviews/genetics
© 2003 Nature Publishing Group
REVIEWS
BOX 2 | The chromosomal landscape of human protease genes
SECRETORY GRANULE
A subcellular vesicle that
contains molecules that are
destined for secretion.
The distribution of protease genes
differs widely among human
chromosomes, as shown in the figure:
green dots represent aspartic proteases,
blue dots represent cysteine proteases,
red dots represent serine proteases,
violet dots represent threonine
proteases and black dots represent
metalloproteases. Several protease genes
occur in clusters, especially on
chromosomes 11, 16 and 19. The largest
is located in 19q13 (kallikrein locus)
and contains 15 functional genes and
several pseudogenes. Another densely
populated cluster maps at 16p13, in
which a primordial serine protease gene
duplicated repeatedly during evolution
to give rise to 10 trypsin-like genes and
three related pseudogenes. Similarly, the
matrix metalloprotease (MMP) cluster
at 11q22 contains nine genes (MMP1,
MMP3, MMP7, MMP8, MMP10,
MMP12, MMP13, MMP20 and
MMP27) and two pseudogenes. Despite
these examples of gene families that
have been formed
and expanded by local duplications,
most protease gene families have been
dynamic in their evolution and, after
duplication, the different paralogous
genes have translocated to different
chromosomes. Analysis of the
chromosomal landscape of protease
genes has also shown some signs of
large-scale duplication events in the
human genome139,140. So, a PARALOGON
that is composed of a group of related
genes at chromosomes 11q and 21q
contains several protease genes from
distinct families with a conserved
arrangement in both locations (USP2,
matriptase/ST14, ADAMTS8 and
ADAMTS15 at 11q, and USP25,
enteropeptidase/PRSS7, ADAMTS1 and
ADAMTS5 at 21q).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y
MAST CELL
A specialized cell that initiates
the inflammatory response by
releasing histamine and other
cytokines.
GRANZYME
A serine protease that is
produced by immune-system
cells and stored in secretory
granules.
COMPLEMENT
A set of plasma proteins that
form part of a proteolytic
cascade, which leads to foreigncell lysis and phagocytosis.
proteases might be involved in host defence, especially
during bacterial and parasitic infections, the expansion of
this gene family in the mouse genome might have important consequences in the development of distinctive
immune responses in rodents compared with humans.
Several GRANZYMES and trypsin-like enzymes have also
been specifically expanded in mice. Remarkably, the
COMPLEMENT C1r and C1s serine protease genes are duplicated in the mouse. One set (C1rA and C1sA) corresponds to the murine orthologues of the human genes,
whereas the other (C1rB and C1sB) is exclusively
expressed in the male genital tract, which indicates a
role for these proteases in mouse reproduction that is
NATURE REVIEWS | GENETICS
independent of complement activation35. Local duplications in mouse chromosomes 5F and 9A1 have also
generated two copies of genes that encode the singlecopy human metalloproteases ADAM-1 and MMP-1.
The resulting pairs are expressed in the mouse testis36
and placenta23, respectively.
As well as these large- and small-scale expansions
in clusters of mouse gene families, there are single
copy genes or even entire subfamilies that seem to be
absent in the human lineage. Ren2, which encodes the
aspartic protease known as submandibular renin, and
Uchl4, which encodes a ubiquitin C-terminal hydrolase, are examples from the first group. Among the
VOLUME 4 | JULY 2003 | 5 4 7
© 2003 Nature Publishing Group
REVIEWS
mouse-specific protease subfamilies, we have identified the testins, which are composed of three cysteine
proteases of the 01 family mapped at mouse chromosome 13B3 and expressed in the testis 37. There are
also several interesting examples of genes that have
been specifically inactivated in the human lineage
through diverse mechanisms. For example, human
caspase 12 has acquired mutations that abrogate its
protease function, but its murine orthologue medi-
02
03
01
ates apoptosis in response to endoplasmic-reticulum
stress38. Human cyritestins and implantation serine proteases (ISPs) are also inactivated in the human39,40. The
gene for the aspartic protease chymosin is inactivated in
humans41 but there seems to be a functional mouse
orthologue on chromosome 3F3. The ELA1 gene that
encodes pancreatic elastase has been transcriptionally
silenced in the human genome owing to a mutation that
inactivates crucial enhancer and promoter elements42.
02 01 x1
22 1
2
48
x1
15
56
01
19
54
46
50
x2 13 02
6 44
14 2
01
28
67
x1
08
x2
12
18
24
08
14 54
10
26
47
09
10
13
53
16
33
12
28
43 x2
22 16 03 x1 38 41 50
14
Human specific
Threonine
Cysteine
Mouse specific
Aspartic
Metallo
20
49
02
48
19
17
01
Serine
Figure 1 | The protease wheel. Unrooted phylogenetic tree of human and mouse proteases. Proteases are distributed in five
catalytic classes and 63 different families. The code number for each protease family is indicated in the outer ring. Protein sequences
that correspond to the protease domain from each family were aligned using the ClustalX program. Phylogenetic trees were
constructed for each family using the Protpars program. A global tree was generated using the protease domain from one member
of each family, and individual family trees were added at the corresponding positions. The figure shows the non-redundant set of
proteases. Orthologous proteases are shown in light grey, mouse-specific proteases are shown in red and human-specific
proteases in blue. Metalloproteases are the most abundant class of enzymes in both organisms, but most lineage-specific
differences are in the serine protease class, making this sector wider. The 01 family of serine proteases can be divided into 22
smaller subgroups on the basis of involvement in different physiological processes, to facilitate the interpretation of differences.
548
| JULY 2003 | VOLUME 4
www.nature.com/reviews/genetics
© 2003 Nature Publishing Group
REVIEWS
BOX 3 | Assessing the completeness of the human genome sequence
Genomic analysis of human proteases has proved valuable in assessing the completeness and accuracy of the available
human genome sequence. We have found six previously known protease genes (CAPN8, CTSE, DPP7, IFP38, MMP23A
and TPS1), the existence of which has been experimentally verified, but which are missing in the public version of the
human genome. The first four are present in the Celera version of the sequence, although the Celera Discovery System
database also lacks several genes that are found in the public genome sequence. Some of these conflicts might derive
from gaps in both genome assemblies, or from the artefactual collapse of duplicated regions, which is a frequent
problem in the informatic assembly of highly related genomic sequences. Also, there are examples of protease genes
that are incorrectly annotated or described, as well as cases of predicted genes that correspond to pseudogenes.
Nevertheless, beyond these differences, which are common with all genomic analyses, our study of human protease
genes confirms the relative completeness and high quality of the present versions of the human gene catalogue.
Human protease genes that are absent in mouse. Almost
all human protease genes that are absent in the mouse
lineage correspond to paralogous genes that belong to
differentially expanded gene families. So, there are four
γ-glutamyl transferase genes in a cluster on human
chromosome 22q and only one gene at the equivalent
region in the mouse genome. These threonine proteases
catalyse the degradation of glutathione to glutamic acid
and cysteinyl glycine43. Calpain 14, caspases-5 and -10,
PSA, matrilysin-2, mesotrypsin, ADAM-20, and several
aminopeptidases and USPs are among the human proteases that are absent in mouse. Of special interest is
USP6 (Tre2) — a member of the ubiquitin-specific protease family that is encoded by a recently characterized
hominoid-specific gene44. Beyond changes in protease
gene numbers, there are also examples of significant
human–mouse differences in the expression pattern of
orthologous protease genes23,45. Such regulatory differences might also be fundamental motors of evolutionary innovation in protease-mediated functions that take
place in each species.
PROSTATE INVOLUTION
A process by which the prostate
gland reduces its size following
androgen depletion.
EXON SHUFFLING
The process of non-homologous
recombination of exons from
different genes.
PRODOMAIN
A sequence of amino acids that
precedes the catalytic domain in
many inactive protease
precursors. On removal or
conformational change of the
prodomain, the protease
becomes active.
CHAPERONE
A protein that aids the folding of
another to prevent it from taking
an interactive conformation.
Functional differences in human and mouse degradomes.
The observed differences between human and mouse
protease sets provide insights into the molecular mechanisms that underlie the changes in physiology and life
strategies of both species after their divergence from a
common ancestor ~75 million years ago. Extending
data that are derived from the global comparative
analysis between human and mouse genomes21,46, most
differences between the respective degradomes correspond to proteases that are involved in reproductive
and immunological functions.
In relation to reproductive processes, proteases are
known to have roles in menstruation, fertilization, ovulation, implantation, placentation, pregnancy, and uterine,
breast and PROSTATE INVOLUTION47–54. They might also contribute to the ability of reproductive organs to respond
rapidly to changes in the hormonal environment,
through the activation of local networks of cytokines
and growth factors. The lineage-specific expansion of
some reproductive proteases in mice might help to
explain many of the pronounced reproductive differences between human and mouse, including variations
in oestrous cycles, placental structures, gestation periods
and number of descendants per delivery.
The changes in immune-related proteases might
reflect evolutionary diversification processes that are
NATURE REVIEWS | GENETICS
aimed at expanding the repertoire of host defence
mechanisms in response to new physiological conditions, dietary changes, new sources of pathogens or
environmental stress.
The ‘helping hands’ of proteases
As well as mechanisms of protease invention that are
based on gene duplication and divergence, the evolution
of both human and mouse degradomes has also been
driven by EXON SHUFFLING and the duplication of protein
modules in protease genes to form new architectures. In
this way, proteases might link their catalytic domains to
a range of specialized functional modules other than
archetypal sorting signals that direct proteases to intracellular organelles or extracellular environments. So,
substrate and binding specificities can be altered in an
evolutionarily rapid and selectable way that leads to
gene-family diversity and results in substrate specificity
or diversity and new kinetic, inhibitory, and cell or tissue
localization properties55.
First, many proteases contain conserved prodomains
that serve an auto-inhibitory role to prevent activation
at the wrong place or time and are often required as
intramolecular CHAPERONES during protein synthesis and
folding. Prodomains can also function as a contact face
for cell-surface receptors and to direct proteases to specific substrates or locations in tissue. Second, a large
proportion of proteases (40%) have ancillary domains
that probably facilitate their interaction with other proteins such as substrates, inhibitors and receptors, or
have some kind of regulatory role, as proposed for the
protease associated (PA) domain56,57.
FIGURE 2 shows the most typical domains that are
associated with proteases. Some, such as the EGF
domains, have been successful in their adaptation to
proteolytic enzymes and are present in a range of proteases from different families, in which they undoubtedly perform different but possibly related functions.
Other domains have multiplied in the same enzyme
to form long tandem repeats. This is the case with
ADAMTSs, which contain as many as 15 thrombospondin (TS) repeats in their structure58,59. Other proteolytic enzymes, such as most membrane-type serine
proteases60,61, have a complex mosaic structure with up
to six distinct domains located in a single gene. This
inventory activity has also created several peculiar
structures, including proteases with different catalytic
units or with protease-inhibitory domains embedded in
VOLUME 4 | JULY 2003 | 5 4 9
© 2003 Nature Publishing Group
REVIEWS
EF_HAND
KR
Igc2
UBA
LDLa
CCP
sushi
RHOD
PDZ
SEA
EGF-like
EXOIII
TBC
PAN_AP
DUSP
LON
zf_MYND
UBQ
FRI
CAP-Gly
ZnF_RBZ
VWA
FU
Death
APPLE
VWC
FN1
DED
KAZAL
SR
SO
Catalytic
domain
IB
IG-like
SIS
FIMAC
ZnF_UBP
EGF_CA
EGF
MATH
FN2
PLAC
ShKT
MAM
Cystatin
IG
PA
CCP
CUB
UIM
THYN
DISIN
FZ
SNF7
HX
AAA
TSP1
NL
GON
ACR
MGS
FA58C
LamGL
Figure 2 | Ancillary domains present in human and mouse proteases. Domains are colour-coded according to the
protease catalytic class to which they are linked: yellow, aspartic proteases; blue, cysteine proteases; green,
metalloproteases; and red, serine proteases. Domains with two colours have been found in two classes of proteolytic
enzymes. Codes for domains correspond to those used in the Pfam domain database, with the exception of recently
described domains such as PLAC143 and GON58.
SCISSILE BONDS
Peptide bonds that are cleaved
by proteolytic enzymes.
550
the same gene62,63. Therefore, it seems that protease
genes originally encoded simple single-domain catalytic
proteins that underwent gene fusions to generate this
extraordinary diversity of multidomain enzymes.
This strategy of domain accretion and shuffling has
facilitated the evolution from nonspecific primitive
degradative proteases to highly selective enzymes that
are able to perform subtle reactions of proteolytic processing. Substrate-binding exosites modulate and
broaden the substrate-specificity profile of proteases by
providing a further contact area that is not influenced
by the primary specificity subsites, and might even prevent substrate binding and cleavage by the catalytic
domain55. In this way, the function of the protease is
refined and can be made more specific or efficient. This
phenomenom is also shared by many other vertebrate
proteins and has been advantageous for the development
| JULY 2003 | VOLUME 4
of several physiological systems in which proteases are
involved, such as coagulation and complement cascades.
Nevertheless, evolution has also progressed in the
reverse direction, with some domains that were acquired
early in protease families being lost in more recently
duplicated members. This is the case for the hemopexin
C domain that is present in most MMPs from different
origins, but is specifically lost in members of the
matrilysin subfamily (MMP-7 and MMP-26)7.
Surprisingly little is known about the function of protease ancillary domains. Recent strategies to find binding
partners and functions for these domains using the yeast
two-hybrid system have uncovered new families of
protease substrates for MMPs64, whereas using inactive
catalytic domains as yeast two-hybrid baits has uncovered
further substrates that are cleaved by the active homologue10. These studies also indicate that inactive protease
www.nature.com/reviews/genetics
© 2003 Nature Publishing Group
REVIEWS
homologues or shed exosite domains might bind and
therefore mask SCISSILE BONDS in vivo reducing substrate
cleavage, which adds a further layer of control to protease
function65. In this regard, we have also examined the possibility that proteases might selectively accrete new
domains in human or mouse lineages and acquire new
functions. However, we did not find evidence of differences in domain architecture between pairs of orthologous human and mouse proteases. Accordingly, it seems
that these domain assemblies have occurred in an early
stage of protease evolution, before the separation of the
human and mouse lineages, which underscores their
importance.
Other sources of variability in proteases
HAPLOINSUFFUCIENCY
A gene dosage effect that occurs
when a diploid requires both
functional copies of a gene for a
wild-type phenotype. An
organism that is heterozygous
for a haploinsufficient locus does
not have a wild-type phenotype.
LOSS OF HETEROZYGOSITY
A loss of one of the alleles at a
given locus as a result of a
genomic change, such as mitotic
deletion, gene conversion or
chromosome missegregation.
The complexity in proteases might be further increased
by common processes such as alternative splicing and differential polyadenylation, or by the occurrence of polymorphic variants that might have important roles in
expanding or modifying protease functions. There are
some examples in which alternative splicing events or the
use of alternative promoters have been shown to be functionally important in proteases, as is the case in the generation of endothelial and testicular isoforms of
angiotensin-converting enzyme (ACE)66. However, the
functional relevance of most alternative splicing events
that occur in protease-encoding genes is still unknown.
There are also examples of the alternative use of different
polyadenylation sites in many proteases, including
cathepsins, MMPs and kallikreins67–69, but again their
functional relevance is unclear. Changes in the repeat
number of genes duplicated in tandem, as in USP17 and
pepsin A, introduce another level of variability in the
human degradome.
A final and important layer of protease variability
derives from naturally occurring genetic variants that
directly affect the expression or activity of proteolytic
enzymes. These polymorphic variants might alter the delicate control that operates in proteolytic systems, and
influence physiological functions or facilitate the development of pathological conditions that involve proteases.
The ACE gene, which contains many sequence polymorphisms, is a good example70; some of these variants are
associated with enhanced muscle performance71, but others confer an increased susceptibility to cardiovascular
diseases72. Recent genome-wide studies of single
nucleotide polymorphisms (SNPs) and other polymorphisms, which have searched for associations with
complex disease traits, have identified variations in the
promoter and coding sequences of several protease genes
that are linked to relevant pathologies. These associations
include ADAM33 and asthma, and calpain-10 (CAPN10)
and type-2 diabetes73,74. Similarly, a polymorphism in
glutamate carboxypeptidase II has been linked to
hyperhomocysteinemia75, and variants in MMP genes
contribute to an increased susceptibility to cardiovascular
diseases or cancer3,76. Further studies on SNPs mapped in
protease loci might determine individual susceptibility to
common diseases or drug response, and provide further
information on the molecular mechanisms that underlie
some complex genetic traits.
NATURE REVIEWS | GENETICS
Diseases associated with protease alterations
The availability of a complex protease catalogue will
facilitate the identification of therapeutic targets.
Abnormal or deficient protease functions might lead to a
range of pathological conditions that can be classified in
two general groups: those that are caused by alterations
in protease genes, and those that are caused by alterations in other components of proteolytic systems. The
first group can be further subdivided into genetic disorders that are caused by mutations in protease genes, and
epigenetic or regulatory diseases of proteolysis that are
caused by alterations in the spatio-temporal patterns of
expression of proteases. These latter alterations are frequent in certain protease families such as MMPs, and
have been linked to the development and progression of
cancer, arthritis and inflammatory diseases3,77–80.
Protease deficiencies might also derive from alterations in other components of proteolytic systems:
inhibitors, substrates, regulatory factors and transport
systems. Examples of these abnormalities include: serpinopathies81, which are generated by mutations in
serine protease inhibitors such as α1-antitrypsin, deficiency of which causes a common hereditary disorder
in Caucasians82; Alzheimer disease, which is caused by
mutations in the amyloid precursor protein gene
(APP) that facilitate accumulation of the amyloid-β
(Aβ) peptide83; haemophilia A, which is caused by
mutations in the factor VIII gene that result in a
diminished proteolysis catalysed by factor IX (REF. 84);
and finally, the haematological disease that is caused
by mutations in the ERGIC53 (LMAN1) gene that
affect the transport of proteases such as cathepsin Z 85.
Hereditary diseases of proteolysis. We have catalogued
53 diseases that are caused by mutations in protease
genes (TABLES 2,3 ) — most are recessive loss-of-function
mutations. As in the case of other enzymes, the presence of two protease gene alleles might compensate for
the loss of one copy, and heterozygotes usually have a
mild or no phenotype. However, there are some cases
in which loss-of-function mutations in protease genes
are inherited in a dominant pattern. These include
familial cylindromatosis, which is caused by mutations
in the CYLD cysteine protease gene86, and type II
autoimmune lymphoproliferative syndrome (ALPS),
which is caused by mutations in the caspase-10 (CASP10)
gene87. There are several possibilities to explain this dominant inheritance of loss-of-function mutations, including
HAPLOINSUFFICIENCY, LOSS OF HETEROZYGOSITY in the case of
cylindromatosis86 and interference with the process of
activation of procaspase-10 by an induced-proximity
mechanism87,88.
The genetic alterations that lead to the development of hereditary diseases that are associated with
protease genes range from single-site mutations to
large chromosomal deletions. Point mutations that
result in the loss of protease function are the most frequent cause of these disorders. These include: limb-girdle
muscular dystrophy type 2A, which is caused by inactivating mutations in the calpain-3 (CAPN3) gene89;
thrombotic thrombocytopenic purpura, which is
VOLUME 4 | JULY 2003 | 5 5 1
© 2003 Nature Publishing Group
REVIEWS
Table 2 | Human hereditary diseases of proteolysis
Protease
Gene
Locus
Disease
OMIM
Dominant/
recessive
Function
Animal model
Loss-of-function group
Cathepsin K
CTSK
1q21
Pycnodysostosis
265800
R
Loss
KO resembles disease
Cathepsin C
CTSC
11q14
Papillon-Lefevre and
Haim-Munk syndromes
245000
R
Loss
KO does not resemble
disease
Calpain 3
CAPN3
15q15
Limb-girdle muscular
dystrophy type 2A
253600
R
Loss
KO resembles disease
Cylindromatosis protein
CYLD1
16q12
Cylindromatosis
132700
D
Loss
–
Ubiquitin C-terminal
hydrolase 1
UCHL1
4p14
Parkinson disease
type V
191342
D
Loss
Gad mouse resembles
disease
Caspase-8
CASP8
2q33
Autoimmune lymphoproliferative syndrome (I)
601859
R
Loss
KO embryonic lethality
Caspase-10
CASP10
2q33
Autoimmune lymphoproliferative syndrome (II)
603909
D,R
Loss
No mouse orthologue
USP9Y
USP9Y
Yq11
Azoospermia and
hypospermatogenesis
415000
D
Loss
–
Gelatinase A
MMP2
16q13
Multicentric osteolysis
with arthritis
605156
R
Loss
KO does not resemble
disease
ADAMTS-13
ADAMTS13 9q34
Thrombotic thombocytopenic purpura
274150
R
Loss
–
Procollagen I
N-endopeptidase
ADAMTS2 5q23
Ehlers-Danlos syndrome
type VIIC
225410
R
Loss
KO resembles disease
Endothelin-converting
enzyme 1
ECE1
1p36
Hirschprung disease
142623
D
Loss
KO partially
resembles disease
PHEX endopeptidase
PHEX
Xp22
X-linked hypophosphatemia
307800
D
Loss
Hyp mouse resembles
disease
Carboxypeptidase E
CPE
4q33
Hyperproinsulinemia and
diabetes
125853
R
Loss
Fat mouse resembles
disease
Mitochondrial innermembrane protease 2
IMMP2L
7q31
Gilles de la Tourette
syndrome
137580
D
Loss
–
X-Pro dipeptidase
PEPD
19q13
Prolidase deficiency
170100
R
Loss
–
Paraplegin
SPG7
16q24
Spastic paraplegia
607259
R
Loss
–
Enteropeptidase
PRSS7
21q21
Enteropeptidase
deficiency
226200
R
Loss
–
Complement
component C1r
C1R
12p13
C1r deficiency
216950
R
Loss
–
Complement
component C1s
C1S
12p13
C1s deficiency
120580
R
Loss
–
Complement
component 2
C2
6p21
C2 deficiency
217000
R
Loss
Guinea-pig model
resembles disease
Complement factor D
DF
19p13
DF deficiency
134350
R
Loss
KO resembles disease
Complement factor I
IF
4q25
CFI deficiency
217030
R
Loss
–
Plasma kallikrein
KLKB1
4q35
Prekallikrein deficiency
229000
R
Loss
–
Thrombin
F2
11p11
Hyperprothrombinemia/
hypoprothombinemia
176930
D/R
Loss
KO resembles disease
Coagulation factor VIIa
F7
13q34
Factor VIIa deficiency
227500
R
Loss
KO lethal, partially
resembles disease
Coagulation factor IXa
F9
Xq27
Haemophilia B
306900
R
Loss
Mouse and dog models
resemble disease
Coagulation factor Xa
F10
13q34
Factor X deficiency
227600
R
Loss
KO embryonic lethality
fatal neonatal bleeding
D, dominant inheritance; KO, knockout (mouse); R, recessive inheritance.
caused by mutations in the ADAMTS13 gene90; and nonsyndromic deafness, which results from distinct mutations that affect the transmembrane serine protease
TMPRSS3 (REF. 91). There are only a few examples of
point mutations that lead to gain of protease function
(TABLE 3), such as early-onset familial Alzheimer disease,
552
| JULY 2003 | VOLUME 4
which is caused by activating mutations in presenilins1 and -2 (PSEN1 and PSEN2)5,92. Mutations in the
non-coding regions of protease genes can also result in
abnormal proteolytic activity. Examples include:
hyperprothrombinemia, which is caused by mutations
in the 3′-UTR of the thrombin gene that lead to an
www.nature.com/reviews/genetics
© 2003 Nature Publishing Group
REVIEWS
increased secretion of thrombin93; Alzheimer disease,
which is caused by a single-base deletion at the splicedonor site of intron 4 of PSEN1 (REF. 94); and haemophilia
B Leyden, which is caused by different mutations in the
promoter region and the 5′-UTR of factor IX gene84.
Short deletions that cause protease truncation might
result in relevant diseases such as the recently described
neurotrypsin mutation, which causes autosomal recessive
non-syndromic mental retardation95. Large deletions
that remove part of protease genes have also been
described, including a 9.5-kb deletion in the paraplegin gene, which causes a form of spastic paraplegia96.
The systematic classification of genetic diseases of proteolysis is useful for a global perspective of the diversity
of proteases, mutational mechanisms and pathological
alterations that underlie these human diseases.
This classification of protease deficiencies might provide a useful framework for discussing the possibilities
of creating animal models for the different diseases, and
for evaluating potential therapies.
Mouse models of hereditary diseases of proteolysis. The
development of methods for manipulating the mouse
germline offers new opportunities for investigating
human diseases. At least 20 protease genes that are associated with hereditary diseases of proteolysis have been disrupted in mice (TABLES 2,3). In most cases, these mouse
models have provided valuable information on the molecular and physiological mechanisms that are involved in
the development and progression of the corresponding
human disease. However, there are cases of mutant mice
that do not recapitulate the human disorder that is caused
by mutations in the orthologous protease gene.
The identification of differences in the number of paralogous genes in protease families from both organisms
might sometimes explain these paradoxical situations. So,
a point mutation in human caspase-8 causes ALPS,
whereas mice deficient in this protease die as embryos97.
The finding that caspase-10, the closest paralogue of
caspase-8, is absent in mice indicates that the much
milder phenotype observed in ALPS patients with
caspase-8 mutations might be derived from functional
compensation by caspase-10. The development of mouse
models of diseases that involve protease genes has also
contributed to defining the molecular mechanisms that
are implicated in these diseases. So, data obtained from
mice deficient in neutrophil elastase (Ela2)98 have raised
the hypothesis that mutations in Ela2 that are associated
with congenital neutropenia and cyclic haematopoiesis
are the result of a gain-of-function in this protease99.
The use of knockout mice has also allowed the identification of specific substrates of proteases, such as
prelamin A for the FACE-1/ste24 metalloprotease100 or
syndecan-1 and α-defensin for matrilysin101,102, and the
finding of new and unexpected protease functions in
normal and pathological conditions3,103–106. However,
the generalization of functional studies on human proteases on the basis of data from mouse models might be
hampered by the occurrence of robust proteolytic systems with redundant and compensatory enzymes, as
illustrated by the MMPs. In fact — with the exception of
NATURE REVIEWS | GENETICS
MT1–MMP-null mice, which have skeletal abnormalities
and die shortly after birth107,108, and MMP-20-null mice,
which have an amelogenesis imperfecta phenotype109 —
all mutant mice that are deficient in specific MMPs that
have been generated so far lack notable alterations during
development or in adult tissues, which points to the
occurrence of functional overlaps between individual
components of this complex proteolytic system110.
It is worth noting that there are some interesting
examples of mouse disorders that are caused by lossof-function mutations in protease genes the human
orthologues of which have not been associated
with an equivalent pathological condition. Among
mouse diseases that involve protease genes are the
neurological defects in ataxia (axJ) mice, which result
from a mutation in the ubiquitin-specific protease
Usp14 (REF. 111).
The generation of animal models has also been
important to study diseases that involve the gainof-function of specific proteases. So, transgenic mice that
express mutant variants of human PSEN1 or PSEN2,
accumulate Aβ-peptide in the brain, which reinforces the
role of these proteins in the pathogenesis of Alzheimer
disease83,92. Transgenic mice have also been used to establish causal relationships between the overexpression of a
certain protease in a specific tissue and the development
of relevant diseases such as arthritis or cancer3,112.
Continuing projects of large-scale mutagenesis in
mice113,114 will be essential to generate appropriate models for understanding the in vivo functions of many proteases, their potential roles in the pathogenesis of human
diseases and their value as new therapeutic targets.
Therapeutic approaches to protease deficiencies. Human
diseases of proteolysis have been traditionally linked to
the overexpression of proteases. Consequently, the corresponding therapeutic strategies have focused on the
development of inhibitors to block the undesired activity of these enzymes115,116. However, as discussed here,
there is growing evidence for genetic diseases that are
caused by the loss of protease function. Furthermore,
there are also relevant disorders, including many conformational diseases such as systemic amyloidosis, prion
encephalopathies and Alzheimer and Huntington
disease, which arise from the accumulation of intermolecular aggregates of specific proteins81,117,118. These
disorders could benefit from protease-based treatments
to replace the deficient enzymes or to enhance the
demolition of the pathological protein aggregates, as
exemplified by the use of tPA and uPA plasminogen
activators for clot dissolution. Accordingly, the therapeutic approach to diseases that are associated with protease deficiencies must be based on knowledge of the
structure, function and regulation of the pathologically
relevant proteases in physiological situations. This focus
is also essential to avoid the undesired effects of nonspecific therapies that can profoundly alter the delicate
balance of endogenous proteolytic systems.
There are important examples that illustrate the successful introduction of protease inhibitors to treat
human disease. ACE inhibitors are widely used to treat
VOLUME 4 | JULY 2003 | 5 5 3
© 2003 Nature Publishing Group
REVIEWS
Table 3 | Human hereditary diseases of proteolysis
Protease
Gene
Locus
Disease
OMIM
Dominant/
recessive
Function
Animal model
Coagulation factor XIa
F11
4q35
Factor XI deficiency
264900
R
Loss
Cattle and dog models
resemble disease
Coagulation factor XIIa
F12
5q35
Factor XII deficiency
234000
R
Loss
–
Protein C
PROC
2q21
Thrombophilia
176860
D/R
Loss
KO resembles disease
Plasmin
PLG
6q26
Thrombophilia and
ligneous conjunctivitis
173350
R
Loss
KO resembles disease
Neurotrypsin
PRSS12
4q28
Nonsyndromic mental
retardation
249500
R
Loss
–
Proprotein convertase 1
PCSK1
5q15
Obesity
600955
R
Loss
KO does not resemble
disease
Transmembrane
protease, serine 3
TMPRSS3 21q22
Deafness
605316
R
Loss
–
Lysosomal A
carboxypeptidase
PPGB
20q13
Galactosialidosis
256540
R
Loss
KO resembles disease
Tripeptidyl-peptidase I
CLN2
11p15
Neuronal ceroid
lipofuscinosis
204500
R
Loss
–
Glycosylasparaginase
AGA
4q34
Aspartylglucosaminuria
208400
R
Loss
KO resembles disease
Presenilin 1
PSEN1
14q24
Alzheimer type 3
104311
D
Gain
Transgenic models
partially resemble
disease
Presenilin 2
PSEN2
1q42
Alzheimer type 4
600759
D
Gain
Transgenic models
partially resemble
disease
Collagenase 3
MMP13
11q22
Spondyloepimetaphyseal
dysplasia
602111
D
(Gain)
–
Cationic trypsin
PRSS1
7q35
Hereditary pancreatitis
/trypsin deficiency
167800
D/R
Gain/Loss
–
Neutrophil elastase
ELA2
19p13
Cyclic neutropenia
162800
D
Gain
KO more susceptible to
bacterial sepsis
Proprotein convertase 9
PCSK9
1p32
Hyperlipoproteinemia
type III
144400
D
(Gain)
–
Indian hedgehog protein
IHH
2q35
Brachydactyly type A1
112500
D
Loss
KO resembles disease
Sonic hedgehog protein
SHH
7q36
Holoprosencephaly type 3
142945
D
Loss
KO resembles disease
Desert hedgehog protein
DHH
12q13
Partial gonadal dysgenesis
607080
R
Loss
KO resembles disease
DJ-1 (putative protease)
DJ1
1p36
Parkinson disease type VII
606324
R
Loss
–
Reelin (putative protease)
RELN
7q22
Lissencephaly syndrome
257320
R
Loss
Reeler mouse
resembles disease
Dihydropyrimidinase (np)
DPYS
8q22
Dihydropyrimidinase
deficiency
222748
R
Loss
–
Aspartoacylase (np)
ASPA
17p13
Canavan disease
271900
R
Loss
KO resembles disease
Transferrin receptor 2
protein (np)
TFR2
7q22
Hemochromatosis type 3
604250
R
Loss
KO resembles disease
Haptoglobin-1 (np)
HP
16q22
Anhaptoglobinemia
140100
R
Loss
–
Loss-of-function group
Gain-of-function group
Heterogeneous group*
*Heterogeneous group includes non-protease homologues (np), putative proteases and hedgehog proteins with only autoprocessing activity. D, dominant inheritance; KO,
knockout (mouse); R, recessive inheritance.
VASOPEPTIDASE
A protease that is involved in the
regulation of vascular tone
BACTERIAL SEPSIS
Pathology that is caused by the
spread of bacteria or their
products through the
bloodstream.
554
hypertension and congestive heart failure119 and several
drugs have been introduced for blocking the human
immunodeficiency virus (HIV) protease120. There are
also many inhibitors that have been tested in preclinical
models or have advanced to clinical trials2. Special interest has focused on: inhibitors of β- and γ-secretases for
Alzheimer disease; VASOPEPTIDASE inhibitors that simultaneously target neprilysin and ACE for hypertension,
| JULY 2003 | VOLUME 4
atherosclerosis and heart failure; MMP inhibitors for
cancer and inflammatory disorders; caspase inhibitors for
BACTERIAL SEPSIS and autoimmune and degenerative diseases; tryptase inhibitors for asthma; proteasome
inhibitors for multiple myeloma; and aggrecanase
inhibitors for arthritis. However, other attempts to
develop synthetic inhibitors for targeting human proteases of clinical relevance have been accompanied by
www.nature.com/reviews/genetics
© 2003 Nature Publishing Group
REVIEWS
frequent disappointments, especially in cancer
research121. The continuing efforts aimed at the resolution
of the 3D structures of proteases and protease-inhibitor
complexes might facilitate new avenues for the rational
design of an improved generation of inhibitors that are
more selective and have better pharmacokinetic properties122–126. Another approach involves using endogenous
inhibitors to target the undesired proteolytic activity that
is associated with many diseases116. However, this
approach suffers from difficulties with compound
administration and poor PHARMACOKINETICS. So far, a few
endogenous inhibitors have been used as therapeutic proteins, such as antithrombin III, in diverse inflammatory
disorders127.
The identification of the molecular defects that
underlie diseases that are caused by the loss of function
of protease genes has offered new options for developing specific therapies. The first obvious approach is
based on classical enzyme-replacement therapies (ERT)
that are aimed at substituting the defective enzyme by its
normal counterpart128. There are several examples of
protease-based therapies to treat genetic diseases of proteolysis including those that have long been recognized in
the clotting system and which are now treatable by
recombinant proteases like coagulation factors VII and IX
(REF. 129). However, the fact that most diseases of this category have only been recently characterized has hampered
the rapid development of suitable ERT-based therapies.
Protease-based therapies might also be aimed at increasing the turnover of proteins that tend to form intermolecular aggregates, as in the case of Αβ-deposits in Alzheimer
disease130. These therapies might suffer from the same
problems and limitations as other ERTs, including the
high doses that are necessary to achieve therapeutic
effects, the inability of recombinant proteins to cross the
blood–brain barrier and the elicitation of immune
responses. For this reason, further therapeutic alternatives
for protease-linked diseases need to be explored. Gene
therapy, bone-marrow transplantation, enzyme enhancement therapies and substrate-deprivation strategies
might offer therapeutic alternatives for human diseases
that are caused by protease deficiencies, and in some
cases, preliminary clinical trials have confirmed their
potential effectiveness for treating specific diseases of
proteolysis128,131.
Conclusion and perspectives
Here we offer the first comparative glimpse of the 553and 628-member human and mouse degradomes.
Although we have extensively revised the available
annotations for both protease sets and have included
many new members, especially in the case of mouse
enzymes, these numbers are still not definitive.
Continuing efforts that combine bioinformatic predictions, expert manual annotation and curation, and
experimental verification of the new in silico acquisitions for the protease collection, will be necessary to
obtain the complete proteolytic portrait of both species.
This global view of the human and mouse protease
worlds has offered some surprises, especially the finding
that the mouse degradome is more complex. Why do
mice require more proteases than humans? The one-byone comparison between both protease sets indicates
that most differences are a result of the human-specific
protease gene inactivation or mouse-specific protease
gene expansion of family members that are associated
with reproductive or immune processes. To some
degree, this counterintuitive situation parallels that
found after preliminary comparative analysis of the
human and chimpanzee genomes, which has shown
that genetic losses in the human lineage might have
caused some of the differences between these species132.
This comparative genomic analysis might be the starting point for further studies of the biological and pathological relevance of proteases. The resolution of the 3D
structures of proteases, the ascription of functions to their
ancillary domains and non-catalytic homologues, and
the detailed analysis of protease-mediated processes such
as protein ECTODOMAIN SHEDDING133,134, the regulation of
transcription factor activity135,136 and regulated
intramembrane proteolysis137, will provide important
information in the near future. From the clinical point of
view, the availability of a complete protease catalogue will
facilitate the identification of protease genes that are
responsible for genetic diseases that are associated with
protease deficiencies, and the evaluation of new proteases
as drug targets or prognostic markers138. The design of
protease chips for the global analysis of patterns of
expression and activity of human proteases will be helpful for this purpose10. The increased knowledge of the
structure, function and regulation of proteases will also
BOX 4 | The protease repertoire of other model organisms
PHARMACOKINETICS
The time course of a drug and its
metabolites in the body after
administration.
ECTODOMAIN SHEDDING
The protease-mediated release
from the cell surface of the
extracellular domain of integral
membrane proteins.
There are many families of human and mouse proteases that are also clearly recognizable in the genomes of Drosophila
melanogaster, Caenorhabditis elegans and Arabidopsis thaliana. This indicates the existence of universal proteolytic
routines in these organisms, although they are frequently expanded in vertebrates. The MEROPS database has
annotated 558 proteases and homologues in Drosophila, 400 in C. elegans and 598 in Arabidopsis. Further comparative
analysis shows important differences in the distribution of proteases in these species. The most remarkable finding is the
multiplication in the fly genome of a group of 1A serine protease genes, reaching more than 200 members. Because of
this family expansion, the total number of proteases in Drosophila is similar to that of vertebrates, despite flies having a
considerably fewer genes. These Drosophila trypsin-like proteases might be involved in development and innate
immune defense141. There are also some protease families that have apparently expanded in other organisms, for
example serine carboxypeptidases, pepsin-like and subtilisin-like enzymes in Arabidopsis, and several Znmetalloproteases in C. elegans, which indicates that there are unique functions that are carried out for specific
proteolytic enzymes in the different species142. However, the annotation of the degradome of these species is still
preliminary and the functionality of most predicted proteases has not yet been experimentally validated.
NATURE REVIEWS | GENETICS
VOLUME 4 | JULY 2003 | 5 5 5
© 2003 Nature Publishing Group
REVIEWS
provide excellent opportunities to design new generations of therapeutic inhibitors, including those based on
endogenous protease inhibitors. The availability of the
human and mouse genome sequence also offers the possibility of exploring the complete repertoire of endogenous inhibitors in these organisms. It will be interesting to
test whether the mouse protease-inhibitor complement is
also more complex than that of human, as an evolutionary attempt to control the expanded murine protease
repertoire. Comparative analysis of proteases (BOX 4) will
also help to identify regulatory differences that might
contribute to define distinctive aspects of human and
murine biology from a protease perspective.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
556
Barrett, A. J., Rawlings, N. D. & Woessner, J. F. Handbook of
Proteolytic Enzymes (Academic Press, San Diego, 1998).
An essential book in the protease field that
comprehensively lists and describes proteases from
many organisms.
Hooper, N. M. Proteases in Biology and Medicine (Portland
Press, London, 2002).
Egeblad, M. & Werb, Z. New functions for the matrix
metalloproteinases in cancer progression. Nature Rev.
Cancer 2, 161–174 (2002).
This review illustrates the diversity of protease
functions in pathological processes such as cancer.
Krane, S. M. Elucidation of the potential roles of matrix
metalloproteinases in skeletal biology. Arthritis Res. Ther. 5,
2–4 (2003).
Esler, W. P. & Wolfe, M. S. A portrait of Alzheimer secretases
— new features and familiar faces. Science 293, 1449–1454
(2001).
Luttun, A., Dewerchin, M., Collen, D. & Carmeliet, P. The role
of proteinases in angiogenesis, heart development,
restenosis, atherosclerosis, myocardial ischemia, and stroke:
insights from genetic studies. Curr. Atheroscler. Rep. 2,
407–416 (2000).
Uría, J. A. & López-Otín, C. Matrilysin-2, a new matrix
metalloproteinase expressed in human tumors and showing
the minimal domain organization required for secretion,
latency, and activity. Cancer Res. 60, 4745–4751 (2000).
Geier, E. et al. A giant protease with potential to substitute for
some functions of the proteasome. Science 283, 978–981
(1999).
Voges, D., Zwickl, P. & Baumeister, W. The 26S proteasome:
a molecular machine designed for controlled proteolysis.
Annu. Rev. Biochem. 68, 1015–1068 (1999).
López-Otín, C. & Overall, C. M. Protease degradomics: a
new challenge for proteomics. Nature Rev. Mol. Cell Biol. 3,
509–519 (2002).
This article introduces new concepts and approaches
for the global analysis of proteases in normal and
pathological conditions, and especially in cancer.
Rawlings, N. D., O’Brien, E. & Barrett, A. J. MEROPS: the
protease database. Nucleic Acids Res. 30, 343–346 (2002).
A description of a database that is freely available to
the academic community, which represents an
essential resource for research on proteases.
International Human Genome Sequencing Consortium. Initial
sequencing and analysis of the human genome. Nature 409,
860–921 (2001).
Venter, J. C. et al. The sequence of the human genome.
Science 291, 1304–1351 (2001).
Zucker, S. & Chen, W.-T. Cell Surface Proteases (Academic
Press, San Diego, 2003).
A compilation of articles that cover recent advances in
the functional analysis of membrane-bound proteases,
which are a group of enzymes that are of growing
relevance in normal and pathological conditions.
Cope, G. A. et al. Role of predicted metalloprotease motif of
Jab1/Csn5 in cleavage of Nedd8 from Cul1. Science 298,
608–611 (2002).
Urban, S., Lee, J. R. & Freeman, M. A family of Rhomboid
intramembrane proteases activates all Drosophila
membrane-tethered EGF ligands. EMBO J. 21, 4277–4286
(2002).
Mariño, G. et al. Human autophagins, a family of cysteine
proteinases potentially implicated in cell degradation by
autophagy. J. Biol. Chem. 278, 3671–3678 (2003).
Weihofen, A., Binns, K., Lemberg, M. K., Ashman, K. &
Martoglio, B. Identification of signal peptide peptidase,
The exploration of the human and mouse
genomes under a proteolytic prism has disentangled
some of the complexities that are derived from the
existence of multiple executioners of a single chemical reaction — the hydrolysis of peptide bonds —
which lies at the heart of many events on which cell
life and death depend. The genomic analysis of
human and mouse proteases has also indicated that
there are many challenges ahead. It is to be hoped
that the continuing comparative analysis of these
functionally related genes will illuminate new areas
in biology and provide clinical answers to the many
diseases of proteolysis.
a presenilin-type aspartic protease. Science 296, 2215–2218
(2002).
19. Makarova, K. S., Aravind, L. & Koonin, E. V. A novel
superfamily of predicted cysteine proteases from eukaryotes,
viruses and Chlamydia pneumoniae. Trends Biochem. Sci.
25, 50–52 (2000).
20. Krylov, D. M. & Koonin, E. V. A novel family of predicted
retroviral-like aspartyl proteases with a possible key role in
eukaryotic cell cycle control. Curr. Biol. 11, 584–587 (2001).
21. Mouse Genome Sequence Consortium. Initial sequencing
and comparative analysis of the mouse genome. Nature
420, 520–562 (2002).
22. Swanson, W. J. & Vacquier, V. D. The rapid evolution of
reproductive proteins. Nature Rev. Genet. 3, 137–144 (2002).
23. Balbín, M. et al. Identification and enzymatic characterization
of two diverging murine counterparts of human interstitial
collagenase (MMP-1) expressed at sites of embryo
implantation. J. Biol. Chem. 276, 10253–10262 (2001).
24. Deussing, J. et al. Identification and characterization of a
dense cluster of placenta-specific cysteine peptidase genes
and related genes on mouse chromosome 13. Genomics
79, 225–240 (2002).
25. Sol-Church, K. et al. Evolution of placentally expressed
cathepsins. Biochem. Biophys. Res. Commun. 293, 23–29
(2002).
26. Yeh, E. T., Gong, L. & Kamitani, T. Ubiquitin-like proteins: new
wines in new bottles. Gene 248, 1–14 (2000).
27. Brachvogel, B. et al. Molecular cloning and expression
analysis of a novel member of the disintegrin and
metalloprotease-domain (ADAM) family. Gene 288, 203–210
(2002).
28. Olsson, A. Y. & Lundwall, A. Organization and evolution of the
glandular kallikrein locus in Mus musculus. Biochem.
Biophys. Res. Commun. 299, 305–311 (2002).
29. Yousef, G. M. & Diamandis, E. P. The new human tissue
kallikrein gene family: structure, function, and association to
disease. Endocr. Rev. 22, 184–204 (2001).
30. Luo, L. Y. et al. The serum concentration of human kallikrein
10 represents a novel biomarker for ovarian cancer diagnosis
and prognosis. Cancer Res. 63, 807–811 (2003).
31. Balk, S. P., Ko, Y. J. & Bubley, G. J. Biology of prostatespecific antigen. J. Clin. Oncol. 21, 383–391 (2003).
32. Caputo, E., Manco, G., Mandrich, L. & Guardiola, J.
A novel aspartyl proteinase from apocrine epithelia and
breast tumors. J. Biol. Chem. 275, 7935–7941 (2000).
33. Yoshida, M., Kaneko, M., Kurachi, H. & Osawa, M.
Identification of two rodent genes encoding homologues to
seminal vesicle autoantigen: a gene family including the gene
for prolactin-inducible protein. Biochem. Biophys. Res.
Commun. 281, 94–100 (2001).
34. Lunderius, C. & Hellman, L. Characterization of the
gene encoding mouse mast cell protease 8 (mMCP-8), and a
comparative analysis of hematopoietic serine protease
genes. Immunogenetics 53, 225–232 (2001).
35. Garnier, G., Circolo, A., Xu, Y. & Volanakis, J. E. Complement
C1r and C1s genes are duplicated in the mouse: differential
expression generates alternative isomorphs in the liver and in
the male reproductive system. Biochem. J. 371, 631–640
(2003).
36. Nishimura, H. et al. The ADAM1a and ADAM1b genes,
instead of the ADAM1 (fertilin-α) gene, are localized on
mouse chromosome 5. Gene 291, 67–76 (2002).
37. Grima, J., Wong, C. C., Zhu, L. J., Zong, S. D. & Cheng, C. Y.
Testin secreted by Sertoli cells is associated with the cell
surface, and its expression correlates with the disruption of
Sertoli-germ cell junctions but not the inter-Sertoli tight
junction. J. Biol. Chem. 273, 21040–21053 (1998).
| JULY 2003 | VOLUME 4
38. Fischer, H., Koenig, U., Eckhart, L. & Tschachler, E.
Human caspase 12 has acquired deleterious mutations.
Biochem. Biophys. Res. Commun. 293, 722–726
(2002).
39. Grzmil, P. et al. Human cyritestin genes (CYRN1 and
CYRN2) are non-functional. Biochem. J. 357, 551–556
(2001).
40. O’Sullivan, C. M., Liu, S. Y., Karpinka, J. B. & Rancourt, D. E.
Embryonic hatching enzyme strypsin/ISP1 is expressed
with ISP2 in endometrial glands during implantation.
Mol. Reprod. Dev. 62, 328–334 (2002).
41. Kageyama, T. Pepsinogens, progastricsins, and
prochymosins: structure, function, evolution, and
development. Cell. Mol. Life Sci. 59, 288–306 (2002).
42. Rose, S. D. & MacDonald, R. J. Evolutionary silencing of the
human elastase I gene (ELA1). Hum. Mol. Genet. 6,
897–903 (1997).
43. Suzuki, H. & Kumagai, H. Autocatalytic processing of
γ-glutamyltranspeptidase. J. Biol. Chem. 277,
43536–43543 (2002).
44. Paulding, C. A., Ruvolo, M. & Haber, D. A. The Tre2 (USP6)
oncogene is a hominoid-specific gene. Proc. Natl Acad. Sci.
USA 100, 2507–2511 (2003).
45. Fougerousse, F. et al. Human–mouse differences in the
embryonic expression patterns of developmental control
genes and disease genes. Hum. Mol. Genet. 9, 165–173
(2000).
46. Emes, R. D., Goodstadt, L., Winter, E. E. & Ponting, C. P.
Comparison of the genomes of human and mouse lays the
foundation of genome zoology. Hum. Mol. Genet. 12,
701–709 (2003).
An excellent analysis of the differences among human
and mouse genomes and discussion of their
physiological relevance.
47. Salamonsen, L. A. & Nie, G. Proteases at the
endometrial–trophoblast interface: their role in implantation.
Rev. Endocr. Metab. Disord. 3, 133–143 (2002).
48. Fata, J. E., Ho, A. T., Leco, K. J., Moorehead, R. A. &
Khokha, R. Cellular turnover and extracellular matrix
remodeling in female reproductive tissues: functions of
metalloproteinases and their inhibitors. Cell. Mol. Life Sci.
57, 77–95 (2000).
49. Curry, T. E. & Osteen, K. G. Cyclic changes in the matrix
metalloproteinase system in the ovary and uterus.
Biol. Reprod. 64, 1285–1296 (2001).
50. Evans, J. P. Fertilin-β and other ADAMs as integrin ligands:
insights into cell adhesion and fertilization. Bioessays 23,
628–639 (2001).
51. Seals, D. F. & Courtneidge, S. A. The ADAMs family of
metalloproteases: multidomain proteins with multiple
functions. Genes Dev. 17, 7–30 (2003).
52. Ny, T., Wahlberg, P. & Brandstrom, I. J. Matrix remodeling in
the ovary: regulation and functional role of the plasminogen
activator and matrix metalloproteinase systems. Mol. Cell
Endocrinol. 187, 29–38 (2002).
53. Hulboy, D. L., Rudolph, L. A. & Matrisian, L. M. Matrix
metalloproteinases as mediators of reproductive function.
Mol. Hum. Reprod. 3, 27–45 (1997).
54. Vu, T. H. & Werb, Z. Matrix metalloproteinases: effectors of
development and normal physiology. Genes Dev. 14,
2123–2133 (2000).
55. Overall, C. M. Molecular determinants of metalloproteinase
substrate specificity: matrix metalloproteinase substrate
binding domains, modules, and exosites. Mol. Biotechnol.
22, 51–86 (2002).
56. Mahon, P. & Bateman, A. The PA domain: a proteaseassociated domain. Protein Sci. 9, 1930–1934 (2000).
www.nature.com/reviews/genetics
© 2003 Nature Publishing Group
REVIEWS
57. Luo, X. & Hofmann, K. The protease-associated domain:
a homology domain associated with multiple classes of
proteases. Trends Biochem. Sci. 26, 147–148 (2001).
58. Llamazares, M., Cal, S., Quesada, V. & López-Otín, C.
Identification and characterization of ADAMTS-20
defines a novel subfamily of metalloproteinases-disintegrins
with multiple thrombospondin-1 repeats
and a unique GON-domain. J. Biol. Chem. 278,
13382–13389 (2003).
59. Somerville, R. P. et al. Characterization of ADAMTS-9 and
ADAMTS-20 as a distinct ADAMTS subfamily related to
Caenorhabditis elegans GON-1. J. Biol. Chem. 278,
9503–9513 (2003).
60. Hooper, J. D., Clements, J. A., Quigley, J. P. & Antalis, T. M.
Type II transmembrane serine proteases: insights into an
emerging class of cell surface proteolytic enzymes. J. Biol.
Chem. 276, 857–860 (2001).
61. Velasco, G., Cal, S., Quesada, V., Sanchez, L. M. &
Lopez-Otin, C. Matriptase-2, a membrane-bound mosaic
serine proteinase predominantly expressed in human liver
and showing degrading activity against extracellular matrix
proteins. J. Biol. Chem. 277, 37637–37646 (2002).
62. Wex, T., Wex, H. & Bromme, D. The human cathepsin
F gene — a fusion product between an ancestral cathepsin
and cystatin gene. Biol. Chem. 380, 1439–1442 (1999).
63. Nagler, D. K., Sulea, T. & Menard, R. Full-length cDNA of
human cathepsin F predicts the presence of a cystatin
domain at the N-terminus of the cysteine protease zymogen.
Biochem. Biophys. Res. Commun. 257, 313–318 (1999).
64. McQuibban, G. A. et al. Inflammation dampened by
gelatinase A cleavage of monocyte chemoattractant protein3. Science 289, 1202–1206 (2000).
65. Tam, E. M., Wu, Y. I., Butler, G. S., Stack, M. S. & Overall,
C. M. Collagen binding properties of the membrane type-1
matrix metalloproteinase (MT1–MMP) hemopexin C domain.
The ectodomain of the 44-kDa autocatalytic product of
MT1–MMP inhibits cell invasion by disrupting native type I
collagen cleavage. J. Biol. Chem. 277, 39005–39014 (2002).
66. Ehlers, M. R., Fox, E. A., Strydom, D. J. & Riordan, J. F.
Molecular cloning of human testicular angiotensin-converting
enzyme: the testis isozyme is identical to the C-terminal half
of endothelial angiotensin-converting enzyme. Proc. Natl
Acad. Sci. USA 86, 7741–7745 (1989).
67. Azuma, T., Liu, W. G., Vander Laan, D. J., Bowcock, A. M. &
Taggart, R. T. Human gastric cathepsin E gene. Multiple
transcripts result from alternative polyadenylation of the
primary transcripts of a single gene locus at 1q31–q32.
J. Biol. Chem. 267, 1609–1614 (1992).
68. Freije, J. M. et al. Molecular cloning and expression of
collagenase-3, a novel human matrix metalloproteinase
produced by breast carcinomas. J. Biol. Chem. 269,
16766–16773 (1994).
69. Heuze-Vourc’h, N., Leblond, V. & Courty, Y. Complex
alternative splicing of the hKLK3 gene coding for the tumor
marker PSA (prostate-specific-antigen). Eur. J. Biochem.
270, 706–714 (2003).
70. Rieder, M. J., Taylor, S. L., Clark, A. G. & Nickerson, D. A.
Sequence variation in the human angiotensin converting
enzyme. Nature Genet. 22, 59–62 (1999).
71. Williams, A. G. et al. The ACE gene and muscle
performance. Nature 403, 614 (2000).
72. Niu, T., Chen, X. & Xu, X. Angiotensin converting enzyme
gene insertion/deletion polymorphism and cardiovascular
disease: therapeutic implications. Drugs 62, 977–993 (2002).
73. Van Eerdewegh, P. et al. Association of the ADAM33 gene
with asthma and bronchial hyperresponsiveness. Nature
418, 426–430 (2002).
Together with reference 74, this paper illustrates the
increased susceptibility to common diseases that is
associated with genetic variation in some protease
genes.
74. Horikawa, Y. et al. Genetic variation in the gene encoding
calpain-10 is associated with type 2 diabetes mellitus. Nature
Genet. 26, 163–175 (2000).
75. Devlin, A. M. et al. Glutamate carboxypeptidase II: a
polymorphism associated with lower levels of serum folate
and hyperhomocysteinemia. Hum. Mol. Genet. 9,
2837–2844 (2000).
76. Yamada, Y. et al. Prediction of the risk of myocardial infarction
from polymorphisms in candidate genes. N. Engl. J. Med.
347, 1916–1923 (2002).
77. Murphy, G. et al. Matrix metalloproteinases in arthritic
disease. Arthritis Res. 4 (Suppl.) 39–49 (2002).
78. Yong, V. W., Power, C., Forsyth, P. & Edwards, D. R.
Metalloproteinases in biology and pathology of the nervous
system. Nature Rev. Neurosci. 2, 502–511 (2001).
79. Brinckerhoff, C. E. & Matrisian, L. M. Matrix
metalloproteinases: a tail of a frog that became a prince.
Nature Rev. Mol. Cell Biol. 3, 207–214 (2002).
80. Parks, W. C. & Shapiro, S. D. Matrix metalloproteinases in
lung biology. Respir. Res. 2, 10–19 (2001).
81. Lomas, D. A. & Carrell, R. W. Serpinopathies and the
conformational dementias. Nature Rev. Genet. 3, 759–768
(2002).
82. Carrell, R. W. & Lomas, D. A. α1-antitrypsin deficiency —
a model for conformational diseases. N. Engl. J. Med. 346,
45–53 (2002).
83. Hardy, J. & Selkoe, D. J. The amyloid hypothesis of
Alzheimer’s disease: progress and problems on the road to
therapeutics. Science 297, 353–356 (2002).
84. Bowen, D. J. Haemophilia A and haemophilia B: molecular
insights. Mol. Pathol. 55, 1–18 (2002).
85. Hauri, H. P., Kappeler, F., Andersson, H. & Appenzeller, C.
ERGIC-53 and traffic in the secretory pathway. J. Cell. Sci.
113, 587–596 (2000).
86. Bignell, G. R. et al. Identification of the familial cylindromatosis
tumour-suppressor gene. Nature Genet. 25, 160–165
(2000).
87. Wang, J. et al. Inherited human caspase 10 mutations
underlie defective lymphocyte and dendritic cell apoptosis in
autoimmune lymphoproliferative syndrome type II. Cell 98,
47–58 (1999).
88. Boatright, K. M. et al. A unified model for apical caspase
activation. Mol. Cell 11, 529–541 (2003).
89. Huang, Y. & Wang, K. K. The calpain family and human
disease. Trends Mol. Med. 7, 355–362 (2001).
90. Levy, G. G. et al. Mutations in a member of the ADAMTS
gene family cause thrombotic thrombocytopenic purpura.
Nature 413, 488–494 (2001).
91. Guipponi, M. et al. The transmembrane serine protease
(TMPRSS3) mutated in deafness DFNB8/10 activates the
epithelial sodium channel (ENaC) in vitro. Hum. Mol. Genet.
11, 2829–2836 (2002).
92. Citron, M. et al. Mutant presenilins of Alzheimer’s disease
increase production of 42-residue amyloid β-protein in both
transfected cells and transgenic mice. Nature Med. 3, 67–72
(1997).
93. Gehring, N. H. et al. Increased efficiency of mRNA 3′ end
formation: a new genetic mechanism contributing to
hereditary thrombophilia. Nature Genet. 28, 389–392 (2001).
94. De Jonghe, C. et al. Aberrant splicing in the presenilin-1
intron 4 mutation causes presenile Alzheimer’s disease by
increased Aβ42 secretion. Hum. Mol. Genet. 8, 1529–1540
(1999).
95. Molinari, F. et al. Truncating neurotrypsin mutation in
autosomal recessive nonsyndromic mental retardation.
Science 298, 1779–1781 (2002).
96. Casari, G. et al. Spastic paraplegia and OXPHOS impairment
caused by mutations in paraplegin, a nuclear-encoded
mitochondrial metalloprotease. Cell 93, 973–983 (1998).
97. Chun, H. J. et al. Pleiotropic defects in lymphocyte activation
caused by caspase-8 mutations lead to human
immunodeficiency. Nature 419, 395–399 (2002).
98. Belaaouaj, A. et al. Mice lacking neutrophil elastase reveal
impaired host defense against Gram negative bacterial
sepsis. Nature Med. 4, 615–618 (1998).
99. Horwitz, M., Benson, K. F., Person, R. E., Aprikyan, A. G. &
Dale, D. C. Mutations in ELA2, encoding neutrophil elastase,
define a 21-day biological clock in cyclic haematopoiesis.
Nature Genet. 23, 433–436 (1999).
100. Pendás, A. M. et al. Defective prelamin A processing and
muscular and adipocyte alterations in Zmpste24
metalloproteinase-deficient mice. Nature Genet. 31, 94–99
(2002).
Together with references 101 and 102, this paper is an
example of the usefulness of mouse models and
genetic approaches to identify the in vivo substrates of
proteases.
101. Li, Q., Park, P. W., Wilson, C. L. & Parks, W. C. Matrilysin
shedding of syndecan-1 regulates chemokine mobilization
and transepithelial efflux of neutrophils in acute lung injury.
Cell 111, 635–646 (2002).
102. Wilson, C. L. et al. Regulation of intestinal α-defensin
activation by the metalloproteinase matrilysin in innate host
defense. Science 286, 113–117 (1999).
103. Ranger, A. M., Malynn, B. A. & Korsmeyer, S. J. Mouse
models of cell death. Nature Genet. 28, 113–118 (2001).
104. Rakic, J. M. et al. Role of plasminogen activator-plasmin
system in tumor angiogenesis. Cell Mol. Life Sci. 60,
463–473 (2003).
105. Lund, L. R. et al. Functional overlap between two classes of
matrix-degrading proteases in wound healing. EMBO J. 18,
4645–4656 (1999).
106. Blasi, F. & Carmeliet, P. uPAR: a versatile signalling
orchestrator. Nature Rev. Mol. Cell Biol. 3, 932–943 (2002).
107. Holmbeck, K. et al. MT1–MMP-deficient mice develop
dwarfism, osteopenia, arthritis, and connective tissue disease
due to inadequate collagen turnover. Cell 99, 81–92 (1999).
108. Zhou, Z. et al. Impaired endochondral ossification and
angiogenesis in mice deficient in membrane-type matrix
metalloproteinase I. Proc. Natl Acad. Sci. USA 97,
4052–4057 (2000).
NATURE REVIEWS | GENETICS
109. Caterina, J. J. et al. Enamelysin (matrix metalloproteinase
20)-deficient mice display an amelogenesis imperfecta
phenotype. J. Biol. Chem. 277, 49598–49604 (2002).
110. Coussens, L. M., Shapiro, S. D., Soloway, P. D. & Werb, Z.
Models for gain-of-function and loss-of-function of MMPs:
transgenic and gene targeted mice. Methods Mol. Biol. 151,
149–179 (2001).
111. Wilson, S. M. et al. Synaptic defects in ataxia mice result from
a mutation in Usp14, encoding a ubiquitin-specific protease.
Nature Genet. 32, 420–425 (2002).
An interesting example of a mouse disease that is
caused by a mutation in a protease gene, the human
orthologue of which has not yet been linked to an
equivalent disorder.
112. Neuhold, L. A. et al. Postnatal expression in hyaline cartilage
of constitutively active human collagenase-3 (MMP-13)
induces osteoarthritis in mice. J. Clin. Invest. 107, 35–44
(2001).
113. Yu, Y. & Bradley, A. Engineering chromosomal
rearrangements in mice. Nature Rev. Genet. 2, 780–790
(2001).
114. Stanford, W. L., Cohn, J. B. & Cordes, S. P. Gene-trap
mutagenesis: past, present and beyond. Nature Rev. Genet.
2, 756–768 (2001).
115. Southan, C. A genomic perspective on human proteases as
drug targets. Drug Discov. Today 6, 681–688 (2001).
A discussion of the relevance of proteases as
therapeutic targets.
116. Overall, C. M. & López-Otín, C. Strategies for MMP inhibition
in cancer: innovations for the post-trial era. Nature Rev.
Cancer 2, 657–672 (2002).
117. Soto, C. Protein misfolding and disease; protein refolding and
therapy. FEBS Lett. 498, 204–207 (2001).
118. Crowther, D. C. Familial conformational diseases and
dementias. Hum. Mutat. 20, 1–14 (2002).
119. Cushman, D. W. & Ondetti, M. A. Design of angiotensin
converting enzyme inhibitors. Nature Med. 5, 1110–1113
(1999).
Together with reference 120, this article represents an
example of the successful introduction of protease
inhibitors to treat human disease.
120. Menendez-Arias, L. Targeting HIV: antiretroviral therapy and
development of drug resistance. Trends Pharmacol. Sci. 23,
381–388 (2002).
121. Coussens, L. M., Fingleton, B. & Matrisian, L. M. Matrix
metalloproteinase inhibitors and cancer: trials and
tribulations. Science 295, 2387–2392 (2002).
An excellent analysis of the lack of success of most
MMP inhibitors developed for treating cancer and
discussion of alternatives for future improvement in
this field.
122. Gomis-Ruth, F. X. et al. Mechanism of inhibition of the human
matrix metalloproteinase stromelysin-1 by TIMP-1. Nature
389, 77–81 (1997).
123. Bode, W. & Huber, R. Structural basis of the endoproteinaseprotein inhibitor interaction. Biochim. Biophys. Acta 1477,
241–252 (2000).
124. Vendrell, J., Querol, E. & Aviles, F. X.
Metallocarboxypeptidases and their protein inhibitors:
structure, function and biomedical properties. Biochim.
Biophys. Acta 1477, 284–298 (2000).
125. Morgunova, E., Tuuttila, A., Bergmann, U. & Tryggvason, K.
Structural insight into the complex formation of latent matrix
metalloproteinase 2 with tissue inhibitor of metalloproteinase
2. Proc. Natl Acad. Sci. USA 99, 7414–7419 (2002).
126. Turk, V., Turk, B. & Turk, D. Lysosomal cysteine proteases:
facts and opportunities. EMBO J. 20, 4629–4633 (2001).
127. Anel, R. L. & Kumar, A. Experimental and emerging therapies
for sepsis and septic shock. Expert Opin. Investig. Drugs 10,
1471–1485 (2001).
128. Desnick, R. J. & Schuchman, E. H. Enzyme
replacement and enhancement therapies: lessons from
lysosomal disorders. Nature Rev. Genet. 3, 954–966 (2002).
A comprehensive review that discusses the successes
and shortcomings of present strategies to treat
inherited metabolic disorders.
129. Roth, D. A. et al. Human recombinant factor IX: safety and
efficacy studies in hemophilia B patients previously treated
with plasma-derived factor IX concentrates. Blood 98,
3600–3606 (2001).
130. Selkoe, D. J. Deciphering the genesis and fate of
amyloid β-protein yields novel therapies for Alzheimer
disease. J. Clin. Invest. 110, 1375–1381 (2002).
131. Kay, M. A. et al. Evidence for gene transfer and expression of
factor IX in haemophilia B patients treated with an AAV
vector. Nature Genet. 24, 257–261 (2000).
132. Olson, M. V. & Varki, A. Sequencing the chimpanzee
genome: insights into human evolution and disease. Nature
Rev. Genet. 4, 20–28 (2003).
An excellent analysis of the relevance of comparative
genomics and discussion of the argument that gene
VOLUME 4 | JULY 2003 | 5 5 7
© 2003 Nature Publishing Group
REVIEWS
133.
134.
135.
136.
137.
138.
558
loss might be an important mechanism of rapid
evolutionary change.
Kheradmand, F. & Werb, Z. Shedding light on sheddases:
role in growth and development. Bioessays 24, 8–12
(2002).
Together with reference 134, this review describes
the functional relevance of the protease-mediated
process of ectodomain shedding of membrane
proteins.
Arribas, J. & Borroto, A. Protein ectodomain shedding.
Chem. Rev. 102, 4627–4638 (2002).
Rudner, D. Z., Fawcett, P. & Losick, R. A family of
membrane-embedded metalloproteases involved in
regulated proteolysis of membrane-associated
transcription factors. Proc. Natl Acad. Sci. USA 96,
14765–14770 (1999).
Hoppe, T., Rape, M. & Jentsch, S. Membrane-bound
transcription factors: regulated release by RIP or RUP. Curr.
Opin. Cell Biol. 13, 344–348 (2001).
Brown, M. S., Ye, J., Rawson, R. B. & Goldstein, J. L.
Regulated intramembrane proteolysis: a control mechanism
conserved from bacteria to humans. Cell 100, 391–398
(2000).
An excellent analysis of the fascinating process that
involves the participation of proteases that hydrolyze
their substrates in the hydrophobic environment of the
lipid bilayers.
Hopkins, A. L. & Groom, C. R. The druggable
genome. Nature Rev. Drug Discov. 1, 727–730
(2002).
139. McLysaght, A., Hokamp, K. & Wolfe, K. H. Extensive
genomic duplication during early chordate evolution. Nature
Genet. 31, 200–204 (2002).
140. Samonte, R. V. & Eichler, E. E. Segmental duplications and
the evolution of the primate genome. Nature Rev. Genet. 3,
65–72 (2002).
141. Ross, J., Jiang, H., Kanost, M. R. & Wang, Y. Serine
proteases and their homologs in the Drosophila
melanogaster genome: an initial analysis of sequence
conservation and phylogenetic relationships. Gene 304,
117–131 (2003).
142. Lespinet, O., Wolf, Y. I., Koonin, E. V. & Aravind, L.
The role of lineage-specific gene family expansion in the
evolution of eukaryotes. Genome Res. 12, 1048–1059
(2002).
143. Nardi, J. B., Martos, R., Walden, K. K., Lampe, D. J. &
Robertson, H. M. Expression of lacunin, a large multidomain
extracellular matrix protein, accompanies morphogenesis of
epithelial monolayers in Manduca sexta. Insect Biochem.
Mol. Biol. 29, 883–897 (1999).
Acknowledgments
We thank all members of our laboratories for their comments on
the manuscript and apologize for the omission of relevant works
owing to space constraints. Our work is supported by grants from
the Ministerio de Ciencia y Tecnología-Spain, the Gobierno del
Principado de Asturias, Fundación ‘La Caixa’ and the European
Union. C.M.O. is supported by a Canada Research Chair in
Metalloproteinase Biology. The Instituto Universitario de Oncología
is supported by Obra Social Cajastur-Asturias, Spain.
| JULY 2003 | VOLUME 4
Online links
DATABASES
The following terms in this article are linked online to:
LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink
ACE | ADAM33 | ADAMTS13 | APP | C1r | C1s |CAPN3 | CAPN10
| CASP10 | CYLD | ELA1 | Ela2 | KLK3 | LMAN1 | MMP1 | PSEN1 |
PSEN2 | Ren2 | Uchl4 | Usp14 | USP17
OMIM: http://www.ncbi.nlm.nih.gov/Omim
Alzheimer disease | asthma | congenital neutropenia | cyclic
haematopoiesis | familial cylindromatosis | haemophilia A |
haemophilia B Leyden | Huntington disease |
hyperhomocysteinemia | hyperprothrombinemia | limb-girdle
muscular dystrophy type 2A | multiple myeloma | thrombotic
thrombocytopenic purpura | type II autoimmune
lymphoproliferative syndrome | type-2 diabetes
FURTHER INFORMATION
Celera Discovery System:
http://www.celeradiscoverysystem.com
Chris Overall’s Laboratory: http://www.clip.ubc.ca
Ensembl: http://www.ensembl.org
Interpro: http://www.ebi.ac.uk/interpro
Lopez-Otin’s Laboratory: http://web.uniovi.es/degradome
MEROPS: http://merops.sanger.ac.uk
NCBI: http://www.ncbi.nlm.nih.gov
Pfam: http://www.sanger.ac.uk/Software/Pfam
Protpars: http://evolution.genetics.washington.edu/phylip.html
SMART: http://smart.ox.ac.uk
Access to this interactive links box is free online.
www.nature.com/reviews/genetics
© 2003 Nature Publishing Group
Supplementary Tables S1–S5
Human and mouse proteases are divided into five classes, which are subdivided into families according to the MEROPS database criteria (Tables S1–S5).
We have provided the MEROPS code for all enzymes for which they are available. There are some conflicting cases in which different codes have been
previously assigned to human and mouse protease genes that were shown in this work to be true orthologues. In these cases, the human code is proposed for
both orthologues. The genes encoding protease-like proteins that show changes in crucial residues for proteolytic activity are indicated as ‘np’ (non-protease
homologues) after the code.
The Locus link or nucleotide accession number is provided for each protease. The information for human enzymes is labelled in green and for mouse in
yellow. Genes that are absent from human or mouse are labelled in red. Genes that have been inactivated by mutation in one species, but are functional in the
other, are labelled in pink. Although these specific pseudogenes have been included in the Tables to emphasize the human–mouse difference, they have not
been incorporated into the final counts of protease genes. Genes that have been verified experimentally, but the sequence of which is missing from the
available genome sequences, are indicated in red and in parentheses. ‘Y’ indicates that the corresponding human and mouse genes are syntenic. The
percentage of identities between orthologous proteases are also shown.
Table S1 | Aspartic proteases
Code
A01.001
A01.003
A01.004
A01.041
A01.006
A01.007
A01.009
A01.010
A01.046
A01.008
Peptidase
pepsin A
pepsin C
β-secretase 1
β-secretase 2
chymosin
renin
cathepsin D
cathepsin E
napsin A
submandibular renin
Human Gene
PGA3/4/5
PGC
BACE
BACE2
#CYMP
REN
CTSD
CTSE
NAP1
LocusLink
5219
5225
23621
25825
1542
5972
1509
1510
9476
Locus
11q12
6p12
11q23
21q22
1p13
1q31
11p15
(1q31)
19q13
Mouse Gene
Pepf
Pgc
Bace
Bace2
Cymp
Ren1
Ctsd
Ctse
Kdap
Ren2
LocusLink
58803
109820
23821
56175
229697
19701
13033
13034
16541
19702
Locus
19B
17C
9B
16C4
3F3
1E4
7F5
1E4
7B2
1E4
Syntenic
y
y
y
y
y
y
y
y
y
Identity
55
73
96
88
A02.059
A02.xxx
A02.xxx
A02.xxx
A02.xxx
DDI-related protease
DNA-damage inducible protein
DNA-damage inducible protein 2
Nuclear recept. interacting prot. 2
Nuclear recept. interacting prot. 3
DDI-RP
DDI1
DDI2
NRIP2
NRIP3
151516
AK093336
BN000122
83714
56675
2p13
11q22
1p36
12p13
11p15
Ddi-rp
Ddi1
Ddi2
Nrip2
Nrip3
67855
71829
BC021415
60345
78593
6D2
9A1
4E1
6F3
7E3
y
y
y
y
y
88
81
96
83
93
69
81
82
71
A22.001
A22.002
A22.005
A22.006
A22.003
A22.004
A22.007
presenilin 1
presenilin 2
presenilin homologue 1/SPPL3
presenilin homologue 2
presenilin homologue 3/SPP
presenilin homologue 4/SPPL2B
presenilin homologue 5
PSEN1
PSEN2
PSH1
PSH2
PSH3
PSH4
PSH5
5663
5664
121665
162540
81502
56928
84888
14q24
1q42
12q24
17q21
20q11
19p13
15q21
Psen1
Psen2
Psh1
Psh2
Psh3
Psh4
Psh5
19164
19165
83678
237958
14950
73218
66552
12D3
1H4
5F
11D
2H2
10C1
2F2
y
y
y
y
y
y
y
92
95
96
70
96
83
83
Ax1.xxx
GCDFP15
PIP
5304
7q34
Pip
18716
6B2
y
47
Ax1.xxxnp
seminal vesicle antigen
Sva
20939
6B2
Ax1.xxxnp
seminal vesicle antigen-like 1
Sval1
71578
6B2
Ax1.xxxnp
seminal vesicle antigen-like 2
Sval2
84543
6B2
Ax1.xxxnp
seminal vesicle antigen-like 3
Sval3
232737
6B2
These are divided into four families: A01, A02, A22 and Ax1. There are several pepsinogen A isozymogens encoded by highly related genes (>95%
identities) that form part of a cluster located at 11q12. The individual pepsinogen A isozymogens result from haplotypes that contain different number of
1,2
genes (ranging from 1 to 4) . In agreement with other databases, this region has been annotated as a single gene in human. According to the criteria
discussed above, we have assigned mouse pepsinogen F as the orthologue of human pepsinogen A, despite notable divergence of their structure and
3
regulation . Ren2 is absent in some strains of laboratory mice. The gene that encodes prochymosin has been inactivated by mutations and frameshifts in
4
the human genome and is classified as a pseudogene, although in mouse and other species it is functional .
The genes DDI1, DDI2, DDI-RP, NRIP2 and NRIP3 are included in the family A02 that contains predicted retroviral-like aspartic proteases5. All of these have
mouse orthologues at syntenic regions, and are not embedded in endogenous retroviral elements. The human and mouse genomes also contain several
aspartic protease-related sequences derived from endogenous retrovirus, but we have not annotated these as human or mouse proteases. In this regard, it
is remarkable that most of the retroviruses embedded in both genomes have suffered inactivating mutations, also affecting the putative proteases that are
encoded by these viral elements. However, HERV-K113, for example, which is located at 19p13 in ~30% of the human population, has intact open-reading
frames for all viral proteins, including the corresponding aspartic protease, and remains capable of reinfecting human today6. The catalogue of aspartic
proteases also includes a new family that is derived from the protein prolactin inducible protein/gross cystic disease fluid protein-15 (PIP/GCDFP15), which
has recently been characterized as a protease belonging to this class of enzymes7. The four PIP-related proteins lack residues proposed to be essential for
PIP proteolytic activity and have been classified as non-protease homologues.
Table S2 | Cysteine proteases
Code
C01.060
C01.070
C01.018
C01.040
C01.036
C01.032
C01.009
C01.034
C01.037
C01.013
C01.038
C01.023
C01.051
C01.042
C01.016
C01.031
C01.053
C01.045
Peptidase
cathepsin B
cathepsin C
cathepsin F
cathepsin H
cathepsin K
cathepsin L
cathepsin L2
cathepsin S
cathepsin W
cathepsin Z
cathepsin J
cathepsin M
cathepsin Q
cathepsin R
cathepsin-1
cathepsin-2
cathepsin-3
cathepsin-6
Human Gene
CTSB
CTSC
CTSF
CTSH
CTSK
CTSL
CTSL2
CTSS
CTSW
CTSZ
LocusLink
1508
1075
8722
1512
1513
1514
1515
1520
1521
1522
Locus
8p23
11q14
11q13
15q24
1q21
9q21
9q22
1q21
11q13
20q13
Mouse Gene
Ctsb
Ctsc
Ctsf
Ctsh
Ctsk
LocusLink
13030
13032
56464
13036
13038
Locus
14C3
7E1
19A
9E3
3F2
Syntenic
y
y
y
y
y
Identity
78
77
78
83
86
Ctsl
Ctss
Ctsw
Ctsz
Ctsj
Ctsm
Ctsq
Ctsr
Cts1
Cts2
Cts3
Cts6
13039
13040
13041
64138
26898
64139
104002
56835
116909
56094
117066
58518
13B3
3F2
19A
2H4
13B3
13B3
13B3
13B3
13B3
13B3
13B3
13B3
y
y
y
y
75
73
68
83
C01.973np
C01.975np
tubulointerstitial nephritis antigen
TINAG related protein
TINAG
LCN7
27283
64129
6p12
1p35
Tinag
Lcn7
26944
94242
9E1
4D3
y
y
85
90
C01.972np
C01.xxxnp
C01.xxx
testin
testin-2
testin-3
Cmb22/23
Cmb24
Cmb25
214639
70202
BY736040
13B3
13B3
13B3
C01.084
bleomycin hydrolase
BLMH
642
17q11
Blmh
104184
11B5
y
93
C02.001
calpain 1
CAPN1
823
11q13
Capn1
12333
19A
y
89
C02.002
C02.004
C02.011
C02.971np
C02.008
C02.007
C02.006
C02.018
C02.013
C02.017
C02.020
C02.xxx
C02.010
calpain 2
calpain 3
calpain 5
calpain 6
calpain 7
calpain 8
calpain 9
calpain 10
calpain 11
calpain 12
calpain 13
calpain 14
calpain 15/Sol protein
CAPN2
CAPN3
CAPN5
CAPN6
CAPN7
CAPN8
CAPN9
CAPN10
CAPN11
CAPN12
CAPN13
CAPN14
SOLH
824
825
726
827
23473
AA043093
10753
11132
11131
147968
92291
114773
6650
1q42
15q15
11q13
Xq23
3p25
(1q42)
1q42
2q37
6p21
19q13
2p23
2p23
16p13
Capn2
Capn3
Capn5
Capn6
Capn7
Capn8
Capn9
Capn10
Capn11
Capn12
Capn13
12334
12335
12337
12338
12339
170725
73647
23830
103998
60594
240159
1H4
2F1
7F1
XF2
14B
1H4
8E2
1D
17C
7A3
17E2
y
y
y
y
y
y
y
y
y
y
y
93
93
92
95
95
72
85
81
83
87
62
Solh
50817
17B1
y
89
C12.001
C12.003
C12.004
C12.005
C12.007
C12.xxx
ubiquitin C-terminal hydrolase 1
ubiquitin C-terminal hydrolase 3
ubiquitin C-term. hydrolase BAP1
ubiquitin C-terminal hydrolase 5
ubiquitin C-terminal hydrolase 4
cylindromatosis protein
UCHL1
UCHL3
BAP1
UCHL5
7345
7347
8314
51377
4p14
13q22
3p21
1q31
16q12
5D
14E2
14B
1F
9D
8C4
94
98
93
96
1540
22223
50933
104416
56207
93841
74256
y
y
y
y
CYLD1
Uchl1
Uchl3
Bap1
Uchl5
Uchl4
Cyld1
y
95
C13.004
C13.xxx
C13.005
legumain
legumain-2
hGPI8
LGMN
LGMN2
PIGK
5641
122199
10026
14q32
13q21
1p31
Lgmn
19141
12F1
y
82
Pigk
66613
3H4
y
94
C14.001
C14.006
C14.003
C14.007
C14.008
C14.005
caspase-1
caspase-2
caspase-3
caspase-4/11
caspase-5
caspase-6
CASP1
CASP2
CASP3
CASP4
CASP5
CASP6
834
835
836
837
838
839
11q22
7q34
4q35
11q22
11q22
4q25
Casp1
Casp2
Casp3
Casp11
12362
12366
12367
12363
9A1
6B2
8B2
9A1
y
y
y
y
62
89
87
60
Casp6
12368
3H1
y
90
C14.004
C14.009
C14.010
C14.011
C14.013
C14.018
C14.026
caspase-7
caspase-8
caspase-9
caspase-10
caspase-12
caspase-14
paracaspase
CASP7
CASP8
CASP9
CASP10
#CASP12
CASP14
MALT1
840
841
842
843
120329
23581
10892
10q25
2q33
1p36
2q33
11q22
19p13
18q21
Casp7
Casp8
Casp9
12369
12370
12371
19D2
1C2
4E1
y
y
y
82
62
72
Casp12
Casp14
Malt1
12364
12365
240354
9A1
10C1
18E1
y
y
y
74
90
C14.020np
C14.971np
C14.975np
homologue ICEY
casper
caspase-14-like
ICEYH
CFLAR
CASP14L
120332
8837
197350
11q22
2q33
16p13
Cflar
Casp14L
12633
1C2
17A3
y
y
68
78
C15.010
C15.011
pyroglutamyl peptidase I
pyroglutamyl-peptidase II
PGPEP1
PGPEP2
65074
145814
19p13
15q26
Pgpi
Pgpep2
66522
78444
8C1
7C
y
y
95
71
C19.019
C19.013
C19.026
C19.010
C19.001
C19.009
C19.016
C19.011
C19.017
C19.028
C19.018
C19.014
C19.020
C19.012
C19.015
C19.022
USP1
USP2
USP3
USP4
USP5
USP6
USP7
USP8
USP9X
USP9Y
USP10
USP11
USP12
USP13
USP14
USP15
USP1
USP2
USP3
USP4
USP5
USP6
USP7
USP8
USP9X
USP9Y
USP10
USP11
USP12
USP13
USP14
USP15
7398
9099
9960
7375
8078
9098
7874
9101
8239
8287
9100
8237
9959
8975
9097
9958
1p31
11q23
15q22
3p21
12p13
17p13
16p13
15q21
Xp11
Yq11
16q24
Xp11
13q12
3q26
18p11
12q14
Usp1
Usp2
Usp3
Usp4
Usp5
230484
53376
235441
22258
22225
4C6
9B
9D
9F2
6F2
y
y
y
y
y
88
95
98
90
98
Usp7
Usp8
Usp9x
Usp9y
Uchrp
Usp11
Ubh1
108732
84092
22284
107868
22224
236733
22217
16A3
2F2
XA1
(Y)
8E1
XA2
5G2
y
y
y
y
y
y
99
82
98
82
83
85
98
Usp14
Usp15
59025
14479
18A1
10D3
y
y
96
94
C19.021
C19.023
C19.xxx
C19.030
C19.024
C19.025
C19.034
C19.035
C19.047
C19.041
C19.046
C19.075
C19.054
C19.040
C19.060
C19.071
C19.044
C19.037
C19.067
C19.059
C19.042
C19.053
C19.056
C19.972np
C19.069
C19.xxx
C19.048
C19.xxx
C19.057
C19.975
C19.052
USP16
USP17
USP17-like
USP18
USP19
USP20
USP21
USP22
USP24
USP25
USP26
USP27
USP28
USP29
USP30
USP31
NY-REN-60
VDU1
USP34
USP35
USP36
USP37
HP43.8KD
SAD1
USP40
USP41
USP42
USP43
USP44
USP45
USP46
USP16
USP17
USP17L
USP18
USP19
USP20
USP21
USP22
USP24
USP25
USP26
USP27
USP28
USP29
USP30
USP31
USP32
USP33
USP34
USP35
USP36
USP37
USP38
USP39
USP40
USP41
USP42
USP43
USP44
USP45
USP46
10600
23661
BN000116
11274
10869
10868
27005
23326
23358
29761
83844
AW851065
57646
57663
84749
57478
84669
23032
9736
57558
57602
57695
84640
10713
55230
150200
84132
124739
84101
85015
64854
21q21
4p16
8p23
22q11
3p11
9q34
1q22
17p11
1p32
21q11
Xq26
Xp11
11q23
19q13
12q23
16p12
17q23
1p31
2p15
11q13
17q25
2q35
4q31
2p11
2q37
22q11
7p22
17p12
12q21
6q16
4q12
Usp16
74112
16C3
y
82
Usp18
Usp19
Usp20
Usp21
Usp22
Usp24
Usp25
Usp26
Usp27
Usp28
Usp29
Usp30
Usp31
Usp32
Usp33
24110
71472
74270
30941
216825
72686
30940
83563
54651
235323
57775
100756
209833
237899
170822
6F2
9F2
2B
1H2
11B4
4C7
16C3
XA3
XA1
9B
7A1
5F
7F2
11B5
3H4
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
70
79
94
96
93
97
95
36
97
98
45
90
90
94
92
Usp35
Usp36
244144
72344
7E3
12F2
y
y
82
74
Usp38
Usp39
Usp40
74841
28035
227334
8C3
6C3
1C5
y
y
y
72
87
81
Usp42
Usp43
Usp44
Usp45
Usp46
76800
216835
214955
77593
100664
5G2
11B3
10C2
4A3
5C3
y
y
y
y
y
81
76
87
79
99
C19.055
C19.068
C19.073np
C19.058np
C19.065
C19.xxxnp
C19.031
C19.032
C19.xxx
C19.xxx
C19.xxx
C19.xxx
USP47
USP48
USP49
USP50
USP51
USP52
DUB-1
DUB-2
DUB2a
DUB2a-like
DUB2a-like2
DUB6
USP47
USP48
USP49
USP50
USP51
USP52
55031
84196
25862
AI990110
BF741256
9924
11p15
1p36
6p21
15q21
Xp11
12q13
Usp47
Usp48
Usp49
Usp50
320745
170707
224836
75083
7F2
4D3
17C
2F2
y
y
y
y
94
95
80
75
Usp52
Dub1
Dub2
Dub3
Dub4
Dub5
Dub6
103135
13531
13532
AF393638
AF393637
BAC40791
BN000117
10D3
7F2
7F1
7F1
7F1
7F1
7F1
y
97
C26.001
γ-glutamyl hydrolase
GGH
8836
8q12
Ggh
14590
4A3
y
69
C44.001
Gln-PRPP amidotransferase
PPAT
5471
4q12
Ppat
231327
5E1
y
93
C44.971np
C44.972np
C44.973np
Gln-fructose-6-P transamidase 1
Gln-fructose-6-P transamidase 2
Gln-fructose-6-P transamidase 3
GFPT1
GFPT2
GFPT3
2673
9945
203431
2p13
5q35
Xq21
Gfpt1
Gfpt2
#Gfpt3
14583
14584
6D2
11B1
XC3
y
y
y
99
98
C46.002
C46.003
C46.004
sonic hedgehog protein
indian hedgehog protein
desert hedgehog protein
SHH
IHH
DHH
6469
3549
50846
7q36
2q35
12q13
Shh
Ihh
Dhh
20423
16147
13363
5A3
1C3
15F2
y
y
y
92
95
97
C48.002
C48.007
C48.003
C48.008
C48.004
C48.009
sentrin/SUMO protease 1
sentrin/SUMO protease 2
sentrin/SUMO protease 3
sentrin/SUMO protease 5
sentrin/SUMO protease 6
sentrin/SUMO protease 7
SENP1
SENP2
SENP3
SENP5
SENP6
SENP7
29843
59343
26168
205564
26054
57337
12q13
3q27
17p13
3q29
6q14
3q12
Senp1
Senp2
Senp3
Senp5
Senp6
Senp7
223870
75826
80886
AK043171
215351
72869
15F2
16B1
11B4
16B2
9E2
16B1
y
y
y
y
y
y
88
71
95
71
81
87
C48.011
C48.016
C48.013
C48.017
C48.015
C48.xxx
C48.xxx
sentrin/SUMO protease 8
sentrin/SUMO protease 9
sentrin/SUMO protease 11
sentrin/SUMO protease 12
sentrin/SUMO protease 13
sentrin/SUMO protease 14
sentrin/SUMO protease 15
SENP8
123228
15q23
Senp8
Senp9
Senp11
Senp12
Senp13
Senp14
Senp15
71599
236870
216394
208231
114671
278823
278824
9C
XA7
10D3
16B5
10A3
1B
1B
y
92
C50.001
separase
ESPL1
9700
12q13
Espl1
105988
15F3
y
78
C54.003
C54.002
C54.004
C54.005
autophagin-1
autophagin-2
autophagin-3
autophagin-4
AUTL1
AUTL2
AUTL3
AUTL4
23192
115201
84938
84971
2q37
Xq23
1p31
19p13
Autl1
Autl2
Autl3
Autl4
66615
102926
242557
235040
1D
XF1
4C6
9A3
y
y
y
y
92
89
86
86
C56.002
DJ-1
DJ-1
11315
1p36
Dj-1
57320
4E1
y
90
Cx1.xxx
Cx1.xxx
Cx1.xxx
Cx1.xxx
Cx1.xxx
Cx1.xxx
Cx1.xxx
Cx1.xxx
Cx1.xxx
Cx1.xxx
Cx1.xxx
Cx1.xxxnp
Cx1.xxx
Cx1.xxx
Hin-1
Hin-1-like
Hin-2
Hin-3
Hin-4
Hin-5
Hin-6
Hin-7
Otubain-1
Otubain-2
TNFa-induced protein 3/A20
TRAF-binding protein domain
Cezanne
Cezanne-2
HSHIN1
HSHIN1L
HSHIN2
HSHIN3
HSHIN4
HSHIN5
HSHIN6
HSHIN7
OTUB1
OTUB2
TNFAIP3
TRABID
CEZANNE
LOC161725
54726
BN000160
79868
254897
220213
55593
139562
BI829009
55611
78990
7128
54764
56957
161725
4q31
12p13
Xq23
1p36
10p12
Xp11
Xq13
1q32
11q13
14q32
6q23
10q26
1q21
15q13
Hshin1
234484
8C3
y
91
Hshin2
Hshin3
Hshin4
Hshin5
Hshin6
Hshin7
Otub1
Otub2
Tnfaip3
Trabid
Cezanne
AJ430384
245656
73162
71198
54644
236924
226418
107260
68149
21929
BN000126
AAH37040
170711
XF2
4D3
2A2
XA1
XC2
1E4
19A
12F1
10A2
7F4
3F2
7C
y
y
y
y
y
y
y
y
y
y
y
y
65
73
90
91
64
92
99
95
90
99
93
95
Cx1.xxx
Cx1.xxxnp
CGI-77
CGI-77b
CGI77
51633
8q21
Cgi77
Cgi77b
72201
236778
4A2
XA3
y
87
Cx2.xxxnp
HetF-like
HETFL
23331
22q12
Hetfl
209683
5F
y
85
The cysteine proteases belong to 16 different families, and include proteins such as hedgehog family members, the protease function of which is only used
8
for the autolytic processing of their respective precursors . The C01 family is largely expanded in the mouse as a result of the presence of placental
cathepsins and testins. We have annotated two further mouse testins, including testin-3, which was the first member of this subfamily predicted to be a
functional protease. There are two functional human cathepsin L-like genes (CTSL and CTSL2) at 9q21, and a single gene in the mouse, which is more
closely related to CTSL2. The cylindromatosis protein contains an ubiquitin C-terminal hydrolase domain and has been included in the C12 family. The genes
for calpain 14, caspase 5 and caspase 10 are absent in mice, and the human gene for caspase 12 has been inactivated and is therefore classified as a
pseudogene. We have annotated a second human legumain-like gene that is absent in mouse.
The C19 family of ubiquitin specific proteases (USPs) is large and complex. We have annotated 21 human members (USP30, 31, 34–52) and assigned their
corresponding mouse orthologues. We have not found mouse orthologues for human USP6, -13, -34, -37, -42 and -51. USP17 is located within the RS447
9
human megasatellite at 4p15 . This region is highly polymorphic in the human genome, containing a variable number of USP17-related intronless tandemlyrepeated sequences (>95% identical), which have probably been generated by retrotransposition. Forty-four distinct alleles in 74 unrelated chromosomes
10
containing 20–103 copies of the RS477 unit have been identified . We have also identified several USP17-related sequences in a cluster located at 8p25.
This cluster would contain at least seven USP17-like (USP17L) intronless genes (three of these are classified as non-protease homologues) and
pseudogenes. The proteins encoded by these polymorphic and variable regions have been annotated as two single proteases (USP17 and USP17L) in this
table. The closest relatives of USP17 genes in the mouse genome are those that code for proteins called DUBs (deubiquitinating enzymes). DUB1, DUB2,
and DUB2A have been extensively characterized as members of a novel group of cytokine-inducible deubiquitylating enzymes that are produced by
11–13
lymphocytes
. We have annotated three further members of this subfamily of haematopoietic proteases. The classification of mouse DUBs as orthologues
of human USP17 genes is doubtful because, despite sequence similarities, their syntenic relationship is unclear. Accordingly, we have tentatively classified
them as paralogous genes.
We have annotated six members of the C48 family of SUMO-1 proteases in the mouse genome, which are absent in the human genome. We have also
included a family of recently described cysteine proteases with deubiquitylating activity containing the OTU-protease domain and tentatively called
14,15
otubains
. This family should comprise 14 orthologues and one specific member in both human and mouse. All of them contain characteristic features of
active proteases with the exception of TRABID and murine Cgi77b. The last protease included in our list of cysteine proteases is called HetF-like and forms
16
part of the superfamily of caspase-haemoglobinase fold proteases . Human and mouse HetF-like have a serine residue instead of the active-site cysteine
present in cysteine proteases, and have been classified as non-protease homologues.
Table S3 | Metalloproteases
Code
M01.003
M01.014
M01.023
M01.001
M01.018
M01.004
M01.008
M01.010
M01.011
M01.022
M01.028
M01.027
M01.972np
Peptidase
aminopeptidase A
aminopeptidase B
aminopeptidase MAMS
aminopeptidase N
aminopeptidase PILS
leukotriene A4 hydrolase
pyroglutamyl-peptidase II
cytosol alanyl aminopeptidase
leucyl-cystinyl aminopeptidase
aminopeptidase B-like 1
aminopeptidase O
aminopeptidase Q
TBP-associated factor 2
Human Gene
ENPEP
RNPEP
AMPEP
ANPEP
ARTS1
LTA4H
TRHDE
NPEPPS
LNPEP
RNPEPL1
AOPEP
AQPEP
TAF2
LocusLink
2028
6051
64167
290
51752
4048
29953
9520
4012
57140
84909
BG623101
6873
Locus
4q26
1q32
5q15
15q25
5q21
12q23
12q21
17q21
5q15
2q37
9q22
5q23
8q24
Mouse Gene LocusLink
Enpep
13809
Rnpep
215615
Locus
3H1
1F
Syntenic
y
y
Identity
77
86
Anpep
Arts1
Lta4h
Trhde
Psa
Lnpep
Rnpepl1
Aopep
Aqpep
Taf2
16790
80898
16993
237553
19155
266720
98480
BAC31943
74574
319944
7D2
13C1
10C2
10D1
11D
13C1
1D
13B3
18C
15D
y
y
y
y
y
y
y
y
y
y
76
85
92
94
97
88
95
72
68
99
M02.001
M02.006
M02.971np
angiotensin-converting enzyme 1
angiotensin-converting enzyme 2
angiotensin-converting enzyme 3
ACE
ACE2
#ACE3
1636
59272
17q23
Xp21
17q23
Ace
Ace2
Ace3
11421
70008
217246
11E1
XF5
11E1
y
y
y
83
82
M03.001
M03.002
M03.006
thimet oligopeptidase
neurolysin
mitochondrial intermediate peptidase
THOP1
NLN
MIPEP
7064
57486
4285
19p13
5q13
13q12
Thop1
Nln
Mipep
50492
75805
70478
10C1
13D1
14C3
y
y
y
89
90
84
M08.003
leishmanolysin-2
LMLN
89782
3q29
Lmln
239833
16B2
y
73
M10.034
M10.001
M10.003
M10.005
M10.008
collagenase-like B
collagenase 1
gelatinase A
stromelysin 1
matrilysin
11q22
16q22
11q22
11q22
Mcolb
Mcola
Mmp2
Mmp3
Mmp7
83996
83995
17390
17392
17393
9A1
9A1
8C5
9A1
9A1
y
y
y
y
59
95
76
70
MMP1
MMP2
MMP3
MMP7
4312
4313
4314
4316
M10.002
M10.004
M10.006
M10.007
M10.009
M10.013
M10.014
M10.015
M10.016
M10.017
M10.021
M10.019
M10.026
M10.022
M10.022
M10.023
M10.024
M10.029
collagenase 2
gelatinase B
stromelysin 2
stromelysin 3
macrophage elastase
collagenase 3
MT1-MMP
MT2-MMP
MT3-MMP
MT4-MMP
MMP19
enamelysin
MMP21
MMP23A
MMP23B
MT5-MMP
MT6-MMP
matrilysin-2
MMP8
MMP9
MMP10
MMP11
MMP12
MMP13
MMP14
MMP15
MMP16
MMP17
MMP19
MMP20
MMP21
MMP23A
MMP23B
MMP24
MMP25
MMP26
4317
4318
4319
4320
4321
4322
4323
4324
4325
4326
4327
9313
118856
8511
8510
10893
64386
56547
11q22
20q13
11q22
22q11
11q22
11q22
14q11
16q22
8q22
12q24
12q13
11q22
10q26
(1p36)
1p36
20q11
16p12
11p15
Mmp8
Mmp9
Mmp10
Mmp11
Mmp12
Mmp13
Mmp14
Mmp15
Mmp16
Mmp17
Mmp19
Mmp20
Mmp21
Mmp23
17394
17395
17384
17385
17381
17386
17387
17388
17389
23948
58223
30800
214766
26561
9A1
2H3
9A1
10B5
9A1
9A1
14C1
8C5
4A3
5F
10D3
9A1
7F4
4E2
y
y
y
y
y
y
y
y
y
y
y
y
y
y
72
72
76
81
61
86
96
87
98
87
78
89
80
83
Mmp24
Mmp25
17391
240047
2H2
17A3
y
y
92
80
M10.027
M10.030
MMP27
epilysin
MMP27
MMP28
64066
79148
11q22
17q12
Mmp27
Mmp28
234911
118453
9A1
11B5
y
y
57
79
M12.002
M12.004
M12.005
M12.016
M12.018
M12.245
meprin α-subunit
meprin β-subunit
procollagen C-protease
mammalian tolloid-like 1 protein
mammalian tolloid-like 2 protein
hatching-metalloprotease
MEP1A
MEP1B
BMP1
TLL1
TLL2
HAMET
4224
4225
649
7092
7093
AJ537600
6p12
18q12
8p21
4q32
10q24
2q11
Mep1a
Mep1b
Bmp1
Tll1
Tll2
Hamet
17287
17288
12153
21892
24087
215095
17C
18A2
14D1
8B3
19D1
2F3
y
y
y
y
y
y
76
77
92
93
91
67
M12.219
M12.201
M12.xxx
DECYSIN
ADAM1a
ADAM1b
ADAMDEC1 27299
#ADAM1
8759
8p21
12q24
Adamdec1
Adam1a
Adam1b
58860
280668
280667
14D1
5F
5F
y
y
65
M12.950np
M12.975np
M12.952np
M12.xxxnp
M12.953np
M12.xxxnp
M12.xxxnp
M12.956np
M12.208
M12.209
M12.210
M12.976np
M12.212
M12.215
M12.217
M12.957np
M12.214
M12.218
M12.234
M12.978np
M12.979np
M12.227
M12.228
M12.229
M12.224
M12.981np
M12.232
M12.960np
M12.244
M12.xxx
M12.xxx
ADAM2/Fertilin-β
ADAM3B
ADAM4
ADAM4B
ADAM5
ADAM6
ADAM6B
ADAM7
ADAM8
ADAM9
ADAM10
ADAM11
ADAM12
ADAM15
ADAM17
ADAM18
ADAM19
ADAM20
ADAM21
ADAM22
ADAM23
testase 1
testase 2
testase 3
ADAM28
ADAM29
ADAM 30
ADAM 32
ADAM 33
testase 4
testase 5
ADAM2
#ADAM3B
#ADAM4
#ADAM4B
#ADAM5
#ADAM6
2515
ADAM7
ADAM8
ADAM9
ADAM10
ADAM11
ADAM12
ADAM15
ADAM17
ADAM18
ADAM19
ADAM20
ADAM21
ADAM22
ADAM23
8756
101
8754
102
4185
8038
8751
6868
8749
8728
8748
8747
53616
8745
8p21
10q26
8p11
15q21
17q21
10q26
1q21
2p25
8p11
5q33
14q24
14q24
7q21
2q33
#ADAM25
137491
8p22
ADAM28
ADAM29
ADAM30
ADAM32
ADAM33
10863
11086
11085
203102
80332
8p21
4q34
1p11
8p11
20p13
8757
8p11
16q12
14q24
14q24
8p11
14q24
Adam2
Adam3
Adam4
Adam4b
Adam5
Adam6
Adam6b
Adam7
Adam8
Adam9
Adam10
Adam11
Adam12
Adam15
Adam17
Adam18
Adam19
11495
11497
11498
AV274161
11499
238406
238405
11500
11501
11502
11487
11488
11489
11490
11491
13524
11492
14D1
8A3
12D3
12D3
8A3
12F2
12F2
14D1
7F5
8A3
9D
11D
7F4
3F1
(12)
8A3
11B3
n
y
y
y
y
y
59
y
y
y
y
y
y
y
y
y
y
66
65
86
96
91
81
80
91
62
82
Adam21
Adam22
Adam23
Adam24
Adam25
Adam26
Adam28
Adam29
Adam30
Adam32
Adam33
Adam34
Adam35
56622
11496
23792
13526
23793
13525
13522
244486
71078
209192
110751
252866
XM_146316
12D3
5A1
1C2
8B1
8B1
8B1
14D1
8B3
3F3
8A3
2F3
8B1
8B1
y
y
y
68
92
94
y
y
y
y
y
y
70
58
63
60
71
M12.xxx
M12.xxx
M12.xxx
M12.247
testase 6
testase 7
testase 8
testase 9
M12.222
M12.301
M12.220
M12.221
M12.225
M12.230
M12.231
M12.226
M12.021
M12.235
M12.237
M12.241
M12.024
M12.025
M12.026
M12.027
M12.028
M12.029
M12.246
ADAMTS1
ADAMTS2
ADAMTS3
ADAMTS4
ADAMTS5/11
ADAMTS6
ADAMTS7
ADAMTS8
ADAMTS9
ADAMTS10
ADAMTS12
ADAMTS13
ADAMTS14
ADAMTS15
ADAMTS16
ADAMTS17
ADAMTS18
ADAMTS19
ADAMTS20
ADAMTS1
ADAMTS2
ADAMTS3
ADAMTS4
ADAMTS5
ADAMTS6
ADAMTS7
ADAMTS8
ADAMTS9
ADAMTS10
ADAMTS12
ADAMTS13
ADAMTS14
ADAMTS15
ADAMTS16
ADAMTS17
ADAMTS18
ADAMTS19
ADAMTS20
9510
9509
9508
9507
11096
11174
11173
11095
56999
81794
81792
11093
140766
170689
170690
170691
170692
171019
80070
M13.001
M13.008
M13.002
M13.003
M13.007
M13.090
neprilysin
neprilysin-2
endothelin-converting enzyme 1
endothelin-converting enzyme 2
DINE peptidase
Kell blood-group protein
MME
MMEL2
ECE1
ECE2
ECEL1
KEL
4311
79258
1889
9718
9427
3792
Adam36
Adam37
Adam38
Adam39
BN000114
BN000115
BN000119
BN000121
8B1
8B1
8B1
8B1
21q21
5q35
4q21
1q23
21q21
5q12
15q24
11q24
3p14
19p13
5p13
9q34
10q22
11q24
5p15
15q26
16q23
5q23
12q12
Adamts1
Adamts2
Adamts3
Adamts4
Adamts5
Adamts6
Adamts7
Adamts8
Adamts9
Adamts10
Adamts12
Adamts13
Adamts14
Adamts15
Adamts16
Adamts17
Adamts18
Adamts19
Adamts20
11504
26550
BAC27597
11505
23794
238832
209798
30806
69070
224698
239227
279028
237360
235130
271127
244028
208937
240324
223838
16C3
11B1
5E2
1H2
16C3
13D1
9E3
9A5
6D3
17B2
15A2
2A3
10B4
9A5
13C1
7C
8E1
18D2
15F1
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
84
88
65
91
91
73
67
81
90
91
88
71
81
91
83
78
73
82
70
3q26
1p36
1p36
3q29
2q37
7q35
Mme
Mell1
Ece1
Ece2
Ecel1
Kel
17380
27390
230857
107522
13599
23925
3E1
4E2
4D3
16B1
1C5
6B2
y
y
y
y
y
y
94
79
93
87
94
74
M13.091
PHEX endopeptidase
PHEX
5251
Xp22
Phex
18675
XF4
y
96
M14.001
M14.002
M14.010
M14.017
M14.020
M14.018
M14.003
M14.009
M14.021
carboxypeptidase A1
carboxypeptidase A2
carboxypeptidase A3
carboxypeptidase A4
carboxypeptidase A5
carboxypeptidase A6
carboxypeptidase B
carboxypeptidase U
carboxypeptidase O
CPA1
CPA2
CPA3
CPA4
CPA5
CPA6
CPB1
CPB2
CPO
1357
1358
1359
51200
93979
57094
1360
1361
130749
7q32
7q32
3q24
7q32
7q32
8q13
3q25
13q14
2q33
Cpa1
Cpa2
Cpa3
Cpa4
Cpa5
Cpa6
Cpb1
Cpb2
#Cpo
109697
232680
12873
215225
76649
329093
76703
56373
269201
6A3
6A3
3A3
6A3
1A3
1A3
3A3
14D2
1C2
y
y
y
y
y
y
y
y
y
74
86
81
84
84
86
72
82
M14.005
M14.004
M14.006
M14.011
M14.012
M14.015np
M14.019np
M14.951np
carboxypeptidase E
carboxypeptidase N
carboxypeptidase M
carboxypeptidase D
carboxypeptidase Z
carboxypeptidase X1
carboxypeptidase X2
adipocyte-enhancer binding prot. 1
CPE
CPN
CPM
CPD
CPZ
CPX1
CPX2
AEBP1
1363
1369
1368
1362
8532
56265
119587
165
4q32
10q25
12q15
17q11
4p16
20p13
10q26
7p13
Cpe
Cpn
Cpm
Cpd
Cpz
Cpx1
Cpx2
Aebp1
12876
93721
70574
12874
242939
56264
55987
11568
8B3
19D1
10D2
11B4
5B1
2F3
7F4
11A1
y
y
y
y
y
y
y
y
97
66
79
93
82
86
89
90
M16.002
M16.003
M16.005
M16.009
M16.971np
M16.973np
M16.974np
M16.976np
insulysin
mitochondrial processing pept. β-sub
nardilysin
pitrilysin metalloprotease 1
mitochondrial processing protease
UCR1
UCR2
mitoch. processing protease-like
IDE
PMPCB
NRD1
PITRM1
INPP5E
UQCRC1
UQCRC2
AMPP
3416
9512
4898
10531
23203
7384
7385
133083
10q24
7q22
1p32
10p15
9q34
3p21
16p12
4q22
Ide
Pmpcb
Nrd1
Pitrm1
Inpp5e
Uqcrc1
Uqcrc2
15925
73078
230598
69617
66865
22273
67003
19C3
5A3
4C7
13A1
2A3
9F2
7F3
y
y
y
y
y
y
y
97
90
93
86
91
88
85
M17.001
leucyl aminopeptidase
LAP3
51056
4p15
Lap3
66988
5B3
y
90
M17.006
aminopeptidase-like 1
NPEPL1
79716
20q13
M18.002
aspartyl aminopeptidase
DNPEP
23549
2q36
Dnpep
13437
1C3
y
90
M19.001
M19.002
M19.004
membrane dipeptidase
membrane dipeptidase 2
membrane dipeptidase 3
DPEP1
DPEP2
DPEP3
1800
64174
64180
16q24
16q22
16q22
Dpep1
Dpep2
Dpep3
13479
244632
71854
8E2
8D2
8D2
y
y
y
73
70
73
M20.005
M20.006
M20.971np
M20.973np
glu-carboxypeptidase-like 1
glu-carboxypeptidase-like 2
HmrA-like protease
aminoacylase
CPGL
CPGL2
HMRALP
ACY1
55748
84735
135293
95
18q22
18q22
6q15
3p21
Cpgl
Cpgl2
Hmralp
Acy1
66054
240478
242377
109652
18E3
18E3
4A5
9F1
y
y
y
y
91
73
83
85
M22.003
M22.004
O-sialoglycoprotein endopeptidase
O-sialoglycoprotein endopeptidase 2
OSGEP
OSGEP2
55644
64172
14q11
2q32
Osgep
Osgep2
66246
72085
14C1
1C1
y
y
93
84
M24.001
M24.002
M24.028
M24.005
M24.007
M24.009
M24.026
M24.973np
M24.974np
methionyl aminopeptidase I
methionyl aminopeptidase II
methionyl aminopeptidase-like 1
X-prolyl aminopeptidase 2
X-Pro dipeptidase
aminopeptidase P1
aminopeptidase P homologue
proliferation-association protein 1
suppressor of Ty 16 homologue
METAP1
METAP2
METAPL1
XPNPEP2
PEPD
XPNPEPL
PEPP
PA2G4
SUPT16H
23173
10988
254042
7512
5184
7511
63929
5036
11198
4q24
12q23
2q31
Xq26
19q13
10q25
22q13
12q13
14q11
Metap1
Metap2
Metapl1
Xpnpep2
Pepd
Xpnpep1
Pepp
Pa2g4
Supt16h
75624
56307
66559
170745
18624
170750
321003
18813
114741
3H2
10C3
2C3
XA3
7B1
19D2
15E3
10D3
14C1
y
y
y
y
y
y
y
y
y
92
88
95
81
90
81
93
98
98
M28.010
M28.011
M28.012
M28.975np
M28.014
glutamate carboxypeptidase II
NAALADASE L peptidase
NAALADASE II
NAALADASE III
plasma Glu-carboxypeptidase
FOLH1
NAALADL
NAALAD2
NAALAD3
PGCP
2346
10004
10003
254827
10404
11p11
11q13
11q14
3q26
8q22
Folh1
NAALADL
Naalad2
Naalad3
Pgcp
53320
BN000129
72560
229149
54381
7E1
19A
9A3
3A3
15B3
y
y
y
y
y
85
80
89
63
93
M28.018
M28.972np
M28.973np
M28.974np
M28.016
Ojeda peptidase
transferrin receptor protein
transferrin receptor 2 protein
glutaminyl cyclase
glutaminyl cyclase 2
OJP
TFRC
TFR2
QPCT
QPCT2
79956
7037
7036
25797
54814
9p24
3q29
7q22
2p22
19q13
Ojp
Trfr
Trfr2
Qpct
Qpct2
BAC38286
22042
50765
70536
67369
19C2
16B3
5G1
17E3
7A2
y
y
y
y
y
87
77
84
81
84
M38.972np
M38.973np
M38.xxxnp
M38.xxxnp
M38.xxxnp
M38.xxxnp
M38.xxxnp
dihydroorotase
dihydropyrimidinase
dihydropyrimidinase-related prot. 1
dihydropyrimidinase-related prot. 2
dihydropyrimidinase-related prot. 3
dihydropyrimidinase-related prot. 4
dihydropyrimidinase-related prot. 5
CAD
DPYS
CRMP1
DPYSL2
DPYSL3
DPYSL4
DPYSL5
790
1807
1400
1808
1809
10570
56896
2p23
8q22
4p16
8p21
5q32
10q26
2p23
Cad
Dpys
Crmp1
Dpysl2
Dpysl3
Dpysl4
Dpysl5
69719
64705
12933
12934
22240
26757
65254
5B1
15C
5B2
14D1
18B3
7F5
5B1
y
y
y
y
y
y
y
94
88
96
98
98
93
98
M41.004
M41.006
M41.010
M41.007
i-AAA protease
paraplegin
Afg3-like protein 1
Afg3-like protein 2
YME1L1
SPG7
#AFG3L1
AFG3L2
10730
6687
172
10939
10p12
16q24
16q24
18p11
Yme1l1
Spg7
Afg3l1
Afg3l2
27377
234847
114896
69597
2A3
8E2
8E2
18E1
y
y
y
y
95
89
M43.004
M43.005
pappalysin-1
pappalysin-2
PAPPA
PLAC3
5069
60676
9q32
1q25
Pappa
Plac3
18491
240848
4C1
1H1
y
y
93
78
M47.001
procol. III N-endopeptidase
PCOLN3
5119
16q24
#Pcoln3
BI690732
8E2
y
M48.003
M48.017
FACE-1/ZMPSTE24
VVML
FACE1
VVML
10269
115209
1p34
1p32
Face1
Vvml
230709
67013
4D1
4C6
y
y
91
71
M49.001
dipeptidyl-peptidase III
DPP3
10072
11q13
Dpp3
75221
19A
y
92
M50.001
S2P protease
MBTPS2
51360
Xp22
Mbtps2
270669
XF4
y
97
94
M67.001
M67.002
M67.xxxnp
M67.xxx
M67.003
M67.004
M67.xxx
M67.005
M67.xxx
M67.xxxnp
M67.xxxnp
M67.xxxnp
M67.xxxnp
M67.xxxnp
M67.xxxnp
Pad1-homologue
JAB1
COPS6
AMSH
AMSH 2
C6.1A
C6.1A-like
jammin-like protease 1
jammin-like protease 2
PSMD7
PRPF8
eukar. translation initiation F3S3
eukar. translation initiation F3S5
eukar. translation initiation F3S5B
IFP38
POH1
COPS5
COPS6
AMSH
AMSH2
C6.1A
10213
10987
10980
10617
57559
79184
2q24
8q13
7q22
2p13
10q23
Xq28
Poh1
Cops5
Cops6
Stambp
Amsh2
C6.1a
C6.1al
Jamml1
Jamml2
Psmd7
Prpf8
Eif3s3
Eif3s5
59029
26754
26893
70527
76630
210766
BN000130
230448
68047
17463
192159
68135
66085
2C3
1A2
5G1
6D1
19C3
XA6
10D1
4C5
17D
8D2
11B4
15D1
7F2
y
y
y
y
y
y
99
98
100
83
89
97
JAMML1
JAMML2
PSMD7
PRPF8
EIF3S3
EIF3S5
EIF3S5B
IFP38
114803
84954
5713
10594
8667
8665
120963
83880
1p32
19p13
16q23
17p13
8q24
11p15
12p13
(2p11)
y
y
y
y
y
y
91
87
97
99
97
93
Mx1.xxx
FACE-2/RCE1
FACE2
9986
11q13
Face2
19671
19A
y
95
Mx2.xxxnp
Mx2.xxxnp
aspartoacylase-2
aspartoacylase-3
ASPA/ACY-2 443
ACY-3
91703
17p13
11q13
Aspa/Acy-2
Acy-3
11484
71670
11B4
19A
y
y
86
68
These belong to 26 distinct families. The M01 family contains 13 members in human and 12 in mouse, which lacks aminopeptidase MAMS. We propose the
names aminopeptidases O and Q for the M01 proteases previously annotated as human hypothetical proteins FLJ14675 and BG623101. We have also
identified orthologues for these genes located at mouse chromosomes 13B3 and 18C. In the M02 family, we have tentatively annotated a mouse gene for a
third angiotensin-converting enzyme-like (Ace3), which is located at chromosome 11E1. We have classified Ace3 as a non-protease homologue because it
contains the HQMGH sequence instead of the consensus Zn-binding HExxH motif. No expressed sequence tags (ESTs) have been found for mouse Ace3,
which could be an inactive pseudogene, although the locus is apparently complete and conserved in the rat. The corresponding human gene is a
pseudogene as a result of the accumulation of stop codons and frameshifts.
There are some differences between human and mouse members of the M10 family of matrix metalloproteases (MMPs). Mouse McolB, a diverging
counterpart of human MMP1 is absent in human, whereas human matrilysin-2 (MMP26) is absent from mouse, although there are some gaps in the mouse
17
genome region which could contain this missing gene. MMP23 has been recently duplicated in the human genome , generating two closely related genes
MMP23A and MMP23B. This region is artefactually collapsed in the available public and private genome sequences owing to the high sequence identity
between both genes, and is erroneously considered as containing a single gene. Apparently, there is a single mouse MMP23 gene, although the possibility
that this region is duplicated in the mouse genome and has also been computer-collapsed can not be ruled out. In the family M12, we have annotated a
18
new member within the meprin/tolloid subfamily .
The ADAM (a disintegrin and metalloprotease) subfamily of M12 metalloproteases shows important differences between both organisms. The genes for
ADAM-1, -3, -4, -5, -6 and -25 are pseudogenes in the human but active genes in the mouse. ADAM-1 and -6 are duplicated in mouse, whereas ADAM-20 is
duplicated in human (ADAM-20 and ADAM-21). Also, testases — a subgroup of ADAMs located at 8B1 — are mouse specific. We have annotated five further
members of this family (testases 5–9), although they are intronless and their functional relevance remains to be shown. The group of ADAMTSs (ADAMs
with thrombospondin domains) is completed with the inclusion of human and mouse ADAMTS-20. In the M14 family of carboxypeptidases, we have found
20
that mouse carboxypeptidase O has been specifically inactivated by mutation and is annotated as a pseudogene . Dihydroorotase and several
dihydropyrimidinases have been included as non-protease homologues of bacterial isoaspartyl dipeptidases. The gene that encodes procollagen III Nendopeptidase is inactivated in mouse, thereby representing an interesting difference between both human and mouse degradomes, as there are no other
functional members in the M47 family that could compensate this specific loss in mouse. We have annotated 14 human and 13 mouse proteins in the
21,22
recently described M67 family of metalloisopeptidases
. All of them contain the JAMM motif, although some lack conserved residues that are predicted to
be essential for proteolytic activity, and have therefore been classified as non-protease homologues.
19
There are doubts about the ascription of the FACE-2/RCE1 prenyl endopeptidase to the cysteine or metalloprotease classes of enzymes ; however, in
24
agreement with recent structural comparisons , we have included it as the only human and mouse representative of a new family of membrane-bound
metalloproteases. Finally, we have included three aminoacylases in our catalogue of metalloproteases. These enzymes are not, strictly speaking, proteases
25
because they cleave peptide bonds that connect an acyl derivative with an amino acid . However, the structure of ACY1 clearly allows its inclusion in the
M20 family of metalloproteases, whereas those of ACY2 and ACY3 have also been proposed to be part of a superfamily of metalloproteases that contains
members of the M14 family of carboxypeptidases26.
23
Table S4 | Serine proteases
Code
S01.160
S01.161
S01.162
S01.251
S01.017
S01.236
S01.300
S01.244
S01.307
S01.246
S01.257
S01.020
S01.306
S01.029
S01.081
Peptidase
kallikrein hK1
kallikrein hK2
kallikrein hK3
kallikrein hK4
kallikrein hK5
kallikrein hK6
kallikrein hK7
kallikrein hK8
kallikrein hK9
kallikrein hK10
kallikrein hK11
kallikrein hK12
kallikrein hK13
kallikrein hK14
kallikrein hK15
S01.164
S01.170
S01.066np
S01.037
S01.067
S01.071
S01.041
S01.068
S01.163
S01.038
S01.039
S01.069
S01.070
S01.073
glandular kallikrein mK1
glandular kallikrein mK3
glandular kallikrein mK4
glandular kallikrein mK5
glandular kallikrein mK8
glandular kallikrein mK9
glandular kallikrein mK11
glandular kallikrein mK14
glandular kallikrein mK16
glandular kallikrein mK21
glandular kallikrein mK22
glandular kallikrein mK24
glandular kallikrein mK26
glandular kallikrein mK27
Human gene
KLK1
KLK2
KLK3
KLK4
KLK5
KLK6
KLK7
KLK8
KLK9
KLK10
KLK11
KLK12
KLK13
KLK14
KLK15
LocusLink
3816
3817
354
9622
25818
5653
5650
11202
23579
5655
11012
43849
26085
43847
55554
Locus
19q13
19q13
19q13
19q13
19q13
19q13
19q13
19q13
19q13
19q13
19q13
19q13
19q13
19q13
19q13
Human gene LocusLink
mGk6
16612
#mGk25
Locus
7B2
7B2
Syntenic
y
y
Identity
65
mKlk4
mKlk5
mKlk6
mKlk7
mKlk8
mKlk9
mKlk10
mKlk11
mKlk12
mKlk13
mKlk14
mKlk15
56640
68668
19144
23993
259277
73832
69540
56538
69511
13647
233190
XM_145570
7B2
7B2
7B2
7B2
7B2
7B2
7B2
7B2
7B2
7B2
7B2
7B2
y
y
y
y
y
y
y
y
y
y
y
y
69
70
68
75
72
76
68
80
71
79
73
75
mGk1
mGk3
mGk4
mGk5
mGk8
mGk9
mGk11
mGk14
mGk16
mGk21
mGk22
mGk24
mGk26
mGk27
16623
18050
18048
16622
16624
13648
16613
16614
16615
16616
13646
16617
16618
16619
7B2
7B2
7B2
7B2
7B2
7B2
7B2
7B2
7B2
7B2
(7B2)
7B2
7B2
7B2
S01.107
glandular kallikrein mKx
mGkx
76999
7B2
S01.217
S01.215
S01.214
S01.216
S01.213
S01.211
S01.218
S01.979np
S01.212
S01.228
S01.033
S01.998np
thrombin
coagulation factor VIIa
coagulation factor IXa
coagulation factor Xa
coagulation factor XIa
coagulation factor XIIa
protein C
protein Z
plasma kallikrein
hepatocyte growth factor activator
hyaluronan-binding ser-protease
protein C-like
F2
F7
F9
F10
F11
F12
PROC
PROZ
KLKB1
HGFAC
HABP2
PROCL
2147
2155
2158
2159
2160
2161
5624
8858
3818
3083
3026
25891
11p11
13q34
Xq27
13q34
4q35
5q35
2q21
13q34
4q35
4p16
10q25
11p12
F2
F7
F9
F10
F11
F12
Proc
Proz
Klkb1
Hgfac
Habp2
Procl
14061
14068
14071
14058
109821
58992
19123
66901
16621
54426
226243
210622
2E1
8A2
XA5
8A2
8B2
13B2
18B3
8A2
8B2
5B1
19D2
2E3
y
y
y
y
y
y
y
y
y
y
y
y
S01.303
S01.242
S01.242
S01.028
S01.074
S01.075
S01.076
S01.011
S01.252
S01.314
S01.315
S01.098
S01.295
S01.054
S01.143
mastin
tryptase β-1
tryptase β-2
tryptase γ-1
marapsin
tryptase homologue 2
tryptase homologue 3
testisin
brain serine protease 2
implantation serine protease 1
implantation serine protease 2
intestinal serine protease 1
intestinal serine protease 2
tryptase δ-1
tryptase α
#MASTIN
TPSB1
TPSB2
TPSG1
MPN
EOS
TESSP1
PRSS21
PRSS22
257157
7177
64499
25823
83886
260429
BN000124
10942
64063
16p13
16p13
16p13
16p13
16p13
16p13
16p13
16p13
16p13
#ISP2
#DISP
123787
124221
16p13
16p12
Mastin
Mcpt7
Mcpt6
Tpsg1
Mpn
Eos
Tessp1
Prss21
Prss22
Isp1
Isp2
Disp
Disp2
207224
17230
17229
26945
213171
BE646687
71003
57256
70835
114661
114662
30943
69814
17A3
17A3
17A3
17A3
17A3
17A3
17A3
17A3
17A3
17A3
17A3
17A3
17A3
y
y
y
y
y
y
y
y
y
y
y
y
TPSD1
TPS1
23430
7176
16p13
(16p13)
S01.159
prostasin
PRSS8
5652
16p11
Prss8
76560
7F4
y
82
70
82
76
78
72
68
67
76
81
80
90
75
77
73
80
81
62
67
75
77
S01.414
S01.xxx
S01.xxx
S01.318
prostasin-like 1
prostasin-like 2
epidermis-specific SP-like
marapsin 2
PSTL1
PSTL2
ESSPL
MPN2
146547
79001
BN000134
BN000131
16p11
16p11
4q31
1q42
PSTL1
PSTL2
Esspl
Mpn2
77613
27973
BN000135
216797
7F4
7F4
3F1
11B2
y
y
y
y
62
84
68
57
S01.993np
S01.317
S01.106
S01.xxx
S01.968np
S01.xxx
testis-specific protein tsp50
testis serine protease 2
testis serine protease 3
testis serine protease 4
testis serine protease 5
testis serine protease 6
TSP50
TESSP2
#TESSP3
#TESSP4
TESSP5
#TESSP6
29122
AJ544583
3p21
3p21
3p21
3p21
3p21
3p21
Tsp50
Tessp2
Tessp3
Tessp4
Tessp5
Tessp6
235631
235628
73336
272643
260408
74306
9F2
9F2
9F2
9F2
9F2
9F2
y
y
y
y
y
y
61
64
62
S01.985np
S01.045
S01.088
TESP1
TESP2
TESP3
#TESP2
#TESP3
2q21
9q22
Tesp1
Tesp2
Tesp3
21755
21756
218304
1B
1B
13B3
y
y
S01.140
S01.010
S01.133
S01.147
S01.141
S01.003
S01.149
S01.254
S01.304
S01.004
S01.xxx
S01.398
S01.399
S01.401
S01.402
S01.xxx
S01.xxx
chymase
granzyme B
cathepsin G
granzyme H
mast cell protease 1
mast cell protease 2
mast cell protease 4
mast cell protease 8
mast cell protease 9
mast cell protease 10
mast cell protease L
granzyme D
granzyme E
granzyme F
granzyme G
granzyme N
granzyme O
Mcpt5
Gzmb
Ctsg
Gzmc
Mcpt1
Mcpt2
Mcpt4
Mcpt8
Mcpt9
Mcpt10
Mcptl
Gzmd
Gzme
Gzmf
Gzmg
Gzmn
Gzmo
17228
14939
13035
14940
17224
17225
17227
17231
17232
AF361939
17233
14941
14942
14943
14944
245839
239106
14C2
14C1
14C1
14C1
14C2
14C1
14C2
14C1
14C1
14C1
14C1
14C1
14C1
14C1
14C1
14C1
14C1
CMA1
GZMB
CTSG
GZMH
BN000137
1215
3002
1511
113155
14q11
14q11
14q11
14q11
y
y
y
y
68
66
74
67
69
60
S01.135
S01.146
granzyme A
granzyme K
GZMA
GZMK
3001
3003
5q11
5q11
Gzma
Gzmk
14938
14945
13D2
13D2
y
y
68
70
S01.139
S01.134
S01.131
S01.971np
granzyme M
protease 3
neutrophil elastase
azurocidin
GZMM
PRTN3
ELA2
AZU1
3004
5657
1991
566
19p13
19p13
19p13
19p13
Gzmm
Prtn3
Ela2
16904
19152
50701
10C1
10C1
10C1
y
y
y
70
63
72
S01.156
S01.xxx
S01.224
S01.291
S01.301
S01.292
S01.xxx
S01.294
S01.321
S01.xxx
S01.021
S01.019
S01.302
S01.247
S01.079
S01.034
S01.313
S01.308
S01.xxx
S01.298
S01.087
enteropeptidase
enteropeptidase-like
hepsin
HAT-related protease
airway-trypsin-like protease
HAT-like 1
HAT-like 2
HAT-like 3
HAT-like 4
HAT-like 5
DESC1 protease
corin
matriptase
epitheliasin
transmembrane Ser-protease 3
transmembrane Ser-protease 4
spinesin
matriptase-2
matriptase-3
polyserase
membrane-type mosaic Ser-prot.
PRSS7
PRSS7L
HPN
HATRP
HAT
HATL1
#HATL2
#HATL3
#HATL4
HATL5
DESC1
PRSC
MTSP1
TMPRSS2
TMPRSS3
TMPRSS4
TMPRSS5
TMPRSS6
TMPRSS7
TMPRSS8
MSPL
5651
BQ638967
3249
283471
9407
BN000133
Prss7
Prss7l
Hpn
Hatrp
Hat
Hatl1
Hatl2
Hatl3
Hatl4
Hatl5
Desc1
Lpr4
Mtsp1
Tmprss2
Tmprss3
Tmprss4
Tmprss5
Tmprss6
Tmprss7
Tmprss8
Mspl
19146
332474
15451
75002
231382
194597
320454
231381
243083
BAC29606
243084
53419
19143
50528
140765
214523
80893
71753
208171
270749
AAH42878
16C3
1C5
7B1
15F1
5E1
5E1
5E1
5E1
5E1
5E1
5E1
5D
9A5
16C4
17B2
9B
9B
15E1
16B5
10C1
9B
y
y
y
75
89
88
59
66
76
132722
132724
28983
10699
6768
7113
64699
56649
80975
164656
BN000125
AJ488946
84000
21q21
2q37
19q13
12q13
4q13
4q13
4q13
4q13
4q13
4q13
4q13
4p12
11q24
21q22
21q22
11q23
11q22
22q12
3q13
19p13
11q23
S01.320
S01.322
oviductin-like
ovochymase-like
OVTN
OVCH
BN000130
BN000128
11p15
12p11
Ovtn
BN000123
7F2
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
y
52
75
82
80
77
82
76
78
84
91
80
90
71
S01.152
S01.256
S01.157
chymotrypsin B
chymopasin
chymotrypsin C
CTRB1
CTRL
CTRC
1504
1506
11330
16q23
16q22
1p36
Ctrb
Ctrl
Ctrc
66473
109660
76701
8D3
8D2
4E1
y
y
y
85
86
77
S01.127
S01.060
S01.059
S01.258
S01.063
S01.062
S01.xxxnp
S01.989np
S01.174
S01.151
S01.058
S01.061
S01.063
S01.984np
S01.xxx
S01.129
S01.092
S01.105
cationic trypsin
trypsin 3
trypsin 10
anionic trypsin (II)
trypsin C
trypsin 15
trypsin X1
trypsin X2
mesotrypsin
trypsin 1
trypsin 9
trypsin 12
trypsin 16
trypsin X3
trypsin X4
trypsin 4
trypsin V
trypsin X5
PRSS1
#TRY3
#TRY10
PRSS2
#TRY6
#TRY15
#TRYX1
TRYX2
PRSS3
5644
7q34
7q34
7q34
7q34
7q34
7q34
7q34
7q34
9p13
Try4
Try3
Try10
Try2
Try10l
Try15
Tryx1
Tryx2
22074
22073
AAB69058
22072
BN000136
AAB69087
272341
67690
6B2
6B2
6B2
6B2
6B2
6B2
6B2
6B2
y
y
y
y
y
y
y
y
77
Try1
Try9
Try12
Try16
Tryx3
Tryx4
Try4bis
Tryv
Tryx5
67373
BAB25300
AAB69086
114228
194359
194360
73626
232718
73481
6B2
6B2
(6B2)
6B2
6B2
6B2
6B2
6B2
6B2
S01.153
S01.155
S01.154
S01.205
S01.206
pancreatic elastase
pancreatic elastase II (IIA)
pancreatic endopeptidase E (A)
pancreatic endopeptidase E (B)
pancreatic elastase II form B
#ELA1
ELA2A
ELA3A
ELA3B
ELA2B
1990
63036
10136
23436
51032
12q13
1p36
1p36
1p36
1p36
Ela1
Ela2a
Ela3a
Ela3b
109901
13706
242711
67868
15F3
4E1
4D3
4D3
y
y
y
y
75
76
84
S01.194
S01.196
S01.995np
complement component 2
complement factor B
complement C1r-homologue
C2
BF
C1RL
717
629
51279
6p21
6p21
12p13
C2
Bf
C1rl
12263
14962
232371
17B2
17B2
6F2
y
y
y
76
84
73
5645
154754
136242
5646
77
78
S01.192
S01.xxx
S01.193
S01.xxx
S01.191
S01.xxx
S01.199
S01.198
S01.229
S01.237
complement component C1ra
complement component C1rb
complement component C1sa
complement component C1sb
complement factor D
complement factor D-like
complement factor I
MASP1/3
MASP2
neurotrypsin
C1R
715
12p13
C1ra
C1rb
C1sa
C1sb
Df
Df2
If
Masp1/3
Masp2
Prss12
50909
AF459018
50908
317677
11537
270746
12630
17174
17175
19142
6F2
(6F2)
6F2
6F2
10C1
10C1
3H1
16B1
4E1
3G3
y
81
C1S
716
12p13
y
74
DF
DF2
IF
MASP1/3
MASP2
PRSS12
1675
199783
3426
5648
10747
8492
19p13
19p13
4q25
3q29
1p36
4q28
y
y
y
y
y
y
67
79
69
86
81
82
S01.231
S01.232
S01.233
S01.976np
S01.975np
S01.999np
u-plasminogen activator
t-plasminogen activator
plasminogen
hepatocyte growth factor
macrophage-stimulating protein
apolipoprotein
PLAU
PLAT
PLG
HGF
MSP
LPA
5328
5327
5340
3082
4485
4018
10q22
8p11
6q26
7q21
3p21
6q26
Plau
Plat
Plg
Hgf
Msp
18792
18791
18815
15234
15235
14B
8A3
17A2
5A3
9F2
y
y
y
y
y
69
80
79
91
80
S01.223
S01.972np
S01.974np
acrosin
haptoglobin-1
haptoglobin-related protein
ACR
HP
HPR
49
3240
3250
22q13
16q22
16q22
Acr
Hp
11434
15439
15F1
8D3
y
y
68
79
S01.277
S01.278
S01.284
S01.285
osteoblast serine protease
HTRA2
HTRA3
HTRA4
HTRA1
HTRA2
HTRA3
HTRA4
5654
27429
94031
203100
10q26
2p12
4p16
8p11
Htra1
Htra2
Htra3
Htra4
56213
64704
78558
66943
7F4
6D1
5B1
8A3
y
y
y
y
91
84
86
66
S01.309
S01.994np
umbilical vein protease
similar to SPUVE
SPUVE
SPUVE2
11098
167681
11q14
6q14
Spuve
Spuve2
76453
244954
7E1
9E3
y
y
90
77
S01.104
S01.415
S01.419
plasma-kallikrein-like 1
plasma-kallikrein-like 2
plasma-kallikrein-like 3
KLKBL1
KLKBL2
#KLKBL3
XP_116753
203074
8p23
8p23
8p23
Klkbl1
Klkbl2
Klkbl3
74215
71037
73382
(14C3)
14C3
14C3
y
y
66
71
S01.992np
plasma-kallikrein-like 4
KLKBL4
221191
16q21
Klkbl4
BN000132
8C5
y
62
S01.286
S01.991np
similar to Arabidopsis Ser-prot.
chymase-like serine protease
SASP
219743
10q22
Sasp
Clsp
71767
75106
10B4
XC3
y
80
S08.063
S08.039
S08.090
site-1 protease
proprotein convertase 9
tripeptidyl-peptidase II
MBTPS1
PCSK9
TPP2
8720
255738
7174
16q23
1p32
13q33
Mbtps1
Pcsk9
Tpp2
56453
100102
22019
8E1
4C7
1C1
y
y
y
96
73
95
S08.072
S08.073
S08.071
S08.074
S08.076
S08.075
S08.077
proprotein convertase 1
proprotein convertase 2
furin
proprotein convertase 4
proprotein convertase 5
PACE4 proprotein convertase
proprotein convertase 7
PCSK1
PCSK2
PCSK3
PCSK4
PCSK5
PCSK6
PCSK7
5122
5126
5045
5124
5125
5046
9159
5q15
20p12
15q26
19p13
9q21
15q26
11q23
Pcsk1
Pcsk2
Pcsk3
Pcsk4
Pcsk5
Pcsk6
Pcsk7
18548
18549
18550
18551
18552
18553
18554
13C1
2H1
7D2
10C1
19B
7C
9B
y
y
y
y
y
y
y
93
97
94
82
92
93
88
S09.001
S09.015
prolyl oligopeptidase
prolyl-oligopeptidase 2
PREP
PREP2
5550
9581
6q22
2p21
Prep
Prep2
19072
213760
10B2
17E4
y
y
96
94
S09.003
S09.973np
S09.018
S09.019
S09.974np
S09.007
dipeptidyl-peptidase 4
dipeptidyl-peptidase 6
dipeptidyl-peptidase 8
dipeptidyl-peptidase 9
dipeptidyl-peptidase 10
Seprase
DPP4
DPP6
DPP8
DPP9
DPP10
FAP
1803
1804
54878
91039
57628
2191
2q24
7q36
15q23
19p13
2q14
2q24
CD26
Dpp6
Dpp8
Dpp9
Dpp10
Fap
13482
13483
74388
224897
269109
14089
2C3
5A3
9D
17D
1E2
2C3
y
y
y
y
y
y
85
91
95
89
88
90
S09.004
acylaminoacyl-peptidase
APEH
327
3p21
Apeh
235606
9F2
y
91
S09.055
S09.052
S09.053
S09.051
CGI-67 protein
CGI-67-like protease-1
CGI-67-like protease-2
BEM46-like 1
CGI-67
CGI-67L1
CGI-67L2
BEM46L1
51104
81926
58489
84945
9q21
19p13
15q25
13q33
Cgi-67
Cgi-67l1
Cgi-67l2
Bem46l1
BN000127
216169
70178
68904
19C1
10C1
7D3
8A2
y
y
y
y
98
93
97
97
S09.054
S09.xxx
BEM46-like 2
BEM46-like 3
BEM46L2
BEM46L3
26090
BG74273
20p11
14q22
Bem46l2
Bem46l3
76192
278594
2H1
12C3
y
y
90
78
S10.002
S10.003
S10.013
lysosomal carboxypeptidase A
vitellogenic carboxypeptidase-L
serine carboxypeptidase 1
PPGB
CPVL
RISC
5476
54504
59342
20q13
7p15
17q23
Ppgb
Cpvl
Risc
19025
71287
74617
2H3
6B3
11C
y
y
y
87
76
82
S12.004
β-lactamase
LACTB
114294
15q22
Lactb
80907
9D
y
85
S14.003
endopeptidase Clp
CLPP
8192
19p13
Clpp
53895
17E1
y
87
S16.002
S16.006
PIM1 endopeptidase
PIM2 endopeptidase
PRSS15
PIM2
9361
83752
19p13
16q21
Prss15
Pim2
74142
66887
17E1
8C4
y
y
88
95
S26.009
S26.010
S26.xxx
S26.012
S26.013
signalase 18 kDa component
signalase 21 kDa component
signalase-like 1
mitoc. inner membrane protease 2
mitochondrial signal peptidase
SPC18
SPC21
SPCL1
IMMP2L
IMMP1
23478
90701
158326
83943
196294
15q25
18q21
9p22
7q31
11p13
Spc18
Spc21
Spcl1
Immp2l
Immp1
56529
66286
230344
93757
66541
7D2
18E1
4C3
12B3
2E3
y
y
y
y
y
98
98
76
90
95
S26.xxx
lactotransferrin
LTF
4057
3p21
Ltf
17002
9F2
y
70
S28.001
S28.002
S28.003
lysosomal Pro-X carboxypeptidase
dipeptidyl-peptidase II
thymus-specific serine peptidase
PRCP
DPP7
PRSS16
5547
29952
10279
11q14
(9q24)
6p21
Prcp
Dpp7
Prss16
72461
83768
54373
7E2
2A3
13A3
y
y
y
77
80
79
S33.009
S33.971np
S33.972np
S33.974np
S33.xxxnp
αβ-hydrolase dom. containing 4
epoxyde hydrolase
Mesoderm specific transcript hom.
epoxyde hydrolase related protein
CGI-58
ABHD4
EPHX1
MEST
EPHXRP
CGI-58
63874
2052
4232
253152
51099
14q11
1q42
7q32
1p22
3p21
Abhd4
Ephx1
Mest
Ephxrp
Cgi-58
105501
13849
17294
243192
67469
14C1
1H4
6A3
5E
9F4
y
y
y
y
y
96
83
97
87
94
S53.003
tripeptidyl-peptidase I
CLN2
1200
11p15
Cln2
12751
7F1
y
88
S54.005
S54.002
S54.006
S54.xxx
S54.953np
S54.xxxnp
S54.xxx
S54.952np
rhomboid-like protein 1
rhomboid-like protein 2
rhomboid-like protein 4
rhomboid-like protein 5
rhomboid-like protein 6
rhomboid-like protein 7
Presenilins associated rhomboid like
EGF Receptor Related Sequence
RHBDL
RHBDL2
RHBDL4
RHBDL5
RHBDL6
RHBDL7
PARL
EGFR-RS
9028
54933
162494
84236
79651
AC005067
55486
64285
16p13
1p34
17q11
2q36
17q25
7q11
3q27
16p13
Rhbdl
Rhbdl2
Rhbdl4
Rhbdl5
Rhbdl6
Rhdbl7
Parl
Egfr-rs
214951
230727
246104
76867
276799
215160
208159
13650
17B1
4D1
11B5
1C5
11E2
5G1
16B1
11A5
y
y
y
y
y
y
y
y
97
89
95
80
93
88
80
95
Sx1.xxx
Reelin
RELN
5649
7q22
Reln
19699
5A3
y
95
Sx2.xxx
tumor rejection antigen (gp96)
TRA1
7184
12q23
Tra1
22027
10C2
y
97
Sx2.xxxnp
HSPCA
3320
14q32
Hspca
15519
12F2
y
99
heat shock 90kDa protein 1, α
Sx2.xxxnp
HSPCB
3326
6p21
Hsp84-1
15516
17C
y
98
heat shock 90kDa protein 1, β
Sx2.xxxnp
heat shock protein 75
TRAP1
10131
16p13
Trap1
68015
16A1
y
88
Most of these belong to the S01 family, but there are representatives of 13 further serine protease families in the human and mouse degradomes. All
differences between human and mouse serine proteases correspond to changes in members of this densely populated family. The kallikreins are duplicated
in mouse almost entirely — there are 28 members in mouse and 15 in human. The genes for mastin, implantation serine protease-2 (ISP-2), intestinal
serine protease (DISP-1), and testis serine proteases TESP-2 and -3, are inactivated in human hence their classification as pseudogenes. The absence of
genes for human DISP-2, ISP-1 and TESP-1, together with the finding that human DISP-1, ISP-2, TESP-2 and TESP-3 are pseudogenes, indicates that the
functions performed by ISP, DISP and TESP proteases might be mouse-specific. We have also annotated several new members of the testis-specific serine
protease (TESSP) subfamily, with TESSP-3, -4 and -6 being pseudogenes in human and active genes in mouse. Mast-cell proteases (Mcpt), granzymes
(Gzm), trypsins and human-airway trypsin-like (HAT-like) proteases are expanded in mouse; two tryptases, an ovochymase-like protease and a form of
pancreatic elastase, are only present in human. Two well-known non-protease homologues, apolipoprotein (a) (LPA) and haptoglobin-related protein, are
absent in mouse. Further characteristic features of the mouse degradome include the duplication of complement factors C1r and C1s, and the presence of
an extra functional member of the plasma-kallikrein like subfamily (Klkbl3), and of a non-protease homologue called Clsp (chymase-like serine protease).
We have included in the catalogue of serine proteases, a series of proteins such as lactoferrin, reelin and tumour rejection antigen (gp96), which have been
27–29
recently reported to have this kind of proteolytic activity
. On the basis of structural analysis, lactoferrin has been tentatively classified as a member of
the S26 family of serine proteases, whereas reelin, gp96 and their close relatives have been preliminarily ascribed to two Sx families of presently
unclassified serine proteases. Gene Ontology annotation of the human proteome also predicts a series of serine proteases with minimal relationship to other
members of this class of enzymes. They include torsin, NSP (novel serine protease) and Ufd1L (ubiquitin fusion degradation protein 1 homologue), but
owing to the absence of enough evidence to support its ascription as serine proteases, they have not been included in the present version of the human and
mouse degradomes.
Table S5 | Threonine proteases
Code
T01.010
T01.011
T01.012
T01.013
T01.014
T01.015
T01.016
Peptidase
proteasome catalytic subunit 1
proteasome catalytic subunit 2
proteasome catalytic subunit 3
proteasome catalytic subunit 1i
proteasome catalytic subunit 2i
proteasome catalytic subunit 3i
proteasome β-subunit LMP7-like
Human Gene
PSMB6
PSMB7
PSMB5
PSMB9
PSMB10
PSMB8
LMP7L
LocusLink
5694
5695
5693
5698
5699
5696
122706
Locus
17p13
9q33
14q11
6p21
16q23
6p21
14q11
Mouse Gene
Psmb6
Psmb7
Psmb5
Psmb9
Psmb10
Psmb8
Lmp7l
LocusLink
19175
19177
19173
16912
19171
16913
73902
Locus
11B4
2B
14C1
17B2
8D2
17B2
14C1
Syntenic
y
y
y
y
y
y
y
Identity
97
96
93
88
88
79
84
T01.986np
T01.984np
T01.983np
T01.987np
proteasome β-1 subunit
proteasome β-2 subunit
proteasome β-3 subunit
proteasome β-4 subunit
PSMB1
PSMB2
PSMB3
PSMB4
5689
5690
5691
5692
6q27
1p34
17q12
1q21
Psmb1
Psmb2
Psmb3
Psmb4
19170
26445
26446
19172
17A2
4D2
11D
3F2
y
y
y
y
93
96
98
93
T01.976np
T01.972np
T01.977np
T01.973np
T01.975np
T01.971np
T01.974np
T01.978np
proteasome α-1 subunit
proteasome α-2 subunit
proteasome α-3 subunit
proteasome α-4 subunit
proteasome α-5 subunit
proteasome α-6 subunit
proteasome α-7 subunit
proteasome α-8 subunit
PSMA1
PSMA2
PSMA3
PSMA4
PSMA5
PSMA6
PSMA7
PSMA8
5682
5683
5684
5685
5686
5687
5688
143471
11p15
7p14
14q23
15q24
1p13
14q13
20q13
18q11
Psma1
Psma2
Psma3
Psma4
Psma5
Psma6
Psma7
Psma8
26440
19166
19167
26441
26442
26443
26444
73677
7F2
13A2
12C3
9C
3F3
12C1
2H4
18A2
y
y
y
y
y
y
y
y
98
99
99
99
99
99
99
95
T02.001
T02.003
T02.004
glycosylasparaginase
glycosylasparaginase-2
glycosylasparaginase-3
AGA
ASRGL1
AGA3
175
80150
55617
4q34
11q12
20p12
Aga
Asrgl1
Aga3
11593
66514
75812
8B3
19A
2G2
y
y
y
82
77
94
T03.006
T03.017
T03.015
γ-glutamyltransferase 1
γ-glutamyltransferase-like 3
γ-glutamyltransferase 2
GGT1
GGTL3
GGT2
2678
2686
2679
22q11
20q11
22q11
Ggtp
Ggtl3
14598
207182
10B5
2H2
y
y
79
95
T03.016
GGTL4
91227
22q11
γ-glutamyltransferase m-3
T03.002
GGTLA1
220522
22q11
γ-glutamyltransferase 5
30
The most recently identified catalytic class of proteases, the threonine proteases , are classified into three families: T01, containing the proteasome
components; T02, composed of three distinct glycosylasparaginases; and T03, including diverse γ-glutamyltransferases (GGTs). All members of the T01 and
T02 families are conserved between human and mouse. There are, however, some differences in the number of GGT genes clustered in a region of
31
chromosome 22, which has undergone successive duplications . As a consequence of this dynamic evolution, there are four GGT genes in this region of the
human genome but only one in the corresponding region of the mouse genome (10B5). An additional GGT gene located at 20q11 is conserved in the mouse
genome at an equivalent position (2H2).
1.
2.
Evers, M. P. et al. Nucleotide sequence comparison of five human pepsinogen A (PGA) genes: evolution of the PGA multigene family.
Genomics 4, 232–239 (1989).
Taggart, R. T., Mohandas, T. K., Shows, T. B. & Bell, G. I. Variable numbers of pepsinogen genes are located in the centromeric region
of human chromosome 11 and determine the high-frequency electrophoretic polymorphism. Proc. Natl Acad. Sci. USA 82, 6240–6244
(1985).
3.
Chen, X., Rosenfeld, C. S., Roberts, R. M. & Green, J. A. An aspartic proteinase expressed in the yolk sac and neonatal stomach of the
mouse. Biol. Reprod. 65, 1092–1101 (2001).
4.
Ord, T., Kolmer, M., Villems, R. & Saarma, M. Structure of the human genomic region homologous to the bovine prochymosin-encoding
gene. Gene 91, 241–246 (1990).
5.
Krylov, D. M. & Koonin, E. V. A novel family of predicted retroviral–like aspartyl proteases with a possible key role in eukaryotic cell
cycle control. Curr. Biol. 11, 584 (2001).
6.
Turner, G. et al. Insertional polymorphisms of full-length endogenous retroviruses in humans. Curr. Biol. 11, 1531–1535 (2001).
7.
Caputo, E., Manco, G., Mandrich, L. & Guardiola, J. A novel aspartyl proteinase from apocrine epithelia and breast tumors. J. Biol.
Chem. 275, 7935–7941 (2000).
8.
Lee, J. J. et al. Autoproteolysis in hedgehog protein biogenesis. Science 266, 1528–1537 (1994).
9.
Gondo, Y. et al. Human megasatellite DNA RS447: copy-number polymorphisms and interspecies conservation. Genomics 54, 39–49
(1998).
10.
Okada, T. et al. Unstable transmission of the RS447 human megasatellite tandem repetitive sequence that contains the USP17
deubiquitinating enzyme gene. Hum. Genet. 110, 302–313 (2002).
11.
Zhu, Y., Carroll, M., Papa, F. R., Hochstrasser, M. & D‘Andrea, A. D. DUB-1, a deubiquitinating enzyme with growth-suppressing
activity. Proc. Natl Acad. Sci. USA 93, 3275–3279 (1996).
12.
Zhu, Y. et al. DUB-2 is a member of a novel family of cytokine-inducible deubiquitinating enzymes. J. Biol. Chem. 272, 51–57 (1997).
13.
Baek, K. H., Mondoux, M. A., Jaster, R., Fire-Levin, E. & D‘Andrea, A. D. DUB-2A, a new member of the DUB subfamily of
hematopoietic deubiquitinating enzymes. Blood 98, 636–642 (2001).
14.
Evans, P. C. et al. A novel type of deubiquitinating enzyme. J. Biol. Chem. (in the press).
15.
Balakirev, M. Y., Tcherniuk, S. O., Jaquinod, M. & Chroboczek, J. Otubains: a new family of cysteine proteases in the ubiquitin
pathway. EMBO Rep. 4, 517–522 (2003).
16.
Aravind, L. & Koonin, E. V. Classification of the caspase-hemoglobinase fold: detection of new families and implications for the origin of
the eukaryotic separins. Proteins 46, 355–367 (2002).
17.
Gururajan, R. et al. Duplication of a genomic region containing the Cdc2L1-2 and MMP21-22 genes on human chromosome 1p36.3 and
their linkage to D1Z2. Genome Res. 8, 929–939 (1998).
18.
Bertenshaw, G. P., Norcum, M. T. & Bond, J. S. Structure of homo- and hetero-oligomeric meprin metalloproteases: dimers, tetramers,
and high molecular mass multimers. J. Biol. Chem. 278, 2522–2532 (2003).
19.
Seals, D. F. & Courtneidge, S. A. The ADAMs family of metalloproteases: multidomain proteins with multiple functions. Genes Dev. 17,
7–30 (2003).
20.
Wei, S. et al. Identification and characterization of three members of the human metallocarboxypeptidase gene family. J. Biol. Chem.
277, 14954–14964 (2002).
21.
Verma, R. et al. Role of Rpn11 metalloprotease in deubiquitination and degradation by the 26S proteasome. Science 298, 611–615
(2002).
22.
Yao, T. & Cohen, R. E. A cryptic protease couples deubiquitination and degradation by the proteasome. Nature 419, 403–407 (2002).
23.
Cadiñanos, J. et al. Identification, functional expression and enzymatic analysis of two distinct CaaX proteases from Caenorhabditis
elegans. Biochem. J. 370, 1047–1054 (2003).
24.
Pei, J. & Grishin, N. V. Type II CAAX prenyl endopeptidases belong to a novel superfamily of putative membrane-bound
metalloproteases. Trends Biochem. Sci. 26, 275–277 (2001).
25.
Biagini, A. & Puigserver, A. Sequence analysis of the aminoacylase-1 family: a new proposed signature for metalloexopeptidases.
Comp. Biochem. Physiol. B 128, 469–481 (2001).
26.
Makarova, K. S. & Grishin, N. V. The Zn-peptidase superfamily: functional convergence after evolutionary divergence. J. Mol. Biol. 292,
11–17 (1999).
27.
Hendrixson, D. R. et al. Human milk lactoferrin is a serine protease that cleaves Haemophilus surface proteins at arginine-rich sites.
Mol. Microbiol. 47, 607–617 (2003).
28.
Quattrocchi, C. C. et al. Reelin is a serine protease of the extracellular matrix. J. Biol. Chem. 277, 303–309 (2002).
29.
Menoret, A., Li, Z., Niswonger, M. L., Altmeyer, A. & Srivastava, P. K. An endoplasmic reticulum protein implicated in chaperoning
peptides to major histocompatibility of class I is an aminopeptidase. J. Biol. Chem. 276, 33313–33318 (2001).
30.
Seemuller, E. et al. Proteasome from Thermoplasma acidophilum: a threonine protease. Science 268, 579–582 (1995).
31.
Courtay, C., Heisterkamp, N., Siest, G. & Groffen, J. Expression of multiple γ-glutamyltransferase genes in man. Biochem. J. 297, 503–
508 (1994).