REVIEWS HUMAN AND MOUSE PROTEASES: A COMPARATIVE GENOMIC APPROACH Xose S. Puente*, Luis M. Sánchez*, Christopher M. Overall‡ and Carlos López-Otín* The availability of the human and mouse genome sequences has allowed the identification and comparison of their respective degradomes — the complete repertoire of proteases that are produced by these organisms. Because of the essential roles of proteolytic enzymes in the control of cell behaviour, survival and death, degradome analysis provides a useful framework for the global exploration of these protease-mediated functions in normal and pathological conditions. PROTEASOME An intracellular protein complex that is responsible for degrading intracellular proteins that have been tagged for destruction by ubiquitin. NUCLEOPHILE A chemical group that can donate a pair of electrons in a chemical reaction. *Departamento de Bioquímica y Biología Molecular, Facultad de Medicina, Instituto Universitario de Oncología, Universidad de Oviedo, 33006–Oviedo, Spain. ‡ Departments of Biochemistry and Molecular Biology, and Oral Biological and Medical Sciences, C.I.H.R. Group in Matrix Dynamics, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada. Correspondence to C.L.-O. e-mail: [email protected] doi:10.1038/nrg1111 544 Proteases perform essential functions in all living organisms. They were initially recognized as gastric juice proteolytic enzymes that were involved in the nonspecific degradation of dietary proteins. However, recent advances have provided a new view of the proteolytic world. As well as mediating nonspecific protein hydrolysis, proteases also act as processing enzymes that perform highly selective, limited and efficient cleavage of specific substrates, which initiates irreversible decisions at the post-translational level that influence many biological processes. Proteolytic processing events are fundamental in ovulation, fertilization, embryonic development, bone formation, control of homeostatic tissue remodelling, neuronal outgrowth, antigen presentation, cell-cycle regulation, immune and inflammatory cell migration and activation, wound healing, angiogenesis and apoptosis1,2. Accordingly, alterations in the structure and expression patterns of proteases underlie many human pathological processes including cancer, arthritis, osteoporosis, neurodegenerative disorders and cardiovascular diseases1–6. This impressive diversity in protease functions derives from the evolutionary invention of enzymes with structural designs that range from simple catalytic devices with a minimal domain organization7, through giant proteases such as tripeptidyl peptidase II (REF. 8), to precisely engineered protein-degradation machines, such as the PROTEASOME9. In terms of specificity, some | JULY 2003 | VOLUME 4 proteases show high fidelity with exquisite specificity in their ability to target a unique protein, whereas others are clearly promiscuous, with an indiscriminate degradative activity against many substrate partners. Proteases also use distinct strategies to define their intraor extracellular spatial localization, and in many cases act in the context of complex networks that comprise distinct proteases, substrates, inhibitors, receptors and binding proteins. The availability of the human and mouse euchromatic genomic sequences has raised the possibility of addressing global questions about the degradome — the complete set of proteases that are expressed at a specific moment or circumstance by a cell, tissue or organism10. In this review, we present a comparative analysis of the human and mouse degradomes and discuss the potential importance of this global study for understanding and treating the growing group of pathological conditions that involve abnormal or deficient protease function. The human degradome: 553 and counting On the basis of the mechanism of catalysis, proteases are classified into five distinct classes: aspartic, metallo, cysteine, serine and threonine proteases. Proteases in the first two classes use an activated water molecule as a NUCLEOPHILE to attack the peptide bond of the substrate, whereas in the remaining classes the nucleophile is a catalytic amino-acid residue (Cys, Ser or Thr, respectively) that is located in the active site from www.nature.com/reviews/genetics © 2003 Nature Publishing Group REVIEWS BOX 1 | Using TBLASTN, InterPro and hidden Markov models to identify proteases HIDDEN MARKOV MODEL (HMM). A probabilistic model that is applied to protein and DNA sequence pattern recognition. HMMs represent a system as a set of discrete states and as transitions between those states. Each transition has an associated probability. HMMs are valuable because they allow a search or alignment algorithm to be built on firm probabilistic bases, and the parameters (transition probabilities) can be easily trained on a known data set. ORTHOLOGUES Homologous genes that have originated as a result of a speciation event. PARALOGUES Homologous genes that have originated as a result of a duplication event. SYNTENY Gene loci on the same chromosome. This term is often used to refer to gene loci in different organisms that are located on a chromosomal region of common evolutionary ancestry. MEROPS A database that provides a comprehensive catalogue and structure-based classification of proteases and inhibitors from a range of organisms. EXOSITE A substrate-binding site that lies outside the catalytic domain of a protease and is located on specialized substrate-binding modules or domains. RETROTRANSPOSITION The incorporation of DNA segments in a genome through a reverse-transcription-mediated mechanism. AUTOPHAGY A nutritionally and developmentally regulated process that is involved in the intracellular destruction of endogenous proteins and the removal of damaged organelles. Human and mouse genomic sequences were analysed for the presence of unidentified proteases using TBLASTN at the Ensembl genome browser. Each available protease sequence was used to query both genomes, and all hits with P<10–2 were analysed using the FASTA program against a custom degradome database. For each single protease locus, a 500-kb genomic sequence flanking the target gene was analysed for the presence of further members of that family, as ~23% of human and 34% of mouse protease genes are organized in clusters. Also, InterPro annotations of the public human and mouse genomes were used to identify putative new members of known families. Ensembl predictions containing InterPro protease motifs were manually inspected to distinguish between true proteases, pseudogenes and false positives. We also built a HIDDEN MARKOV MODEL (HMM) for each of the 63 different protease families that were present in human and mouse. Further HMMs for protease families that were not described in mammals were obtained from the Pfam database and used to screen protein predictions from Ensembl (Releases 12.31.1 and 12.3.1), FANTOM 2.1 DB and RefSeq (version 10 February 2003). The combined application of these strategies led to the identification of 72 human and 137 mouse putative proteases that were not present in MEROPS, although many of these were already present as gene predictions in the National Center for Biotechnology Information (NCBI) databases. Fifteen of the 72 human proteases, and 63 of the 137 mouse proteases, correspond to ORTHOLOGUES of previously known proteases in one of these organisms. Twenty-eight known mouse genes were found to be orthologues of known human proteases, despite being previously classified as PARALOGUES. Orthology assignation was based on four different criteria: SYNTENY, amino-acid sequence identity (>70%), function conservation and relevant supporting literature. which the class names derive. The different classes can be further divided into families on the basis of aminoacid sequence comparison, and these families can be assembled into clans on the basis of similarities in their three-dimensional (3D) folding11. By using primary information that was retrieved from public and private sequencing projects12,13, combined with data from the MEROPS, InterPro and Ensembl databases (BOX 1), and our own experimental data, we have annotated a total of 553 genes that encode proteases or protease homologues in the human genome (TABLE 1; BOX 2). Ninety-three of these proteins seem to be catalytically inactive proteases owing to substitutions in specific amino-acid residues that are located in critical active-site regions (TABLE 1). These inactive homologues are abundant in some protease families and might have important roles as regulatory or inhibitory molecules, acting as dominant negatives by binding substrates through the inactive catalytic or EXOSITE ancillary domains in nonproductive complexes, or by titrating inhibitors from the milieu to increase the net proteolytic activity10. We have identified more than 200 protease pseudogenes in the human genome, including processed pseudogenes that have arisen by RETROTRANSPOSITION, and non-processed pseudogenes that have resulted from duplication and the accumulation of frameshifts and stop codons in the duplicated gene. Also, more than 150 sequences are related to aspartic proteases that are embedded in endogenous retroviral elements, but they have not been included in the catalogue of human proteases (see Supplementary Tables S1–S5 online). The most recent release of the MEROPS database (6.2; 24 March 2003) annotates 548 entries for human proteases, although it includes a number of pseudogenes. Seventy-two human proteases that are included in our catalogue are absent from MEROPS. The phylogenetic tree that is shown in FIG. 1 reflects the distribution of the 553 annotated human proteases and homologues in the different catalytic classes. Metalloproteases and serine proteases are the most densely populated groups, with 186 and 176 members, respectively, followed by 143 NATURE REVIEWS | GENETICS cysteine proteases. Threonine and aspartic proteases are highly specialized and are therefore less numerous with 27 and 21 members, respectively. Following the family classification that is used in the MEROPS database, we conclude that the human proteases belong to 63 different families, the largest being the 01 family of serine proteases. Other families with many representatives are the ubiquitin-specific proteases (USPs) and the disintegrin and metalloproteases (ADAMs), whereas there are several families with a single member in the human genome (Supplementary Tables S1–S5 online). Interestingly, 125 (22%) of the catalogued proteases are membrane-bound proteins; this emphasizes the relevance of the proteolytic processes that take place at this cellular interface14. The annotated human degradome is likely to continue to grow in the near future (BOX 3), as new enzymes with unusual structures and catalytic mechanisms are identified and characterized. There are recent examples of experimental work that has led to the unmasking of some of these ‘hidden proteases’, including: JAMM proteins, which are a new family of deubiquitylating metalloproteases15; rhomboid proteins, which are atypical serine proteases that were first described in Drosophila16; autophagins, which are a family of cysteine proteases that are involved in cell death by AUTOPHAGY17; and signal-peptide peptidases, which are aspartic proteases that catalyse the intramembrane proteolysis of signal peptides 18. Koonin et al. have also described new superfamilies of predicted cysteine and aspartic proteases that include several human paralogues19,20. Nevertheless, some newly identified proteases remain as predictions without experimental evidence for proteolytic activity, and further work will be necessary to show their enzymatic properties. The mouse degradome: increased complexity The recent availability of the first version of the mouse genome sequence21 will accelerate the structural and functional characterization of the human degradome. The identification of mouse orthologues of human VOLUME 4 | JULY 2003 | 5 4 5 © 2003 Nature Publishing Group REVIEWS Table 1 | The human and mouse protease genes and pseudogenes Human/mouse Total number Number of proteases/ protease homologues Number per catalytic class Aspartic Cysteine Metallo Serine Human proteases 553 460/93 21 143 186 176 27 Mouse proteases 628 525/103 27 153 197 227 24 Orthologous pairs 24 Threonine 514 429/85 21 127 176 166 Human specific 35 28/7 0 15 8 9 3 Mouse specific 85 73/12 5 25 12 43 0 Human gene/ mouse pseudogene 4 3/1 0 1 2 1 0 Mouse gene/ human pseudogene 29 22/7 1 1 9 18 0 PARALOGONS Chromosomal regions that contain groups of paralogous genes in the same order, which have presumably arisen by the duplication of large genomic fragments. 546 protease genes provides essential information to perform evolutionary studies, identify regulatory elements and create knockout and knock-in models that are useful to examine protease functions in vivo. Although the mouse genome is ~14% smaller than the euchromatic human genome, its degradome is larger — it contains 628 proteases and protease homologues. The distribution of proteases in the different classes is shown in FIG. 1 and TABLE 1. Note that sequences that are related to proteases of endogenous retroviruses and the identified protease pseudogenes were not included in our list of mouse proteases (Supplementary Tables S1–S5 online). We include 138 putative mouse proteases and homologues that were absent from the last release of the MEROPS database. Comparative analysis between human and mouse degradomes indicates a high percentage (82%) of mouse genes with a strict orthologue in the human genome. The assignation of orthology was mainly based on sequence identities (mean 83%) and location in regions of conserved synteny. In some cases, especially in those protease clusters that contain several paralogous genes and pseudogenes, it has been difficult to decide which mouse sequences are bona fide orthologues of the corresponding human genes. There might also be cases of orthologues that are difficult to recognize because of their rapid evolution in one or both lineages22 (low degree of identity; ~50%), such as MMP1 and McolA23. Nevertheless, beyond these difficulties in orthology assignment, there are clear examples of protease genes that are unique in human or mouse lineages. We could not find a human counterpart for 85 of the 628 genes analysed. Similarly, 35 genes seem to be specific to the human lineage. In principle, these differences might result from specific deletion events or the creation of new protease genes in one of the }lineages, although the possibility that the orthologues of any of the lineage-specific genes are missing owing to the incompleteness of the available genome sequences cannot be ruled out. Nevertheless, detailed analysis of these differences indicates that most derive from changes in the number of paralogous genes in protease gene families that are present in the genomes of both species. | JULY 2003 | VOLUME 4 Mouse protease genes that are absent in human. A remarkable example of local gene expansions that have occurred in the mouse degradome is that of placental cathepsins, which are a group of eight cysteine protease genes that are present on mouse chromosome 13B3 and absent from the human genome24,25. There are also three placental cathepsin pseudogenes in this region, which reflects the dynamic nature of gene birth and death during the evolution of this family. Similarly, the SENP family of sentrin-specific proteases26 is expanded in the mouse lineage — there are 14 members in the mouse but seven in the human. There are also nine testases, which are a subfamily of testis-specific ADAMs that are located at mouse chromosome 8B (REF. 27), for which we have not found human orthologues. The family of tissue kallikreins has also been expanded in the mouse genome. This large cluster of serine protease genes on mouse chromosome 7B2 contains 28 genes and several pseudogenes; the equivalent human cluster located at 19q13 contains just 15 functional genes28,29. The interest in human kallikreins is growing as a result of their frequently altered expression patterns in tumour processes30. Indeed, prostate-specific antigen (PSA), one of the most relevant tumour markers, is encoded by KLK3, which is a member of this gene family31. Human kallikreins are frequently expressed in reproductive organs, although their functions and specific substrates are unknown. Similarly, little information is available about the functional roles of most members of the large family of mouse kallikreins28,29. The prolactin inducible protein (PIP) gene family provides another example of gene-family expansion in mice. This protein has been recently characterized as an aspartic protease32, and is encoded by a single locus on human chromosome 7q34. A highly divergent mouse counterpart is located on chromosome 6B2, and further PIP-related genes are expressed in male reproductive organs in mice33. There are also human–mouse differences in haematopoietic serine proteases. Proteolytic enzymes such as tryptases and chymases are the main SECRETORY-GRANULE constituents in many haematopoiteic cell lineages. At least seven mouse MAST-CELL chymase genes (Mcpt1, Mcpt2, Mcpt4, Mcpt8, Mcpt9, Mcpt10 and McptL) are absent in human34. Because mast-cell www.nature.com/reviews/genetics © 2003 Nature Publishing Group REVIEWS BOX 2 | The chromosomal landscape of human protease genes SECRETORY GRANULE A subcellular vesicle that contains molecules that are destined for secretion. The distribution of protease genes differs widely among human chromosomes, as shown in the figure: green dots represent aspartic proteases, blue dots represent cysteine proteases, red dots represent serine proteases, violet dots represent threonine proteases and black dots represent metalloproteases. Several protease genes occur in clusters, especially on chromosomes 11, 16 and 19. The largest is located in 19q13 (kallikrein locus) and contains 15 functional genes and several pseudogenes. Another densely populated cluster maps at 16p13, in which a primordial serine protease gene duplicated repeatedly during evolution to give rise to 10 trypsin-like genes and three related pseudogenes. Similarly, the matrix metalloprotease (MMP) cluster at 11q22 contains nine genes (MMP1, MMP3, MMP7, MMP8, MMP10, MMP12, MMP13, MMP20 and MMP27) and two pseudogenes. Despite these examples of gene families that have been formed and expanded by local duplications, most protease gene families have been dynamic in their evolution and, after duplication, the different paralogous genes have translocated to different chromosomes. Analysis of the chromosomal landscape of protease genes has also shown some signs of large-scale duplication events in the human genome139,140. So, a PARALOGON that is composed of a group of related genes at chromosomes 11q and 21q contains several protease genes from distinct families with a conserved arrangement in both locations (USP2, matriptase/ST14, ADAMTS8 and ADAMTS15 at 11q, and USP25, enteropeptidase/PRSS7, ADAMTS1 and ADAMTS5 at 21q). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y MAST CELL A specialized cell that initiates the inflammatory response by releasing histamine and other cytokines. GRANZYME A serine protease that is produced by immune-system cells and stored in secretory granules. COMPLEMENT A set of plasma proteins that form part of a proteolytic cascade, which leads to foreigncell lysis and phagocytosis. proteases might be involved in host defence, especially during bacterial and parasitic infections, the expansion of this gene family in the mouse genome might have important consequences in the development of distinctive immune responses in rodents compared with humans. Several GRANZYMES and trypsin-like enzymes have also been specifically expanded in mice. Remarkably, the COMPLEMENT C1r and C1s serine protease genes are duplicated in the mouse. One set (C1rA and C1sA) corresponds to the murine orthologues of the human genes, whereas the other (C1rB and C1sB) is exclusively expressed in the male genital tract, which indicates a role for these proteases in mouse reproduction that is NATURE REVIEWS | GENETICS independent of complement activation35. Local duplications in mouse chromosomes 5F and 9A1 have also generated two copies of genes that encode the singlecopy human metalloproteases ADAM-1 and MMP-1. The resulting pairs are expressed in the mouse testis36 and placenta23, respectively. As well as these large- and small-scale expansions in clusters of mouse gene families, there are single copy genes or even entire subfamilies that seem to be absent in the human lineage. Ren2, which encodes the aspartic protease known as submandibular renin, and Uchl4, which encodes a ubiquitin C-terminal hydrolase, are examples from the first group. Among the VOLUME 4 | JULY 2003 | 5 4 7 © 2003 Nature Publishing Group REVIEWS mouse-specific protease subfamilies, we have identified the testins, which are composed of three cysteine proteases of the 01 family mapped at mouse chromosome 13B3 and expressed in the testis 37. There are also several interesting examples of genes that have been specifically inactivated in the human lineage through diverse mechanisms. For example, human caspase 12 has acquired mutations that abrogate its protease function, but its murine orthologue medi- 02 03 01 ates apoptosis in response to endoplasmic-reticulum stress38. Human cyritestins and implantation serine proteases (ISPs) are also inactivated in the human39,40. The gene for the aspartic protease chymosin is inactivated in humans41 but there seems to be a functional mouse orthologue on chromosome 3F3. The ELA1 gene that encodes pancreatic elastase has been transcriptionally silenced in the human genome owing to a mutation that inactivates crucial enhancer and promoter elements42. 02 01 x1 22 1 2 48 x1 15 56 01 19 54 46 50 x2 13 02 6 44 14 2 01 28 67 x1 08 x2 12 18 24 08 14 54 10 26 47 09 10 13 53 16 33 12 28 43 x2 22 16 03 x1 38 41 50 14 Human specific Threonine Cysteine Mouse specific Aspartic Metallo 20 49 02 48 19 17 01 Serine Figure 1 | The protease wheel. Unrooted phylogenetic tree of human and mouse proteases. Proteases are distributed in five catalytic classes and 63 different families. The code number for each protease family is indicated in the outer ring. Protein sequences that correspond to the protease domain from each family were aligned using the ClustalX program. Phylogenetic trees were constructed for each family using the Protpars program. A global tree was generated using the protease domain from one member of each family, and individual family trees were added at the corresponding positions. The figure shows the non-redundant set of proteases. Orthologous proteases are shown in light grey, mouse-specific proteases are shown in red and human-specific proteases in blue. Metalloproteases are the most abundant class of enzymes in both organisms, but most lineage-specific differences are in the serine protease class, making this sector wider. The 01 family of serine proteases can be divided into 22 smaller subgroups on the basis of involvement in different physiological processes, to facilitate the interpretation of differences. 548 | JULY 2003 | VOLUME 4 www.nature.com/reviews/genetics © 2003 Nature Publishing Group REVIEWS BOX 3 | Assessing the completeness of the human genome sequence Genomic analysis of human proteases has proved valuable in assessing the completeness and accuracy of the available human genome sequence. We have found six previously known protease genes (CAPN8, CTSE, DPP7, IFP38, MMP23A and TPS1), the existence of which has been experimentally verified, but which are missing in the public version of the human genome. The first four are present in the Celera version of the sequence, although the Celera Discovery System database also lacks several genes that are found in the public genome sequence. Some of these conflicts might derive from gaps in both genome assemblies, or from the artefactual collapse of duplicated regions, which is a frequent problem in the informatic assembly of highly related genomic sequences. Also, there are examples of protease genes that are incorrectly annotated or described, as well as cases of predicted genes that correspond to pseudogenes. Nevertheless, beyond these differences, which are common with all genomic analyses, our study of human protease genes confirms the relative completeness and high quality of the present versions of the human gene catalogue. Human protease genes that are absent in mouse. Almost all human protease genes that are absent in the mouse lineage correspond to paralogous genes that belong to differentially expanded gene families. So, there are four γ-glutamyl transferase genes in a cluster on human chromosome 22q and only one gene at the equivalent region in the mouse genome. These threonine proteases catalyse the degradation of glutathione to glutamic acid and cysteinyl glycine43. Calpain 14, caspases-5 and -10, PSA, matrilysin-2, mesotrypsin, ADAM-20, and several aminopeptidases and USPs are among the human proteases that are absent in mouse. Of special interest is USP6 (Tre2) — a member of the ubiquitin-specific protease family that is encoded by a recently characterized hominoid-specific gene44. Beyond changes in protease gene numbers, there are also examples of significant human–mouse differences in the expression pattern of orthologous protease genes23,45. Such regulatory differences might also be fundamental motors of evolutionary innovation in protease-mediated functions that take place in each species. PROSTATE INVOLUTION A process by which the prostate gland reduces its size following androgen depletion. EXON SHUFFLING The process of non-homologous recombination of exons from different genes. PRODOMAIN A sequence of amino acids that precedes the catalytic domain in many inactive protease precursors. On removal or conformational change of the prodomain, the protease becomes active. CHAPERONE A protein that aids the folding of another to prevent it from taking an interactive conformation. Functional differences in human and mouse degradomes. The observed differences between human and mouse protease sets provide insights into the molecular mechanisms that underlie the changes in physiology and life strategies of both species after their divergence from a common ancestor ~75 million years ago. Extending data that are derived from the global comparative analysis between human and mouse genomes21,46, most differences between the respective degradomes correspond to proteases that are involved in reproductive and immunological functions. In relation to reproductive processes, proteases are known to have roles in menstruation, fertilization, ovulation, implantation, placentation, pregnancy, and uterine, breast and PROSTATE INVOLUTION47–54. They might also contribute to the ability of reproductive organs to respond rapidly to changes in the hormonal environment, through the activation of local networks of cytokines and growth factors. The lineage-specific expansion of some reproductive proteases in mice might help to explain many of the pronounced reproductive differences between human and mouse, including variations in oestrous cycles, placental structures, gestation periods and number of descendants per delivery. The changes in immune-related proteases might reflect evolutionary diversification processes that are NATURE REVIEWS | GENETICS aimed at expanding the repertoire of host defence mechanisms in response to new physiological conditions, dietary changes, new sources of pathogens or environmental stress. The ‘helping hands’ of proteases As well as mechanisms of protease invention that are based on gene duplication and divergence, the evolution of both human and mouse degradomes has also been driven by EXON SHUFFLING and the duplication of protein modules in protease genes to form new architectures. In this way, proteases might link their catalytic domains to a range of specialized functional modules other than archetypal sorting signals that direct proteases to intracellular organelles or extracellular environments. So, substrate and binding specificities can be altered in an evolutionarily rapid and selectable way that leads to gene-family diversity and results in substrate specificity or diversity and new kinetic, inhibitory, and cell or tissue localization properties55. First, many proteases contain conserved prodomains that serve an auto-inhibitory role to prevent activation at the wrong place or time and are often required as intramolecular CHAPERONES during protein synthesis and folding. Prodomains can also function as a contact face for cell-surface receptors and to direct proteases to specific substrates or locations in tissue. Second, a large proportion of proteases (40%) have ancillary domains that probably facilitate their interaction with other proteins such as substrates, inhibitors and receptors, or have some kind of regulatory role, as proposed for the protease associated (PA) domain56,57. FIGURE 2 shows the most typical domains that are associated with proteases. Some, such as the EGF domains, have been successful in their adaptation to proteolytic enzymes and are present in a range of proteases from different families, in which they undoubtedly perform different but possibly related functions. Other domains have multiplied in the same enzyme to form long tandem repeats. This is the case with ADAMTSs, which contain as many as 15 thrombospondin (TS) repeats in their structure58,59. Other proteolytic enzymes, such as most membrane-type serine proteases60,61, have a complex mosaic structure with up to six distinct domains located in a single gene. This inventory activity has also created several peculiar structures, including proteases with different catalytic units or with protease-inhibitory domains embedded in VOLUME 4 | JULY 2003 | 5 4 9 © 2003 Nature Publishing Group REVIEWS EF_HAND KR Igc2 UBA LDLa CCP sushi RHOD PDZ SEA EGF-like EXOIII TBC PAN_AP DUSP LON zf_MYND UBQ FRI CAP-Gly ZnF_RBZ VWA FU Death APPLE VWC FN1 DED KAZAL SR SO Catalytic domain IB IG-like SIS FIMAC ZnF_UBP EGF_CA EGF MATH FN2 PLAC ShKT MAM Cystatin IG PA CCP CUB UIM THYN DISIN FZ SNF7 HX AAA TSP1 NL GON ACR MGS FA58C LamGL Figure 2 | Ancillary domains present in human and mouse proteases. Domains are colour-coded according to the protease catalytic class to which they are linked: yellow, aspartic proteases; blue, cysteine proteases; green, metalloproteases; and red, serine proteases. Domains with two colours have been found in two classes of proteolytic enzymes. Codes for domains correspond to those used in the Pfam domain database, with the exception of recently described domains such as PLAC143 and GON58. SCISSILE BONDS Peptide bonds that are cleaved by proteolytic enzymes. 550 the same gene62,63. Therefore, it seems that protease genes originally encoded simple single-domain catalytic proteins that underwent gene fusions to generate this extraordinary diversity of multidomain enzymes. This strategy of domain accretion and shuffling has facilitated the evolution from nonspecific primitive degradative proteases to highly selective enzymes that are able to perform subtle reactions of proteolytic processing. Substrate-binding exosites modulate and broaden the substrate-specificity profile of proteases by providing a further contact area that is not influenced by the primary specificity subsites, and might even prevent substrate binding and cleavage by the catalytic domain55. In this way, the function of the protease is refined and can be made more specific or efficient. This phenomenom is also shared by many other vertebrate proteins and has been advantageous for the development | JULY 2003 | VOLUME 4 of several physiological systems in which proteases are involved, such as coagulation and complement cascades. Nevertheless, evolution has also progressed in the reverse direction, with some domains that were acquired early in protease families being lost in more recently duplicated members. This is the case for the hemopexin C domain that is present in most MMPs from different origins, but is specifically lost in members of the matrilysin subfamily (MMP-7 and MMP-26)7. Surprisingly little is known about the function of protease ancillary domains. Recent strategies to find binding partners and functions for these domains using the yeast two-hybrid system have uncovered new families of protease substrates for MMPs64, whereas using inactive catalytic domains as yeast two-hybrid baits has uncovered further substrates that are cleaved by the active homologue10. These studies also indicate that inactive protease www.nature.com/reviews/genetics © 2003 Nature Publishing Group REVIEWS homologues or shed exosite domains might bind and therefore mask SCISSILE BONDS in vivo reducing substrate cleavage, which adds a further layer of control to protease function65. In this regard, we have also examined the possibility that proteases might selectively accrete new domains in human or mouse lineages and acquire new functions. However, we did not find evidence of differences in domain architecture between pairs of orthologous human and mouse proteases. Accordingly, it seems that these domain assemblies have occurred in an early stage of protease evolution, before the separation of the human and mouse lineages, which underscores their importance. Other sources of variability in proteases HAPLOINSUFFUCIENCY A gene dosage effect that occurs when a diploid requires both functional copies of a gene for a wild-type phenotype. An organism that is heterozygous for a haploinsufficient locus does not have a wild-type phenotype. LOSS OF HETEROZYGOSITY A loss of one of the alleles at a given locus as a result of a genomic change, such as mitotic deletion, gene conversion or chromosome missegregation. The complexity in proteases might be further increased by common processes such as alternative splicing and differential polyadenylation, or by the occurrence of polymorphic variants that might have important roles in expanding or modifying protease functions. There are some examples in which alternative splicing events or the use of alternative promoters have been shown to be functionally important in proteases, as is the case in the generation of endothelial and testicular isoforms of angiotensin-converting enzyme (ACE)66. However, the functional relevance of most alternative splicing events that occur in protease-encoding genes is still unknown. There are also examples of the alternative use of different polyadenylation sites in many proteases, including cathepsins, MMPs and kallikreins67–69, but again their functional relevance is unclear. Changes in the repeat number of genes duplicated in tandem, as in USP17 and pepsin A, introduce another level of variability in the human degradome. A final and important layer of protease variability derives from naturally occurring genetic variants that directly affect the expression or activity of proteolytic enzymes. These polymorphic variants might alter the delicate control that operates in proteolytic systems, and influence physiological functions or facilitate the development of pathological conditions that involve proteases. The ACE gene, which contains many sequence polymorphisms, is a good example70; some of these variants are associated with enhanced muscle performance71, but others confer an increased susceptibility to cardiovascular diseases72. Recent genome-wide studies of single nucleotide polymorphisms (SNPs) and other polymorphisms, which have searched for associations with complex disease traits, have identified variations in the promoter and coding sequences of several protease genes that are linked to relevant pathologies. These associations include ADAM33 and asthma, and calpain-10 (CAPN10) and type-2 diabetes73,74. Similarly, a polymorphism in glutamate carboxypeptidase II has been linked to hyperhomocysteinemia75, and variants in MMP genes contribute to an increased susceptibility to cardiovascular diseases or cancer3,76. Further studies on SNPs mapped in protease loci might determine individual susceptibility to common diseases or drug response, and provide further information on the molecular mechanisms that underlie some complex genetic traits. NATURE REVIEWS | GENETICS Diseases associated with protease alterations The availability of a complex protease catalogue will facilitate the identification of therapeutic targets. Abnormal or deficient protease functions might lead to a range of pathological conditions that can be classified in two general groups: those that are caused by alterations in protease genes, and those that are caused by alterations in other components of proteolytic systems. The first group can be further subdivided into genetic disorders that are caused by mutations in protease genes, and epigenetic or regulatory diseases of proteolysis that are caused by alterations in the spatio-temporal patterns of expression of proteases. These latter alterations are frequent in certain protease families such as MMPs, and have been linked to the development and progression of cancer, arthritis and inflammatory diseases3,77–80. Protease deficiencies might also derive from alterations in other components of proteolytic systems: inhibitors, substrates, regulatory factors and transport systems. Examples of these abnormalities include: serpinopathies81, which are generated by mutations in serine protease inhibitors such as α1-antitrypsin, deficiency of which causes a common hereditary disorder in Caucasians82; Alzheimer disease, which is caused by mutations in the amyloid precursor protein gene (APP) that facilitate accumulation of the amyloid-β (Aβ) peptide83; haemophilia A, which is caused by mutations in the factor VIII gene that result in a diminished proteolysis catalysed by factor IX (REF. 84); and finally, the haematological disease that is caused by mutations in the ERGIC53 (LMAN1) gene that affect the transport of proteases such as cathepsin Z 85. Hereditary diseases of proteolysis. We have catalogued 53 diseases that are caused by mutations in protease genes (TABLES 2,3 ) — most are recessive loss-of-function mutations. As in the case of other enzymes, the presence of two protease gene alleles might compensate for the loss of one copy, and heterozygotes usually have a mild or no phenotype. However, there are some cases in which loss-of-function mutations in protease genes are inherited in a dominant pattern. These include familial cylindromatosis, which is caused by mutations in the CYLD cysteine protease gene86, and type II autoimmune lymphoproliferative syndrome (ALPS), which is caused by mutations in the caspase-10 (CASP10) gene87. There are several possibilities to explain this dominant inheritance of loss-of-function mutations, including HAPLOINSUFFICIENCY, LOSS OF HETEROZYGOSITY in the case of cylindromatosis86 and interference with the process of activation of procaspase-10 by an induced-proximity mechanism87,88. The genetic alterations that lead to the development of hereditary diseases that are associated with protease genes range from single-site mutations to large chromosomal deletions. Point mutations that result in the loss of protease function are the most frequent cause of these disorders. These include: limb-girdle muscular dystrophy type 2A, which is caused by inactivating mutations in the calpain-3 (CAPN3) gene89; thrombotic thrombocytopenic purpura, which is VOLUME 4 | JULY 2003 | 5 5 1 © 2003 Nature Publishing Group REVIEWS Table 2 | Human hereditary diseases of proteolysis Protease Gene Locus Disease OMIM Dominant/ recessive Function Animal model Loss-of-function group Cathepsin K CTSK 1q21 Pycnodysostosis 265800 R Loss KO resembles disease Cathepsin C CTSC 11q14 Papillon-Lefevre and Haim-Munk syndromes 245000 R Loss KO does not resemble disease Calpain 3 CAPN3 15q15 Limb-girdle muscular dystrophy type 2A 253600 R Loss KO resembles disease Cylindromatosis protein CYLD1 16q12 Cylindromatosis 132700 D Loss – Ubiquitin C-terminal hydrolase 1 UCHL1 4p14 Parkinson disease type V 191342 D Loss Gad mouse resembles disease Caspase-8 CASP8 2q33 Autoimmune lymphoproliferative syndrome (I) 601859 R Loss KO embryonic lethality Caspase-10 CASP10 2q33 Autoimmune lymphoproliferative syndrome (II) 603909 D,R Loss No mouse orthologue USP9Y USP9Y Yq11 Azoospermia and hypospermatogenesis 415000 D Loss – Gelatinase A MMP2 16q13 Multicentric osteolysis with arthritis 605156 R Loss KO does not resemble disease ADAMTS-13 ADAMTS13 9q34 Thrombotic thombocytopenic purpura 274150 R Loss – Procollagen I N-endopeptidase ADAMTS2 5q23 Ehlers-Danlos syndrome type VIIC 225410 R Loss KO resembles disease Endothelin-converting enzyme 1 ECE1 1p36 Hirschprung disease 142623 D Loss KO partially resembles disease PHEX endopeptidase PHEX Xp22 X-linked hypophosphatemia 307800 D Loss Hyp mouse resembles disease Carboxypeptidase E CPE 4q33 Hyperproinsulinemia and diabetes 125853 R Loss Fat mouse resembles disease Mitochondrial innermembrane protease 2 IMMP2L 7q31 Gilles de la Tourette syndrome 137580 D Loss – X-Pro dipeptidase PEPD 19q13 Prolidase deficiency 170100 R Loss – Paraplegin SPG7 16q24 Spastic paraplegia 607259 R Loss – Enteropeptidase PRSS7 21q21 Enteropeptidase deficiency 226200 R Loss – Complement component C1r C1R 12p13 C1r deficiency 216950 R Loss – Complement component C1s C1S 12p13 C1s deficiency 120580 R Loss – Complement component 2 C2 6p21 C2 deficiency 217000 R Loss Guinea-pig model resembles disease Complement factor D DF 19p13 DF deficiency 134350 R Loss KO resembles disease Complement factor I IF 4q25 CFI deficiency 217030 R Loss – Plasma kallikrein KLKB1 4q35 Prekallikrein deficiency 229000 R Loss – Thrombin F2 11p11 Hyperprothrombinemia/ hypoprothombinemia 176930 D/R Loss KO resembles disease Coagulation factor VIIa F7 13q34 Factor VIIa deficiency 227500 R Loss KO lethal, partially resembles disease Coagulation factor IXa F9 Xq27 Haemophilia B 306900 R Loss Mouse and dog models resemble disease Coagulation factor Xa F10 13q34 Factor X deficiency 227600 R Loss KO embryonic lethality fatal neonatal bleeding D, dominant inheritance; KO, knockout (mouse); R, recessive inheritance. caused by mutations in the ADAMTS13 gene90; and nonsyndromic deafness, which results from distinct mutations that affect the transmembrane serine protease TMPRSS3 (REF. 91). There are only a few examples of point mutations that lead to gain of protease function (TABLE 3), such as early-onset familial Alzheimer disease, 552 | JULY 2003 | VOLUME 4 which is caused by activating mutations in presenilins1 and -2 (PSEN1 and PSEN2)5,92. Mutations in the non-coding regions of protease genes can also result in abnormal proteolytic activity. Examples include: hyperprothrombinemia, which is caused by mutations in the 3′-UTR of the thrombin gene that lead to an www.nature.com/reviews/genetics © 2003 Nature Publishing Group REVIEWS increased secretion of thrombin93; Alzheimer disease, which is caused by a single-base deletion at the splicedonor site of intron 4 of PSEN1 (REF. 94); and haemophilia B Leyden, which is caused by different mutations in the promoter region and the 5′-UTR of factor IX gene84. Short deletions that cause protease truncation might result in relevant diseases such as the recently described neurotrypsin mutation, which causes autosomal recessive non-syndromic mental retardation95. Large deletions that remove part of protease genes have also been described, including a 9.5-kb deletion in the paraplegin gene, which causes a form of spastic paraplegia96. The systematic classification of genetic diseases of proteolysis is useful for a global perspective of the diversity of proteases, mutational mechanisms and pathological alterations that underlie these human diseases. This classification of protease deficiencies might provide a useful framework for discussing the possibilities of creating animal models for the different diseases, and for evaluating potential therapies. Mouse models of hereditary diseases of proteolysis. The development of methods for manipulating the mouse germline offers new opportunities for investigating human diseases. At least 20 protease genes that are associated with hereditary diseases of proteolysis have been disrupted in mice (TABLES 2,3). In most cases, these mouse models have provided valuable information on the molecular and physiological mechanisms that are involved in the development and progression of the corresponding human disease. However, there are cases of mutant mice that do not recapitulate the human disorder that is caused by mutations in the orthologous protease gene. The identification of differences in the number of paralogous genes in protease families from both organisms might sometimes explain these paradoxical situations. So, a point mutation in human caspase-8 causes ALPS, whereas mice deficient in this protease die as embryos97. The finding that caspase-10, the closest paralogue of caspase-8, is absent in mice indicates that the much milder phenotype observed in ALPS patients with caspase-8 mutations might be derived from functional compensation by caspase-10. The development of mouse models of diseases that involve protease genes has also contributed to defining the molecular mechanisms that are implicated in these diseases. So, data obtained from mice deficient in neutrophil elastase (Ela2)98 have raised the hypothesis that mutations in Ela2 that are associated with congenital neutropenia and cyclic haematopoiesis are the result of a gain-of-function in this protease99. The use of knockout mice has also allowed the identification of specific substrates of proteases, such as prelamin A for the FACE-1/ste24 metalloprotease100 or syndecan-1 and α-defensin for matrilysin101,102, and the finding of new and unexpected protease functions in normal and pathological conditions3,103–106. However, the generalization of functional studies on human proteases on the basis of data from mouse models might be hampered by the occurrence of robust proteolytic systems with redundant and compensatory enzymes, as illustrated by the MMPs. In fact — with the exception of NATURE REVIEWS | GENETICS MT1–MMP-null mice, which have skeletal abnormalities and die shortly after birth107,108, and MMP-20-null mice, which have an amelogenesis imperfecta phenotype109 — all mutant mice that are deficient in specific MMPs that have been generated so far lack notable alterations during development or in adult tissues, which points to the occurrence of functional overlaps between individual components of this complex proteolytic system110. It is worth noting that there are some interesting examples of mouse disorders that are caused by lossof-function mutations in protease genes the human orthologues of which have not been associated with an equivalent pathological condition. Among mouse diseases that involve protease genes are the neurological defects in ataxia (axJ) mice, which result from a mutation in the ubiquitin-specific protease Usp14 (REF. 111). The generation of animal models has also been important to study diseases that involve the gainof-function of specific proteases. So, transgenic mice that express mutant variants of human PSEN1 or PSEN2, accumulate Aβ-peptide in the brain, which reinforces the role of these proteins in the pathogenesis of Alzheimer disease83,92. Transgenic mice have also been used to establish causal relationships between the overexpression of a certain protease in a specific tissue and the development of relevant diseases such as arthritis or cancer3,112. Continuing projects of large-scale mutagenesis in mice113,114 will be essential to generate appropriate models for understanding the in vivo functions of many proteases, their potential roles in the pathogenesis of human diseases and their value as new therapeutic targets. Therapeutic approaches to protease deficiencies. Human diseases of proteolysis have been traditionally linked to the overexpression of proteases. Consequently, the corresponding therapeutic strategies have focused on the development of inhibitors to block the undesired activity of these enzymes115,116. However, as discussed here, there is growing evidence for genetic diseases that are caused by the loss of protease function. Furthermore, there are also relevant disorders, including many conformational diseases such as systemic amyloidosis, prion encephalopathies and Alzheimer and Huntington disease, which arise from the accumulation of intermolecular aggregates of specific proteins81,117,118. These disorders could benefit from protease-based treatments to replace the deficient enzymes or to enhance the demolition of the pathological protein aggregates, as exemplified by the use of tPA and uPA plasminogen activators for clot dissolution. Accordingly, the therapeutic approach to diseases that are associated with protease deficiencies must be based on knowledge of the structure, function and regulation of the pathologically relevant proteases in physiological situations. This focus is also essential to avoid the undesired effects of nonspecific therapies that can profoundly alter the delicate balance of endogenous proteolytic systems. There are important examples that illustrate the successful introduction of protease inhibitors to treat human disease. ACE inhibitors are widely used to treat VOLUME 4 | JULY 2003 | 5 5 3 © 2003 Nature Publishing Group REVIEWS Table 3 | Human hereditary diseases of proteolysis Protease Gene Locus Disease OMIM Dominant/ recessive Function Animal model Coagulation factor XIa F11 4q35 Factor XI deficiency 264900 R Loss Cattle and dog models resemble disease Coagulation factor XIIa F12 5q35 Factor XII deficiency 234000 R Loss – Protein C PROC 2q21 Thrombophilia 176860 D/R Loss KO resembles disease Plasmin PLG 6q26 Thrombophilia and ligneous conjunctivitis 173350 R Loss KO resembles disease Neurotrypsin PRSS12 4q28 Nonsyndromic mental retardation 249500 R Loss – Proprotein convertase 1 PCSK1 5q15 Obesity 600955 R Loss KO does not resemble disease Transmembrane protease, serine 3 TMPRSS3 21q22 Deafness 605316 R Loss – Lysosomal A carboxypeptidase PPGB 20q13 Galactosialidosis 256540 R Loss KO resembles disease Tripeptidyl-peptidase I CLN2 11p15 Neuronal ceroid lipofuscinosis 204500 R Loss – Glycosylasparaginase AGA 4q34 Aspartylglucosaminuria 208400 R Loss KO resembles disease Presenilin 1 PSEN1 14q24 Alzheimer type 3 104311 D Gain Transgenic models partially resemble disease Presenilin 2 PSEN2 1q42 Alzheimer type 4 600759 D Gain Transgenic models partially resemble disease Collagenase 3 MMP13 11q22 Spondyloepimetaphyseal dysplasia 602111 D (Gain) – Cationic trypsin PRSS1 7q35 Hereditary pancreatitis /trypsin deficiency 167800 D/R Gain/Loss – Neutrophil elastase ELA2 19p13 Cyclic neutropenia 162800 D Gain KO more susceptible to bacterial sepsis Proprotein convertase 9 PCSK9 1p32 Hyperlipoproteinemia type III 144400 D (Gain) – Indian hedgehog protein IHH 2q35 Brachydactyly type A1 112500 D Loss KO resembles disease Sonic hedgehog protein SHH 7q36 Holoprosencephaly type 3 142945 D Loss KO resembles disease Desert hedgehog protein DHH 12q13 Partial gonadal dysgenesis 607080 R Loss KO resembles disease DJ-1 (putative protease) DJ1 1p36 Parkinson disease type VII 606324 R Loss – Reelin (putative protease) RELN 7q22 Lissencephaly syndrome 257320 R Loss Reeler mouse resembles disease Dihydropyrimidinase (np) DPYS 8q22 Dihydropyrimidinase deficiency 222748 R Loss – Aspartoacylase (np) ASPA 17p13 Canavan disease 271900 R Loss KO resembles disease Transferrin receptor 2 protein (np) TFR2 7q22 Hemochromatosis type 3 604250 R Loss KO resembles disease Haptoglobin-1 (np) HP 16q22 Anhaptoglobinemia 140100 R Loss – Loss-of-function group Gain-of-function group Heterogeneous group* *Heterogeneous group includes non-protease homologues (np), putative proteases and hedgehog proteins with only autoprocessing activity. D, dominant inheritance; KO, knockout (mouse); R, recessive inheritance. VASOPEPTIDASE A protease that is involved in the regulation of vascular tone BACTERIAL SEPSIS Pathology that is caused by the spread of bacteria or their products through the bloodstream. 554 hypertension and congestive heart failure119 and several drugs have been introduced for blocking the human immunodeficiency virus (HIV) protease120. There are also many inhibitors that have been tested in preclinical models or have advanced to clinical trials2. Special interest has focused on: inhibitors of β- and γ-secretases for Alzheimer disease; VASOPEPTIDASE inhibitors that simultaneously target neprilysin and ACE for hypertension, | JULY 2003 | VOLUME 4 atherosclerosis and heart failure; MMP inhibitors for cancer and inflammatory disorders; caspase inhibitors for BACTERIAL SEPSIS and autoimmune and degenerative diseases; tryptase inhibitors for asthma; proteasome inhibitors for multiple myeloma; and aggrecanase inhibitors for arthritis. However, other attempts to develop synthetic inhibitors for targeting human proteases of clinical relevance have been accompanied by www.nature.com/reviews/genetics © 2003 Nature Publishing Group REVIEWS frequent disappointments, especially in cancer research121. The continuing efforts aimed at the resolution of the 3D structures of proteases and protease-inhibitor complexes might facilitate new avenues for the rational design of an improved generation of inhibitors that are more selective and have better pharmacokinetic properties122–126. Another approach involves using endogenous inhibitors to target the undesired proteolytic activity that is associated with many diseases116. However, this approach suffers from difficulties with compound administration and poor PHARMACOKINETICS. So far, a few endogenous inhibitors have been used as therapeutic proteins, such as antithrombin III, in diverse inflammatory disorders127. The identification of the molecular defects that underlie diseases that are caused by the loss of function of protease genes has offered new options for developing specific therapies. The first obvious approach is based on classical enzyme-replacement therapies (ERT) that are aimed at substituting the defective enzyme by its normal counterpart128. There are several examples of protease-based therapies to treat genetic diseases of proteolysis including those that have long been recognized in the clotting system and which are now treatable by recombinant proteases like coagulation factors VII and IX (REF. 129). However, the fact that most diseases of this category have only been recently characterized has hampered the rapid development of suitable ERT-based therapies. Protease-based therapies might also be aimed at increasing the turnover of proteins that tend to form intermolecular aggregates, as in the case of Αβ-deposits in Alzheimer disease130. These therapies might suffer from the same problems and limitations as other ERTs, including the high doses that are necessary to achieve therapeutic effects, the inability of recombinant proteins to cross the blood–brain barrier and the elicitation of immune responses. For this reason, further therapeutic alternatives for protease-linked diseases need to be explored. Gene therapy, bone-marrow transplantation, enzyme enhancement therapies and substrate-deprivation strategies might offer therapeutic alternatives for human diseases that are caused by protease deficiencies, and in some cases, preliminary clinical trials have confirmed their potential effectiveness for treating specific diseases of proteolysis128,131. Conclusion and perspectives Here we offer the first comparative glimpse of the 553and 628-member human and mouse degradomes. Although we have extensively revised the available annotations for both protease sets and have included many new members, especially in the case of mouse enzymes, these numbers are still not definitive. Continuing efforts that combine bioinformatic predictions, expert manual annotation and curation, and experimental verification of the new in silico acquisitions for the protease collection, will be necessary to obtain the complete proteolytic portrait of both species. This global view of the human and mouse protease worlds has offered some surprises, especially the finding that the mouse degradome is more complex. Why do mice require more proteases than humans? The one-byone comparison between both protease sets indicates that most differences are a result of the human-specific protease gene inactivation or mouse-specific protease gene expansion of family members that are associated with reproductive or immune processes. To some degree, this counterintuitive situation parallels that found after preliminary comparative analysis of the human and chimpanzee genomes, which has shown that genetic losses in the human lineage might have caused some of the differences between these species132. This comparative genomic analysis might be the starting point for further studies of the biological and pathological relevance of proteases. The resolution of the 3D structures of proteases, the ascription of functions to their ancillary domains and non-catalytic homologues, and the detailed analysis of protease-mediated processes such as protein ECTODOMAIN SHEDDING133,134, the regulation of transcription factor activity135,136 and regulated intramembrane proteolysis137, will provide important information in the near future. From the clinical point of view, the availability of a complete protease catalogue will facilitate the identification of protease genes that are responsible for genetic diseases that are associated with protease deficiencies, and the evaluation of new proteases as drug targets or prognostic markers138. The design of protease chips for the global analysis of patterns of expression and activity of human proteases will be helpful for this purpose10. The increased knowledge of the structure, function and regulation of proteases will also BOX 4 | The protease repertoire of other model organisms PHARMACOKINETICS The time course of a drug and its metabolites in the body after administration. ECTODOMAIN SHEDDING The protease-mediated release from the cell surface of the extracellular domain of integral membrane proteins. There are many families of human and mouse proteases that are also clearly recognizable in the genomes of Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana. This indicates the existence of universal proteolytic routines in these organisms, although they are frequently expanded in vertebrates. The MEROPS database has annotated 558 proteases and homologues in Drosophila, 400 in C. elegans and 598 in Arabidopsis. Further comparative analysis shows important differences in the distribution of proteases in these species. The most remarkable finding is the multiplication in the fly genome of a group of 1A serine protease genes, reaching more than 200 members. Because of this family expansion, the total number of proteases in Drosophila is similar to that of vertebrates, despite flies having a considerably fewer genes. These Drosophila trypsin-like proteases might be involved in development and innate immune defense141. There are also some protease families that have apparently expanded in other organisms, for example serine carboxypeptidases, pepsin-like and subtilisin-like enzymes in Arabidopsis, and several Znmetalloproteases in C. elegans, which indicates that there are unique functions that are carried out for specific proteolytic enzymes in the different species142. However, the annotation of the degradome of these species is still preliminary and the functionality of most predicted proteases has not yet been experimentally validated. NATURE REVIEWS | GENETICS VOLUME 4 | JULY 2003 | 5 5 5 © 2003 Nature Publishing Group REVIEWS provide excellent opportunities to design new generations of therapeutic inhibitors, including those based on endogenous protease inhibitors. The availability of the human and mouse genome sequence also offers the possibility of exploring the complete repertoire of endogenous inhibitors in these organisms. It will be interesting to test whether the mouse protease-inhibitor complement is also more complex than that of human, as an evolutionary attempt to control the expanded murine protease repertoire. Comparative analysis of proteases (BOX 4) will also help to identify regulatory differences that might contribute to define distinctive aspects of human and murine biology from a protease perspective. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 556 Barrett, A. J., Rawlings, N. D. & Woessner, J. F. Handbook of Proteolytic Enzymes (Academic Press, San Diego, 1998). An essential book in the protease field that comprehensively lists and describes proteases from many organisms. Hooper, N. M. Proteases in Biology and Medicine (Portland Press, London, 2002). Egeblad, M. & Werb, Z. New functions for the matrix metalloproteinases in cancer progression. Nature Rev. Cancer 2, 161–174 (2002). This review illustrates the diversity of protease functions in pathological processes such as cancer. Krane, S. M. Elucidation of the potential roles of matrix metalloproteinases in skeletal biology. Arthritis Res. Ther. 5, 2–4 (2003). Esler, W. P. & Wolfe, M. S. A portrait of Alzheimer secretases — new features and familiar faces. Science 293, 1449–1454 (2001). Luttun, A., Dewerchin, M., Collen, D. & Carmeliet, P. The role of proteinases in angiogenesis, heart development, restenosis, atherosclerosis, myocardial ischemia, and stroke: insights from genetic studies. Curr. Atheroscler. Rep. 2, 407–416 (2000). Uría, J. A. & López-Otín, C. Matrilysin-2, a new matrix metalloproteinase expressed in human tumors and showing the minimal domain organization required for secretion, latency, and activity. Cancer Res. 60, 4745–4751 (2000). Geier, E. et al. A giant protease with potential to substitute for some functions of the proteasome. Science 283, 978–981 (1999). Voges, D., Zwickl, P. & Baumeister, W. The 26S proteasome: a molecular machine designed for controlled proteolysis. Annu. Rev. Biochem. 68, 1015–1068 (1999). López-Otín, C. & Overall, C. M. Protease degradomics: a new challenge for proteomics. Nature Rev. Mol. Cell Biol. 3, 509–519 (2002). This article introduces new concepts and approaches for the global analysis of proteases in normal and pathological conditions, and especially in cancer. Rawlings, N. D., O’Brien, E. & Barrett, A. J. MEROPS: the protease database. Nucleic Acids Res. 30, 343–346 (2002). A description of a database that is freely available to the academic community, which represents an essential resource for research on proteases. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001). Zucker, S. & Chen, W.-T. Cell Surface Proteases (Academic Press, San Diego, 2003). A compilation of articles that cover recent advances in the functional analysis of membrane-bound proteases, which are a group of enzymes that are of growing relevance in normal and pathological conditions. Cope, G. A. et al. Role of predicted metalloprotease motif of Jab1/Csn5 in cleavage of Nedd8 from Cul1. Science 298, 608–611 (2002). Urban, S., Lee, J. R. & Freeman, M. A family of Rhomboid intramembrane proteases activates all Drosophila membrane-tethered EGF ligands. EMBO J. 21, 4277–4286 (2002). Mariño, G. et al. Human autophagins, a family of cysteine proteinases potentially implicated in cell degradation by autophagy. J. Biol. Chem. 278, 3671–3678 (2003). Weihofen, A., Binns, K., Lemberg, M. K., Ashman, K. & Martoglio, B. Identification of signal peptide peptidase, The exploration of the human and mouse genomes under a proteolytic prism has disentangled some of the complexities that are derived from the existence of multiple executioners of a single chemical reaction — the hydrolysis of peptide bonds — which lies at the heart of many events on which cell life and death depend. The genomic analysis of human and mouse proteases has also indicated that there are many challenges ahead. It is to be hoped that the continuing comparative analysis of these functionally related genes will illuminate new areas in biology and provide clinical answers to the many diseases of proteolysis. a presenilin-type aspartic protease. Science 296, 2215–2218 (2002). 19. Makarova, K. S., Aravind, L. & Koonin, E. V. A novel superfamily of predicted cysteine proteases from eukaryotes, viruses and Chlamydia pneumoniae. Trends Biochem. Sci. 25, 50–52 (2000). 20. Krylov, D. M. & Koonin, E. V. A novel family of predicted retroviral-like aspartyl proteases with a possible key role in eukaryotic cell cycle control. Curr. Biol. 11, 584–587 (2001). 21. Mouse Genome Sequence Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002). 22. Swanson, W. J. & Vacquier, V. D. The rapid evolution of reproductive proteins. Nature Rev. Genet. 3, 137–144 (2002). 23. Balbín, M. et al. Identification and enzymatic characterization of two diverging murine counterparts of human interstitial collagenase (MMP-1) expressed at sites of embryo implantation. J. Biol. Chem. 276, 10253–10262 (2001). 24. Deussing, J. et al. Identification and characterization of a dense cluster of placenta-specific cysteine peptidase genes and related genes on mouse chromosome 13. Genomics 79, 225–240 (2002). 25. Sol-Church, K. et al. Evolution of placentally expressed cathepsins. Biochem. Biophys. Res. Commun. 293, 23–29 (2002). 26. Yeh, E. T., Gong, L. & Kamitani, T. Ubiquitin-like proteins: new wines in new bottles. Gene 248, 1–14 (2000). 27. Brachvogel, B. et al. Molecular cloning and expression analysis of a novel member of the disintegrin and metalloprotease-domain (ADAM) family. Gene 288, 203–210 (2002). 28. Olsson, A. Y. & Lundwall, A. Organization and evolution of the glandular kallikrein locus in Mus musculus. Biochem. Biophys. Res. Commun. 299, 305–311 (2002). 29. Yousef, G. M. & Diamandis, E. P. The new human tissue kallikrein gene family: structure, function, and association to disease. Endocr. Rev. 22, 184–204 (2001). 30. Luo, L. Y. et al. The serum concentration of human kallikrein 10 represents a novel biomarker for ovarian cancer diagnosis and prognosis. Cancer Res. 63, 807–811 (2003). 31. Balk, S. P., Ko, Y. J. & Bubley, G. J. Biology of prostatespecific antigen. J. Clin. Oncol. 21, 383–391 (2003). 32. Caputo, E., Manco, G., Mandrich, L. & Guardiola, J. A novel aspartyl proteinase from apocrine epithelia and breast tumors. J. Biol. Chem. 275, 7935–7941 (2000). 33. Yoshida, M., Kaneko, M., Kurachi, H. & Osawa, M. Identification of two rodent genes encoding homologues to seminal vesicle autoantigen: a gene family including the gene for prolactin-inducible protein. Biochem. Biophys. Res. Commun. 281, 94–100 (2001). 34. Lunderius, C. & Hellman, L. Characterization of the gene encoding mouse mast cell protease 8 (mMCP-8), and a comparative analysis of hematopoietic serine protease genes. Immunogenetics 53, 225–232 (2001). 35. Garnier, G., Circolo, A., Xu, Y. & Volanakis, J. E. Complement C1r and C1s genes are duplicated in the mouse: differential expression generates alternative isomorphs in the liver and in the male reproductive system. Biochem. J. 371, 631–640 (2003). 36. Nishimura, H. et al. The ADAM1a and ADAM1b genes, instead of the ADAM1 (fertilin-α) gene, are localized on mouse chromosome 5. Gene 291, 67–76 (2002). 37. Grima, J., Wong, C. C., Zhu, L. J., Zong, S. D. & Cheng, C. Y. Testin secreted by Sertoli cells is associated with the cell surface, and its expression correlates with the disruption of Sertoli-germ cell junctions but not the inter-Sertoli tight junction. J. Biol. Chem. 273, 21040–21053 (1998). | JULY 2003 | VOLUME 4 38. Fischer, H., Koenig, U., Eckhart, L. & Tschachler, E. Human caspase 12 has acquired deleterious mutations. Biochem. Biophys. Res. Commun. 293, 722–726 (2002). 39. Grzmil, P. et al. Human cyritestin genes (CYRN1 and CYRN2) are non-functional. Biochem. J. 357, 551–556 (2001). 40. O’Sullivan, C. M., Liu, S. Y., Karpinka, J. B. & Rancourt, D. E. Embryonic hatching enzyme strypsin/ISP1 is expressed with ISP2 in endometrial glands during implantation. Mol. Reprod. Dev. 62, 328–334 (2002). 41. Kageyama, T. Pepsinogens, progastricsins, and prochymosins: structure, function, evolution, and development. Cell. Mol. Life Sci. 59, 288–306 (2002). 42. Rose, S. D. & MacDonald, R. J. Evolutionary silencing of the human elastase I gene (ELA1). Hum. Mol. Genet. 6, 897–903 (1997). 43. Suzuki, H. & Kumagai, H. Autocatalytic processing of γ-glutamyltranspeptidase. J. Biol. Chem. 277, 43536–43543 (2002). 44. Paulding, C. A., Ruvolo, M. & Haber, D. A. The Tre2 (USP6) oncogene is a hominoid-specific gene. Proc. Natl Acad. Sci. USA 100, 2507–2511 (2003). 45. Fougerousse, F. et al. Human–mouse differences in the embryonic expression patterns of developmental control genes and disease genes. Hum. Mol. Genet. 9, 165–173 (2000). 46. Emes, R. D., Goodstadt, L., Winter, E. E. & Ponting, C. P. Comparison of the genomes of human and mouse lays the foundation of genome zoology. Hum. Mol. Genet. 12, 701–709 (2003). An excellent analysis of the differences among human and mouse genomes and discussion of their physiological relevance. 47. Salamonsen, L. A. & Nie, G. Proteases at the endometrial–trophoblast interface: their role in implantation. Rev. Endocr. Metab. Disord. 3, 133–143 (2002). 48. Fata, J. E., Ho, A. T., Leco, K. J., Moorehead, R. A. & Khokha, R. Cellular turnover and extracellular matrix remodeling in female reproductive tissues: functions of metalloproteinases and their inhibitors. Cell. Mol. Life Sci. 57, 77–95 (2000). 49. Curry, T. E. & Osteen, K. G. Cyclic changes in the matrix metalloproteinase system in the ovary and uterus. Biol. Reprod. 64, 1285–1296 (2001). 50. Evans, J. P. Fertilin-β and other ADAMs as integrin ligands: insights into cell adhesion and fertilization. Bioessays 23, 628–639 (2001). 51. Seals, D. F. & Courtneidge, S. A. The ADAMs family of metalloproteases: multidomain proteins with multiple functions. Genes Dev. 17, 7–30 (2003). 52. Ny, T., Wahlberg, P. & Brandstrom, I. J. Matrix remodeling in the ovary: regulation and functional role of the plasminogen activator and matrix metalloproteinase systems. Mol. Cell Endocrinol. 187, 29–38 (2002). 53. Hulboy, D. L., Rudolph, L. A. & Matrisian, L. M. Matrix metalloproteinases as mediators of reproductive function. Mol. Hum. Reprod. 3, 27–45 (1997). 54. Vu, T. H. & Werb, Z. Matrix metalloproteinases: effectors of development and normal physiology. Genes Dev. 14, 2123–2133 (2000). 55. Overall, C. M. Molecular determinants of metalloproteinase substrate specificity: matrix metalloproteinase substrate binding domains, modules, and exosites. Mol. Biotechnol. 22, 51–86 (2002). 56. Mahon, P. & Bateman, A. The PA domain: a proteaseassociated domain. Protein Sci. 9, 1930–1934 (2000). www.nature.com/reviews/genetics © 2003 Nature Publishing Group REVIEWS 57. Luo, X. & Hofmann, K. The protease-associated domain: a homology domain associated with multiple classes of proteases. Trends Biochem. Sci. 26, 147–148 (2001). 58. Llamazares, M., Cal, S., Quesada, V. & López-Otín, C. Identification and characterization of ADAMTS-20 defines a novel subfamily of metalloproteinases-disintegrins with multiple thrombospondin-1 repeats and a unique GON-domain. J. Biol. Chem. 278, 13382–13389 (2003). 59. Somerville, R. P. et al. Characterization of ADAMTS-9 and ADAMTS-20 as a distinct ADAMTS subfamily related to Caenorhabditis elegans GON-1. J. Biol. Chem. 278, 9503–9513 (2003). 60. Hooper, J. D., Clements, J. A., Quigley, J. P. & Antalis, T. M. Type II transmembrane serine proteases: insights into an emerging class of cell surface proteolytic enzymes. J. Biol. Chem. 276, 857–860 (2001). 61. Velasco, G., Cal, S., Quesada, V., Sanchez, L. M. & Lopez-Otin, C. Matriptase-2, a membrane-bound mosaic serine proteinase predominantly expressed in human liver and showing degrading activity against extracellular matrix proteins. J. Biol. Chem. 277, 37637–37646 (2002). 62. Wex, T., Wex, H. & Bromme, D. The human cathepsin F gene — a fusion product between an ancestral cathepsin and cystatin gene. Biol. Chem. 380, 1439–1442 (1999). 63. Nagler, D. K., Sulea, T. & Menard, R. Full-length cDNA of human cathepsin F predicts the presence of a cystatin domain at the N-terminus of the cysteine protease zymogen. Biochem. Biophys. Res. Commun. 257, 313–318 (1999). 64. McQuibban, G. A. et al. Inflammation dampened by gelatinase A cleavage of monocyte chemoattractant protein3. Science 289, 1202–1206 (2000). 65. Tam, E. M., Wu, Y. I., Butler, G. S., Stack, M. S. & Overall, C. M. Collagen binding properties of the membrane type-1 matrix metalloproteinase (MT1–MMP) hemopexin C domain. The ectodomain of the 44-kDa autocatalytic product of MT1–MMP inhibits cell invasion by disrupting native type I collagen cleavage. J. Biol. Chem. 277, 39005–39014 (2002). 66. Ehlers, M. R., Fox, E. A., Strydom, D. J. & Riordan, J. F. Molecular cloning of human testicular angiotensin-converting enzyme: the testis isozyme is identical to the C-terminal half of endothelial angiotensin-converting enzyme. Proc. Natl Acad. Sci. USA 86, 7741–7745 (1989). 67. Azuma, T., Liu, W. G., Vander Laan, D. J., Bowcock, A. M. & Taggart, R. T. Human gastric cathepsin E gene. Multiple transcripts result from alternative polyadenylation of the primary transcripts of a single gene locus at 1q31–q32. J. Biol. Chem. 267, 1609–1614 (1992). 68. Freije, J. M. et al. Molecular cloning and expression of collagenase-3, a novel human matrix metalloproteinase produced by breast carcinomas. J. Biol. Chem. 269, 16766–16773 (1994). 69. Heuze-Vourc’h, N., Leblond, V. & Courty, Y. Complex alternative splicing of the hKLK3 gene coding for the tumor marker PSA (prostate-specific-antigen). Eur. J. Biochem. 270, 706–714 (2003). 70. Rieder, M. J., Taylor, S. L., Clark, A. G. & Nickerson, D. A. Sequence variation in the human angiotensin converting enzyme. Nature Genet. 22, 59–62 (1999). 71. Williams, A. G. et al. The ACE gene and muscle performance. Nature 403, 614 (2000). 72. Niu, T., Chen, X. & Xu, X. Angiotensin converting enzyme gene insertion/deletion polymorphism and cardiovascular disease: therapeutic implications. Drugs 62, 977–993 (2002). 73. Van Eerdewegh, P. et al. Association of the ADAM33 gene with asthma and bronchial hyperresponsiveness. Nature 418, 426–430 (2002). Together with reference 74, this paper illustrates the increased susceptibility to common diseases that is associated with genetic variation in some protease genes. 74. Horikawa, Y. et al. Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nature Genet. 26, 163–175 (2000). 75. Devlin, A. M. et al. Glutamate carboxypeptidase II: a polymorphism associated with lower levels of serum folate and hyperhomocysteinemia. Hum. Mol. Genet. 9, 2837–2844 (2000). 76. Yamada, Y. et al. Prediction of the risk of myocardial infarction from polymorphisms in candidate genes. N. Engl. J. Med. 347, 1916–1923 (2002). 77. Murphy, G. et al. Matrix metalloproteinases in arthritic disease. Arthritis Res. 4 (Suppl.) 39–49 (2002). 78. Yong, V. W., Power, C., Forsyth, P. & Edwards, D. R. Metalloproteinases in biology and pathology of the nervous system. Nature Rev. Neurosci. 2, 502–511 (2001). 79. Brinckerhoff, C. E. & Matrisian, L. M. Matrix metalloproteinases: a tail of a frog that became a prince. Nature Rev. Mol. Cell Biol. 3, 207–214 (2002). 80. Parks, W. C. & Shapiro, S. D. Matrix metalloproteinases in lung biology. Respir. Res. 2, 10–19 (2001). 81. Lomas, D. A. & Carrell, R. W. Serpinopathies and the conformational dementias. Nature Rev. Genet. 3, 759–768 (2002). 82. Carrell, R. W. & Lomas, D. A. α1-antitrypsin deficiency — a model for conformational diseases. N. Engl. J. Med. 346, 45–53 (2002). 83. Hardy, J. & Selkoe, D. J. The amyloid hypothesis of Alzheimer’s disease: progress and problems on the road to therapeutics. Science 297, 353–356 (2002). 84. Bowen, D. J. Haemophilia A and haemophilia B: molecular insights. Mol. Pathol. 55, 1–18 (2002). 85. Hauri, H. P., Kappeler, F., Andersson, H. & Appenzeller, C. ERGIC-53 and traffic in the secretory pathway. J. Cell. Sci. 113, 587–596 (2000). 86. Bignell, G. R. et al. Identification of the familial cylindromatosis tumour-suppressor gene. Nature Genet. 25, 160–165 (2000). 87. Wang, J. et al. Inherited human caspase 10 mutations underlie defective lymphocyte and dendritic cell apoptosis in autoimmune lymphoproliferative syndrome type II. Cell 98, 47–58 (1999). 88. Boatright, K. M. et al. A unified model for apical caspase activation. Mol. Cell 11, 529–541 (2003). 89. Huang, Y. & Wang, K. K. The calpain family and human disease. Trends Mol. Med. 7, 355–362 (2001). 90. Levy, G. G. et al. Mutations in a member of the ADAMTS gene family cause thrombotic thrombocytopenic purpura. Nature 413, 488–494 (2001). 91. Guipponi, M. et al. The transmembrane serine protease (TMPRSS3) mutated in deafness DFNB8/10 activates the epithelial sodium channel (ENaC) in vitro. Hum. Mol. Genet. 11, 2829–2836 (2002). 92. Citron, M. et al. Mutant presenilins of Alzheimer’s disease increase production of 42-residue amyloid β-protein in both transfected cells and transgenic mice. Nature Med. 3, 67–72 (1997). 93. Gehring, N. H. et al. Increased efficiency of mRNA 3′ end formation: a new genetic mechanism contributing to hereditary thrombophilia. Nature Genet. 28, 389–392 (2001). 94. De Jonghe, C. et al. Aberrant splicing in the presenilin-1 intron 4 mutation causes presenile Alzheimer’s disease by increased Aβ42 secretion. Hum. Mol. Genet. 8, 1529–1540 (1999). 95. Molinari, F. et al. Truncating neurotrypsin mutation in autosomal recessive nonsyndromic mental retardation. Science 298, 1779–1781 (2002). 96. Casari, G. et al. Spastic paraplegia and OXPHOS impairment caused by mutations in paraplegin, a nuclear-encoded mitochondrial metalloprotease. Cell 93, 973–983 (1998). 97. Chun, H. J. et al. Pleiotropic defects in lymphocyte activation caused by caspase-8 mutations lead to human immunodeficiency. Nature 419, 395–399 (2002). 98. Belaaouaj, A. et al. Mice lacking neutrophil elastase reveal impaired host defense against Gram negative bacterial sepsis. Nature Med. 4, 615–618 (1998). 99. Horwitz, M., Benson, K. F., Person, R. E., Aprikyan, A. G. & Dale, D. C. Mutations in ELA2, encoding neutrophil elastase, define a 21-day biological clock in cyclic haematopoiesis. Nature Genet. 23, 433–436 (1999). 100. Pendás, A. M. et al. Defective prelamin A processing and muscular and adipocyte alterations in Zmpste24 metalloproteinase-deficient mice. Nature Genet. 31, 94–99 (2002). Together with references 101 and 102, this paper is an example of the usefulness of mouse models and genetic approaches to identify the in vivo substrates of proteases. 101. Li, Q., Park, P. W., Wilson, C. L. & Parks, W. C. Matrilysin shedding of syndecan-1 regulates chemokine mobilization and transepithelial efflux of neutrophils in acute lung injury. Cell 111, 635–646 (2002). 102. Wilson, C. L. et al. Regulation of intestinal α-defensin activation by the metalloproteinase matrilysin in innate host defense. Science 286, 113–117 (1999). 103. Ranger, A. M., Malynn, B. A. & Korsmeyer, S. J. Mouse models of cell death. Nature Genet. 28, 113–118 (2001). 104. Rakic, J. M. et al. Role of plasminogen activator-plasmin system in tumor angiogenesis. Cell Mol. Life Sci. 60, 463–473 (2003). 105. Lund, L. R. et al. Functional overlap between two classes of matrix-degrading proteases in wound healing. EMBO J. 18, 4645–4656 (1999). 106. Blasi, F. & Carmeliet, P. uPAR: a versatile signalling orchestrator. Nature Rev. Mol. Cell Biol. 3, 932–943 (2002). 107. Holmbeck, K. et al. MT1–MMP-deficient mice develop dwarfism, osteopenia, arthritis, and connective tissue disease due to inadequate collagen turnover. Cell 99, 81–92 (1999). 108. Zhou, Z. et al. Impaired endochondral ossification and angiogenesis in mice deficient in membrane-type matrix metalloproteinase I. Proc. Natl Acad. Sci. USA 97, 4052–4057 (2000). NATURE REVIEWS | GENETICS 109. Caterina, J. J. et al. Enamelysin (matrix metalloproteinase 20)-deficient mice display an amelogenesis imperfecta phenotype. J. Biol. Chem. 277, 49598–49604 (2002). 110. Coussens, L. M., Shapiro, S. D., Soloway, P. D. & Werb, Z. Models for gain-of-function and loss-of-function of MMPs: transgenic and gene targeted mice. Methods Mol. Biol. 151, 149–179 (2001). 111. Wilson, S. M. et al. Synaptic defects in ataxia mice result from a mutation in Usp14, encoding a ubiquitin-specific protease. Nature Genet. 32, 420–425 (2002). An interesting example of a mouse disease that is caused by a mutation in a protease gene, the human orthologue of which has not yet been linked to an equivalent disorder. 112. Neuhold, L. A. et al. Postnatal expression in hyaline cartilage of constitutively active human collagenase-3 (MMP-13) induces osteoarthritis in mice. J. Clin. Invest. 107, 35–44 (2001). 113. Yu, Y. & Bradley, A. Engineering chromosomal rearrangements in mice. Nature Rev. Genet. 2, 780–790 (2001). 114. Stanford, W. L., Cohn, J. B. & Cordes, S. P. Gene-trap mutagenesis: past, present and beyond. Nature Rev. Genet. 2, 756–768 (2001). 115. Southan, C. A genomic perspective on human proteases as drug targets. Drug Discov. Today 6, 681–688 (2001). A discussion of the relevance of proteases as therapeutic targets. 116. Overall, C. M. & López-Otín, C. Strategies for MMP inhibition in cancer: innovations for the post-trial era. Nature Rev. Cancer 2, 657–672 (2002). 117. Soto, C. Protein misfolding and disease; protein refolding and therapy. FEBS Lett. 498, 204–207 (2001). 118. Crowther, D. C. Familial conformational diseases and dementias. Hum. Mutat. 20, 1–14 (2002). 119. Cushman, D. W. & Ondetti, M. A. Design of angiotensin converting enzyme inhibitors. Nature Med. 5, 1110–1113 (1999). Together with reference 120, this article represents an example of the successful introduction of protease inhibitors to treat human disease. 120. Menendez-Arias, L. Targeting HIV: antiretroviral therapy and development of drug resistance. Trends Pharmacol. Sci. 23, 381–388 (2002). 121. Coussens, L. M., Fingleton, B. & Matrisian, L. M. Matrix metalloproteinase inhibitors and cancer: trials and tribulations. Science 295, 2387–2392 (2002). An excellent analysis of the lack of success of most MMP inhibitors developed for treating cancer and discussion of alternatives for future improvement in this field. 122. Gomis-Ruth, F. X. et al. Mechanism of inhibition of the human matrix metalloproteinase stromelysin-1 by TIMP-1. Nature 389, 77–81 (1997). 123. Bode, W. & Huber, R. Structural basis of the endoproteinaseprotein inhibitor interaction. Biochim. Biophys. Acta 1477, 241–252 (2000). 124. Vendrell, J., Querol, E. & Aviles, F. X. Metallocarboxypeptidases and their protein inhibitors: structure, function and biomedical properties. Biochim. Biophys. Acta 1477, 284–298 (2000). 125. Morgunova, E., Tuuttila, A., Bergmann, U. & Tryggvason, K. Structural insight into the complex formation of latent matrix metalloproteinase 2 with tissue inhibitor of metalloproteinase 2. Proc. Natl Acad. Sci. USA 99, 7414–7419 (2002). 126. Turk, V., Turk, B. & Turk, D. Lysosomal cysteine proteases: facts and opportunities. EMBO J. 20, 4629–4633 (2001). 127. Anel, R. L. & Kumar, A. Experimental and emerging therapies for sepsis and septic shock. Expert Opin. Investig. Drugs 10, 1471–1485 (2001). 128. Desnick, R. J. & Schuchman, E. H. Enzyme replacement and enhancement therapies: lessons from lysosomal disorders. Nature Rev. Genet. 3, 954–966 (2002). A comprehensive review that discusses the successes and shortcomings of present strategies to treat inherited metabolic disorders. 129. Roth, D. A. et al. Human recombinant factor IX: safety and efficacy studies in hemophilia B patients previously treated with plasma-derived factor IX concentrates. Blood 98, 3600–3606 (2001). 130. Selkoe, D. J. Deciphering the genesis and fate of amyloid β-protein yields novel therapies for Alzheimer disease. J. Clin. Invest. 110, 1375–1381 (2002). 131. Kay, M. A. et al. Evidence for gene transfer and expression of factor IX in haemophilia B patients treated with an AAV vector. Nature Genet. 24, 257–261 (2000). 132. Olson, M. V. & Varki, A. Sequencing the chimpanzee genome: insights into human evolution and disease. Nature Rev. Genet. 4, 20–28 (2003). An excellent analysis of the relevance of comparative genomics and discussion of the argument that gene VOLUME 4 | JULY 2003 | 5 5 7 © 2003 Nature Publishing Group REVIEWS 133. 134. 135. 136. 137. 138. 558 loss might be an important mechanism of rapid evolutionary change. Kheradmand, F. & Werb, Z. Shedding light on sheddases: role in growth and development. Bioessays 24, 8–12 (2002). Together with reference 134, this review describes the functional relevance of the protease-mediated process of ectodomain shedding of membrane proteins. Arribas, J. & Borroto, A. Protein ectodomain shedding. Chem. Rev. 102, 4627–4638 (2002). Rudner, D. Z., Fawcett, P. & Losick, R. A family of membrane-embedded metalloproteases involved in regulated proteolysis of membrane-associated transcription factors. Proc. Natl Acad. Sci. USA 96, 14765–14770 (1999). Hoppe, T., Rape, M. & Jentsch, S. Membrane-bound transcription factors: regulated release by RIP or RUP. Curr. Opin. Cell Biol. 13, 344–348 (2001). Brown, M. S., Ye, J., Rawson, R. B. & Goldstein, J. L. Regulated intramembrane proteolysis: a control mechanism conserved from bacteria to humans. Cell 100, 391–398 (2000). An excellent analysis of the fascinating process that involves the participation of proteases that hydrolyze their substrates in the hydrophobic environment of the lipid bilayers. Hopkins, A. L. & Groom, C. R. The druggable genome. Nature Rev. Drug Discov. 1, 727–730 (2002). 139. McLysaght, A., Hokamp, K. & Wolfe, K. H. Extensive genomic duplication during early chordate evolution. Nature Genet. 31, 200–204 (2002). 140. Samonte, R. V. & Eichler, E. E. Segmental duplications and the evolution of the primate genome. Nature Rev. Genet. 3, 65–72 (2002). 141. Ross, J., Jiang, H., Kanost, M. R. & Wang, Y. Serine proteases and their homologs in the Drosophila melanogaster genome: an initial analysis of sequence conservation and phylogenetic relationships. Gene 304, 117–131 (2003). 142. Lespinet, O., Wolf, Y. I., Koonin, E. V. & Aravind, L. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res. 12, 1048–1059 (2002). 143. Nardi, J. B., Martos, R., Walden, K. K., Lampe, D. J. & Robertson, H. M. Expression of lacunin, a large multidomain extracellular matrix protein, accompanies morphogenesis of epithelial monolayers in Manduca sexta. Insect Biochem. Mol. Biol. 29, 883–897 (1999). Acknowledgments We thank all members of our laboratories for their comments on the manuscript and apologize for the omission of relevant works owing to space constraints. Our work is supported by grants from the Ministerio de Ciencia y Tecnología-Spain, the Gobierno del Principado de Asturias, Fundación ‘La Caixa’ and the European Union. C.M.O. is supported by a Canada Research Chair in Metalloproteinase Biology. The Instituto Universitario de Oncología is supported by Obra Social Cajastur-Asturias, Spain. | JULY 2003 | VOLUME 4 Online links DATABASES The following terms in this article are linked online to: LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink ACE | ADAM33 | ADAMTS13 | APP | C1r | C1s |CAPN3 | CAPN10 | CASP10 | CYLD | ELA1 | Ela2 | KLK3 | LMAN1 | MMP1 | PSEN1 | PSEN2 | Ren2 | Uchl4 | Usp14 | USP17 OMIM: http://www.ncbi.nlm.nih.gov/Omim Alzheimer disease | asthma | congenital neutropenia | cyclic haematopoiesis | familial cylindromatosis | haemophilia A | haemophilia B Leyden | Huntington disease | hyperhomocysteinemia | hyperprothrombinemia | limb-girdle muscular dystrophy type 2A | multiple myeloma | thrombotic thrombocytopenic purpura | type II autoimmune lymphoproliferative syndrome | type-2 diabetes FURTHER INFORMATION Celera Discovery System: http://www.celeradiscoverysystem.com Chris Overall’s Laboratory: http://www.clip.ubc.ca Ensembl: http://www.ensembl.org Interpro: http://www.ebi.ac.uk/interpro Lopez-Otin’s Laboratory: http://web.uniovi.es/degradome MEROPS: http://merops.sanger.ac.uk NCBI: http://www.ncbi.nlm.nih.gov Pfam: http://www.sanger.ac.uk/Software/Pfam Protpars: http://evolution.genetics.washington.edu/phylip.html SMART: http://smart.ox.ac.uk Access to this interactive links box is free online. www.nature.com/reviews/genetics © 2003 Nature Publishing Group Supplementary Tables S1–S5 Human and mouse proteases are divided into five classes, which are subdivided into families according to the MEROPS database criteria (Tables S1–S5). We have provided the MEROPS code for all enzymes for which they are available. There are some conflicting cases in which different codes have been previously assigned to human and mouse protease genes that were shown in this work to be true orthologues. In these cases, the human code is proposed for both orthologues. The genes encoding protease-like proteins that show changes in crucial residues for proteolytic activity are indicated as ‘np’ (non-protease homologues) after the code. The Locus link or nucleotide accession number is provided for each protease. The information for human enzymes is labelled in green and for mouse in yellow. Genes that are absent from human or mouse are labelled in red. Genes that have been inactivated by mutation in one species, but are functional in the other, are labelled in pink. Although these specific pseudogenes have been included in the Tables to emphasize the human–mouse difference, they have not been incorporated into the final counts of protease genes. Genes that have been verified experimentally, but the sequence of which is missing from the available genome sequences, are indicated in red and in parentheses. ‘Y’ indicates that the corresponding human and mouse genes are syntenic. The percentage of identities between orthologous proteases are also shown. Table S1 | Aspartic proteases Code A01.001 A01.003 A01.004 A01.041 A01.006 A01.007 A01.009 A01.010 A01.046 A01.008 Peptidase pepsin A pepsin C β-secretase 1 β-secretase 2 chymosin renin cathepsin D cathepsin E napsin A submandibular renin Human Gene PGA3/4/5 PGC BACE BACE2 #CYMP REN CTSD CTSE NAP1 LocusLink 5219 5225 23621 25825 1542 5972 1509 1510 9476 Locus 11q12 6p12 11q23 21q22 1p13 1q31 11p15 (1q31) 19q13 Mouse Gene Pepf Pgc Bace Bace2 Cymp Ren1 Ctsd Ctse Kdap Ren2 LocusLink 58803 109820 23821 56175 229697 19701 13033 13034 16541 19702 Locus 19B 17C 9B 16C4 3F3 1E4 7F5 1E4 7B2 1E4 Syntenic y y y y y y y y y Identity 55 73 96 88 A02.059 A02.xxx A02.xxx A02.xxx A02.xxx DDI-related protease DNA-damage inducible protein DNA-damage inducible protein 2 Nuclear recept. interacting prot. 2 Nuclear recept. interacting prot. 3 DDI-RP DDI1 DDI2 NRIP2 NRIP3 151516 AK093336 BN000122 83714 56675 2p13 11q22 1p36 12p13 11p15 Ddi-rp Ddi1 Ddi2 Nrip2 Nrip3 67855 71829 BC021415 60345 78593 6D2 9A1 4E1 6F3 7E3 y y y y y 88 81 96 83 93 69 81 82 71 A22.001 A22.002 A22.005 A22.006 A22.003 A22.004 A22.007 presenilin 1 presenilin 2 presenilin homologue 1/SPPL3 presenilin homologue 2 presenilin homologue 3/SPP presenilin homologue 4/SPPL2B presenilin homologue 5 PSEN1 PSEN2 PSH1 PSH2 PSH3 PSH4 PSH5 5663 5664 121665 162540 81502 56928 84888 14q24 1q42 12q24 17q21 20q11 19p13 15q21 Psen1 Psen2 Psh1 Psh2 Psh3 Psh4 Psh5 19164 19165 83678 237958 14950 73218 66552 12D3 1H4 5F 11D 2H2 10C1 2F2 y y y y y y y 92 95 96 70 96 83 83 Ax1.xxx GCDFP15 PIP 5304 7q34 Pip 18716 6B2 y 47 Ax1.xxxnp seminal vesicle antigen Sva 20939 6B2 Ax1.xxxnp seminal vesicle antigen-like 1 Sval1 71578 6B2 Ax1.xxxnp seminal vesicle antigen-like 2 Sval2 84543 6B2 Ax1.xxxnp seminal vesicle antigen-like 3 Sval3 232737 6B2 These are divided into four families: A01, A02, A22 and Ax1. There are several pepsinogen A isozymogens encoded by highly related genes (>95% identities) that form part of a cluster located at 11q12. The individual pepsinogen A isozymogens result from haplotypes that contain different number of 1,2 genes (ranging from 1 to 4) . In agreement with other databases, this region has been annotated as a single gene in human. According to the criteria discussed above, we have assigned mouse pepsinogen F as the orthologue of human pepsinogen A, despite notable divergence of their structure and 3 regulation . Ren2 is absent in some strains of laboratory mice. The gene that encodes prochymosin has been inactivated by mutations and frameshifts in 4 the human genome and is classified as a pseudogene, although in mouse and other species it is functional . The genes DDI1, DDI2, DDI-RP, NRIP2 and NRIP3 are included in the family A02 that contains predicted retroviral-like aspartic proteases5. All of these have mouse orthologues at syntenic regions, and are not embedded in endogenous retroviral elements. The human and mouse genomes also contain several aspartic protease-related sequences derived from endogenous retrovirus, but we have not annotated these as human or mouse proteases. In this regard, it is remarkable that most of the retroviruses embedded in both genomes have suffered inactivating mutations, also affecting the putative proteases that are encoded by these viral elements. However, HERV-K113, for example, which is located at 19p13 in ~30% of the human population, has intact open-reading frames for all viral proteins, including the corresponding aspartic protease, and remains capable of reinfecting human today6. The catalogue of aspartic proteases also includes a new family that is derived from the protein prolactin inducible protein/gross cystic disease fluid protein-15 (PIP/GCDFP15), which has recently been characterized as a protease belonging to this class of enzymes7. The four PIP-related proteins lack residues proposed to be essential for PIP proteolytic activity and have been classified as non-protease homologues. Table S2 | Cysteine proteases Code C01.060 C01.070 C01.018 C01.040 C01.036 C01.032 C01.009 C01.034 C01.037 C01.013 C01.038 C01.023 C01.051 C01.042 C01.016 C01.031 C01.053 C01.045 Peptidase cathepsin B cathepsin C cathepsin F cathepsin H cathepsin K cathepsin L cathepsin L2 cathepsin S cathepsin W cathepsin Z cathepsin J cathepsin M cathepsin Q cathepsin R cathepsin-1 cathepsin-2 cathepsin-3 cathepsin-6 Human Gene CTSB CTSC CTSF CTSH CTSK CTSL CTSL2 CTSS CTSW CTSZ LocusLink 1508 1075 8722 1512 1513 1514 1515 1520 1521 1522 Locus 8p23 11q14 11q13 15q24 1q21 9q21 9q22 1q21 11q13 20q13 Mouse Gene Ctsb Ctsc Ctsf Ctsh Ctsk LocusLink 13030 13032 56464 13036 13038 Locus 14C3 7E1 19A 9E3 3F2 Syntenic y y y y y Identity 78 77 78 83 86 Ctsl Ctss Ctsw Ctsz Ctsj Ctsm Ctsq Ctsr Cts1 Cts2 Cts3 Cts6 13039 13040 13041 64138 26898 64139 104002 56835 116909 56094 117066 58518 13B3 3F2 19A 2H4 13B3 13B3 13B3 13B3 13B3 13B3 13B3 13B3 y y y y 75 73 68 83 C01.973np C01.975np tubulointerstitial nephritis antigen TINAG related protein TINAG LCN7 27283 64129 6p12 1p35 Tinag Lcn7 26944 94242 9E1 4D3 y y 85 90 C01.972np C01.xxxnp C01.xxx testin testin-2 testin-3 Cmb22/23 Cmb24 Cmb25 214639 70202 BY736040 13B3 13B3 13B3 C01.084 bleomycin hydrolase BLMH 642 17q11 Blmh 104184 11B5 y 93 C02.001 calpain 1 CAPN1 823 11q13 Capn1 12333 19A y 89 C02.002 C02.004 C02.011 C02.971np C02.008 C02.007 C02.006 C02.018 C02.013 C02.017 C02.020 C02.xxx C02.010 calpain 2 calpain 3 calpain 5 calpain 6 calpain 7 calpain 8 calpain 9 calpain 10 calpain 11 calpain 12 calpain 13 calpain 14 calpain 15/Sol protein CAPN2 CAPN3 CAPN5 CAPN6 CAPN7 CAPN8 CAPN9 CAPN10 CAPN11 CAPN12 CAPN13 CAPN14 SOLH 824 825 726 827 23473 AA043093 10753 11132 11131 147968 92291 114773 6650 1q42 15q15 11q13 Xq23 3p25 (1q42) 1q42 2q37 6p21 19q13 2p23 2p23 16p13 Capn2 Capn3 Capn5 Capn6 Capn7 Capn8 Capn9 Capn10 Capn11 Capn12 Capn13 12334 12335 12337 12338 12339 170725 73647 23830 103998 60594 240159 1H4 2F1 7F1 XF2 14B 1H4 8E2 1D 17C 7A3 17E2 y y y y y y y y y y y 93 93 92 95 95 72 85 81 83 87 62 Solh 50817 17B1 y 89 C12.001 C12.003 C12.004 C12.005 C12.007 C12.xxx ubiquitin C-terminal hydrolase 1 ubiquitin C-terminal hydrolase 3 ubiquitin C-term. hydrolase BAP1 ubiquitin C-terminal hydrolase 5 ubiquitin C-terminal hydrolase 4 cylindromatosis protein UCHL1 UCHL3 BAP1 UCHL5 7345 7347 8314 51377 4p14 13q22 3p21 1q31 16q12 5D 14E2 14B 1F 9D 8C4 94 98 93 96 1540 22223 50933 104416 56207 93841 74256 y y y y CYLD1 Uchl1 Uchl3 Bap1 Uchl5 Uchl4 Cyld1 y 95 C13.004 C13.xxx C13.005 legumain legumain-2 hGPI8 LGMN LGMN2 PIGK 5641 122199 10026 14q32 13q21 1p31 Lgmn 19141 12F1 y 82 Pigk 66613 3H4 y 94 C14.001 C14.006 C14.003 C14.007 C14.008 C14.005 caspase-1 caspase-2 caspase-3 caspase-4/11 caspase-5 caspase-6 CASP1 CASP2 CASP3 CASP4 CASP5 CASP6 834 835 836 837 838 839 11q22 7q34 4q35 11q22 11q22 4q25 Casp1 Casp2 Casp3 Casp11 12362 12366 12367 12363 9A1 6B2 8B2 9A1 y y y y 62 89 87 60 Casp6 12368 3H1 y 90 C14.004 C14.009 C14.010 C14.011 C14.013 C14.018 C14.026 caspase-7 caspase-8 caspase-9 caspase-10 caspase-12 caspase-14 paracaspase CASP7 CASP8 CASP9 CASP10 #CASP12 CASP14 MALT1 840 841 842 843 120329 23581 10892 10q25 2q33 1p36 2q33 11q22 19p13 18q21 Casp7 Casp8 Casp9 12369 12370 12371 19D2 1C2 4E1 y y y 82 62 72 Casp12 Casp14 Malt1 12364 12365 240354 9A1 10C1 18E1 y y y 74 90 C14.020np C14.971np C14.975np homologue ICEY casper caspase-14-like ICEYH CFLAR CASP14L 120332 8837 197350 11q22 2q33 16p13 Cflar Casp14L 12633 1C2 17A3 y y 68 78 C15.010 C15.011 pyroglutamyl peptidase I pyroglutamyl-peptidase II PGPEP1 PGPEP2 65074 145814 19p13 15q26 Pgpi Pgpep2 66522 78444 8C1 7C y y 95 71 C19.019 C19.013 C19.026 C19.010 C19.001 C19.009 C19.016 C19.011 C19.017 C19.028 C19.018 C19.014 C19.020 C19.012 C19.015 C19.022 USP1 USP2 USP3 USP4 USP5 USP6 USP7 USP8 USP9X USP9Y USP10 USP11 USP12 USP13 USP14 USP15 USP1 USP2 USP3 USP4 USP5 USP6 USP7 USP8 USP9X USP9Y USP10 USP11 USP12 USP13 USP14 USP15 7398 9099 9960 7375 8078 9098 7874 9101 8239 8287 9100 8237 9959 8975 9097 9958 1p31 11q23 15q22 3p21 12p13 17p13 16p13 15q21 Xp11 Yq11 16q24 Xp11 13q12 3q26 18p11 12q14 Usp1 Usp2 Usp3 Usp4 Usp5 230484 53376 235441 22258 22225 4C6 9B 9D 9F2 6F2 y y y y y 88 95 98 90 98 Usp7 Usp8 Usp9x Usp9y Uchrp Usp11 Ubh1 108732 84092 22284 107868 22224 236733 22217 16A3 2F2 XA1 (Y) 8E1 XA2 5G2 y y y y y y 99 82 98 82 83 85 98 Usp14 Usp15 59025 14479 18A1 10D3 y y 96 94 C19.021 C19.023 C19.xxx C19.030 C19.024 C19.025 C19.034 C19.035 C19.047 C19.041 C19.046 C19.075 C19.054 C19.040 C19.060 C19.071 C19.044 C19.037 C19.067 C19.059 C19.042 C19.053 C19.056 C19.972np C19.069 C19.xxx C19.048 C19.xxx C19.057 C19.975 C19.052 USP16 USP17 USP17-like USP18 USP19 USP20 USP21 USP22 USP24 USP25 USP26 USP27 USP28 USP29 USP30 USP31 NY-REN-60 VDU1 USP34 USP35 USP36 USP37 HP43.8KD SAD1 USP40 USP41 USP42 USP43 USP44 USP45 USP46 USP16 USP17 USP17L USP18 USP19 USP20 USP21 USP22 USP24 USP25 USP26 USP27 USP28 USP29 USP30 USP31 USP32 USP33 USP34 USP35 USP36 USP37 USP38 USP39 USP40 USP41 USP42 USP43 USP44 USP45 USP46 10600 23661 BN000116 11274 10869 10868 27005 23326 23358 29761 83844 AW851065 57646 57663 84749 57478 84669 23032 9736 57558 57602 57695 84640 10713 55230 150200 84132 124739 84101 85015 64854 21q21 4p16 8p23 22q11 3p11 9q34 1q22 17p11 1p32 21q11 Xq26 Xp11 11q23 19q13 12q23 16p12 17q23 1p31 2p15 11q13 17q25 2q35 4q31 2p11 2q37 22q11 7p22 17p12 12q21 6q16 4q12 Usp16 74112 16C3 y 82 Usp18 Usp19 Usp20 Usp21 Usp22 Usp24 Usp25 Usp26 Usp27 Usp28 Usp29 Usp30 Usp31 Usp32 Usp33 24110 71472 74270 30941 216825 72686 30940 83563 54651 235323 57775 100756 209833 237899 170822 6F2 9F2 2B 1H2 11B4 4C7 16C3 XA3 XA1 9B 7A1 5F 7F2 11B5 3H4 y y y y y y y y y y y y y y y 70 79 94 96 93 97 95 36 97 98 45 90 90 94 92 Usp35 Usp36 244144 72344 7E3 12F2 y y 82 74 Usp38 Usp39 Usp40 74841 28035 227334 8C3 6C3 1C5 y y y 72 87 81 Usp42 Usp43 Usp44 Usp45 Usp46 76800 216835 214955 77593 100664 5G2 11B3 10C2 4A3 5C3 y y y y y 81 76 87 79 99 C19.055 C19.068 C19.073np C19.058np C19.065 C19.xxxnp C19.031 C19.032 C19.xxx C19.xxx C19.xxx C19.xxx USP47 USP48 USP49 USP50 USP51 USP52 DUB-1 DUB-2 DUB2a DUB2a-like DUB2a-like2 DUB6 USP47 USP48 USP49 USP50 USP51 USP52 55031 84196 25862 AI990110 BF741256 9924 11p15 1p36 6p21 15q21 Xp11 12q13 Usp47 Usp48 Usp49 Usp50 320745 170707 224836 75083 7F2 4D3 17C 2F2 y y y y 94 95 80 75 Usp52 Dub1 Dub2 Dub3 Dub4 Dub5 Dub6 103135 13531 13532 AF393638 AF393637 BAC40791 BN000117 10D3 7F2 7F1 7F1 7F1 7F1 7F1 y 97 C26.001 γ-glutamyl hydrolase GGH 8836 8q12 Ggh 14590 4A3 y 69 C44.001 Gln-PRPP amidotransferase PPAT 5471 4q12 Ppat 231327 5E1 y 93 C44.971np C44.972np C44.973np Gln-fructose-6-P transamidase 1 Gln-fructose-6-P transamidase 2 Gln-fructose-6-P transamidase 3 GFPT1 GFPT2 GFPT3 2673 9945 203431 2p13 5q35 Xq21 Gfpt1 Gfpt2 #Gfpt3 14583 14584 6D2 11B1 XC3 y y y 99 98 C46.002 C46.003 C46.004 sonic hedgehog protein indian hedgehog protein desert hedgehog protein SHH IHH DHH 6469 3549 50846 7q36 2q35 12q13 Shh Ihh Dhh 20423 16147 13363 5A3 1C3 15F2 y y y 92 95 97 C48.002 C48.007 C48.003 C48.008 C48.004 C48.009 sentrin/SUMO protease 1 sentrin/SUMO protease 2 sentrin/SUMO protease 3 sentrin/SUMO protease 5 sentrin/SUMO protease 6 sentrin/SUMO protease 7 SENP1 SENP2 SENP3 SENP5 SENP6 SENP7 29843 59343 26168 205564 26054 57337 12q13 3q27 17p13 3q29 6q14 3q12 Senp1 Senp2 Senp3 Senp5 Senp6 Senp7 223870 75826 80886 AK043171 215351 72869 15F2 16B1 11B4 16B2 9E2 16B1 y y y y y y 88 71 95 71 81 87 C48.011 C48.016 C48.013 C48.017 C48.015 C48.xxx C48.xxx sentrin/SUMO protease 8 sentrin/SUMO protease 9 sentrin/SUMO protease 11 sentrin/SUMO protease 12 sentrin/SUMO protease 13 sentrin/SUMO protease 14 sentrin/SUMO protease 15 SENP8 123228 15q23 Senp8 Senp9 Senp11 Senp12 Senp13 Senp14 Senp15 71599 236870 216394 208231 114671 278823 278824 9C XA7 10D3 16B5 10A3 1B 1B y 92 C50.001 separase ESPL1 9700 12q13 Espl1 105988 15F3 y 78 C54.003 C54.002 C54.004 C54.005 autophagin-1 autophagin-2 autophagin-3 autophagin-4 AUTL1 AUTL2 AUTL3 AUTL4 23192 115201 84938 84971 2q37 Xq23 1p31 19p13 Autl1 Autl2 Autl3 Autl4 66615 102926 242557 235040 1D XF1 4C6 9A3 y y y y 92 89 86 86 C56.002 DJ-1 DJ-1 11315 1p36 Dj-1 57320 4E1 y 90 Cx1.xxx Cx1.xxx Cx1.xxx Cx1.xxx Cx1.xxx Cx1.xxx Cx1.xxx Cx1.xxx Cx1.xxx Cx1.xxx Cx1.xxx Cx1.xxxnp Cx1.xxx Cx1.xxx Hin-1 Hin-1-like Hin-2 Hin-3 Hin-4 Hin-5 Hin-6 Hin-7 Otubain-1 Otubain-2 TNFa-induced protein 3/A20 TRAF-binding protein domain Cezanne Cezanne-2 HSHIN1 HSHIN1L HSHIN2 HSHIN3 HSHIN4 HSHIN5 HSHIN6 HSHIN7 OTUB1 OTUB2 TNFAIP3 TRABID CEZANNE LOC161725 54726 BN000160 79868 254897 220213 55593 139562 BI829009 55611 78990 7128 54764 56957 161725 4q31 12p13 Xq23 1p36 10p12 Xp11 Xq13 1q32 11q13 14q32 6q23 10q26 1q21 15q13 Hshin1 234484 8C3 y 91 Hshin2 Hshin3 Hshin4 Hshin5 Hshin6 Hshin7 Otub1 Otub2 Tnfaip3 Trabid Cezanne AJ430384 245656 73162 71198 54644 236924 226418 107260 68149 21929 BN000126 AAH37040 170711 XF2 4D3 2A2 XA1 XC2 1E4 19A 12F1 10A2 7F4 3F2 7C y y y y y y y y y y y y 65 73 90 91 64 92 99 95 90 99 93 95 Cx1.xxx Cx1.xxxnp CGI-77 CGI-77b CGI77 51633 8q21 Cgi77 Cgi77b 72201 236778 4A2 XA3 y 87 Cx2.xxxnp HetF-like HETFL 23331 22q12 Hetfl 209683 5F y 85 The cysteine proteases belong to 16 different families, and include proteins such as hedgehog family members, the protease function of which is only used 8 for the autolytic processing of their respective precursors . The C01 family is largely expanded in the mouse as a result of the presence of placental cathepsins and testins. We have annotated two further mouse testins, including testin-3, which was the first member of this subfamily predicted to be a functional protease. There are two functional human cathepsin L-like genes (CTSL and CTSL2) at 9q21, and a single gene in the mouse, which is more closely related to CTSL2. The cylindromatosis protein contains an ubiquitin C-terminal hydrolase domain and has been included in the C12 family. The genes for calpain 14, caspase 5 and caspase 10 are absent in mice, and the human gene for caspase 12 has been inactivated and is therefore classified as a pseudogene. We have annotated a second human legumain-like gene that is absent in mouse. The C19 family of ubiquitin specific proteases (USPs) is large and complex. We have annotated 21 human members (USP30, 31, 34–52) and assigned their corresponding mouse orthologues. We have not found mouse orthologues for human USP6, -13, -34, -37, -42 and -51. USP17 is located within the RS447 9 human megasatellite at 4p15 . This region is highly polymorphic in the human genome, containing a variable number of USP17-related intronless tandemlyrepeated sequences (>95% identical), which have probably been generated by retrotransposition. Forty-four distinct alleles in 74 unrelated chromosomes 10 containing 20–103 copies of the RS477 unit have been identified . We have also identified several USP17-related sequences in a cluster located at 8p25. This cluster would contain at least seven USP17-like (USP17L) intronless genes (three of these are classified as non-protease homologues) and pseudogenes. The proteins encoded by these polymorphic and variable regions have been annotated as two single proteases (USP17 and USP17L) in this table. The closest relatives of USP17 genes in the mouse genome are those that code for proteins called DUBs (deubiquitinating enzymes). DUB1, DUB2, and DUB2A have been extensively characterized as members of a novel group of cytokine-inducible deubiquitylating enzymes that are produced by 11–13 lymphocytes . We have annotated three further members of this subfamily of haematopoietic proteases. The classification of mouse DUBs as orthologues of human USP17 genes is doubtful because, despite sequence similarities, their syntenic relationship is unclear. Accordingly, we have tentatively classified them as paralogous genes. We have annotated six members of the C48 family of SUMO-1 proteases in the mouse genome, which are absent in the human genome. We have also included a family of recently described cysteine proteases with deubiquitylating activity containing the OTU-protease domain and tentatively called 14,15 otubains . This family should comprise 14 orthologues and one specific member in both human and mouse. All of them contain characteristic features of active proteases with the exception of TRABID and murine Cgi77b. The last protease included in our list of cysteine proteases is called HetF-like and forms 16 part of the superfamily of caspase-haemoglobinase fold proteases . Human and mouse HetF-like have a serine residue instead of the active-site cysteine present in cysteine proteases, and have been classified as non-protease homologues. Table S3 | Metalloproteases Code M01.003 M01.014 M01.023 M01.001 M01.018 M01.004 M01.008 M01.010 M01.011 M01.022 M01.028 M01.027 M01.972np Peptidase aminopeptidase A aminopeptidase B aminopeptidase MAMS aminopeptidase N aminopeptidase PILS leukotriene A4 hydrolase pyroglutamyl-peptidase II cytosol alanyl aminopeptidase leucyl-cystinyl aminopeptidase aminopeptidase B-like 1 aminopeptidase O aminopeptidase Q TBP-associated factor 2 Human Gene ENPEP RNPEP AMPEP ANPEP ARTS1 LTA4H TRHDE NPEPPS LNPEP RNPEPL1 AOPEP AQPEP TAF2 LocusLink 2028 6051 64167 290 51752 4048 29953 9520 4012 57140 84909 BG623101 6873 Locus 4q26 1q32 5q15 15q25 5q21 12q23 12q21 17q21 5q15 2q37 9q22 5q23 8q24 Mouse Gene LocusLink Enpep 13809 Rnpep 215615 Locus 3H1 1F Syntenic y y Identity 77 86 Anpep Arts1 Lta4h Trhde Psa Lnpep Rnpepl1 Aopep Aqpep Taf2 16790 80898 16993 237553 19155 266720 98480 BAC31943 74574 319944 7D2 13C1 10C2 10D1 11D 13C1 1D 13B3 18C 15D y y y y y y y y y y 76 85 92 94 97 88 95 72 68 99 M02.001 M02.006 M02.971np angiotensin-converting enzyme 1 angiotensin-converting enzyme 2 angiotensin-converting enzyme 3 ACE ACE2 #ACE3 1636 59272 17q23 Xp21 17q23 Ace Ace2 Ace3 11421 70008 217246 11E1 XF5 11E1 y y y 83 82 M03.001 M03.002 M03.006 thimet oligopeptidase neurolysin mitochondrial intermediate peptidase THOP1 NLN MIPEP 7064 57486 4285 19p13 5q13 13q12 Thop1 Nln Mipep 50492 75805 70478 10C1 13D1 14C3 y y y 89 90 84 M08.003 leishmanolysin-2 LMLN 89782 3q29 Lmln 239833 16B2 y 73 M10.034 M10.001 M10.003 M10.005 M10.008 collagenase-like B collagenase 1 gelatinase A stromelysin 1 matrilysin 11q22 16q22 11q22 11q22 Mcolb Mcola Mmp2 Mmp3 Mmp7 83996 83995 17390 17392 17393 9A1 9A1 8C5 9A1 9A1 y y y y 59 95 76 70 MMP1 MMP2 MMP3 MMP7 4312 4313 4314 4316 M10.002 M10.004 M10.006 M10.007 M10.009 M10.013 M10.014 M10.015 M10.016 M10.017 M10.021 M10.019 M10.026 M10.022 M10.022 M10.023 M10.024 M10.029 collagenase 2 gelatinase B stromelysin 2 stromelysin 3 macrophage elastase collagenase 3 MT1-MMP MT2-MMP MT3-MMP MT4-MMP MMP19 enamelysin MMP21 MMP23A MMP23B MT5-MMP MT6-MMP matrilysin-2 MMP8 MMP9 MMP10 MMP11 MMP12 MMP13 MMP14 MMP15 MMP16 MMP17 MMP19 MMP20 MMP21 MMP23A MMP23B MMP24 MMP25 MMP26 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 9313 118856 8511 8510 10893 64386 56547 11q22 20q13 11q22 22q11 11q22 11q22 14q11 16q22 8q22 12q24 12q13 11q22 10q26 (1p36) 1p36 20q11 16p12 11p15 Mmp8 Mmp9 Mmp10 Mmp11 Mmp12 Mmp13 Mmp14 Mmp15 Mmp16 Mmp17 Mmp19 Mmp20 Mmp21 Mmp23 17394 17395 17384 17385 17381 17386 17387 17388 17389 23948 58223 30800 214766 26561 9A1 2H3 9A1 10B5 9A1 9A1 14C1 8C5 4A3 5F 10D3 9A1 7F4 4E2 y y y y y y y y y y y y y y 72 72 76 81 61 86 96 87 98 87 78 89 80 83 Mmp24 Mmp25 17391 240047 2H2 17A3 y y 92 80 M10.027 M10.030 MMP27 epilysin MMP27 MMP28 64066 79148 11q22 17q12 Mmp27 Mmp28 234911 118453 9A1 11B5 y y 57 79 M12.002 M12.004 M12.005 M12.016 M12.018 M12.245 meprin α-subunit meprin β-subunit procollagen C-protease mammalian tolloid-like 1 protein mammalian tolloid-like 2 protein hatching-metalloprotease MEP1A MEP1B BMP1 TLL1 TLL2 HAMET 4224 4225 649 7092 7093 AJ537600 6p12 18q12 8p21 4q32 10q24 2q11 Mep1a Mep1b Bmp1 Tll1 Tll2 Hamet 17287 17288 12153 21892 24087 215095 17C 18A2 14D1 8B3 19D1 2F3 y y y y y y 76 77 92 93 91 67 M12.219 M12.201 M12.xxx DECYSIN ADAM1a ADAM1b ADAMDEC1 27299 #ADAM1 8759 8p21 12q24 Adamdec1 Adam1a Adam1b 58860 280668 280667 14D1 5F 5F y y 65 M12.950np M12.975np M12.952np M12.xxxnp M12.953np M12.xxxnp M12.xxxnp M12.956np M12.208 M12.209 M12.210 M12.976np M12.212 M12.215 M12.217 M12.957np M12.214 M12.218 M12.234 M12.978np M12.979np M12.227 M12.228 M12.229 M12.224 M12.981np M12.232 M12.960np M12.244 M12.xxx M12.xxx ADAM2/Fertilin-β ADAM3B ADAM4 ADAM4B ADAM5 ADAM6 ADAM6B ADAM7 ADAM8 ADAM9 ADAM10 ADAM11 ADAM12 ADAM15 ADAM17 ADAM18 ADAM19 ADAM20 ADAM21 ADAM22 ADAM23 testase 1 testase 2 testase 3 ADAM28 ADAM29 ADAM 30 ADAM 32 ADAM 33 testase 4 testase 5 ADAM2 #ADAM3B #ADAM4 #ADAM4B #ADAM5 #ADAM6 2515 ADAM7 ADAM8 ADAM9 ADAM10 ADAM11 ADAM12 ADAM15 ADAM17 ADAM18 ADAM19 ADAM20 ADAM21 ADAM22 ADAM23 8756 101 8754 102 4185 8038 8751 6868 8749 8728 8748 8747 53616 8745 8p21 10q26 8p11 15q21 17q21 10q26 1q21 2p25 8p11 5q33 14q24 14q24 7q21 2q33 #ADAM25 137491 8p22 ADAM28 ADAM29 ADAM30 ADAM32 ADAM33 10863 11086 11085 203102 80332 8p21 4q34 1p11 8p11 20p13 8757 8p11 16q12 14q24 14q24 8p11 14q24 Adam2 Adam3 Adam4 Adam4b Adam5 Adam6 Adam6b Adam7 Adam8 Adam9 Adam10 Adam11 Adam12 Adam15 Adam17 Adam18 Adam19 11495 11497 11498 AV274161 11499 238406 238405 11500 11501 11502 11487 11488 11489 11490 11491 13524 11492 14D1 8A3 12D3 12D3 8A3 12F2 12F2 14D1 7F5 8A3 9D 11D 7F4 3F1 (12) 8A3 11B3 n y y y y y 59 y y y y y y y y y y 66 65 86 96 91 81 80 91 62 82 Adam21 Adam22 Adam23 Adam24 Adam25 Adam26 Adam28 Adam29 Adam30 Adam32 Adam33 Adam34 Adam35 56622 11496 23792 13526 23793 13525 13522 244486 71078 209192 110751 252866 XM_146316 12D3 5A1 1C2 8B1 8B1 8B1 14D1 8B3 3F3 8A3 2F3 8B1 8B1 y y y 68 92 94 y y y y y y 70 58 63 60 71 M12.xxx M12.xxx M12.xxx M12.247 testase 6 testase 7 testase 8 testase 9 M12.222 M12.301 M12.220 M12.221 M12.225 M12.230 M12.231 M12.226 M12.021 M12.235 M12.237 M12.241 M12.024 M12.025 M12.026 M12.027 M12.028 M12.029 M12.246 ADAMTS1 ADAMTS2 ADAMTS3 ADAMTS4 ADAMTS5/11 ADAMTS6 ADAMTS7 ADAMTS8 ADAMTS9 ADAMTS10 ADAMTS12 ADAMTS13 ADAMTS14 ADAMTS15 ADAMTS16 ADAMTS17 ADAMTS18 ADAMTS19 ADAMTS20 ADAMTS1 ADAMTS2 ADAMTS3 ADAMTS4 ADAMTS5 ADAMTS6 ADAMTS7 ADAMTS8 ADAMTS9 ADAMTS10 ADAMTS12 ADAMTS13 ADAMTS14 ADAMTS15 ADAMTS16 ADAMTS17 ADAMTS18 ADAMTS19 ADAMTS20 9510 9509 9508 9507 11096 11174 11173 11095 56999 81794 81792 11093 140766 170689 170690 170691 170692 171019 80070 M13.001 M13.008 M13.002 M13.003 M13.007 M13.090 neprilysin neprilysin-2 endothelin-converting enzyme 1 endothelin-converting enzyme 2 DINE peptidase Kell blood-group protein MME MMEL2 ECE1 ECE2 ECEL1 KEL 4311 79258 1889 9718 9427 3792 Adam36 Adam37 Adam38 Adam39 BN000114 BN000115 BN000119 BN000121 8B1 8B1 8B1 8B1 21q21 5q35 4q21 1q23 21q21 5q12 15q24 11q24 3p14 19p13 5p13 9q34 10q22 11q24 5p15 15q26 16q23 5q23 12q12 Adamts1 Adamts2 Adamts3 Adamts4 Adamts5 Adamts6 Adamts7 Adamts8 Adamts9 Adamts10 Adamts12 Adamts13 Adamts14 Adamts15 Adamts16 Adamts17 Adamts18 Adamts19 Adamts20 11504 26550 BAC27597 11505 23794 238832 209798 30806 69070 224698 239227 279028 237360 235130 271127 244028 208937 240324 223838 16C3 11B1 5E2 1H2 16C3 13D1 9E3 9A5 6D3 17B2 15A2 2A3 10B4 9A5 13C1 7C 8E1 18D2 15F1 y y y y y y y y y y y y y y y y y y y 84 88 65 91 91 73 67 81 90 91 88 71 81 91 83 78 73 82 70 3q26 1p36 1p36 3q29 2q37 7q35 Mme Mell1 Ece1 Ece2 Ecel1 Kel 17380 27390 230857 107522 13599 23925 3E1 4E2 4D3 16B1 1C5 6B2 y y y y y y 94 79 93 87 94 74 M13.091 PHEX endopeptidase PHEX 5251 Xp22 Phex 18675 XF4 y 96 M14.001 M14.002 M14.010 M14.017 M14.020 M14.018 M14.003 M14.009 M14.021 carboxypeptidase A1 carboxypeptidase A2 carboxypeptidase A3 carboxypeptidase A4 carboxypeptidase A5 carboxypeptidase A6 carboxypeptidase B carboxypeptidase U carboxypeptidase O CPA1 CPA2 CPA3 CPA4 CPA5 CPA6 CPB1 CPB2 CPO 1357 1358 1359 51200 93979 57094 1360 1361 130749 7q32 7q32 3q24 7q32 7q32 8q13 3q25 13q14 2q33 Cpa1 Cpa2 Cpa3 Cpa4 Cpa5 Cpa6 Cpb1 Cpb2 #Cpo 109697 232680 12873 215225 76649 329093 76703 56373 269201 6A3 6A3 3A3 6A3 1A3 1A3 3A3 14D2 1C2 y y y y y y y y y 74 86 81 84 84 86 72 82 M14.005 M14.004 M14.006 M14.011 M14.012 M14.015np M14.019np M14.951np carboxypeptidase E carboxypeptidase N carboxypeptidase M carboxypeptidase D carboxypeptidase Z carboxypeptidase X1 carboxypeptidase X2 adipocyte-enhancer binding prot. 1 CPE CPN CPM CPD CPZ CPX1 CPX2 AEBP1 1363 1369 1368 1362 8532 56265 119587 165 4q32 10q25 12q15 17q11 4p16 20p13 10q26 7p13 Cpe Cpn Cpm Cpd Cpz Cpx1 Cpx2 Aebp1 12876 93721 70574 12874 242939 56264 55987 11568 8B3 19D1 10D2 11B4 5B1 2F3 7F4 11A1 y y y y y y y y 97 66 79 93 82 86 89 90 M16.002 M16.003 M16.005 M16.009 M16.971np M16.973np M16.974np M16.976np insulysin mitochondrial processing pept. β-sub nardilysin pitrilysin metalloprotease 1 mitochondrial processing protease UCR1 UCR2 mitoch. processing protease-like IDE PMPCB NRD1 PITRM1 INPP5E UQCRC1 UQCRC2 AMPP 3416 9512 4898 10531 23203 7384 7385 133083 10q24 7q22 1p32 10p15 9q34 3p21 16p12 4q22 Ide Pmpcb Nrd1 Pitrm1 Inpp5e Uqcrc1 Uqcrc2 15925 73078 230598 69617 66865 22273 67003 19C3 5A3 4C7 13A1 2A3 9F2 7F3 y y y y y y y 97 90 93 86 91 88 85 M17.001 leucyl aminopeptidase LAP3 51056 4p15 Lap3 66988 5B3 y 90 M17.006 aminopeptidase-like 1 NPEPL1 79716 20q13 M18.002 aspartyl aminopeptidase DNPEP 23549 2q36 Dnpep 13437 1C3 y 90 M19.001 M19.002 M19.004 membrane dipeptidase membrane dipeptidase 2 membrane dipeptidase 3 DPEP1 DPEP2 DPEP3 1800 64174 64180 16q24 16q22 16q22 Dpep1 Dpep2 Dpep3 13479 244632 71854 8E2 8D2 8D2 y y y 73 70 73 M20.005 M20.006 M20.971np M20.973np glu-carboxypeptidase-like 1 glu-carboxypeptidase-like 2 HmrA-like protease aminoacylase CPGL CPGL2 HMRALP ACY1 55748 84735 135293 95 18q22 18q22 6q15 3p21 Cpgl Cpgl2 Hmralp Acy1 66054 240478 242377 109652 18E3 18E3 4A5 9F1 y y y y 91 73 83 85 M22.003 M22.004 O-sialoglycoprotein endopeptidase O-sialoglycoprotein endopeptidase 2 OSGEP OSGEP2 55644 64172 14q11 2q32 Osgep Osgep2 66246 72085 14C1 1C1 y y 93 84 M24.001 M24.002 M24.028 M24.005 M24.007 M24.009 M24.026 M24.973np M24.974np methionyl aminopeptidase I methionyl aminopeptidase II methionyl aminopeptidase-like 1 X-prolyl aminopeptidase 2 X-Pro dipeptidase aminopeptidase P1 aminopeptidase P homologue proliferation-association protein 1 suppressor of Ty 16 homologue METAP1 METAP2 METAPL1 XPNPEP2 PEPD XPNPEPL PEPP PA2G4 SUPT16H 23173 10988 254042 7512 5184 7511 63929 5036 11198 4q24 12q23 2q31 Xq26 19q13 10q25 22q13 12q13 14q11 Metap1 Metap2 Metapl1 Xpnpep2 Pepd Xpnpep1 Pepp Pa2g4 Supt16h 75624 56307 66559 170745 18624 170750 321003 18813 114741 3H2 10C3 2C3 XA3 7B1 19D2 15E3 10D3 14C1 y y y y y y y y y 92 88 95 81 90 81 93 98 98 M28.010 M28.011 M28.012 M28.975np M28.014 glutamate carboxypeptidase II NAALADASE L peptidase NAALADASE II NAALADASE III plasma Glu-carboxypeptidase FOLH1 NAALADL NAALAD2 NAALAD3 PGCP 2346 10004 10003 254827 10404 11p11 11q13 11q14 3q26 8q22 Folh1 NAALADL Naalad2 Naalad3 Pgcp 53320 BN000129 72560 229149 54381 7E1 19A 9A3 3A3 15B3 y y y y y 85 80 89 63 93 M28.018 M28.972np M28.973np M28.974np M28.016 Ojeda peptidase transferrin receptor protein transferrin receptor 2 protein glutaminyl cyclase glutaminyl cyclase 2 OJP TFRC TFR2 QPCT QPCT2 79956 7037 7036 25797 54814 9p24 3q29 7q22 2p22 19q13 Ojp Trfr Trfr2 Qpct Qpct2 BAC38286 22042 50765 70536 67369 19C2 16B3 5G1 17E3 7A2 y y y y y 87 77 84 81 84 M38.972np M38.973np M38.xxxnp M38.xxxnp M38.xxxnp M38.xxxnp M38.xxxnp dihydroorotase dihydropyrimidinase dihydropyrimidinase-related prot. 1 dihydropyrimidinase-related prot. 2 dihydropyrimidinase-related prot. 3 dihydropyrimidinase-related prot. 4 dihydropyrimidinase-related prot. 5 CAD DPYS CRMP1 DPYSL2 DPYSL3 DPYSL4 DPYSL5 790 1807 1400 1808 1809 10570 56896 2p23 8q22 4p16 8p21 5q32 10q26 2p23 Cad Dpys Crmp1 Dpysl2 Dpysl3 Dpysl4 Dpysl5 69719 64705 12933 12934 22240 26757 65254 5B1 15C 5B2 14D1 18B3 7F5 5B1 y y y y y y y 94 88 96 98 98 93 98 M41.004 M41.006 M41.010 M41.007 i-AAA protease paraplegin Afg3-like protein 1 Afg3-like protein 2 YME1L1 SPG7 #AFG3L1 AFG3L2 10730 6687 172 10939 10p12 16q24 16q24 18p11 Yme1l1 Spg7 Afg3l1 Afg3l2 27377 234847 114896 69597 2A3 8E2 8E2 18E1 y y y y 95 89 M43.004 M43.005 pappalysin-1 pappalysin-2 PAPPA PLAC3 5069 60676 9q32 1q25 Pappa Plac3 18491 240848 4C1 1H1 y y 93 78 M47.001 procol. III N-endopeptidase PCOLN3 5119 16q24 #Pcoln3 BI690732 8E2 y M48.003 M48.017 FACE-1/ZMPSTE24 VVML FACE1 VVML 10269 115209 1p34 1p32 Face1 Vvml 230709 67013 4D1 4C6 y y 91 71 M49.001 dipeptidyl-peptidase III DPP3 10072 11q13 Dpp3 75221 19A y 92 M50.001 S2P protease MBTPS2 51360 Xp22 Mbtps2 270669 XF4 y 97 94 M67.001 M67.002 M67.xxxnp M67.xxx M67.003 M67.004 M67.xxx M67.005 M67.xxx M67.xxxnp M67.xxxnp M67.xxxnp M67.xxxnp M67.xxxnp M67.xxxnp Pad1-homologue JAB1 COPS6 AMSH AMSH 2 C6.1A C6.1A-like jammin-like protease 1 jammin-like protease 2 PSMD7 PRPF8 eukar. translation initiation F3S3 eukar. translation initiation F3S5 eukar. translation initiation F3S5B IFP38 POH1 COPS5 COPS6 AMSH AMSH2 C6.1A 10213 10987 10980 10617 57559 79184 2q24 8q13 7q22 2p13 10q23 Xq28 Poh1 Cops5 Cops6 Stambp Amsh2 C6.1a C6.1al Jamml1 Jamml2 Psmd7 Prpf8 Eif3s3 Eif3s5 59029 26754 26893 70527 76630 210766 BN000130 230448 68047 17463 192159 68135 66085 2C3 1A2 5G1 6D1 19C3 XA6 10D1 4C5 17D 8D2 11B4 15D1 7F2 y y y y y y 99 98 100 83 89 97 JAMML1 JAMML2 PSMD7 PRPF8 EIF3S3 EIF3S5 EIF3S5B IFP38 114803 84954 5713 10594 8667 8665 120963 83880 1p32 19p13 16q23 17p13 8q24 11p15 12p13 (2p11) y y y y y y 91 87 97 99 97 93 Mx1.xxx FACE-2/RCE1 FACE2 9986 11q13 Face2 19671 19A y 95 Mx2.xxxnp Mx2.xxxnp aspartoacylase-2 aspartoacylase-3 ASPA/ACY-2 443 ACY-3 91703 17p13 11q13 Aspa/Acy-2 Acy-3 11484 71670 11B4 19A y y 86 68 These belong to 26 distinct families. The M01 family contains 13 members in human and 12 in mouse, which lacks aminopeptidase MAMS. We propose the names aminopeptidases O and Q for the M01 proteases previously annotated as human hypothetical proteins FLJ14675 and BG623101. We have also identified orthologues for these genes located at mouse chromosomes 13B3 and 18C. In the M02 family, we have tentatively annotated a mouse gene for a third angiotensin-converting enzyme-like (Ace3), which is located at chromosome 11E1. We have classified Ace3 as a non-protease homologue because it contains the HQMGH sequence instead of the consensus Zn-binding HExxH motif. No expressed sequence tags (ESTs) have been found for mouse Ace3, which could be an inactive pseudogene, although the locus is apparently complete and conserved in the rat. The corresponding human gene is a pseudogene as a result of the accumulation of stop codons and frameshifts. There are some differences between human and mouse members of the M10 family of matrix metalloproteases (MMPs). Mouse McolB, a diverging counterpart of human MMP1 is absent in human, whereas human matrilysin-2 (MMP26) is absent from mouse, although there are some gaps in the mouse 17 genome region which could contain this missing gene. MMP23 has been recently duplicated in the human genome , generating two closely related genes MMP23A and MMP23B. This region is artefactually collapsed in the available public and private genome sequences owing to the high sequence identity between both genes, and is erroneously considered as containing a single gene. Apparently, there is a single mouse MMP23 gene, although the possibility that this region is duplicated in the mouse genome and has also been computer-collapsed can not be ruled out. In the family M12, we have annotated a 18 new member within the meprin/tolloid subfamily . The ADAM (a disintegrin and metalloprotease) subfamily of M12 metalloproteases shows important differences between both organisms. The genes for ADAM-1, -3, -4, -5, -6 and -25 are pseudogenes in the human but active genes in the mouse. ADAM-1 and -6 are duplicated in mouse, whereas ADAM-20 is duplicated in human (ADAM-20 and ADAM-21). Also, testases — a subgroup of ADAMs located at 8B1 — are mouse specific. We have annotated five further members of this family (testases 5–9), although they are intronless and their functional relevance remains to be shown. The group of ADAMTSs (ADAMs with thrombospondin domains) is completed with the inclusion of human and mouse ADAMTS-20. In the M14 family of carboxypeptidases, we have found 20 that mouse carboxypeptidase O has been specifically inactivated by mutation and is annotated as a pseudogene . Dihydroorotase and several dihydropyrimidinases have been included as non-protease homologues of bacterial isoaspartyl dipeptidases. The gene that encodes procollagen III Nendopeptidase is inactivated in mouse, thereby representing an interesting difference between both human and mouse degradomes, as there are no other functional members in the M47 family that could compensate this specific loss in mouse. We have annotated 14 human and 13 mouse proteins in the 21,22 recently described M67 family of metalloisopeptidases . All of them contain the JAMM motif, although some lack conserved residues that are predicted to be essential for proteolytic activity, and have therefore been classified as non-protease homologues. 19 There are doubts about the ascription of the FACE-2/RCE1 prenyl endopeptidase to the cysteine or metalloprotease classes of enzymes ; however, in 24 agreement with recent structural comparisons , we have included it as the only human and mouse representative of a new family of membrane-bound metalloproteases. Finally, we have included three aminoacylases in our catalogue of metalloproteases. These enzymes are not, strictly speaking, proteases 25 because they cleave peptide bonds that connect an acyl derivative with an amino acid . However, the structure of ACY1 clearly allows its inclusion in the M20 family of metalloproteases, whereas those of ACY2 and ACY3 have also been proposed to be part of a superfamily of metalloproteases that contains members of the M14 family of carboxypeptidases26. 23 Table S4 | Serine proteases Code S01.160 S01.161 S01.162 S01.251 S01.017 S01.236 S01.300 S01.244 S01.307 S01.246 S01.257 S01.020 S01.306 S01.029 S01.081 Peptidase kallikrein hK1 kallikrein hK2 kallikrein hK3 kallikrein hK4 kallikrein hK5 kallikrein hK6 kallikrein hK7 kallikrein hK8 kallikrein hK9 kallikrein hK10 kallikrein hK11 kallikrein hK12 kallikrein hK13 kallikrein hK14 kallikrein hK15 S01.164 S01.170 S01.066np S01.037 S01.067 S01.071 S01.041 S01.068 S01.163 S01.038 S01.039 S01.069 S01.070 S01.073 glandular kallikrein mK1 glandular kallikrein mK3 glandular kallikrein mK4 glandular kallikrein mK5 glandular kallikrein mK8 glandular kallikrein mK9 glandular kallikrein mK11 glandular kallikrein mK14 glandular kallikrein mK16 glandular kallikrein mK21 glandular kallikrein mK22 glandular kallikrein mK24 glandular kallikrein mK26 glandular kallikrein mK27 Human gene KLK1 KLK2 KLK3 KLK4 KLK5 KLK6 KLK7 KLK8 KLK9 KLK10 KLK11 KLK12 KLK13 KLK14 KLK15 LocusLink 3816 3817 354 9622 25818 5653 5650 11202 23579 5655 11012 43849 26085 43847 55554 Locus 19q13 19q13 19q13 19q13 19q13 19q13 19q13 19q13 19q13 19q13 19q13 19q13 19q13 19q13 19q13 Human gene LocusLink mGk6 16612 #mGk25 Locus 7B2 7B2 Syntenic y y Identity 65 mKlk4 mKlk5 mKlk6 mKlk7 mKlk8 mKlk9 mKlk10 mKlk11 mKlk12 mKlk13 mKlk14 mKlk15 56640 68668 19144 23993 259277 73832 69540 56538 69511 13647 233190 XM_145570 7B2 7B2 7B2 7B2 7B2 7B2 7B2 7B2 7B2 7B2 7B2 7B2 y y y y y y y y y y y y 69 70 68 75 72 76 68 80 71 79 73 75 mGk1 mGk3 mGk4 mGk5 mGk8 mGk9 mGk11 mGk14 mGk16 mGk21 mGk22 mGk24 mGk26 mGk27 16623 18050 18048 16622 16624 13648 16613 16614 16615 16616 13646 16617 16618 16619 7B2 7B2 7B2 7B2 7B2 7B2 7B2 7B2 7B2 7B2 (7B2) 7B2 7B2 7B2 S01.107 glandular kallikrein mKx mGkx 76999 7B2 S01.217 S01.215 S01.214 S01.216 S01.213 S01.211 S01.218 S01.979np S01.212 S01.228 S01.033 S01.998np thrombin coagulation factor VIIa coagulation factor IXa coagulation factor Xa coagulation factor XIa coagulation factor XIIa protein C protein Z plasma kallikrein hepatocyte growth factor activator hyaluronan-binding ser-protease protein C-like F2 F7 F9 F10 F11 F12 PROC PROZ KLKB1 HGFAC HABP2 PROCL 2147 2155 2158 2159 2160 2161 5624 8858 3818 3083 3026 25891 11p11 13q34 Xq27 13q34 4q35 5q35 2q21 13q34 4q35 4p16 10q25 11p12 F2 F7 F9 F10 F11 F12 Proc Proz Klkb1 Hgfac Habp2 Procl 14061 14068 14071 14058 109821 58992 19123 66901 16621 54426 226243 210622 2E1 8A2 XA5 8A2 8B2 13B2 18B3 8A2 8B2 5B1 19D2 2E3 y y y y y y y y y y y y S01.303 S01.242 S01.242 S01.028 S01.074 S01.075 S01.076 S01.011 S01.252 S01.314 S01.315 S01.098 S01.295 S01.054 S01.143 mastin tryptase β-1 tryptase β-2 tryptase γ-1 marapsin tryptase homologue 2 tryptase homologue 3 testisin brain serine protease 2 implantation serine protease 1 implantation serine protease 2 intestinal serine protease 1 intestinal serine protease 2 tryptase δ-1 tryptase α #MASTIN TPSB1 TPSB2 TPSG1 MPN EOS TESSP1 PRSS21 PRSS22 257157 7177 64499 25823 83886 260429 BN000124 10942 64063 16p13 16p13 16p13 16p13 16p13 16p13 16p13 16p13 16p13 #ISP2 #DISP 123787 124221 16p13 16p12 Mastin Mcpt7 Mcpt6 Tpsg1 Mpn Eos Tessp1 Prss21 Prss22 Isp1 Isp2 Disp Disp2 207224 17230 17229 26945 213171 BE646687 71003 57256 70835 114661 114662 30943 69814 17A3 17A3 17A3 17A3 17A3 17A3 17A3 17A3 17A3 17A3 17A3 17A3 17A3 y y y y y y y y y y y y TPSD1 TPS1 23430 7176 16p13 (16p13) S01.159 prostasin PRSS8 5652 16p11 Prss8 76560 7F4 y 82 70 82 76 78 72 68 67 76 81 80 90 75 77 73 80 81 62 67 75 77 S01.414 S01.xxx S01.xxx S01.318 prostasin-like 1 prostasin-like 2 epidermis-specific SP-like marapsin 2 PSTL1 PSTL2 ESSPL MPN2 146547 79001 BN000134 BN000131 16p11 16p11 4q31 1q42 PSTL1 PSTL2 Esspl Mpn2 77613 27973 BN000135 216797 7F4 7F4 3F1 11B2 y y y y 62 84 68 57 S01.993np S01.317 S01.106 S01.xxx S01.968np S01.xxx testis-specific protein tsp50 testis serine protease 2 testis serine protease 3 testis serine protease 4 testis serine protease 5 testis serine protease 6 TSP50 TESSP2 #TESSP3 #TESSP4 TESSP5 #TESSP6 29122 AJ544583 3p21 3p21 3p21 3p21 3p21 3p21 Tsp50 Tessp2 Tessp3 Tessp4 Tessp5 Tessp6 235631 235628 73336 272643 260408 74306 9F2 9F2 9F2 9F2 9F2 9F2 y y y y y y 61 64 62 S01.985np S01.045 S01.088 TESP1 TESP2 TESP3 #TESP2 #TESP3 2q21 9q22 Tesp1 Tesp2 Tesp3 21755 21756 218304 1B 1B 13B3 y y S01.140 S01.010 S01.133 S01.147 S01.141 S01.003 S01.149 S01.254 S01.304 S01.004 S01.xxx S01.398 S01.399 S01.401 S01.402 S01.xxx S01.xxx chymase granzyme B cathepsin G granzyme H mast cell protease 1 mast cell protease 2 mast cell protease 4 mast cell protease 8 mast cell protease 9 mast cell protease 10 mast cell protease L granzyme D granzyme E granzyme F granzyme G granzyme N granzyme O Mcpt5 Gzmb Ctsg Gzmc Mcpt1 Mcpt2 Mcpt4 Mcpt8 Mcpt9 Mcpt10 Mcptl Gzmd Gzme Gzmf Gzmg Gzmn Gzmo 17228 14939 13035 14940 17224 17225 17227 17231 17232 AF361939 17233 14941 14942 14943 14944 245839 239106 14C2 14C1 14C1 14C1 14C2 14C1 14C2 14C1 14C1 14C1 14C1 14C1 14C1 14C1 14C1 14C1 14C1 CMA1 GZMB CTSG GZMH BN000137 1215 3002 1511 113155 14q11 14q11 14q11 14q11 y y y y 68 66 74 67 69 60 S01.135 S01.146 granzyme A granzyme K GZMA GZMK 3001 3003 5q11 5q11 Gzma Gzmk 14938 14945 13D2 13D2 y y 68 70 S01.139 S01.134 S01.131 S01.971np granzyme M protease 3 neutrophil elastase azurocidin GZMM PRTN3 ELA2 AZU1 3004 5657 1991 566 19p13 19p13 19p13 19p13 Gzmm Prtn3 Ela2 16904 19152 50701 10C1 10C1 10C1 y y y 70 63 72 S01.156 S01.xxx S01.224 S01.291 S01.301 S01.292 S01.xxx S01.294 S01.321 S01.xxx S01.021 S01.019 S01.302 S01.247 S01.079 S01.034 S01.313 S01.308 S01.xxx S01.298 S01.087 enteropeptidase enteropeptidase-like hepsin HAT-related protease airway-trypsin-like protease HAT-like 1 HAT-like 2 HAT-like 3 HAT-like 4 HAT-like 5 DESC1 protease corin matriptase epitheliasin transmembrane Ser-protease 3 transmembrane Ser-protease 4 spinesin matriptase-2 matriptase-3 polyserase membrane-type mosaic Ser-prot. PRSS7 PRSS7L HPN HATRP HAT HATL1 #HATL2 #HATL3 #HATL4 HATL5 DESC1 PRSC MTSP1 TMPRSS2 TMPRSS3 TMPRSS4 TMPRSS5 TMPRSS6 TMPRSS7 TMPRSS8 MSPL 5651 BQ638967 3249 283471 9407 BN000133 Prss7 Prss7l Hpn Hatrp Hat Hatl1 Hatl2 Hatl3 Hatl4 Hatl5 Desc1 Lpr4 Mtsp1 Tmprss2 Tmprss3 Tmprss4 Tmprss5 Tmprss6 Tmprss7 Tmprss8 Mspl 19146 332474 15451 75002 231382 194597 320454 231381 243083 BAC29606 243084 53419 19143 50528 140765 214523 80893 71753 208171 270749 AAH42878 16C3 1C5 7B1 15F1 5E1 5E1 5E1 5E1 5E1 5E1 5E1 5D 9A5 16C4 17B2 9B 9B 15E1 16B5 10C1 9B y y y 75 89 88 59 66 76 132722 132724 28983 10699 6768 7113 64699 56649 80975 164656 BN000125 AJ488946 84000 21q21 2q37 19q13 12q13 4q13 4q13 4q13 4q13 4q13 4q13 4q13 4p12 11q24 21q22 21q22 11q23 11q22 22q12 3q13 19p13 11q23 S01.320 S01.322 oviductin-like ovochymase-like OVTN OVCH BN000130 BN000128 11p15 12p11 Ovtn BN000123 7F2 y y y y y y y y y y y y y y y y y y 52 75 82 80 77 82 76 78 84 91 80 90 71 S01.152 S01.256 S01.157 chymotrypsin B chymopasin chymotrypsin C CTRB1 CTRL CTRC 1504 1506 11330 16q23 16q22 1p36 Ctrb Ctrl Ctrc 66473 109660 76701 8D3 8D2 4E1 y y y 85 86 77 S01.127 S01.060 S01.059 S01.258 S01.063 S01.062 S01.xxxnp S01.989np S01.174 S01.151 S01.058 S01.061 S01.063 S01.984np S01.xxx S01.129 S01.092 S01.105 cationic trypsin trypsin 3 trypsin 10 anionic trypsin (II) trypsin C trypsin 15 trypsin X1 trypsin X2 mesotrypsin trypsin 1 trypsin 9 trypsin 12 trypsin 16 trypsin X3 trypsin X4 trypsin 4 trypsin V trypsin X5 PRSS1 #TRY3 #TRY10 PRSS2 #TRY6 #TRY15 #TRYX1 TRYX2 PRSS3 5644 7q34 7q34 7q34 7q34 7q34 7q34 7q34 7q34 9p13 Try4 Try3 Try10 Try2 Try10l Try15 Tryx1 Tryx2 22074 22073 AAB69058 22072 BN000136 AAB69087 272341 67690 6B2 6B2 6B2 6B2 6B2 6B2 6B2 6B2 y y y y y y y y 77 Try1 Try9 Try12 Try16 Tryx3 Tryx4 Try4bis Tryv Tryx5 67373 BAB25300 AAB69086 114228 194359 194360 73626 232718 73481 6B2 6B2 (6B2) 6B2 6B2 6B2 6B2 6B2 6B2 S01.153 S01.155 S01.154 S01.205 S01.206 pancreatic elastase pancreatic elastase II (IIA) pancreatic endopeptidase E (A) pancreatic endopeptidase E (B) pancreatic elastase II form B #ELA1 ELA2A ELA3A ELA3B ELA2B 1990 63036 10136 23436 51032 12q13 1p36 1p36 1p36 1p36 Ela1 Ela2a Ela3a Ela3b 109901 13706 242711 67868 15F3 4E1 4D3 4D3 y y y y 75 76 84 S01.194 S01.196 S01.995np complement component 2 complement factor B complement C1r-homologue C2 BF C1RL 717 629 51279 6p21 6p21 12p13 C2 Bf C1rl 12263 14962 232371 17B2 17B2 6F2 y y y 76 84 73 5645 154754 136242 5646 77 78 S01.192 S01.xxx S01.193 S01.xxx S01.191 S01.xxx S01.199 S01.198 S01.229 S01.237 complement component C1ra complement component C1rb complement component C1sa complement component C1sb complement factor D complement factor D-like complement factor I MASP1/3 MASP2 neurotrypsin C1R 715 12p13 C1ra C1rb C1sa C1sb Df Df2 If Masp1/3 Masp2 Prss12 50909 AF459018 50908 317677 11537 270746 12630 17174 17175 19142 6F2 (6F2) 6F2 6F2 10C1 10C1 3H1 16B1 4E1 3G3 y 81 C1S 716 12p13 y 74 DF DF2 IF MASP1/3 MASP2 PRSS12 1675 199783 3426 5648 10747 8492 19p13 19p13 4q25 3q29 1p36 4q28 y y y y y y 67 79 69 86 81 82 S01.231 S01.232 S01.233 S01.976np S01.975np S01.999np u-plasminogen activator t-plasminogen activator plasminogen hepatocyte growth factor macrophage-stimulating protein apolipoprotein PLAU PLAT PLG HGF MSP LPA 5328 5327 5340 3082 4485 4018 10q22 8p11 6q26 7q21 3p21 6q26 Plau Plat Plg Hgf Msp 18792 18791 18815 15234 15235 14B 8A3 17A2 5A3 9F2 y y y y y 69 80 79 91 80 S01.223 S01.972np S01.974np acrosin haptoglobin-1 haptoglobin-related protein ACR HP HPR 49 3240 3250 22q13 16q22 16q22 Acr Hp 11434 15439 15F1 8D3 y y 68 79 S01.277 S01.278 S01.284 S01.285 osteoblast serine protease HTRA2 HTRA3 HTRA4 HTRA1 HTRA2 HTRA3 HTRA4 5654 27429 94031 203100 10q26 2p12 4p16 8p11 Htra1 Htra2 Htra3 Htra4 56213 64704 78558 66943 7F4 6D1 5B1 8A3 y y y y 91 84 86 66 S01.309 S01.994np umbilical vein protease similar to SPUVE SPUVE SPUVE2 11098 167681 11q14 6q14 Spuve Spuve2 76453 244954 7E1 9E3 y y 90 77 S01.104 S01.415 S01.419 plasma-kallikrein-like 1 plasma-kallikrein-like 2 plasma-kallikrein-like 3 KLKBL1 KLKBL2 #KLKBL3 XP_116753 203074 8p23 8p23 8p23 Klkbl1 Klkbl2 Klkbl3 74215 71037 73382 (14C3) 14C3 14C3 y y 66 71 S01.992np plasma-kallikrein-like 4 KLKBL4 221191 16q21 Klkbl4 BN000132 8C5 y 62 S01.286 S01.991np similar to Arabidopsis Ser-prot. chymase-like serine protease SASP 219743 10q22 Sasp Clsp 71767 75106 10B4 XC3 y 80 S08.063 S08.039 S08.090 site-1 protease proprotein convertase 9 tripeptidyl-peptidase II MBTPS1 PCSK9 TPP2 8720 255738 7174 16q23 1p32 13q33 Mbtps1 Pcsk9 Tpp2 56453 100102 22019 8E1 4C7 1C1 y y y 96 73 95 S08.072 S08.073 S08.071 S08.074 S08.076 S08.075 S08.077 proprotein convertase 1 proprotein convertase 2 furin proprotein convertase 4 proprotein convertase 5 PACE4 proprotein convertase proprotein convertase 7 PCSK1 PCSK2 PCSK3 PCSK4 PCSK5 PCSK6 PCSK7 5122 5126 5045 5124 5125 5046 9159 5q15 20p12 15q26 19p13 9q21 15q26 11q23 Pcsk1 Pcsk2 Pcsk3 Pcsk4 Pcsk5 Pcsk6 Pcsk7 18548 18549 18550 18551 18552 18553 18554 13C1 2H1 7D2 10C1 19B 7C 9B y y y y y y y 93 97 94 82 92 93 88 S09.001 S09.015 prolyl oligopeptidase prolyl-oligopeptidase 2 PREP PREP2 5550 9581 6q22 2p21 Prep Prep2 19072 213760 10B2 17E4 y y 96 94 S09.003 S09.973np S09.018 S09.019 S09.974np S09.007 dipeptidyl-peptidase 4 dipeptidyl-peptidase 6 dipeptidyl-peptidase 8 dipeptidyl-peptidase 9 dipeptidyl-peptidase 10 Seprase DPP4 DPP6 DPP8 DPP9 DPP10 FAP 1803 1804 54878 91039 57628 2191 2q24 7q36 15q23 19p13 2q14 2q24 CD26 Dpp6 Dpp8 Dpp9 Dpp10 Fap 13482 13483 74388 224897 269109 14089 2C3 5A3 9D 17D 1E2 2C3 y y y y y y 85 91 95 89 88 90 S09.004 acylaminoacyl-peptidase APEH 327 3p21 Apeh 235606 9F2 y 91 S09.055 S09.052 S09.053 S09.051 CGI-67 protein CGI-67-like protease-1 CGI-67-like protease-2 BEM46-like 1 CGI-67 CGI-67L1 CGI-67L2 BEM46L1 51104 81926 58489 84945 9q21 19p13 15q25 13q33 Cgi-67 Cgi-67l1 Cgi-67l2 Bem46l1 BN000127 216169 70178 68904 19C1 10C1 7D3 8A2 y y y y 98 93 97 97 S09.054 S09.xxx BEM46-like 2 BEM46-like 3 BEM46L2 BEM46L3 26090 BG74273 20p11 14q22 Bem46l2 Bem46l3 76192 278594 2H1 12C3 y y 90 78 S10.002 S10.003 S10.013 lysosomal carboxypeptidase A vitellogenic carboxypeptidase-L serine carboxypeptidase 1 PPGB CPVL RISC 5476 54504 59342 20q13 7p15 17q23 Ppgb Cpvl Risc 19025 71287 74617 2H3 6B3 11C y y y 87 76 82 S12.004 β-lactamase LACTB 114294 15q22 Lactb 80907 9D y 85 S14.003 endopeptidase Clp CLPP 8192 19p13 Clpp 53895 17E1 y 87 S16.002 S16.006 PIM1 endopeptidase PIM2 endopeptidase PRSS15 PIM2 9361 83752 19p13 16q21 Prss15 Pim2 74142 66887 17E1 8C4 y y 88 95 S26.009 S26.010 S26.xxx S26.012 S26.013 signalase 18 kDa component signalase 21 kDa component signalase-like 1 mitoc. inner membrane protease 2 mitochondrial signal peptidase SPC18 SPC21 SPCL1 IMMP2L IMMP1 23478 90701 158326 83943 196294 15q25 18q21 9p22 7q31 11p13 Spc18 Spc21 Spcl1 Immp2l Immp1 56529 66286 230344 93757 66541 7D2 18E1 4C3 12B3 2E3 y y y y y 98 98 76 90 95 S26.xxx lactotransferrin LTF 4057 3p21 Ltf 17002 9F2 y 70 S28.001 S28.002 S28.003 lysosomal Pro-X carboxypeptidase dipeptidyl-peptidase II thymus-specific serine peptidase PRCP DPP7 PRSS16 5547 29952 10279 11q14 (9q24) 6p21 Prcp Dpp7 Prss16 72461 83768 54373 7E2 2A3 13A3 y y y 77 80 79 S33.009 S33.971np S33.972np S33.974np S33.xxxnp αβ-hydrolase dom. containing 4 epoxyde hydrolase Mesoderm specific transcript hom. epoxyde hydrolase related protein CGI-58 ABHD4 EPHX1 MEST EPHXRP CGI-58 63874 2052 4232 253152 51099 14q11 1q42 7q32 1p22 3p21 Abhd4 Ephx1 Mest Ephxrp Cgi-58 105501 13849 17294 243192 67469 14C1 1H4 6A3 5E 9F4 y y y y y 96 83 97 87 94 S53.003 tripeptidyl-peptidase I CLN2 1200 11p15 Cln2 12751 7F1 y 88 S54.005 S54.002 S54.006 S54.xxx S54.953np S54.xxxnp S54.xxx S54.952np rhomboid-like protein 1 rhomboid-like protein 2 rhomboid-like protein 4 rhomboid-like protein 5 rhomboid-like protein 6 rhomboid-like protein 7 Presenilins associated rhomboid like EGF Receptor Related Sequence RHBDL RHBDL2 RHBDL4 RHBDL5 RHBDL6 RHBDL7 PARL EGFR-RS 9028 54933 162494 84236 79651 AC005067 55486 64285 16p13 1p34 17q11 2q36 17q25 7q11 3q27 16p13 Rhbdl Rhbdl2 Rhbdl4 Rhbdl5 Rhbdl6 Rhdbl7 Parl Egfr-rs 214951 230727 246104 76867 276799 215160 208159 13650 17B1 4D1 11B5 1C5 11E2 5G1 16B1 11A5 y y y y y y y y 97 89 95 80 93 88 80 95 Sx1.xxx Reelin RELN 5649 7q22 Reln 19699 5A3 y 95 Sx2.xxx tumor rejection antigen (gp96) TRA1 7184 12q23 Tra1 22027 10C2 y 97 Sx2.xxxnp HSPCA 3320 14q32 Hspca 15519 12F2 y 99 heat shock 90kDa protein 1, α Sx2.xxxnp HSPCB 3326 6p21 Hsp84-1 15516 17C y 98 heat shock 90kDa protein 1, β Sx2.xxxnp heat shock protein 75 TRAP1 10131 16p13 Trap1 68015 16A1 y 88 Most of these belong to the S01 family, but there are representatives of 13 further serine protease families in the human and mouse degradomes. All differences between human and mouse serine proteases correspond to changes in members of this densely populated family. The kallikreins are duplicated in mouse almost entirely — there are 28 members in mouse and 15 in human. The genes for mastin, implantation serine protease-2 (ISP-2), intestinal serine protease (DISP-1), and testis serine proteases TESP-2 and -3, are inactivated in human hence their classification as pseudogenes. The absence of genes for human DISP-2, ISP-1 and TESP-1, together with the finding that human DISP-1, ISP-2, TESP-2 and TESP-3 are pseudogenes, indicates that the functions performed by ISP, DISP and TESP proteases might be mouse-specific. We have also annotated several new members of the testis-specific serine protease (TESSP) subfamily, with TESSP-3, -4 and -6 being pseudogenes in human and active genes in mouse. Mast-cell proteases (Mcpt), granzymes (Gzm), trypsins and human-airway trypsin-like (HAT-like) proteases are expanded in mouse; two tryptases, an ovochymase-like protease and a form of pancreatic elastase, are only present in human. Two well-known non-protease homologues, apolipoprotein (a) (LPA) and haptoglobin-related protein, are absent in mouse. Further characteristic features of the mouse degradome include the duplication of complement factors C1r and C1s, and the presence of an extra functional member of the plasma-kallikrein like subfamily (Klkbl3), and of a non-protease homologue called Clsp (chymase-like serine protease). We have included in the catalogue of serine proteases, a series of proteins such as lactoferrin, reelin and tumour rejection antigen (gp96), which have been 27–29 recently reported to have this kind of proteolytic activity . On the basis of structural analysis, lactoferrin has been tentatively classified as a member of the S26 family of serine proteases, whereas reelin, gp96 and their close relatives have been preliminarily ascribed to two Sx families of presently unclassified serine proteases. Gene Ontology annotation of the human proteome also predicts a series of serine proteases with minimal relationship to other members of this class of enzymes. They include torsin, NSP (novel serine protease) and Ufd1L (ubiquitin fusion degradation protein 1 homologue), but owing to the absence of enough evidence to support its ascription as serine proteases, they have not been included in the present version of the human and mouse degradomes. Table S5 | Threonine proteases Code T01.010 T01.011 T01.012 T01.013 T01.014 T01.015 T01.016 Peptidase proteasome catalytic subunit 1 proteasome catalytic subunit 2 proteasome catalytic subunit 3 proteasome catalytic subunit 1i proteasome catalytic subunit 2i proteasome catalytic subunit 3i proteasome β-subunit LMP7-like Human Gene PSMB6 PSMB7 PSMB5 PSMB9 PSMB10 PSMB8 LMP7L LocusLink 5694 5695 5693 5698 5699 5696 122706 Locus 17p13 9q33 14q11 6p21 16q23 6p21 14q11 Mouse Gene Psmb6 Psmb7 Psmb5 Psmb9 Psmb10 Psmb8 Lmp7l LocusLink 19175 19177 19173 16912 19171 16913 73902 Locus 11B4 2B 14C1 17B2 8D2 17B2 14C1 Syntenic y y y y y y y Identity 97 96 93 88 88 79 84 T01.986np T01.984np T01.983np T01.987np proteasome β-1 subunit proteasome β-2 subunit proteasome β-3 subunit proteasome β-4 subunit PSMB1 PSMB2 PSMB3 PSMB4 5689 5690 5691 5692 6q27 1p34 17q12 1q21 Psmb1 Psmb2 Psmb3 Psmb4 19170 26445 26446 19172 17A2 4D2 11D 3F2 y y y y 93 96 98 93 T01.976np T01.972np T01.977np T01.973np T01.975np T01.971np T01.974np T01.978np proteasome α-1 subunit proteasome α-2 subunit proteasome α-3 subunit proteasome α-4 subunit proteasome α-5 subunit proteasome α-6 subunit proteasome α-7 subunit proteasome α-8 subunit PSMA1 PSMA2 PSMA3 PSMA4 PSMA5 PSMA6 PSMA7 PSMA8 5682 5683 5684 5685 5686 5687 5688 143471 11p15 7p14 14q23 15q24 1p13 14q13 20q13 18q11 Psma1 Psma2 Psma3 Psma4 Psma5 Psma6 Psma7 Psma8 26440 19166 19167 26441 26442 26443 26444 73677 7F2 13A2 12C3 9C 3F3 12C1 2H4 18A2 y y y y y y y y 98 99 99 99 99 99 99 95 T02.001 T02.003 T02.004 glycosylasparaginase glycosylasparaginase-2 glycosylasparaginase-3 AGA ASRGL1 AGA3 175 80150 55617 4q34 11q12 20p12 Aga Asrgl1 Aga3 11593 66514 75812 8B3 19A 2G2 y y y 82 77 94 T03.006 T03.017 T03.015 γ-glutamyltransferase 1 γ-glutamyltransferase-like 3 γ-glutamyltransferase 2 GGT1 GGTL3 GGT2 2678 2686 2679 22q11 20q11 22q11 Ggtp Ggtl3 14598 207182 10B5 2H2 y y 79 95 T03.016 GGTL4 91227 22q11 γ-glutamyltransferase m-3 T03.002 GGTLA1 220522 22q11 γ-glutamyltransferase 5 30 The most recently identified catalytic class of proteases, the threonine proteases , are classified into three families: T01, containing the proteasome components; T02, composed of three distinct glycosylasparaginases; and T03, including diverse γ-glutamyltransferases (GGTs). All members of the T01 and T02 families are conserved between human and mouse. There are, however, some differences in the number of GGT genes clustered in a region of 31 chromosome 22, which has undergone successive duplications . As a consequence of this dynamic evolution, there are four GGT genes in this region of the human genome but only one in the corresponding region of the mouse genome (10B5). An additional GGT gene located at 20q11 is conserved in the mouse genome at an equivalent position (2H2). 1. 2. Evers, M. P. et al. Nucleotide sequence comparison of five human pepsinogen A (PGA) genes: evolution of the PGA multigene family. Genomics 4, 232–239 (1989). Taggart, R. T., Mohandas, T. K., Shows, T. B. & Bell, G. I. Variable numbers of pepsinogen genes are located in the centromeric region of human chromosome 11 and determine the high-frequency electrophoretic polymorphism. Proc. Natl Acad. Sci. USA 82, 6240–6244 (1985). 3. Chen, X., Rosenfeld, C. S., Roberts, R. M. & Green, J. A. An aspartic proteinase expressed in the yolk sac and neonatal stomach of the mouse. Biol. Reprod. 65, 1092–1101 (2001). 4. Ord, T., Kolmer, M., Villems, R. & Saarma, M. Structure of the human genomic region homologous to the bovine prochymosin-encoding gene. Gene 91, 241–246 (1990). 5. Krylov, D. M. & Koonin, E. V. A novel family of predicted retroviral–like aspartyl proteases with a possible key role in eukaryotic cell cycle control. Curr. Biol. 11, 584 (2001). 6. Turner, G. et al. Insertional polymorphisms of full-length endogenous retroviruses in humans. Curr. Biol. 11, 1531–1535 (2001). 7. Caputo, E., Manco, G., Mandrich, L. & Guardiola, J. A novel aspartyl proteinase from apocrine epithelia and breast tumors. J. Biol. Chem. 275, 7935–7941 (2000). 8. Lee, J. J. et al. Autoproteolysis in hedgehog protein biogenesis. Science 266, 1528–1537 (1994). 9. Gondo, Y. et al. Human megasatellite DNA RS447: copy-number polymorphisms and interspecies conservation. Genomics 54, 39–49 (1998). 10. Okada, T. et al. Unstable transmission of the RS447 human megasatellite tandem repetitive sequence that contains the USP17 deubiquitinating enzyme gene. Hum. Genet. 110, 302–313 (2002). 11. Zhu, Y., Carroll, M., Papa, F. R., Hochstrasser, M. & D‘Andrea, A. D. DUB-1, a deubiquitinating enzyme with growth-suppressing activity. Proc. Natl Acad. Sci. USA 93, 3275–3279 (1996). 12. Zhu, Y. et al. DUB-2 is a member of a novel family of cytokine-inducible deubiquitinating enzymes. J. Biol. Chem. 272, 51–57 (1997). 13. Baek, K. H., Mondoux, M. A., Jaster, R., Fire-Levin, E. & D‘Andrea, A. D. DUB-2A, a new member of the DUB subfamily of hematopoietic deubiquitinating enzymes. Blood 98, 636–642 (2001). 14. Evans, P. C. et al. A novel type of deubiquitinating enzyme. J. Biol. Chem. (in the press). 15. Balakirev, M. Y., Tcherniuk, S. O., Jaquinod, M. & Chroboczek, J. Otubains: a new family of cysteine proteases in the ubiquitin pathway. EMBO Rep. 4, 517–522 (2003). 16. Aravind, L. & Koonin, E. V. Classification of the caspase-hemoglobinase fold: detection of new families and implications for the origin of the eukaryotic separins. Proteins 46, 355–367 (2002). 17. Gururajan, R. et al. Duplication of a genomic region containing the Cdc2L1-2 and MMP21-22 genes on human chromosome 1p36.3 and their linkage to D1Z2. Genome Res. 8, 929–939 (1998). 18. Bertenshaw, G. P., Norcum, M. T. & Bond, J. S. Structure of homo- and hetero-oligomeric meprin metalloproteases: dimers, tetramers, and high molecular mass multimers. J. Biol. Chem. 278, 2522–2532 (2003). 19. Seals, D. F. & Courtneidge, S. A. The ADAMs family of metalloproteases: multidomain proteins with multiple functions. Genes Dev. 17, 7–30 (2003). 20. Wei, S. et al. Identification and characterization of three members of the human metallocarboxypeptidase gene family. J. Biol. Chem. 277, 14954–14964 (2002). 21. Verma, R. et al. Role of Rpn11 metalloprotease in deubiquitination and degradation by the 26S proteasome. Science 298, 611–615 (2002). 22. Yao, T. & Cohen, R. E. A cryptic protease couples deubiquitination and degradation by the proteasome. Nature 419, 403–407 (2002). 23. Cadiñanos, J. et al. Identification, functional expression and enzymatic analysis of two distinct CaaX proteases from Caenorhabditis elegans. Biochem. J. 370, 1047–1054 (2003). 24. Pei, J. & Grishin, N. V. Type II CAAX prenyl endopeptidases belong to a novel superfamily of putative membrane-bound metalloproteases. Trends Biochem. Sci. 26, 275–277 (2001). 25. Biagini, A. & Puigserver, A. Sequence analysis of the aminoacylase-1 family: a new proposed signature for metalloexopeptidases. Comp. Biochem. Physiol. B 128, 469–481 (2001). 26. Makarova, K. S. & Grishin, N. V. The Zn-peptidase superfamily: functional convergence after evolutionary divergence. J. Mol. Biol. 292, 11–17 (1999). 27. Hendrixson, D. R. et al. Human milk lactoferrin is a serine protease that cleaves Haemophilus surface proteins at arginine-rich sites. Mol. Microbiol. 47, 607–617 (2003). 28. Quattrocchi, C. C. et al. Reelin is a serine protease of the extracellular matrix. J. Biol. Chem. 277, 303–309 (2002). 29. Menoret, A., Li, Z., Niswonger, M. L., Altmeyer, A. & Srivastava, P. K. An endoplasmic reticulum protein implicated in chaperoning peptides to major histocompatibility of class I is an aminopeptidase. J. Biol. Chem. 276, 33313–33318 (2001). 30. Seemuller, E. et al. Proteasome from Thermoplasma acidophilum: a threonine protease. Science 268, 579–582 (1995). 31. Courtay, C., Heisterkamp, N., Siest, G. & Groffen, J. Expression of multiple γ-glutamyltransferase genes in man. Biochem. J. 297, 503– 508 (1994).
© Copyright 2026 Paperzz