The human gene for mannan-binding lectin-associated

Genes and Immunity (2001) 2, 119–127
 2001 Nature Publishing Group All rights reserved 1466-4879/01 $15.00
www.nature.com/gene
The human gene for mannan-binding lectin-associated
serine protease-2 (MASP-2), the effector component of
the lectin route of complement activation, is part of a
tightly linked gene cluster on chromosome 1p36.2–3
C Stover1, Y Endo2, M Takahashi2, NJ Lynch1, C Constantinescu1, T Vorup-Jensen3, S Thiel3,
H Friedl4, T Hankeln4, R Hall5, S Gregory5, T Fujita2 and W Schwaeble1
1
Department of Microbiology and Immunology, University of Leicester, Leicester, UK; 2Department of Biochemistry, Fukushima
Medical University, School of Medicine, Fukushima, Japan; 3Department of Medical Microbiology and Immunology, The Bartholin
Building, University of Århus, Århus, Denmark; 4Genterprise, Institut für Molekulargenetik und Gentechnische Sicherheit, Johannes
Gutenberg University Mainz, Mainz, Germany; 5The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
The proteases of the lectin pathway of complement activation, MASP-1 and MASP-2, are encoded by two separate
genes. The MASP1 gene is located on chromosome 3q27, the MASP2 gene on chromosome 1p36.23–31. The genes for
the classical complement activation pathway proteases, C1r and C1s, are linked on chromosome 12p13. We have shown
that the MASP2 gene encodes two gene products, the 76 kDa MASP-2 serine protease and a plasma protein of 19 kDa,
termed MAp19 or sMAP. Both gene products are components of the lectin pathway activation complex. We present the
complete primary structure of the human MASP2 gene and the tight cluster that this locus forms with non-complement
genes. A comparison of the MASP2 gene with the previously characterised C1s gene revealed identical positions of
introns separating orthologous coding sequences, underlining the hypothesis that the C1s and MASP2 genes arose by
exon shuffling from one ancestral gene. Genes and Immunity (2001) 2, 119–127.
Keywords: complement; lectin pathway; serine proteases; gene cluster; MASP2 gene
Introduction
The complement system is an integral part of the innate
antimicrobial immune defence and mediates humoral
and cellular interactions within the immune response,
including chemotaxis, phagocytosis, cell adhesion, and B
cell differentiation.1 Complement may be activated via
three different routes, the classical pathway, the alternative pathway, and the recently described lectin pathway.
The classical activation pathway is initiated by binding of
the hexaoligomeric recognition molecule C1q to immune
complexes via its globular heads. This effects a conformational change of its collagenous stalks which in turn
leads to the activation of the C1q-associated serine protease complexes C1r2 and C1s2. Of these, C1r cleaves and
activates the zymogen C1s. Activated C1s subsequently
cleaves C4 and C4b bound C2 to generate the C3 conCorrespondence: Dr WJ Schwaeble, Department of Microbiology and
Immunology, University of Leicester, University Road, Leicester LE1
9HN, UK. E-mail: ws5얀le.ac.uk
This work was supported by the Wellcome Trust, the German
Research Foundation (Deutsche Forschungsgemeinschaft), the
Japanese Society for the Promotion of Science, and the EMDO
Foundation, Switzerland.
Sequence data from this article have been deposited with the
EMBL/GenBank Data Libraries under accession numbers
AL109811, AB033742, AJ297949, AJ299718, and AJ300188.
Received 4 December 2000; accepted 1 February 2001
vertase, C4b2b, and the C5 convertase, C4b2b(C3b)n. The
alternative pathway forms an amplification loop of complement activation and is initiated by binding of the complement activation product C3b to Factor B. The subsequent cleavage of Factor B by Factor D forms an
enzymatically active complex, C3bBb, which converts
native C3 to C3b. Accumulation of multiple C3b molecules in proximity of the alternative pathway convertase
C3bBb allows the complex to cleave and activate the fifth
component of complement, C5.
The lectin pathway can be activated in the absence of
immune complexes and is initiated by the recognition of
carbohydrate structures present on microbial organisms
via macromolecular complexes present in body fluids.
These complexes are composed of a multivalent pattern
recognition subunit and associated serine proteases
which initiate complement activation upon binding of the
recognition subunits to microbial oligosaccharide structures. To date, two pattern recognition components of the
lectin activation pathway have been described, the well
characterised plasma glycoprotein mannan-binding lectin
(MBL)2 and ficolin p35,3 both members of the collectin
family,4,5 yet differing in their binding specificities to
microbial surface oligosaccharides. As these vary considerably amongst members of the collectin family, the
presence of more than one recognition subunit may result
in a wider spectrum of microbial surfaces recognised by
the lectin pathway activation complexes. MBL forms a
Organisation of the human MASP2 gene locus
C Stover et al
120
hexoidal macromolecular complex of six structural subunits each composed of three identical polypeptide
chains. This associates with two distinct serine proteases,
the MBL-associated serine proteases-1 and -2 (ie, MASP1, MASP-2), and a 19 kDa MASP-2 related plasma protein, termed MAp19 or sMAP.6 No detailed information
is yet available on the stoichiometry of lectin pathway
activation complexes formed with ficolin p35. It is, however, clear that ficolin p35 complexes associate with
MASP-1, MASP-2 and MAp19/sMAP and initiate the lectin pathway via the aforementioned serine proteases.3 In
vitro, purified and recombinant MASP-2 was shown to
cleave the fourth and second component of complement
(ie, C4 and C2) in absence of MASP-1.6–8 The sequential
proteolytic cleavage of C4 and C4b bound C2 to form the
C3 convertase, C4b2b, and the C5 convertase,
C4b2b(C3b), respectively, is the essential step by which
the lectin pathway activation complex as well as the
classical pathway activation complex initiate further
downstream activation of complement.7,8 The potential of
MASP-2 to activate complement in absence of MASP-1 in
vitro is supported by the analysis of sera of a recently
established MASP-1 deficient mouse strain, which exhibits effective lectin pathway activation under physiological
conditions.9 Thus, MASP-2 is the effector component of
the lectin pathway of complement activation. MASP-1
and MAp19 are constitutive components of the lectin
pathway activation complexes, their specific functional
role(s), however, have yet to be defined.
We have previously shown that a single structural
MASP2 gene of approximately 20 kb on chromosome
1p36.23–31 is transcribed and processed to generate two
gene products, a 2.6 kb MASP-2 mRNA, encoding the 76
kDa MASP-2 serine protease, or an abundant 1.0 kb
mRNA transcript, encoding a truncated MASP-2 related
plasma protein of 19 kDa devoid of a serine protease
domain, termed MAp19 or sMAP.10–13
The present report amends and completes our previously presented organisational map (based solely on
genomic PCR)10,11 by a comparative analysis of overlapping sequences obtained from human genomic DNA isolated as independent YAC, BAC and ␭FixII clones. These
represent approximately 222 kb of the chromosomal area
1p36.2–3. Furthermore, we present putative promoter
elements responsible for the strict hepatic expression pattern of MASP-2 and MAp19 mRNA.
Results
Human MASP2 gene organisation
The revised multi-exon structure of the human MASP2
gene (encompassing approximately 20 kb) is presented as
derived from sequence analysis of six overlapping clones
(Figure 1a and b).
The 5⬘ boundary of exon a (62 bp) was determined by
mapping the transcription initiation start site. From two
independent 5⬘ RACE reactions performed on human
liver cDNA, it was determined to lie 57 bp upstream of
the translation initiation codon (Figure 2). In comparison
to the published sequences for human MASP-2 and
MAp19/sMAP mRNA species, the 5⬘ UT region of
human MASP-2 mRNA (Y09926) could be extended by
37 bp and that of human MAp19/sMAP mRNA by 41
bp (Y18281) and 31 bp (AB008047), respectively. The
Genes and Immunity
sequence coding for the signal peptide is split by an
intron of 83 bp (Figure 1a). Exon b codes for the remainder of the signal peptide (R)LLTLLGLLCGSVA and the
N-terminal 63 residues of the CUB I domain. The C-terminal 59 residues of the CUB I domain are encoded by
exon c, separated from exon b by 157 bp and from exon
d, the single exon coding for the EGF-like domain, by
1016 bp. The alternative splice/polyadenylation exon e
(which contains coding sequence for the four C-terminal
amino acids of MAp19 not found in MASP-2, namely glutamic acid, glutamine, serine, and leucine, and sequence
of the 3⬘ UT region specific for MAp19 mRNA) follows
after 454 bp of intronic sequence. The 3⬘ boundary of
exon e is determined by the position of the poly(A)
addition site14 noted in the human MAp19 mRNA transcripts with the accession numbers Y18281–Y18284. 1262
bp further downstream, sequence for the second CUB
structural domain is split by an intron of 316 bp (exon f,
197 bp; exon g, 148 bp, respectively). An intron of 1997
bp follows (AJ299718). The two CCP domains of MASP2 are encoded by two exons each over a stretch of 7.6 kb,
exon h (119 bp) and exon i (79 bp), exon j (135 bp) and
exon k (75 bp), respectively. 2527 bp further downstream,
exon l encodes the serine protease domain of MASP-2
and contains the 3⬘ UT region specific for MASP-2
mRNA. The 3⬘ boundary of exon l is determined from
the position of the poly (A) addition site14 found in cDNA
clones Y09926, phl-1, phl-2, phl-3.15 Exons b-l are preceded by polypyrimidine rich tracts. Sequence comparison of the clones aligned in Figure 1 revealed only two
variants (transitions) within the coding sequence of the
MASP2 gene: the codon for the amino acid position 362
of the mature MASP-2 protein is GTG (valine) in
AL109811 and GCG (alanine) in AJ300188, respectively;
the codon for amino acid position 478 is TCC (serine) in
AL109811 and TCT (serine) in AJ300188, respectively.
The codon phases (positions of introns within the
codon) are given in Table 1. They are identical for the
human MASP2, MASP1, and C1s genes in the orthologous exonic composition represented in MASP2 exons a,
b, c, d, f, g, h, i, j, and k.16 The codon phase at the 5⬘ end
of exon l is also identical for the orthologous position in
the human C1r gene.17
A GC level of 50.37% was determined for sequence
encompassing the MASP2 gene (−2137 from transcription
start site to 167 bp past exon l, ie a total of 19980 bp;
http://ftp.genome.washington.edu/cgi-bin/RepeatMasker).
In keeping with this high GC content, this area is characterised by the abundance of ‘short interspersed nuclear
elements’, especially ALU repeats (36.01% of the
sequence) and MIR (1.59%), and scarce ‘long interspersed
nuclear elements’, LINE2 (1.46%).
Within 1.2 kb 5⬘ of the determined transcription start
site three potential promoter sites (with a score cut-off of
0.9, 0.99, and 1.0, respectively) are predicted by a human
promoter analysis programme (http://www.fruitfly.
org/seqFtools/promoter.html).18 There are two putative
GC-box motifs.19 Several potential transcription factor
binding sequences are revealed in their vicinities
(http://bimas.dcrt.nih.gov/molbio/signal/)20,21 (Figure 2).
Most importantly, among the latter, consensus sequences
for liver-specific (LF-A1 and TGT3) or -enriched transcription factors (HNF4, and HNF5, respectively,
http://www.cbil.upenn.edu/tess/) are contained which
Organisation of the human MASP2 gene locus
C Stover et al
121
Figure 1 (a) Organisation of the human MASP2 gene localised on chromosome 1p36.23–31. Exons are denoted as boxes and enumerated
a through l, their products in terms of structural and functional modules conserved in the serine proteases C1r, C1s, MASP-1, MASP-2
are indicated. Introns are indicated as lines and shortened to scale; their sizes are given in brackets. 1 indicates the transcription start site
determined by 5⬘ RACE analysis (see Figure 2), 17.675 kb indicates the position of the Poly(A) addition site of the MASP-2 mRNA. M,
methionine; SP, signal peptide. (b) Alignment of overlapping clones. AL109811 represents a human PAC clone of approximately 150 kb
which contains all of the MASP2 gene. The intronic sequence g/h (1997 bp) is deposited separately under AJ299718. ø4–12 and ø2–2 are
two clones isolated from a ␭FixII genomic library in this work which have been characterised by restriction mapping, Southern blot and
partial sequence analyses. AB033742 represents sequence (2819 bp) of the 5⬘ part of the gene giving rise to the abundant plasma protein
MAp19 (exons a through e). AJ300188 is a human genomic clone isolated from a RP11 BAC library with partial representation of the
MASP2 gene (see Figure 3). AJ297949 (2577 bp) represents an amplification product obtained from human genomic DNA spanning the
end of coding sequence for the EGF-like domain up to the sequence coding for the C-terminal part of the CUB II domain and encompasses
sequence for the alternative splice and polyadenylation exon e.
may play a role in determining the liver specific
expression of MASP-2 and MAp19 mRNA.10,12
Clone RP11–99P18 (AJ300188) only covers sequence for
the 3⬘ end of the MASP2 gene including exons l, k, and
j, and 1.5 kb intronic sequence prior to exon j (Figure 1).
As indicated in Figure 3, sequence encompassing the promoter area of the MASP2 gene (Figure 2) and the entire
genomic sequence for exons a through i (including the
alternative splice exon e) are replaced by a shortened,
unrelated sequence of 1.3 kb. At its boundaries with the
known MASP2 gene sequence (100% identical in overlapping parts), there is an increasing number of nucleotide
mismatches towards a sequence which is composed of
48% ALU repeats and 6.3% LINE2 repeats (http://
ftp.genome.washington.edu/cgi-bin/RepeatMasker).
Genetic neighbourhood of the MASP2 gene
Figure 3 (a and b) gives an overview of the MASP2 gene
cluster, as determined from sequence of overlapping
clones characterised in this work. The MASP2 gene is
located in close vicinity to non-complement genes.
FRAP is a phospatidylinositol kinase conserved in
yeast,22 cryptococcus neoformans,23 rat,24 and bovidae25
and is involved in G1 cell cycle progression.26 Its gene,
FRAP1, is located in the vicinity of a cytogenetic breakpoint.27 It has been mapped to chromosomal location
1p36.228 and 1p36.2–3,29 respectively. The gene bank
entries U88966 and NMF004958 represent two alternatively polyadenylated transcripts of this one gene differing only in sequence established within a GC rich
region of 46 bp.
PM-Scl 100 kDa nucleolar autoantigen30,31 has a yeast
homologue, Rrp6p, which is required for processing of
5.8 S rRNA prior to ribosomal subunit assembly.32 Its
gene, PMSCL2, has been located to human chromosome
1p36 and to mouse chromosome 4q27 (which is syntenic
to the human gene location).33 Two gene bank entries for
mRNA transcripts of this gene, X66113 and L01457, sharing the same 5⬘ and 3⬘ UT regions and differing in the
presence or absence of coding sequence for 25 amino
acids were analysed in comparison with the complete
transcription unit of PMSCL2. The two mRNA transcripts
are alternative products from the same gene arising from
in-frame inclusion or skipping of the nineteenth exon
(codon phase 0).
TAR DNA binding protein of 43 kDa is a transcription
factor that binds specifically to pyrimidine-rich motifs
contained in the HIV-1 long terminal repeat and acts to
repress gene expression.34 Its gene is termed TARDBP.
The International Radiation Hybrid Mapping consortium
has mapped the TARDBP gene to chromosome 20
(http://www.ncbi.nlm.nih.gov/omim/). Indeed, databank search (http://www.sanger.ac.uk/HGP/Chr20/)
using the full-length mRNA sequence (AL050265)
retrieves the following human bacterial artificial chromosome (BAC) clone, RP11–318P23 (AL359954). However,
the degree of identity is between 86 and 95%, whereas
the TARDBP mRNA sequence (AL050265) is 98–100%
identical to corresponding sequence represented in PAC
clone RP4–635E18 (AL109811), mapping to chromosome
1p36.11–31.
The chromosomal localisation of the gene for ‘corneaderived transcript 6’, CDT6 (Y16132), has not previously
been assigned. The single report in the literature identGenes and Immunity
Organisation of the human MASP2 gene locus
C Stover et al
122
Figure 2 Promoter region, transcriptional regulatory elements, and transcription initiation start site of the human MASP2 gene. Consensus
sequences of putative binding sites for transcription factors are underlined and indicated and predicted promoter sequences are in bold.
Putative GC boxes are marked. The transcription initiation start site as defined by 5⬘ RACE analysis is indicated as +1. Exon a (containing
the 5⬘ UT region of MAp19/MASP-2 mRNA transcripts and their translation initiation codon) and the 5⬘ portion of exon b (coding for the
remainder of the signal peptide and the N-terminal part of CUB I) are shown.
ifies CDT6 as a novel gene whose translational product
shows homology to angiopoietins 1 and 2 and is characterised by a coiled-coil structural domain and a fibrinogen-like domain.35 The gene for CDT6 lies within a 25
kb intron of the FRAP1 gene, in opposite transcriptional
orientation (Figure 3).
Discussion
The MASP2 gene extends over approximately 20 kb and
is much smaller than the MASP1 gene (about 72 kb as
seen from sequence made available by the Human Genome Sequencing Project, accession number AC007920)
owing to smaller introns. The human C1s gene comprises
approximately 13 kb.16
Genes and Immunity
Like the mRNA for the serine protease C1s, the MASP2 mRNA transcript is encoded by 12 exons. There is one
additional exon in the MASP2 gene to allow generation
of the alternative splice and polyadenylation product,
MAp19/sMAP. In the C1s and the MASP2 gene, each of
the two CUB structural domains as well as each of the
two CCP modules are encoded by two exons each. The
EGF-like domain is encoded by one exon only. Likewise,
the enzymatically active protease domain is encoded by
one exon only and clearly distinguishes the C1s and
MASP2 genes from the MASP1 gene, in which the protease domain is encoded by six exons.16 A previously
published comparative analysis of the genomic organisation of the serine protease domain of MASP-2 in mouse
and Xenopus and of MASPs in carp, shark, and lamprey
Organisation of the human MASP2 gene locus
C Stover et al
123
Table 1 Codon phases within the human MASP2 gene
Splice donor and acceptor sequences of all exons are compiled. The arrows indicate the alternative polyadenylation/splicing event that
takes place in the hnRNA to generate either the MAp19 or MASP-2 mRNA. The position of the intron within the respective codon is given
to the right; codon phase 0 and I denote a position of the intronic sequence before and after the first base, respectively.
revealed that these MASP genes belong to the so-called
AGY lineage of thrombin-like serine proteases. The members of this lineage are characterised by an AGY codon
at the active site serine and a single exon coding for the
protease domain.16 C1r was also shown to possess an
intronless protease domain,36 its active site serine residue
is also encoded by an AGY triplet. The close proximity
of the human C1r and C1s genes, both of the AGY lineage
(in 5⬘ → 3⬘ (C1r gene) and 3⬘ → 5⬘ (C1s gene) orientation,
at an intergenic distance of 9.5 kb37), suggested that both
genes are the result of a gene duplication event38 and
have remained closely linked on chromosome 12p13.39
MASP-1, by contrast, belongs to the TCN lineage of trypsin-like serine proteases.16 Taking this into account,40 a
model for the evolution of the MASP1, MASP2, C1r, and
C1s genes was proposed which involves events such as
mutation, retrotransposition, and duplication from an
ancestral gene of the TCN type lineage showing a split
gene organisation for the protease domain.16,17,41,42 The
identical codon phases in the orthologous coding
Genes and Immunity
Organisation of the human MASP2 gene locus
C Stover et al
124
Figure 3 (a) The MASP2 Gene Cluster on chromosome 1p36.2–3. The transcription units for the individual genes in the vicinity of the
MASP2 gene are presented as boxes and arrows indicate orientation of transcription. Names of the gene products are included. The position
of this analysed chromosomal area with respect to telomere and centromere of the short (p) arm is indicated. (b) Coverage of this chromosomal area by overlapping clones (AL109811, AJ300188, AL049653). The sequence area denoted by an asterisk represents the divergent
sequence of 1340 bp solely composed of repetitive elements found in AJ300188.
sequences of the MASP1, MASP2, C1s genes, and the partially characterised C1r gene underline the likelihood of
such a genesis. The phenomenon of exon shuffling may
have to be added to these events:43 symmetric exons, ie
exons for which the phase of the 5⬘ intron is identical to
the phase of the 3⬘ intron, will spread more easily by exon
shuffling than nonsymmetric ones.44 Within the MASP2
gene, exons d, e, j, and k are symmetric, ie phase correlated, exons (coding for the EGF-like domain, the MAp19
specific C-terminus, and all of CCP II). By contrast,
introns that lie in phase 0 (ie between codons), are said
to be ancient, ie originally present in the progenote, and
are probably more related to structural modules of the
protein.45 In the MASP2 gene, phase 0 introns are found
between the exons coding for CUB I, CUB II, and CCP
I, respectively.
The MASP2 gene lies in a gene-rich area of chromosome 1,46 in close proximity to non-complement genes,
such as the gene coding for TAR DNA binding protein,
the gene coding for the 100 kDa Polymyositis-Scleroderma autoantigen, the gene coding for FRAP kinase
which harbours in one of its introns the gene for CDT6.
All of these genes have a split exon architecture.
In a wider genetic context, the detection on 1p35–36.2
of several and polymorphic pseudogenes of the macrophage stimulating protein gene has lead to the concept
that recombination may occur frequently between these
repeated sequences.47 In addition, a PAC clone from this
chromosomal area (Z98257) contains sequence with 81–
96% identity to a human endogenous retrovirus type C
oncovirus sequence (M74509). An adenoviral integration
site has been mapped to 1p36.1.48 The cytogenetic aberration described in neuroblastomas is a deletion of the
short arm of chromosome 1.49 This deletion encompasses
the MASP2 gene locus (the MASP2 gene is at 5.66 cR from
the marker D1S54813) as seen from an analysis of several
framework
maps
(http://www.ncbi.nlm.nih.gov/
genemap/map.cgi for chromosome 1).28,49 The significance of this is not yet clear. Here, it is of interest to note
the substitution of part of the MASP2 gene observed in
clone RP11–99P18 (AJ300188), which encompasses the
Genes and Immunity
promoter region of the MASP2 gene, exons a through i,
and includes most of the intron prior to exon j which
codes for the N-terminal part of CCP II. Analysis of this
sequence area reveals that there is sequence identity over
a stretch of 1.5 kb for the 3⬘ part of the intron i/j of the
MASP2 gene and a stretch of 4.3 kb for the 5⬘ part of
the intergenic region to the PMSCL2 gene, respectively.
Approaching the divergent sequence of 1.3 kb (solely
composed of repetitive elements) from either side, there
is an increasing number of nucleotide mismatches. The
origin of the divergent organisation of the MASP2 gene
locus in this BAC clone is presently investigated by
sequencing two overlapping BACs of the same human
BAC library from which clone RP11–99P18 was isolated.
The serine proteases of the classical and lectin pathway
of complement activation have a similarity score between
39 and 52%.15 Their genes belong to a multigene family.
Genes coding for various types of trypsinogen are also
members of this family. Trypsin, composed solely of a
serine protease domain (of the TCN type and encoded
by five exons, the boundaries differ from the ones found
in the orthologous part of the MASP1 gene coding for the
serine protease domain), is thought to be the primordial
enzyme present before the emergence of complement serine proteases, and novel substrate specificities are
thought to have been introduced by the evolutionary
anteposition of certain modular domain structures found
in, eg, MASP-2, MASP-1, C1r, and C1s. Recent work
emphasises the importance of residue 225 (chymotrypsin
numbering) in serine proteases.50 While degradative
enyzmes such as trypsin have a proline residue at this
position, P225, a tyrosine is found in MASP-1, MASP-2,
C1r, C1s (and thrombin), Y225. Y225 allows Na+ binding
near the primary specificity site and thus enhances catalytic activity. It is thought that the mutation P225 to Y225
occurred in prevertebrates in the face of evolutionary
pressures, prior to the conversion of the codon for the
active site serine (TCN to AGY) in more specialised serine
proteases found in vertebrates.50
The concomitant presence of members of unrelated
multigene families in the vicinity of the MASP1, MASP2,
Organisation of the human MASP2 gene locus
C Stover et al
125
Table 2 Compilation of positional overlap of human genes belonging to four different multigene families
Presented on the left are the genes for the serine proteases MASP-2, MASP-1, C1r, C1s, and two trypsins; these are compared against
members of three unrelated gene families, a glucose transporter gene family (SLC2A), a chloride channel gene family (CLCN), and the
tumor necrosis factor receptor gene family (TNFR), and their respective chromosomal locations are indicated.
C1r, C1s, and trypsinogen genes would lend support to
the hypothesis of a joint ancestry of these genes and
weaken the theoretical possibility of a convergent evolution of unrelated genes. In search of these, strikingly
overlapping chromosomal positions have been found for
the serine protease genes of interest to this work, two of
the trypsinogen genes,51 the genes for certain isoforms of
a glucose transporter52 and of a chloride channel53 as well
as for the genes for the two TNF receptors (http://www.
ncbi.nlm.nih.gov/omim/,
http://www.ncbi.nlm.nih.
gov/genemap99/)54,55 (Table 2). We propose that the consideration of such neighbouring genes which are equally
members of multigene families should be included in
phylogenetic analyses.
Materials and methods
RACE
Initiation of transcription of the human MASP2 gene10,11
was mapped by RACE technology using 5⬘ RACE System
for Rapid Amplification of cDNA Ends (Version 2.0) from
Life Technologies (Paisley, UK). Total RNA prepared
from human liver tissue56 was transcribed to cDNA using
a MASP-2 mRNA specific antisense oligonucleotide (5⬘
GGAGAGTTTGGGATACGGCCGTGG GTA 3⬘, bp pos.
617–643, accession number Y09926). The single strand
cDNA was purified and dC-tailed according to the manufacturer’s protocol. An anchor primer annealing to the
homopolymeric tail and two other, nested MASP2/MAp19 specific oligonucleotides (5⬘ GCGTGGCCAG
CACCTTGGCCCC 3⬘, bp pos. 265–286 of Y18281, 5⬘
GCGA GTAGAAAGTGTCCTTGC 3⬘, bp pos. 326–346 of
Y18281) positioned 3⬘ of the above oligonucleotide were
used for two independent cyclic amplification reactions.
Two distinct amplification products of 320 bp and 380
bp, respectively were subcloned in pGEMTeasy vector
(Promega, Southampton, UK) and analysed by double
stranded cycle sequencing.
Cloning of the MASP2 gene
Human bacterial artificial chromosome (BAC) clones
were isolated by screening a human RP11 BAC library
using a radiolabeled cDNA probe specific for the coding
region of the serine protease domain of MASP-2
(Research Genetics, Huntsville, Alabama, USA). The
clones were verified by colony PCR using MASP-2 specific oligonucleotides (5⬘ ACCT GGTGATTTTCCTTGG
3⬘, bp. pos. 1372–1390 of Y09926 and 5⬘ GTCACTCT
CGTGGTTTATG 3⬘, bp. pos. 2278–2296 of Y09926) and
Southern blotting. Partial primary structure for the
MASP2 gene was established from overlapping subclones
obtained from a shotgun library constructed of one BAC
clone, RP11–99P18. Double stranded sequencing and
analysis were performed using automated sequencer ABI
377 and Sequencher 3.0 programme (Gene Codes Corporation, MI, USA), respectively. In parallel, two clones, ø4–
12 and ø2–2, were isolated from a human ␭ Fix II genomic
library using a MASP-2 cDNA probe and analysed by
restriction mapping and sequencing. One clone,
AJ297949, was obtained by cyclic amplification of human
genomic DNA using MASP-2 specific oligonucleotides (5⬘
GCGCCCACCTGCGACCACCAC 3⬘, bp pos. 461–481 of
Y09926 and 5⬘ TCCTGATTCATCTGTGACAAAGG 3⬘, bp
pos. 846–868 of Y09926) and characterised by restriction
mapping and sequencing.
Acknowledgements
We wish to thank The International Institute for the
Advancement of Medicine, University of Leicester for
kindly providing the non-tumorous human liver specimen used in this study. We acknowledge the expertise of
Cathy Summer (Research Genetics, Huntsville, Alabama,
USA) in isolating the MASP-2 specific human BAC clone
RP11–99P18 (AJ300188). The authors also wish to thank
the Sanger Mapping and Sequence groups for the
sequence production of RP4–635E18 (AL109811). Guelnihal Yueksekdag is acknowledged for excellent technical
support. We especially thank Dr Jim Kaufman (Institute
of Animal Health, Compton, England) for his inspiring
comments.
References
1 Whaley K, Schwaeble W. Complement and complement
deficiencies. Semin Liver Dis 1997; 17: 297–310.
Genes and Immunity
Organisation of the human MASP2 gene locus
C Stover et al
126
2 Lu JH, Thiel S, Wiedemann H, Timpl R, Reid KB. Binding of
the pentamer/hexamer forms of mannan-binding protein to
zymosan activates the proenzyme C1r2C1s2 complex, of the
classical pathway of complement, without involvement of C1q.
J Immunol 1990; 144: 2287–2294.
3 Matsushita M, Endo Y, Fujita T. Complement-activating complex of ficolin and mannose-binding lectin-associated serine protease. J Immunol 2000; 164: 2281–2284.
4 Hansen S, Holmskov U. Structural aspects of collectins and
receptors for collectins. Immunobiology 1998; 199: 165–189.
5 Hoppe H, Reid K. Collectins--soluble proteins containing collagenous regions and lectin domains--and their roles in innate
immunity. Protein Sci 1994; 3: 1143–1158.
6 Thiel S, Petersen S, Vorup-Jensen T et al. Interaction of C1q and
Mannan-Binding Lectin (MBL) with C1r, C1s, MBL-associated
serine proteases 1 and 2, and the MBL-associated protein
MAp19. J Immunol 2000; 165: 878–887.
7 Vorup-Jensen T, Petersen S, Hansen A et al. Distinct pathways of
mannan-binding lectin (MBL)- and C1 complex autoactivation
revealed by reconstitution of MBL with recombinant MBLassociated serine protease-2. J Immunol 2000; 165: 2093–2100.
8 Matsushita M, Thiel S, Jensenius J, Terai I, Fujita T. Proteolytic
activities of two types of mannose-binding lectin-associated
serine protease. J Immunol 2000; 165: 2637–2642.
9 Takahashi M, Miura S, Ishii N et al. An essential role of MASP1 in activation of the lectin pathway. Immunopharmacology 2000;
49: 3 (Abstract 003).
10 Stover C, Thiel S, Thelen M, Lynch N, Vorup-Jensen T, Jensenius
J, Schwaeble W. Two constituents of the initiation complex of
the Mannan-Binding Lectin activation pathway of complement
are encoded by a single structural gene. J Immunol 1999; 162:
3481–3490.
11 Takahashi M, Endo Y, Fujita T, Matsushita M. A truncated form
of mannose-binding lectin-associated serine protease (MASP)-2
expressed by alternative polyadenylation is a component of the
lectin complement pathway. Int Immunol 1999; 11: 859–863.
12 Stover C, Thiel S, Lynch N, Schwaeble W. The rat and mouse
homologues of MASP-2 and MAp19, components of the mannan-binding lectin activation pathway of complement. J Immunol
1999; 63: 6848–6859.
13 Stover C, Schwaeble W, Lynch N, Thiel S, Speicher M. Assignment of the gene encoding Mannan-Binding Lectin-associated
serine protease-2 (MASP-2) to human chromosome 1p36.2–3 by
in situ hybridization and somatic cell hybrid analysis. Cytogenet
Cell Genet 1999; 84: 148–149.
14 Cooke C, Alwine J. The cap and the 3⬘ splice site similarly affect
polyadenylation efficiency. Mol Cell Biol 1996; 16: 2579–2584.
15 Thiel S, Vorup-Jensen T, Stover C et al. A second serine protease
associated with mannan-binding lectin that activates complement. Nature 1997; 386: 506–510.
16 Endo Y, Takahashi M, Nakao M et al. Two lineages of mannosebinding lectin-associated serine protease (MASP) in vertebrates.
J Immunol 1998; 161: 4924–4930.
17 Endo Y, Sato T, Matsushita M, Fujita T. Exon structure of the
gene encoding the human mannose-binding protein-associated
serine protease light chain: comparison with complement C1r
and C1s genes. Int Immunol 1996; 8: 1355–1358.
18 Reese MG, Eeckman FH. Novel Neural Network Algorithms for
Improved Eukaryotic Promoter Site Recognition. The 7th international Genome sequencing and analysis conference, Hilton
Head Island, South Carolina, 16–20 September 1995.
19 Bucher P. Weight matrix descriptions of four eukaryotic RNA
polymerase II promoter elements derived from 502 unrelated
promoter sequences. J Mol Biol 1990; 212: 563–578.
20 Prestridge DS. SIGNAL SCAN: A computer program that scans
DNA sequences for eukaryotic transcriptional elements.
CABIOS 1991; 7: 203–206.
21 Ghosh D. New developments of a transcription factors database.
Trends Biochem Sci 1991; 16: 445–447.
22 Kunz, J, Henriquez R, Schneider U, Deuter-Reinhard M, Movva
N, Hall M. Target of rapamycin in yeast, TOR2, is an essential
Genes and Immunity
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
phosphatidylinositol kinase homolog required for G1 progression. Cell 1993; 73: 585–596.
Cruz M, Cavallo L, Gorlach J et al. Rapamycin antifungal action
is mediated via conserved complexes with FKBP12 and TOR
kinase homologs in cryptococcus neoformans. Mol Cell Biol 1999;
19: 4101–4112.
Sabatini D, Erdjument-Bromage H, Lui M, Tempst P, Snyder S.
RAFT1: a mammalian protein that binds to FKBP12 in a rapamycin-dependent fashion and is homologous to yeast TORs. Cell
1994; 78: 35–43.
Brown E, Albers M, Shin T et al. A mammalian protein targeted
by G1-arresting rapamycin-receptor complex. Nature 1994; 369:
756–758.
Vilella-Bach M, Nuzzi P, Fang Y, Chen J. The FKBP12-rapamycin-binding domain is required for FKBP12-rapamycin-associated protein kinase activity and G1 progression. J Biol Chem 1999;
274: 4266–4272.
Onyango P, Lubyova B, Gardellin P, Kurzbauer R, Weith A.
Molecular cloning and expression analysis of five novel genes
in chromosome 1p36. Genomics 1998; 50: 187–198
Lench N, Macadam R, Markham A. The human gene encoding
FKBP-rapamycin associated protein (FRAP) maps to chromosomal band 1p36.2. Hum Genet 1997; 99: 547–549.
Moore P, Rosen C, Carter K. Assignment of the human FKBP12rapamycin-associated protein (FRAP) gene to chromosome 1p36
by fluorescence in situ hybridization. Genomics 1996; 33: 331–332.
Gelpi C, Algueró A, Angeles Martinez M, Vidal S, Juarez C,
Rodriguez-Sanchez J. Identification of protein components
reactive with anti-PM/Scl autoantibodies. Clin Exp Immunol
1990; 81: 59–64.
Ge Q, Wu Y, Trieu E, Targoff I.Analysis of the specificity of antiPM-Scl autoantibodies. Arthritis Rheum 1994; 37: 1445–1452.
Briggs M, Burkard K, Butler J. Rrp6p, the yeast homologue of
the human PM-Scl 100-kDa autoantigen, is essential for efficient
5.8 S rRNA 3⬘ end formation. J Biol Chem 1998; 273: 13255–13263.
Bliskovski V, Liddell R, Ramsay E, Miller M, Mock B. Structure
and localization of mouse Pmscl1 and Pmscl2 genes. Genomics
2000; 64: 106–110.
Ou SH, Wu F, Harrich D, Garcia-Martinez L, Gaynor R. Cloning
and characterization of a novel cellular protein, TDP-43, that
binds to human immunodeficiency virus type 1 TAR DNA
sequence motifs. J Virol 1995; 69: 3584–3596.
Peek R, van Gelderen BE, Bruinenberg M, Kijlstra A. Molecular
cloning of a new angiopoietin-like factor from the human cornea. Invest Ophthalmol Vis Sci 1998; 39: 1782–1788.
Tosi M, Duponchel C, Meo T, Couture-Tosi E. Complement
genes C1r and C1s feature an intronless protease domain closely
related to haptoglobin. J Mol Biol 1989; 208: 709–714.
Kusumoto H, Hirosawa S, Salier J, Hagen F, Kurachi K. Human
genes for complement C1r and C1s in a close tail-to-tail arrangement. Proc Natl Acad Sci 1988; 85: 7307–7311.
Tosi M, Duponchel C, Meo T, Julier C. Complete cDNA
sequence of human complement C1s and close physical linkage
of the homologous genes C1s and C1r. Biochemistry 1987; 26:
8516–8524.
Van Cong N, Tosi M, Gross M et al. Assignment of the complement serine protease genes C1r and C1s to chromosome 12
region 12p13. Hum Genet 1988; 78: 363–368.
Dang Q, DiCera E. Residue 225 determines the Na+-induced
allosteric regulation of catalytic activity in serine proteases. Proc
Natl Acad Sci USA 1996; 93: 10653–10656.
Lawson P, Reid K. A novel PCR-based technique using
expressed sequence tags and gene homology for murine genetic
mapping: localization of the complement genes. Int Immunology
1999; 12: 231–240.
Irwin D. Evolution of an active-site codon in serine proteases.
Nature 1988; 336: 429–430.
Long M, Rosenberg C, Gilbert W. Intron phase correlations and
the evolution of the intron/exon structure of genes. Proc Natl
Acad Sci USA 1995; 92: 12495–12499.
Organisation of the human MASP2 gene locus
C Stover et al
44 Gilbert W, de Souza S, Long M. Origin of genes. Proc Natl Acad
Sci USA 1997; 94: 7698–7703.
45 De Souza S, Long M, Klein R, Roy S, Lin S, Gilbert W. Toward
a resolution of the introns early/late debate: only phase zero
introns are correlated with the structure of ancient proteins. Proc
Natl Acad Sci 1998; 95: 5094–5099.
46 Saccone S, De Sario A, Della Valle G, Bernardi G. The highest
gene concentrations in the human genome are in telomeric
bands of metaphase chromosomes. Proc Natl Acad Sci 1992; 89:
4913–4917.
47 van der Drift P, Chan A, Zehetner G, Westerveld A, Versteeg
R. Multiple MSP pseudogenes in a local repeat cluster on 1p36.2:
an expanding genomic graveyard? Genomics 1999; 62: 74–81.
48 Romani M, Baldini A, Volpi E, Casciano I, Nobile C, Muresu
R, Siniscalco M. Concurrent mapping of an adeovirus 5/SV40
integration site and the U1 snRNA cluster (RNU1) within 400 kb
of the chromosome 1p36.1. Cytogenet Cell Genet 1994; 67: 37–40.
49 White P, Maris J, Beltinger C et al. A region of consistent deletion
in neuroblastoma maps within human chromosome 1p36.2–36.3.
Genetics 1995; 92: 5520–5524.
50 Guinto ER, Caccia S, Rose T, Futterer K, Waksman G, Di Cera
E. Unexpected crucial role of residue 225 in serine proteases.
Proc Natl Acad Sci USA 1999; 96: 1852–1857.
51 Emi M, Nakamura Y, Ogawa M et al. Cloning, characterization
and nucleotide sequences of two cDNAs encoding human pancreatic trypsinogens. Gene 1986; 41: 305–310.
52 Saier M. Families of transmembrane sugar transport proteins.
Mol Microbiol 2000; 35: 699–710.
53 Jentsch TJ, Friedrich T, Schriever A, Yamada H. The CLC chloride channel family. Pflugers Arch 1999; 437: 783–795.
54 Fuchs P, Strehl S, Dworzak M, Himmler A, Ambros P. Structure
of the human TNF receptor 1 (p60) gene (TNFR1) and localization to chromosome 12p13. Genomics 1992; 13: 219–224.
55 Kemper O, Derre J, Cherif D, Engelmann H, Wallach D, Berger
R. The gene for the type II (p75) tumor necrosis factor receptor
(TNF-RII) is localized on band 1p36.2-p36.3. Hum Genet 1991;
87: 623–624.
56 Chirgwin J, Przybyla A, MacDonald R, Rutter W. Isolation of
biologiocally active ribonucleic acid from sources enriched in
ribonucleases. Biochemistry 1979; 18: 5294–5299.
127
Genes and Immunity