Evolution of the Large Secreted Gel

Evolution of the Large Secreted Gel-Forming Mucins
Jean-Luc Desseyn,*† Jean-Pierre Aubert,*‡ Nicole Porchet,*‡§ and Anne Laine*
*Unité 377 INSERM, Lille, France; †Department of Pharmacology, University of Washington; ‡Laboratoire de Biochimie et
de Biologie Moléculaire de l’Hôpital C. Huriez, CHRU de Lille, Lille, France; and §Faculté de Médecine, Université de Lille
II, Lille, France
Mucins, the major component of mucus, contain tandemly repeated sequences that differ from one mucin to another.
Considerable advances have been made in recent years in our knowledge of mucin genes. The availability of the
complete genomic and cDNA sequences of MUC5B, one of the four human mucin genes clustered on chromosome
11, provides an exemplary model for studying the molecular evolution of large mucins. The emerging picture is
one of expansion of mucin genes by gene duplications, followed by internal repeat expansion that strictly preserves
frameshift. Computational and phylogenetic analyses have permitted the proposal of an evolutionary history of the
four human mucin genes located on chromosome 11 from an ancestor gene common to the human von Willebrand
factor gene and the suggestion of a model for the evolution of the repeat coding portion of the MUC5B gene from
a hypothetical ancestral minigene. The characterization of MUC5B, a member of the large secreted gel-forming
mucin family, offers a new model for the comparative study of the structure-function relationship within this
important family.
Introduction
Mucus protects the underlying epithelium from
chemical, enzymatic, and mechanical damage. It consists mainly of mucins, which are heterogeneous, highly
glycosylated proteins produced from epithelial cells (Ho
and Kim 1991). All mucins contain a central part which
carries numerous oligosaccharide chains. This part, rich
in Ser, Thr, and Pro, is composed of tandem repeats. The
number of repeats and the amino acid (aa) sequence of
each repeat depend on the mucin gene. The central part
is flanked at both ends by unique domains with aa composition different from that of the repeat domain.
Sequences of the mucin cDNAs are rarely fulllength because of the highly repetitive structure and the
extremely large size of some mucin messengers. To date,
eight human mucin genes, MUC1—MUC7 (including
MUC5AC and MUC5B) have been well characterized,
and each mucosa or secretory epithelium expresses a
characteristic pattern of mucin genes. Mucins are usually subdivided into two groups, the secreted mucins
(gel-forming and non–gel-forming) and the membraneanchored mucins. The second group consists of the two
large mucins MUC3 and MUC4, containing EGF-like
motifs, and the small mucin MUC1. MUC6, MUC2,
MUC5AC, and MUC5B are the secreted gel-forming
mucins, and their four genes are contained within a single 400-kb genomic DNA fragment on chromosome 11
band p15.5 (Pigny et al. 1996a). At least MUC2,
MUC5AC, and MUC5B have a common ancestor (Desseyn et al. 1998a) and define a subclass of mucins.
cDNA sequences flanking the central part of this subAbbreviations: aa, amino acid(s); bp, base pair(s); BSM, bovine
submaxillary mucin; CK, cystine knot; FIM, frog integumentary mucin; kb, kilobase(s); NDP, Norrie disease protein; nt, nucleotide(s);
PSM, porcine submaxillary mucin; vWF, von Willebrand factor.
Key words: gel-forming mucin, 11p15, tandem repeat, evolution,
cystine knot.
Address for correspondence and reprints: Jean-Luc Desseyn, Department of Pharmacology, P.O. Box 357750, University of Washington, Seattle, Washington 98195-7750. E-mail: [email protected].
Mol. Biol. Evol. 17(8):1175–1184. 2000
q 2000 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
class of human mucins (MUC2, MUC5AC, MUC5B)
and animal mucins (RMuc2, FIM-B.1, and PSM) code
for cysteine-rich domains which are similar to the cysteine-rich domains that flank the three consecutive A
(A1-A2-A3) domains of von Willebrand factor (vWF)
(Probst, Gertzen, and Hoffmann 1997; Eckhardt et al.
1991, 1997; Gum et al. 1992, 1994; Xu et al. 1992;
Ohmori et al. 1994; Lesuffleur et al. 1995; Desseyn et
al. 1997a, 1998b; Joba and Hoffmann 1997; Li et al.
1998; van de Bovenkamp et al. 1998). These cysteinerich domains are named D (D1-D2-D9-D3 upstream of
the central part in mucins and upstream of the A1-A2A3 domains in vWF and D4 downstream of the central
part in mucins and downstream of the A1-A2-A3 domains in vWF), B, C, and CK (cystine knot; fig. 1).
Partial genomic and cDNA sequences available for the
other mucin genes showed that the 39 ends of MUC6
(Toribara et al. 1997), RMuc5ac (Inatomi et al. 1997),
and BSM (Bhargava et al. 1990) are similar to the Cterminal regions of the three human mucins MUC2,
MUC5AC, and MUC5B.
The CK domain is 85 aa in length in MUC5B and
contains 11 cysteine residues. This domain is similar to
the norrin (also called NDP, for Norrie disease protein)
and its three-dimensional structure is similar to that of
TGF-b family proteins (Meitinger et al. 1993). It has
been found that vWF (Voorberg et al. 1991), NDP (Perez-Vilar and Hill 1997), PSM (Perez-Vilar, Eckhardt,
and Hill 1996; Perez-Vilar and Hill 1998a), and RMuc2
(Bell et al. 1998) form disulfide-linked dimers through
their respective carboxyl-terminal domains, and vWF
and PSM form disulfide-linked multimers through their
respective amino-terminal D domains (Mayadas and
Wagner 1992; Perez-Vilar and Hill 1998b). Following
dimerization, multimerization, and glycosylation (Nand O-), most mucins are stored in secretory granules
before being secreted on the luminal surfaces of epithelia as large oligomeric molecules.
A 108-aa subdomain, rich in cysteine residues (10
Cys) and called the ‘‘Cys-subdomain,’’ has also been
found interrupting several times the central repetitive
1175
1176
Desseyn et al.
FIG. 1.—Schematic comparison of the mucins MUC5B, MUC5AC, MUC2, FIM-B.1, and PSM with the human von Willebrand factor
(vWF). The four D domains, D1, D2, D3, D4, of vWF, are 351–375 aa in length. The D9 domain is 89 aa in length. The three B domains and
the two C domains of vWF are 34, 25, 24, 116, and 118 aa in length, respectively. The three A domains are about 220 aa in length. Ovals
represent Cys-subdomains (108 aa, 10 Cys). Domains representing mucin-type domains (tandem repeats rich in Ser, Thr, and Pro) are hatched.
The four mucin subdomains RI–RIV of MUC5B are followed by a mucin-type polypeptide called R-End (111 aa). The two Cys-subdomains
of MUC2 (ovals) flank a nonpolymorphic mucin-type region. Variations in the inner cores (VNTR) of mucin regions are indicated by slashes.
The mucin region of FIM-B.1 is interrupted at least three times by an SCR (short consensus repeat of about 60 aa) motif. The central regions
of MUC5AC, MUC2, and PSM are composed of tandem repeats of 8, 23, and 81 aa in length, respectively. The number of Cys-subdomains
within the central part of MUC5AC (ovals) has not yet been determined. The two rectangles with asterisks in MUC5B, MUC2, and MUC5AC
represent the two motifs coded by two small exons in MUC5B which flank the central part of the three mucins.
parts of several mucins. This subdomain has been found
seven times in MUC5B (Desseyn et al. 1997a), twice in
MUC2 (Toribara et al. 1991), at least six times in
MUC5AC (Meerzaman et al. 1994; Guyonnet Dupérat
et al. 1995; Klomp, Van Rens, and Strous 1995), and
several times in various homologous animal mucins
(Hansson et al. 1994; Ohmori et al. 1994; Shekels et al.
1995; Turner et al. 1995; Inatomi et al. 1997). This Cyssubdomain has been well conserved throughout evolution. The Cys residues and some other amino acid residues are absolutely conserved (Desseyn et al. 1997b,
1998a), and one putative C-mannosylation consensus
sequence (W-x-x-W; Krieg et al. 1998) is always found
in the amino-terminal region of this domain. Because
this subdomain is found in humans, mice, and rats, it is
likely to play an important function, such as packaging
or trafficking, for example, or it may interact with other
components of the mucus.
Evolutionary studies of mucin genes can help to
define their structure-function relationship and elucidate
their individual biological roles. The genomic organizations of the two small mucin genes MUC1 and MUC7
have previously been reported (Lancaster et al. 1990;
Bobek et al. 1996). We recently published the complete
genomic sequence of the large secreted mucin MUC5B
(Desseyn et al. 1997a, 1997b, 1998b). Another group
reported a 39-aa-longer amino-terminal region (Offner
et al. 1998) with an additional first exon (which we call
0) and a longer exon corresponding to our exon 1 (which
we call 19) for the MUC5B gene. Comparison between
the full length (15.8 kb) cDNA sequence and the corresponding genomic sequence (39 kb) revealed a total
of 49 exons and 48 introns. The additional intron we
call 0, between exon 0 and exon 19, is 2.4 kb long (unpublished results) and is a phase 1 intron. Since MUC5B
is the only large mucin gene for which both complete
cDNA and genomic sequences have been determined, it
provides an excellent model for the investigation of mucin evolution.
Materials and Methods
The accession number of the central part (protein)
of MUC5B is CAA70926. Precise boundaries of the different repeats have previously been determined (Desseyn et al. 1997b). Nucleotide sequences of CK domains
are available from the EMBL database with the following accession numbers, and the sequences used for
alignment and analysis are defined as follows: human
NDP (hNDP): NMp000266, nt 571–792; mouse NDP
(mNDP): X92394, nt 588–809; MUC5AC: AJ001402,
nt 2917–3123; RMuc5ac: U83139, nt 3078–3284;
MUC5B: Y09788, nt 9117–9970 (join 9172 to 9829);
MUC2: M94132, nt 2680–2877; RMuc2: M81920, nt
2236–2433; BSM: M36192, nt 1524–1721; PSM:
M61883, nt 3226–3423; FIM-B.1: J02910, nt 967–1164;
Human vWF (hvWF): NMp000552, nt 8215–8418;
MUC6: U97698, nt 1033–1242.
The multiple-sequence alignments were made using the CLUSTAL X program (Thompson, Higgins, and
Evolution of the Large Secreted Mucins
Gibson 1994) and are displayed by TREEVIEW (Page
1996).
Results and Discussion
Evolutionary History of the Unusual Large Central
Exon of MUC5B
The entire mucin MUC5B gene has been cloned
within two overlapping cosmid clones (Desseyn et al.
1997a, 1997b, 1998b). The mucin-type region (rich in
Ser, Thr, and Pro) is composed of irregular tandem repeats of 29 aa (87 bp) in domains called RI–RV (Desseyn et al. 1997b). This mucin-type region is interrupted
four times by two associated nontandemly repeated sequences (fig. 1). This allowed us to design primers to
amplify overlapping cDNAs corresponding to the central part of MUC5B. cDNA cloning and sequencing, together with genomic subcloning and sequencing, allowed us to establish that the central part of MUC5B
does not contain any intronic—unique or tandemly repeated—sequence. We then conclude that the central
part of MUC5B is coded by a single unusually large
exon of 10,713 bp, and it is then likely that other large
mucins have their central parts coded by a single exon.
Moreover, this suggests that the central part arose
through internal duplications rather than through exon
shuffling. The availability of both complete cDNA and
genomic sequences of the central part of MUC5B (Desseyn et al. 1997b) now allows us to trace its evolutionary history. The deduced peptide is composed of three
kinds of subdomains, Cys-subdomains (108 aa, 10 Cys),
R subdomains (309–657 aa, composed of irregular tandem repeats of 29 aa), and R-End subdomains (111 aa).
The first three Cys-subdomains (fig. 1) are followed by
four super repeats. Each super repeat is composed of an
R subdomain followed by an R-End subdomain and ending with a Cys-subdomain. Each R-subdomain is composed of 11 (the first two and the last one) or 17 tandem
repeats of the irregular motif of 29 aa (87 bp) rich in
Ser and Thr (fig. 2A). The four R-End subdomains are
rich in Ser and Thr and are very similar to each other,
but they do not exhibit any similarity to any other sequence. Another R-subdomain of 23 irregular repeats of
29 aa follows the fourth super repeat. The presence of
repeats at different levels suggests that the central part
of the gene evolved by successive duplications. Multiple-sequence alignments and phylogenetic trees (figs. 2A
and B) together allow us to propose a model showing
how the five tandem repeat blocks RI–RV have been
made up. The repeat RV-14 has an extra pentapeptide,
TTTPT (fig. 2A), that probably arose through partial duplication of the RV-14 sequence. New multiple-sequence alignments and phylogenetic trees were then
constructed without this pentapeptide. This shows that
the block made up of the five repeats RV-3–RV-8 is
highly similar to the block made up of the five repeats
RV-18–RV-23. Further alignments without either RV18–RV-23 or RV-3–RV-8 show that the block RIII-1–
RIII-6 and the block RV-1–RV-6 are similar to blocks
made up of the six repeats of other subdomains. Moreover, alignments using RIII blocks or RV blocks and
1177
phylogenetic trees (data not shown) show that blocks
RIII/V-1–RIII/V-6 are more similar to blocks RIII/V-6–
RIII/V-12 than to other blocks of six repeats. Thus, these
analyses and the order of the subdomains that we defined within the central part of MUC5B allow us to propose a diagram showing the evolution of the repeated
sequences (fig. 3). This scheme is in agreement with our
previous model showing evolution from a single ancestral gene of the three human mucin genes MUC5B,
MUC5AC, and MUC2 (Desseyn et al. 1998a). A part of
an ancestral gene encoding a primordial Cys-subdomain
triplicated to give rise to three Cys-subdomains (fig. 3a).
The resulting gene, composed of these three subdomains
flanked by unique sequences rich in Cys and found in
the vWF gene (see below), duplicated into the two ancestor genes of MUC5AC and MUC5B (Desseyn et al.
1998a). The primordial repeat of 87 bp of MUC5B duplicated several times to form a block composed of 11
irregular repeats of 87 bp, followed by a unique sequence rich in Ser and Thr coding for 111 aa. The ancestral super repeat (Cys/R/R-End subdomains) duplicated into two super repeats (fig. 3b). Then, the first six
repeats of the second block of 11 repeats duplicated (fig.
3c). This event was followed by a further duplication en
bloc of a region composed of the third Cys-subdomain,
the block of 11 repeats, the R-End subdomain, the last
Cys-subdomain, and the block of 17 repeats of 29 aa
(fig. 3d). Finally, the block composed of the first 11
repeats of 87 bp, the following R-End subdomain, and
the Cys-subdomain duplicated en bloc (fig. 3e). The five
repeats RV-3–RV-7 duplicated into the two blocks RV3–RV-7 and RV-18–RV-23 (fig. 3f). A sequence encoding the pentapeptide TTTPT of RV-14 duplicated (fig.
3g).
Relationships Among Cystine Knot (CK) Motifs and
Pattern of Mucin Gene Evolution
Since our previous model of evolution of the three
human mucin genes MUC2, MUC5AC, and MUC5B
(Desseyn et al. 1998a), several new mucin cDNAs encoding a CK motif have been analyzed. Sequences coding the carboxy-terminal peptide between the 10 last
conserved cysteine residues of the CK motif of MUC5B
(Desseyn et al. 1997a), MUC2 (Gum et al. 1992),
MUC6 (Toribara et al. 1997), MUC5AC (Buisine et al.
1998a), RMuc2 (Xu et al. 1992), RMuc5ac (Inatomi et
al. 1997), frog integumentary mucin FIM-B.1 (Probst,
Gertzen, and Hoffmann 1990), porcine and bovine submaxillary mucins PSM (Eckhardt et al. 1991) and BSM
(Bhargava et al. 1990), the human Norrie disease protein
(hNDP; Berger et al. 1992), the mouse NDP (mNDP;
Berger et al. 1996) and the human vWF gene (Mancuso
et al. 1989) were aligned using the CLUSTAL X program (Thompson, Higgins, and Gibson 1994). The
alignment was optimized based on the nine cysteine residues conserved in the 12 sequences (fig. 4A). The phylogenetic tree (fig. 4B) reveals three subfamilies of mucin genes: MUC6 alone, FIM-B.1, PSM, and BSM together, and MUC5AC, MUC5B, and MUC2 together
grouped with their animal homologous mucin genes. It
1178
Desseyn et al.
FIG. 2.—A, Sequence alignment of the 73 tandem repeats of MUC5B. Alignment gaps are indicated by dashes. Multiple alignments were
performed with the CLUSTAL X program (Thompson, Higgins, and Gibson 1994). B, Deduced phylogenetic tree. The neighbor-joining tree is
displayed by the TREEVIEW program (Page 1996).
is noticeable that Cys-subdomains of 108 aa containing
10 Cys have been found in all the members of this last
subfamily. Moreover, out of the four mucin genes of
chromosome 11, MUC6 is closer to FIM-B.1, PSM, or
BSM. Another observation reinforces this idea: the D4
and B domains found in vWF, MUC2, MUC5B, and
MUC5AC are missing in MUC6, PSM, and BSM (fig.
1). Central parts of MUC2, MUC5B, and MUC5AC, in
contrast to PSM and FIM-B.1, are flanked by two domains not found in vWF and encoded, as shown at least
for MUC5B, by two small exons (indicated with asterisks in fig. 1) of 198 and 182 bp, respectively. We can
then speculate that the common ancestor gene of the
mammalian genes MUC2, MUC5AC, and MUC5B and
of their animal homologous mucin genes contains a sequence coding for one Cys-subdomain and flanked at
both its ends by these exons.
In addition to the genomic sequence of MUC5B,
the genomic organization is available for the region
downstream of the repetitive part of MUC6, and it
Evolution of the Large Secreted Mucins
FIG. 3.—Evolution scheme of the MUC5B central part. Ovals
represent Cys-subdomains. Each long rectangle denotes an irregular
motif of 29 aa. Empty small rectangles represent R-End subdomains
of 111 aa. The order of the three events e, f, and g is undetermined,
and they are represented on a single step.
showed that the central domain, rich in Ser and Thr, is
followed by the CK domain, which is the last domain
found in the three other mucins of human chromosome
11. This suggests that during evolution, MUC6 lost several exons coding for the D4, B, and C domains. This
may have been possible, since the intron between the
central part and the CK domain of MUC6 and the two
introns (introns 30 and 46) of MUC5B flanking its domains which are not found in MUC6 have the same class
(class 1), and deletion of the genomic part flanked by
two introns with the same class preserved the downstream reading frame in MUC6. This may have happened by reverse transcription of a spliced variant
mRNA which replaced the endogenous genomic copy
through homologous recombination. This is a simple
mechanism by which contiguous blocks of introns/exons
are removed in one event (Frugoli et al. 1998). Because
the central part of MUC6 does not seem to contain any
Cys-subdomain, we can now propose a general scheme
of the evolutionary history of the four human mucin
genes of chromosome 11 (fig. 5). Our interpretation is
that the present MUC5B and MUC5AC genes evolved
from a common ancestor, which we termed the
MUC5ACB progenitor. This progenitor derived from a
common progenitor (MUC2-5ACB) to the present
MUC2 gene by duplication involving the entire gene.
This hypothetical progenitor contained the initial Cyssubdomain and itself had a progenitor in common with
the present MUC6 gene. This last progenitor contained
the D1-D2-D9-D3 and D4-B-C-CK domains inherited
from a common ancestor gene to the vWF gene. In the
vWF gene, the B and C domains triplicated and duplicated, respectively, while the MUC6 gene lost several
exons coding for the D4, B, and C domains. This putative evolution scheme takes into account the genomic
organization, as well as sequence similarities and the
1179
order of the four human mucin genes on chromosome
11. The amino- and carboxy-terminal domains rich in
Cys are found conserved in mammalian mucins and frog
mucin. These regions are most likely preserved from the
early ancestor, whereas the lack of similarity in the central part carrying the carbohydrate chains suggests
changes in sequence and structural organization that occurred after the amphibian/mammalian divergence.
The 11p15 mucin gene family arose from rare
events. Three mechanisms may have acted (reviewed in
Danielson and Dores 1999): gene amplification through
amplicon formation, gene duplication through chromosomal breakage and ligation, and gene duplication
through unequal crossing over at repeated elements. One
of the recent and unexpected findings is that the large
mucin genes are differently expressed, spatially and
temporally, between the embryo and normal adults
(Buisine et al. 1998b, 1999; Reid, Gould, and Harris
1997; Reid and Harris 1998). Although very little information is available concerning the regulatory elements, it is likely that the cis elements differ among
mucin genes. A better understanding of their expression
pattern and mechanisms that control the cell-specificity
and temporal expression may come from their regulatory regions.
After mucin gene multiplication, further recent duplications within their central part, as suggested above
for MUC5B, led to the present genomic organization of
the four human mucins. Of special interest is the observation that the tandemly repeated coding sequences of
mucins which contain most of the potential O-glycosylation sites are not conserved within species and between
species. The fact that each mucin has its tandem repeats
more or less conserved strongly suggests that (1) the
central part has evolved with a selective pressure to keep
Ser and Thr codons corresponding to the O-glycosylation sites and (2) each tandem repeat portion has arisen
through internal successive duplications.
The most recent major event is probably the formation of the central repetitive region of each mucin
gene, since the repeated sequences are not conserved
among mucin genes. The single large exon of mucin
genes is highly variable in size and sequence between
species and between members of species. Tandem repeats expanded through replication slippage, unequal
sister chromatid exchanges, and gene conversion (Vinall
et al. 1998). This does not allow any frameshift changes
but allows variability among individuals in the number
of repeats, although peptides between two consecutive
Cys-subdomains are always about 400 aa long (Desseyn
1997; Wickstrom et al. 1998), which correlates well with
previous electronic microscopy studies showing the heterogeneity of mucin glycoproteins (Sheehan et al. 1991).
Genomic Organization of the MUC5B, MUC5AC, and
vWF Genes
Comparison of the genomic DNA sequence of
MUC5B with its cDNA sequence allowed us for the first
time to determine the genomic organization of a large
mucin gene (Desseyn et al. 1997a, 1997b, 1998b; Offner
1180
Desseyn et al.
Evolution of the Large Secreted Mucins
FIG. 5.—Hypothetical diagram showing the evolution of the four
human mucin genes clustered on the chromosome 11p15 from a common ancestor of the human vWF gene.
1181
or less perfectly conserved between MUC5B and
MUC5AC genes and the first 7 bp of some introns are
identical, for example. This probably reflects the fact
that intronic splice junctions may not accumulate mutations at the same rate as the rest of the intronic sequences. Although almost no data are available concerning the exon-intron organization of other gel-forming mucin genes, we can anticipate that they all probably
depict the same overall genomic organization.
A lot of intron classes and positions and, by implication, splice-site consensus sequences are conserved
between vWF and MUC5B or MUC5AC, but it is very
noticeable that introns of the vWF gene are longer than
the corresponding introns of the MUC5B gene and the
MUC5AC gene. Although intron comparisons between
MUC5B, MUC5AC, and vWF failed to show any conserved regions except splicing sites, we think that some
of these introns have functions. Intron 36 of MUC5B is
made up almost entirely of perfect direct repeats of 59
bp. The number of repeats is variable among individuals, ranging from three to eight repeats (Desseyn, Rousseau, and Laine 1999). Each repeat contains one binding
site that leads to a specific interaction with a nuclear
factor from mucus-secreting cells (Pigny et al. 1996b).
As shown for some factors (Nakamura, Koyama, and
Matsushima 1998), this factor may play a role in splicing and/or in pre-mRNA stability.
Conclusions
et al. 1998). These studies revealed a total of 48 introns,
30 introns upstream and 18 introns downstream of the
large central exon (exon 30). The work on the MUC5B
gene shows that 23 out of the 51 introns of the vWF
gene have the same position and class in the MUC5B
gene (table 1), and 9 other introns of MUC5B may be
conserved with introns of the vWF gene. Few introns
found in MUC5B are not found in vWF, and vice versa.
Genomic organization of other human and animal mucin
genes (MUC2, MUC5AC, RMuc2, Muc5ac, and FIMB.1) may be helpful in determining which introns have
been gained and which introns have been lost during
evolution. Determination of the genomic organization of
MUC5B facilitated the determination of the genomic organization of the 39 end of the MUC5AC gene (Buisine
et al. 1998a), which showed that all of the introns found
in MUC5AC are conserved (table 1) compared with
MUC5B (position and class). However, unique tandemly
repeated sequences identified in some introns of
MUC5B have not been found in MUC5AC. This is probably due to insignificant selective pressure on the intronic sequences. In contrast to exonic regions, intron
sizes are not conserved between the two genes. Nevertheless, sequences surrounding splice junctions are more
The four mucins clustered to human chromosome
11 have a CK domain and thus form a mucin subfamily.
This subfamily can be divided into two subfamilies depending on the presence or absence of Cys-subdomains
interrupting the large O-glycosylated domain. The number of Cys-subdomains is characteristic of each mucin,
and studies on this domain will help to elucidate physiological functions of mucins.
Further identification of novel mucin genes, cloning of new mucin cDNAs and determination of mucin
gene structure will help to characterize the structurefunction relationship of mucins. Conserved domains in
mucin peptide are most likely to have functional significance, but unique polypeptides should be considered to
have been formed during evolution due to differing biological constraints.
Acknowledgments
This work was supported by le Comité du Nord de
la Ligue Nationale contre le Cancer and l’Association
de Recherche contre le Cancer. J.-L.D. was supported
by a fellowship from the Ministère de l’Education Supérieure et de la Recherche.
←
FIG. 4.—A, Alignment of CK sequences between the last nine cysteine residues. hNDP and mNDP are human and mouse Norrie disease
proteins. Codons coding for cysteine residues and for conserved amino acids are shown in bold. Alignment gaps are indicated by dashes.
Asterisks indicate positions which have a single conserved nucleotide. Plus signs indicate positions which have two conserved nucleotides. The
alignment was constructed using the CLUSTAL X program (Thompson, Higgins, and Gibson 1994). B, Deduced phylogenetic tree. The neighborjoining tree is displayed by the TREEVIEW program (Page 1996).
1182
Desseyn et al.
Table 1
Similar Introns (Position and Class) Between nWF, MUC5B, and MUC5AC
nWF
Name
4
5
6
7
9
14
15
16
19
20
22
24
25
27
33
34
36
37
38
40 
41 
45
46
43 
47 
Class
2
1
0
1
2
1
1
2
2
0
0
0
1
2
0
1
1
1
0
1
1
1
0
0
0
MUC5B
MUC5AC
Size (kb)
Name
Class
0.283
;15.0
;19.9
;1.6
;1.0
;0.82
;4.1
;5.3
;1.6
;3.1
;3.0
;1.8
;0.74
;2.1
0.292
;16.4
0.211
;2.0
;6.5
1.81
1.157
;1.0
0.524
;4.5
;14.2
3
5
6
7
9
14
16
17
20
21
23
24
25
27
31/B
32/C
34/E
36/G
39/J
 40/K


41/L
42/M
 43/N


2
1
0
1
2
1
1
2
2
0
0
0
1
2
0
1
1
1
0
1
0.154
0.274
0.786
0.403
0.196
0.290
0.242
0.523
0.393
0.098
0.357
1.703
0.412
0.100
0.283
1.118
0.159
0.538
0.258
0.389
1
0
0
0.126
0.574
0.696
LITERATURE CITED
BELL, S. L., I. A. KHATRI, G. XU, and J. F. FORSTNER. 1998.
Evidence that a peptide corresponding to the rat Muc2 Cterminus undergoes disulphide-mediated dimerization. Eur.
J. Biochem. 253:123–131.
BERGER, W., A. MEINDL, T. J. VAN DE POL et al. (14 coauthors). 1992. Isolation of a candidate gene for Norrie disease by positional cloning. Nat. Genet. 2:84.
BERGER, W., D. VAN DE POL, D. BACHNER, F. OERLEMANS, H.
WINKENS, H. HAMEISTER, B. WIERINGA, W. HENDRIKS, and
H. H. ROPERS. 1996. An animal model for Norrie disease
(ND): gene targeting of the mouse ND gene. Hum. Mol.
Genet. 5:51–59.
BHARGAVA, A. K., J. T. WOITACH, E. A. DAVIDSON, and V. P.
BHAVANANDAN. 1990. Cloning and cDNA sequence of a
bovine submaxillary gland mucin-like protein containing
two distinct domains. Proc. Natl. Acad. Sci. USA 87:6798–
6802.
BOBEK, L. A., J. LIU, S. N. SAIT, T. B. SHOWS, Y. A. BOBEK,
and M. J. LEVINE. 1996. Structure and chromosomal localization of the human salivary mucin gene, MUC7. Genomics 31:277–282.
BUISINE, M. P., J. L. DESSEYN, N. PORCHET, P. DEGAND, A.
LAINE, and J. P. AUBERT. 1998a. Genomic organization of
the 39-region of the human MUC5AC mucin gene: additional evidence for a common ancestral gene for the
11p15.5 mucin gene family. Biochem. J. 332:729–738.
BUISINE, M. P., L. DEVISME, M. C. COPIN, M. DURAND-REVILLE, B. GOSSELIN, J. P. AUBERT, and N. PORCHET. 1999.
Developmental mucin gene expression in the human respiratory tract. Am. J. Respir. Cell Mol. Biol. 20:209–218.
BUISINE, M. P., L. DEVISME, T. C. SAVIDGE, C. GESPACH, B.
GOSSELIN, N. PORCHET, and J. P. AUBERT. 1998b. Mucin
gene expression in human embryonic and fetal intestine.
Gut 43:519–524.
Size (kb)
Name
B
C
E
G
J
K


L
M
N


Class
Size (kb)
0
1
1
1
0
1
0.502
0.447
0.340
0.333
0.558
0.263
1
0
0
0.263
0.567
0.090
DANIELSON, P. B., and R. M. DORES. 1999. Molecular evolution of the opioid/orphanin gene family. Gen. Comp. Endocrinol. 113:169–186.
DESSEYN, J. L. 1997. Genomic organization of the human mucin MUC5B. Molecular basis of a new classification of mucins. Ph.D. thesis, University of Lille, France.
DESSEYN, J. L., J. P. AUBERT, I. VAN SEUNINGEN, N. PORCHET,
and A. LAINE. 1997a. Genomic organization of the 39 region of the human mucin gene MUC5B. J. Biol. Chem. 272:
16873–16883.
DESSEYN, J. L., M. P. BUISINE, N. PORCHET, J. P. AUBERT, P.
DEGAND, and A. LAINE. 1998a. Evolutionary history of the
11p15 human mucin gene family. J. Mol. Evol. 46:102–
106.
DESSEYN, J. L., M. P. BUISINE, N. PORCHET, J. P. AUBERT, and
A. LAINE. 1998b. Genomic organization of the human mucin gene MUC5B. cDNA and genomic sequences upstream
of the large central exon. J. Biol. Chem. 273:30157–30164.
DESSEYN, J. L., V. GUYONNET-DUPERAT, N. PORCHET, J. P. AUBERT, and A. LAINE. 1997b. Human mucin gene MUC5B,
the 10.7-kb large central exon encodes various alternate
subdomains resulting in a super-repeat. Structural evidence
for a 11p15.5 gene family. J. Biol. Chem. 272:3168–3178.
DESSEYN, J. L., K. ROUSSEAU, and A. LAINE. 1999. Fifty-nine
bp repeat polymorphism in the uncommon intron 36 of the
human mucin gene MUC5B. Electrophoresis 20:493–496.
ECKHARDT, A. E., C. S. TIMPTE, J. L. ABERNETHY, Y. ZHAO,
and R. L. HILL. 1991. Porcine submaxillary mucin contains
a cystine-rich, carboxyl-terminal domain in addition to a
highly repetitive, glycosylated domain. J. Biol. Chem. 266:
9678–9686.
ECKHARDT, A. E., C. S. TIMPTE, A. W. DELUCA, and R. L.
HILL. 1997. The complete cDNA sequence and structural
polymorphism of the polypeptide chain of porcine submaxillary mucin. J. Biol. Chem. 272:33204–33210.
Evolution of the Large Secreted Mucins
FRUGOLI, J. A., M. A. MCPEEK, T. L. THOMAS, and C. R.
MCCLUNG. 1998. Intron loss and gain during evolution of
the catalase gene family in angiosperms. Genetics 149:355–
365.
GUM, J. R. JR., J. W. HICKS, N. W. TORIBARA, E. M. ROTHE,
R. E. LAGACE, and Y. S. KIM. 1992. The human MUC2
intestinal mucin has cysteine-rich subdomains located both
upstream and downstream of its central repetitive region. J.
Biol. Chem. 267:21375–21383.
GUM, J. R. JR., J. W. HICKS, N. W. TORIBARA, B. SIDDIKI, and
Y. S. KIM. 1994. Molecular cloning of human intestinal mucin (MUC2) cDNA. Identification of the amino terminus
and overall sequence similarity to prepro-von Willebrand
factor. J. Biol. Chem. 269:2440–2446.
GUYONNET DUPÉRAT, V., J. P. AUDIE, V. DEBAILLEUL, A. LAINE, M. P. BUISINE, S. GALIEGUE-ZOUITINA, P. PIGNY, P. DEGAND, J. P. AUBERT, and N. PORCHET. 1995. Characterization of the human mucin gene MUC5AC: a consensus cysteine-rich domain for 11p15 mucin genes? Biochem. J. 305:
211–219.
HANSSON, G. C., D. BAECKSTROM, I. CARLSTEDT, and K. KLINGA-LEVAN. 1994. Molecular cloning of a cDNA coding for
a region of an apoprotein from the ‘insoluble’ mucin complex of rat small intestine. Biochem. Biophys. Res. Commun. 198:181–190.
HO, S. B., and Y. S. KIM. 1991. Carbohydrate antigens on
cancer-associated mucin-like molecules. Semin. Cancer
Biol. 2:389–400.
INATOMI, T., A. S. TISDALE, Q. ZHAN, S. SPURR-MICHAUD, and
I. K. GIPSON. 1997. Cloning of rat Muc5AC mucin gene:
comparison of its structure and tissue distribution to that of
human and mouse homologues. Biochem. Biophys. Res.
Commun. 236:789–797.
JOBA, W., and W. HOFFMANN. 1997. Similarities of integumentary mucin B.1 from Xenopus laevis and prepro- von Willebrand factor at their amino-terminal regions. J. Biol.
Chem. 272:1805–1810.
KLOMP, L. W., L. VAN RENS, and G. J. STROUS. 1995. Cloning
and analysis of human gastric mucin cDNA reveals two
types of conserved cysteine-rich domains. Biochem. J. 308:
831–838.
KRIEG, J., S. HARTMANN, A. VICENTINI, W. GLASNER, D. HESS,
and J. HOFSTEENGE. 1998. Recognition signal for C-mannosylation of Trp-7 in RNase 2 consists of sequence Trp-xx-Trp. Mol. Biol. Cell 9:301–309.
LANCASTER, C. A., N. PEAT, T. DUHIG, D. WILSON, J. TAYLORPAPADIMITRIOU, and S. J. GENDLER. 1990. Structure and
expression of the human polymorphic epithelial mucin
gene: an expressed VNTR unit. Biochem. Biophys. Res.
Commun. 173:1019–1029.
LESUFFLEUR, T., F. ROCHE, A. S. HILL, M. LACASA, M. FOX,
D. M. SWALLOW, A. ZWEIBAUM, and F. X. REAL. 1995.
Characterization of a mucin cDNA clone isolated from HT29 mucus-secreting cells. The 39 end of MUC5AC? J. Biol.
Chem. 270:13665–13673.
LI, D., M. GALLUP, N. FAN, D. E. SZYMKOWSKI, and C. B.
BASBAUM. 1998. Cloning of the amino-terminal and 59flanking region of the human MUC5AC mucin gene and
transcriptional up-regulation by bacterial exoproducts. J.
Biol. Chem. 273:6812–6820.
MANCUSO, D. J., E. A. TULEY, L. A. WESTFIELD, N. K. WORRALL, B. B. SHELTON-INLOES, J. M. SORACE, Y. G. ALEVY,
and J. E. SADLER. 1989. Structure of the gene for human
von Willebrand factor. J. Biol. Chem. 264:19514–19527.
MAYADAS, T. N., and D. D. WAGNER. 1992. Vicinal cysteines
in the prosequence play a role in von Willebrand factor
1183
multimer assembly. Proc. Natl. Acad. Sci. USA 89:3531–
3535.
MEERZAMAN, D., P. CHARLES, E. DASKAL, M. H. POLYMEROPOULOS, B. M. MARTIN, and M. C. ROSE. 1994. Cloning
and analysis of cDNA encoding a major airway glycoprotein, human tracheobronchial mucin (MUC5). J. Biol.
Chem. 269:12932–12939.
MEITINGER, T., A. MEINDL, P. BORK, B. ROST, C. SANDER, M.
HAASEMANN, and J. MURKEN. 1993. Molecular modelling
of the Norrie disease protein predicts a cystine knot growth
factor tertiary structure. Nat. Genet. 5:376–380.
NAKAMURA, Y., K. KOYAMA, and M. MATSUSHIMA. 1998.
VNTR (variable number of tandem repeat) sequences as
transcriptional, translational, or functional regulators. J.
Hum. Genet. 43:149–152.
OFFNER, G. D., D. P. NUNES, A. C. KEATES, N. H. AFDHAL,
and R. F. TROXLER. 1998. The amino-terminal sequence of
MUC5B contains conserved multifunctional D domains:
implications for tissue-specific mucin functions. Biochem.
Biophys. Res. Commun. 251:350–355.
OHMORI, H., A. F. DOHRMAN, M. GALLUP, T. TSUDA, H. KAI,
J. R. GUM JR., Y. S. KIM, and C. B. BASBAUM. 1994. Molecular cloning of the amino-terminal region of a rat MUC
2 mucin gene homologue. Evidence for expression in both
intestine and airway. J. Biol. Chem. 269:17833–17840.
PAGE, R. D. 1996. TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 12:357–358.
PEREZ-VILAR, J., A. E. ECKHARDT, and R. L. HILL. 1996. Porcine submaxillary mucin forms disulfide-bonded dimers between its carboxyl-terminal domains. J. Biol. Chem. 271:
9845–9850.
PEREZ-VILAR, J., and R. L. HILL. 1997. Norrie disease protein
(norrin) forms disulfide-linked oligomers associated with
the extracellular matrix. J. Biol. Chem. 272:33410–33415.
———. 1998a. The carboxyl-terminal 90 residues of porcine
submaxillary mucin are sufficient for forming disulfidebonded dimers. J. Biol. Chem. 273:6982–6988.
———. 1998b. Identification of the half-cystine residues in
porcine submaxillary mucin critical for multimerization
through the D-domains. Roles of the CGLCG motif in the
D1- and D3-domains. J. Biol. Chem. 273:34527–34534.
PIGNY, P., V. GUYONNET-DUPERAT, A. S. HILL et al. (14 coauthors). 1996a. Human mucin genes assigned to 11p15.5:
identification and organization of a cluster of genes. Genomics 38:340–352.
PIGNY, P., I. VAN SEUNINGEN, J. L. DESSEYN, S. NOLLET, N.
PORCHET, A. LAINE, and J. P. AUBERT. 1996b. Identification
of a 42-kDa nuclear factor (NF1-MUC5B) from HT-29
MTX cells that binds to the 39 region of human mucin gene
MUC5B. Biochem. Biophys. Res. Commun. 220:186–191.
PROBST, J. C., E. M. GERTZEN, and W. HOFFMANN. 1990. An
integumentary mucin (FIM-B.1) from Xenopus laevis homologous with von Willebrand factor. Biochemistry 29:
6240–6244.
REID, C. J., S. GOULD, and A. HARRIS. 1997. Developmental
expression of mucin genes in the human respiratory tract.
Am. J. Respir. Cell. Mol. Biol. 17:592–598.
REID, C. J., and A. HARRIS. 1998. Developmental expression
of mucin genes in the human gastrointestinal system. Gut
42:220–226.
SHEEHAN, J. K., R. P. BOOT-HANDFORD, E. CHANTLER, I.
CARLSTEDT, and D. J. THORNTON. 1991. Evidence for
shared epitopes within the ‘naked’ protein domains of human mucus glycoproteins. A study performed by using
polyclonal antibodies and electron microscopy. Biochem. J.
274:293–296.
1184
Desseyn et al.
SHEKELS, L. L., C. LYFTOGT, M. KIELISZEWSKI, J. D. FILIE, C.
A. KOZAK, and S. B. HO. 1995. Mouse gastric mucin: cloning and chromosomal localization. Biochem. J. 311:775–
785.
THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.
TORIBARA, N. W., J. R. GUM JR., P. J. CULHANE, R. E. LAGACE,
J. W. HICKS, G. M. PETERSEN, and Y. S. KIM. 1991. MUC2 human small intestinal mucin gene structure. Repeated
arrays and polymorphism. J. Clin. Invest. 88:1005–1013.
TORIBARA, N. W., S. B. HO, E. GUM, J. R. GUM JR., P. LAU,
and Y. S. KIM. 1997. The carboxyl-terminal sequence of the
human secretory mucin, MUC6. Analysis Of the primary
amino acid sequence. J. Biol. Chem. 272:16398–16403.
TURNER, B. S., K. R. BHASKAR, M. HADZOPOULOU-CLADARAS,
R. D. SPECIAN, and J. T. LAMONT. 1995. Isolation and characterization of cDNA clones encoding pig gastric mucin.
Biochem. J. 308:89–96.
VAN DE BOVENKAMP, J. H., C. M. HAU, G. J. STROUS, H. A.
BULLER, J. DEKKER, and A. W. EINERHAND. 1998. Molecular cloning of human gastric mucin MUC5AC reveals con-
served cysteine-rich D-domains and a putative leucine zipper motif. Biochem. Biophys. Res. Commun. 245:853–859.
VINALL, L. E., A. S. HILL, P. PIGNY, W. S. PRATT, N. TORIBARA, J. R. GUM, Y. S. KIM, N. PORCHET, J. P. AUBERT,
and D. M. SWALLOW. 1998. Variable number tandem repeat
polymorphism of the mucin genes located in the complex
on 11p15.5. Hum. Genet. 102:357–366.
VOORBERG, J., R. FONTIJN, J. CALAFAT, H. JANSSEN, J. A. VAN
MOURIK, and H. PANNEKOEK. 1991. Assembly and routing
of von Willebrand factor variants: the requirements for disulfide-linked dimerization reside within the carboxy-terminal 151 amino acids. J. Cell Biol. 113:195–205.
WICKSTROM, C., J. R. DAVIES, G. V. ERIKSEN, E. C. VEERMAN,
and I. CARLSTEDT. 1998. MUC5B is a major gel-forming,
oligomeric mucin from human salivary gland, respiratory
tract and endocervix: identification of glycoforms and Cterminal cleavage. Biochem. J. 334:685–693.
XU, G., L. J. HUAN, I. A. KHATRI, D. WANG, A. BENNICK, R.
E. FAHIM, G. G. FORSTNER, and J. F. FORSTNER. 1992.
cDNA for the carboxyl-terminal region of a rat intestinal
mucin-like peptide. J. Biol. Chem. 267:5401–5407.
CLAUDIA KAPPEN, reviewing editor
Accepted March 31, 2000