Prevertebrate Local Gene Duplication Facilitated Expansion of the

Prevertebrate Local Gene Duplication Facilitated Expansion of
the Neuropeptide GPCR Superfamily
Seongsik Yun,y,1 Michael Furlong,y,1 Mikang Sim,2 Minah Cho,2 Sumi Park,1 Eun Bee Cho,1
Arfaxad Reyes-Alcaraz,1 Jong-Ik Hwang,1 Jaebum Kim,2 and Jae Young Seong*,1
1
Graduate School of Medicine, Korea University, Seoul, Republic of Korea
Department of Animal Biotechnology, Konkuk University, Seoul, Republic of Korea
y
These authors contributed equally to this work.
*Corresponding author: E-mail: [email protected].
Associate editor: David Irwin
2
Abstract
In humans, numerous genes encode neuropeptides that comprise a superfamily of more than 70 genes in approximately
30 families and act mainly through rhodopsin-like G protein-coupled receptors (GPCRs). Two rounds of whole-genome
duplication (2R WGD) during early vertebrate evolution greatly contributed to proliferation within gene families; however, the mechanisms underlying the initial emergence and diversification of these gene families before 2R WGD are
largely unknown. In this study, we analyzed 25 vertebrate rhodopsin-like neuropeptide GPCR families and their cognate
peptides using phylogeny, synteny, and localization of these genes on reconstructed vertebrate ancestral chromosomes
(VACs). Based on phylogeny, these GPCR families can be divided into five distinct clades, and members of each clade tend
to be located on the same VACs. Similarly, their neuropeptide gene families also tend to reside on distinct VACs.
Comparison of these GPCR genes with those of invertebrates including Drosophila melanogaster, Caenorhabditis elegans,
Branchiostoma floridae, and Ciona intestinalis indicates that these GPCR families emerged through tandem local duplication during metazoan evolution prior to 2R WGD. Our study describes a presumptive evolutionary mechanism and
development pathway of the vertebrate rhodopsin-like GPCR and cognate neuropeptide families from the urbilaterian
ancestor to modern vertebrates.
Key words: neuropeptide, GPCR, coevolution, gene duplication, whole-genome duplication, evolutionary history.
Introduction
ß The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Mol. Biol. Evol. 32(11):2803–2817 doi:10.1093/molbev/msv179 Advance Access publication September 3, 2015
2803
Fast Track
reverse pharmacological approaches in insects (Hauser et al.
2006; Jiang et al. 2013), nematodes (Lindemans et al. 2009),
and vertebrates (Civelli et al. 2006) have facilitated discovery
of a substantial number of neuropeptide-GPCR systems.
Recently, phylogenetic analysis of neuropeptide GPCRs revealed subtrees containing mixed sets of protostomian and
deuterostomian sequences, which indicates that many vertebrate and arthropod GPCR families that were previously
thought to be phyla specific actually have a bilaterian origin
(Mirabeau and Joly 2013). Recent studies indicate that the
continuous emergence of new genes through duplication in
ancestral species followed by neo-/subfunctionalization or
gene loss (Krishnan et al. 2012; Assis and Bachtrog 2013;
Hwang et al. 2013) during metazoan evolution established
the current diversified range of neuropeptide-GPCR families.
Syntenic analysis of vertebrate genome fragments and
comparison of whole chromosomes of evolutionarily distant
taxa support the concept of two rounds of whole-genome
duplication (2R WGD) during early vertebrate emergence
(Ohno 1970). These events produced, on average, four paralogous chromosomes that share related sets of genes, defined
as ohnologs (Dehal and Boore 2005; Meyer and Van de Peer
2005; Nakatani et al. 2007; Putnam et al. 2008). The quadruplication of genes by 2R WGD facilitated the rapid proliferation of genes within vertebrate gene families. Additionally,
local gene duplications shortly before, during, and after 2R
Article
Neuropeptides play a plethora of roles in the nervous system
and peripheral organs, mediating a number of physiological
processes including reproduction, growth, homeostasis, metabolism, food intake, sleep, and social and sexual behaviors
(Cho et al. 2007; Mirabeau and Joly 2013; Vaudry and Seong
2014). Mature neuropeptides are derived from large precursor
proteins that share typical structures in their primary amino
acid sequences. These structures include a signal peptide sequence at the N-terminus; an evolutionary conserved mature
neuropeptide sequence with canonical dibasic processing
sites, which are cleaved by prohormone convertases
(Steiner 1998); and often a glycine residue, which can be
subjected to amidation, at the C-terminus of the mature
peptide (Eipper et al. 1992). To date, more than 100 neuropeptides have been identified in humans, and the vast majority of these act through G protein-coupled receptors
(GPCRs) (Fredriksson et al. 2003; Oh et al. 2006), which are
characterized by their possession of seven transmembrane
helices. Phylogenetic analysis indicates that the majority of
neuropeptide GPCRs are clustered in the and groups of
the rhodopsin-like (class A) or in the secretin-like (class B)
GPCR families (Fredriksson et al. 2003).
Rapid accumulation of protostomian and deuterostomian
data, recent advances of bioinformatics tools for neuropeptide and GPCR identification (Mirabeau et al. 2007), and
MBE
Yun et al. . doi:10.1093/molbev/msv179
WGD further contributed to the proliferation within gene
families (Lundin 1993; Holland et al. 1994; Larhammar and
Salaneck 2004; Hwang et al. 2013, 2014; Kim et al. 2014;
Sefideh et al. 2014). More recent local duplications within
gene families are easier to identify, as these duplications
tend to share higher sequence similarity and be located
close to each other. However, genes that emerged earlier in
metazoan and vertebrate evolution have accumulated mutations, leading to greater deviation in function and residue
sequence. Furthermore, chromosomal shuffling has translocated these genes to new inter- or intrachromosomal regions
(Nakatani et al. 2007; Hwang et al. 2013). Thus, phylogenetically related genes created by local duplication are often located on different chromosomes, despite originating from the
same ancestral location. Large-scale gene loss and occasional
single gene translocation further complicate syntenic analysis
(Sefideh et al. 2014).
The evolutionary relationships of gene families can be examined by phylogenetic analysis; however, in the case of neuropeptide genes phylogenetic analysis often does not provide
sufficiently reliable results to ascertain the order of emergence
of, or relationships within, neuropeptide gene families
(Cardoso et al. 2010; Hwang et al. 2013; Kim et al. 2014). In
general, signal peptide sequences are not conserved, and propeptide sequences, other than the mature peptide, are highly
variable, as these sequences are free from evolutionary selective pressure (Lee et al. 2009). Sequence comparison of the
short, conserved mature peptides is not sufficient to extrapolate reliable relational information. Furthermore, paralogous
peptide genes that arose by local gene duplication prior to
vertebrate emergence display considerable variation, even in
the mature peptide sequences (Cardoso et al. 2010; Hwang
et al. 2013). In contrast to peptide genes, evolutionary
relationships among related GPCR families can be more accurately assessed by phylogenetic analysis. GPCR transmembrane domains are reasonably well conserved across
vertebrate and invertebrate species, and the amino acid sequences are long enough to generate relatively reliable phylogenetic trees. Thus, exploration of relationships among
related GPCRs can help us to trace the evolution of the cognate neuropeptides (Kim et al. 2014; Roch et al. 2014).
In addition to phylogenetic analysis, mapping related genes
onto reconstructed pre-2R vertebrate ancestral chromosomes (VACs) can be used to further explore the relationships
among related gene families (Yegorov and Good 2012; Hwang
et al. 2013). This method is based on the fact that paralogous
genes that emerged through local duplication of an ancestral
gene prior to and during vertebrate emergence tend to be
located in the same vicinity on VAC linkage groups (Kim et al.
2014). Pre-2R VACs and post-2R gnathostome chromosomes
(GACs) are mapped by building linkage groups through
whole-genome comparison of paralogous regions in multiple
vertebrate taxa (Nakatani et al. 2007; Putnam et al. 2008). The
Nakatani model shows that linkage groups representing a
single GAC can be mapped to multiple regions of multiple
chromosomes in extant vertebrates, as chromosomal shuffling has fractured ancestral GACs into linkage groups and
dispersed them around the genomes of evolving vertebrates
2804
(Nakatani et al. 2007). Therefore, contiguous reconstructed
chromosomes show that phylogenetically related genes are
indeed present on the same GACs/VACs despite being located
on different chromosomes (Hwang et al. 2013; Kim et al. 2014;
Sefideh et al. 2014).
Previous detailed analyses of the evolution of individual
neuropeptide-GPCR families in vertebrates have predicted
pre-2R progenitor genes (Cerda-Reverter and Larhammar
2000; Lagerstrom et al. 2005; Dores 2013; Osugi et al. 2014).
However, few articles have analyzed the evolution of neuropeptide-GPCR families from a more ancient and holistic perspective than the putative vertebrate ancestor (Hwang et al.
2013; Mirabeau and Joly 2013; Yun et al. 2014). In this study,
the development of vertebrate rhodopsin-like neuropeptide
GPCRs was analyzed through comparison with nonvertebrate
chordate and protostomian GPCRs. Phylogenetic analysis
combined with extensive synteny comparison revealed that
rhodopsin-like neuropeptide GPCRs can be divided into five
clades and that members in each clade tend to be located on
the same one or two putative VACs, with their cognate neuropeptides following a parallel pattern. This study suggests
that local duplication prior to the emergence of vertebrates
produced the current diversity of neuropeptide and GPCR
superfamilies.
Results
Data Retrieval and Phylogeny of the Rhodopsin-Like
Neuropeptide GPCR Superfamily
The amino acid sequences of rhodopsin-like neuropeptide
GPCRs and cognate neuropeptides from representative taxa
were obtained using the ENSEMBL and NCBI genome browsers. Representative taxa consisted of tetrapods (human,
mouse, chicken, anole lizard, and Xenopus), a lobe-finned
fish (sarcopterygian coelacanth), ray-finned fish (basal actinopterygian spotted gar, zebrafish, medaka, stickleback, and
tetraodon), a cartilaginous fish (elephant shark Callorhinchus
milii), a vertebrate of the Agnatha clade (sea lamprey
Petromyzon
marinus),
early
chordates
(lancelet
Branchiostoma floridae, and sea squirt Ciona intestinalis),
and protostomes (Caenorhabditis elegans and Drosophila melanogaster). Using phylogenetic and syntenic analyses, orthologous and paralogous relationships among the GPCR genes
(supplementary tables S1–S5 and S11, Supplementary
Material online) and their corresponding ligands (supplementary tables S6–S10, Supplementary Material online) were
identified. Table 1 displays the size of extant gene families
in humans, coelacanth, spotted gar, and elephant shark. It
reveals that duplications prior to 2R WGD have been followed
by widespread gene loss, and occasional additional 2R duplications, throughout subsequent vertebrate evolution.
To further elucidate the evolutionary relationships within
this GPCR superfamily, representative vertebrate GPCRs from
human (if the human genes were absent, representative
orthologs from coelacanth, spotted gar, or elephant shark
were used), representative GPCRs from the nonvertebrate
chordates Ci. intestinalis and B. floridae, and representative
GPCRs from the protostomes C. elegans and D. melanogaster
MBE
Local Gene Duplication of GPCR and Cognate Neuropeptide Superfamilies . doi:10.1093/molbev/msv179
Table 1. Paralog Numbers of Neuropeptide GPCRs and Their Cognate Neuropeptides in Vertebrates.
Clade
GPCR Family
Number of Paralogs
Peptide Family
1
NPYR
PRLHR
PROKR
TACR
NPFFR
HCRTR
CCKR
QRFPR
GPR83
Pre-2R
3
3
1
2
1
1
2
2
1 or 2a
ES
6
4
1
4
1
1
4
0
1(1)
SG
7
5
1
3
3
2
2
3
2
Co
6
4
2
3
3
2
2
3
2(1)
Hu
4
1
1(1)
3
2
2
2
1
1
Total
7
5
2(1)
5
3
2
4
4
2(1)
2
OPR
MCR
NPBWR
SSTR
UTS2R
MCHR
GALR
1
4
1
3 or 2a
3
4
2
4
5
2
5
1
2
3
4
5
1
6
4
4
4
4
3
2
6
4
5
5
4
5
2
5
1
2
3
4
5
2
7
5
7
5
KiSSR
1 or 2a
3
4
4
1
4
3
EDNR
NMBR/GRPR/BRS3
GPR37
1
1
1
3
2
2
3
3
2
3
3
2
2
3
2
4
TRHR
NTSR
GHSR/MLNR
NMUR
NMUR4
GPR39
GPR139/142
1
1
1
1
1
1
1
3
1
3(1)
1
0
2
2
3
1
2
2
1
1
2
3
2
3
3
1
2
2
5
OXT/AVPR
NPSR
GnRHR
GPR19
GPR150
3
1
3
1
1
4
0
2
1
1
5
0
2
1
1
5
0
4
1
2
Number of Paralogs
NPY
PRLH
PROK
TAC
NPFF/VF
HCRT
CCK/GAST
QRFP
Pre-2R
1
1
1
1
1
1
1
1
ES
SG
2(1)
2
2
2
2
2
2
2
1
2
1
2
2
2
0
1
Unknown
Co
2(1)
2
3
2
2
1
2
2
Hu
2(1)
1
2
3
2
1
2
1
Total
2(1)
2
3
3
2
2
2
2
OP/POMC“
1
3(1)
3(1)
3(1)
3(1)
3(1)
NPBW
SST/CORT
UTS/URP
PMCH
SPX
GAL
KISS
1
1
1 or 2a
1
1
1
1
2
0
0
1
2
2
3
1
3(3)
3(1)
1
2
2
3
2
3(1)
3(1)
1
2
2
1
2
2
2
1
1
2
1
2
3(3)
3(1)
2
2
2
3
3
3
2
EDN
NMB/GRP
1
1
4
4
2
2
Unknown
3
2
3
2
4
2
1
2
2
2
0
1
2
3
2
3(1)
3
1
2
2
TRH
NTS
GHRL/MLN
NMU/S
1
1
1
1
1(1)
1
1
1
1
2
1
1
Unknown
1
2
2
1
1
1
2
2
1(1)
2
2
2
4
1
1
1
1
7
1
6
1
2
OXT/AVP
NPS
GnRH
2
1
1
2
2(1)
0
0
2
2
Unknown
2
0
3
2
1
2
2(1)
1
3
NOTE.—Gene families are grouped into their respective clades. The number of genes in each family is displayed for vertebrate representatives including the pre-2R ancestor; ES,
elephant shark; SG, spotted gar; Co, coelacanth; Hu, humans. The total number of extant genes in these species is shown in the “Total” column, in which all known 2R-generated
GPCRs can be found. The total is often higher than any individual species as different GPCR orthologs have been conserved in different lineages. Additional, post-2R, duplications
in the representative species are indicated in parentheses. The absence of elephant shark, and to a lesser extent coelacanth, genes may be due to incomplete genome mapping.
a
Lack of clarity in the quantity of pre-2R progenitors.
were subjected to phylogenetic analysis. Rhodopsin-like neuropeptide GPCRs can be classified into five clades containing
mixed protostomian and deuterostomian (vertebrate and
nonvertebrate chordate) sequences (fig. 1), indicating that
these clades, and many families within these clades, originated
before the divergence of protostomes and deuterostomes. It
should be noted that the bootstrap values of the more ancient branches in the phylogenetic trees derived from figure 1
are low, likely because the time of divergence of these ancient
GPCR families far predate the divergence of protostomes and
deuterostomes, potentially over a billion years ago
(Nordstrom et al. 2011). The bootstrap values of branches
that represent relatively more recent divergence tend to be
much higher as these gene families have had less time to
differentiate. Despite the low bootstrap values of ancient
branches, phylogenetic trees that show similar clades were
consistently obtained, regardless of the use of complete GPCR
sequences or partial sequences consisting of only the transmembrane regions or the use of a variety of alignment options. The phylogenetic tree was further confirmed by
Bayesian phylogenetic analysis. General topology was similar
between the maximum likelihood and Bayesian trees except
for the GPR139/142 family, which is located out of the clades
in the Bayesian tree (supplementary fig. S1, Supplementary
Material online).
Syntenic Analysis of Clade 1 GPCR and Neuropeptide
Genes
Clade 1 GPCRs include the neuropeptide Y (NPYR), prolactinreleasing peptide (PRLHR), orphan G-protein-coupled receptor 83 (GPR83), prokineticin (PROKR), tachykinin (TACR),
2805
Yun et al. . doi:10.1093/molbev/msv179
MBE
FIG. 1. Maximum-likelihood phylogenetic tree of the rhodopsin-like neuropeptide GPCR superfamily. Representative vertebrate GPCRs from human,
representative nonvertebrate chordate GPCRs from Ciona intestinalis and Branchiostoma floridae, and representative protostome GPCRs from
Caenorhabditis elegans and Drosophila melanogaster were subjected to phylogenetic analysis. Coelacanth, spotted gar, or elephant shark GPCRs,
respectively, were used if the genes are absent in human. Colored regions highlight distinct GPCR groups according to taxon grouping, which is
also indicated as an abbreviation in front of the GPCR names (v, vertebrate; c, chordate; p, protostome). Yellow represents vertebrate species, green
represents nonvertebrate chordate species, orange represents protostome species, and purple represents orthologs only found in elephant shark. The
symbols () represent the branches of the five individual clades. The number of paralogs in each given family is indicated in parentheses.
neuropeptide FF (NPFFR), hypocretin (HCRTR), cholecystokinin (CCKR), and pyroglutamylated RFamide peptide (QRFPR)
families. In the previous phylogenetic tree by Mirabeau and
Joly (2013), CCKRs and OxR(HCRTRs) are on independent
branches from clade 1 whereas they are within the branch of
clade 1 in the present tree. The tree by Fredriksson et al.
(2003) also shows the clustering of these GPCRs in the
clade 1 branch. Individual GPCR and neighboring genes in
this clade were subjected to syntenic analysis to compare
their locations in human, chicken, and spotted gar genomes
(fig. 2). To explore the evolutionary relationships among the
vertebrate clade 1 GPCR genes, we examined the location of
each gene on reconstructed GACs, as proposed by Nakatani
et al. (2007). The Nakatani model suggests the presence of ten
pre-2R VACs, defined as A-J. These VACs underwent 2R WGD
to generate approximately 40 post-2R GACs A0-J1. Most clade
1 GPCRs are located on chromosomal regions with reliable
synteny that correspond to GAC_Cs. For instance, NPY4R,
TACR2, NPFFR1, QRFPR3, and PRLHR1 are located in the
44–120 Mb region of human chromosome 10 that corresponds to GAC_C0 (fig. 2A). CCKAR, NPFFR2, TACR3,
QRFPR1, NPY2R, NPY1R, and NPY5R are located in the
2806
72–169 Mb region of human chromosome 4 that is part of
GAC_C1 (fig. 2B). NPY5R and NPY7R are in human chromosome 5 on the GAC_C2 block (fig. 2C). The GAC_C3 block,
containing PROKR1, QRFPR4, TACR1, QRFPR2, PRLHR3,
NPFFR3 and NPY3R, is split between two human chromosomal regions (68–75 Mb region of chromosome 2 and 20–
41 Mb region of chromosome 8); however in chicken, this
synteny block is located on a single short region (0.2–2 Mb)
of chromosome 22 (fig. 2D), which indicates that this GAC_C3
linkage group is more conserved in chicken.
Although most clade 1 GPCR genes are located on
GAC_Cs, there are a few exceptions. The Nakatani model
places HCRTR1 and HCRTR2 along with the neighboring
COL genes on GAC_Bs (fig. 2E). The fact that paralogous
COL genes are found near clade 1 GPCR genes on GAC_C
and GAC_B suggests a translocation of this region from
VAC_C to VAC_B prior to 2R WGD. PRLHR4 and PRLHR5,
which appear to be ohnologs, are located on GAC_G2 and
GAC_G0, respectively (Fig 2E). The PRLHR4/5 progenitor gene
was also likely translocated to VAC_G prior to 2R WGD, as
other PRLHR paralogs are located on GAC_Cs. CCKBR and
PROKRs 2 and 3 are located on GAC_D1 and GAC_G1,
Local Gene Duplication of GPCR and Cognate Neuropeptide Superfamilies . doi:10.1093/molbev/msv179
MBE
FIG. 2. Synteny for chromosomal regions containing the clade 1 GPCR genes. Conserved synteny for clade 1 GPCR genes on GAC_C0 and GAC_F3 (A),
GAC_C1 (B), GAC_C2 (C), and GAC_C3 and GAC_F4 (D) linkage groups is presented. The GPCRs and their neighboring genes in human, chicken, and
spotted gar genome fragments are matched on GAC linkage groups. GPCR genes are presented as yellow boxes with blue names, whereas select
neighboring genes are presented in differently colored boxes with black names. Orthologous genes are represented by identically colored boxes
connected by blue lines between species, which indicate linkage group movement. Genes are listed sequentially in each species based on chromosome
number, which is indicated above each gene, and the location (megabase), which is indicated below each gene. Boxes with dashed outlines indicate the
absence of these genes in each species, and their putative locations were deduced by syntenic analysis. (E) Arrangement of human chromosomal regions
onto post-2R GAC linkage groups. Ohnologs are aligned in columns with identically colored boxes. The presence of post-2R local duplications in human,
chicken, or gar lineages is indicated with red arrows. Encapsulating dashed rectangles and red GAC annotation indicate linkage group translocation or
misannotation of the Nakatani model.
respectively, but their ohnologs and the paralogs of their
neighboring genes are located on GAC_Cs. Thus, CCKBR
and PROKRs 2 and 3 were translocated subsequent to 2R
WGD. Another translocation is found in the region containing GPR83. GPR83 1, 2, and 3 are located on GAC_F3 (the 94–
100 Mb region of human chromosome 11) and GAC_F4 (the
66–94 Mb region of human chromosome X), but ohnologs of
their neighboring genes (MAML, ARHGAP, and NR3C) are
located on GAC_C1, and GAC_C2. This result indicates that
between the first and second rounds of WGD a translocation
from one of the two GAC_Cs to a GAC_F occurred. The
second round of WGD then produced ohnologous regions
on GAC_C1 and GAC_C2, and two translocated GAC_F3 and
GAC_F4 regions; however, the possibility of misannotation of
the regions in the Nakatani model cannot be excluded.
To further corroborate the results shown in figure 2E,
which uses only a small number of GPCR and neighboring
genes, all protein-encoding genes found on chromosome 4
were compared against the protein-coding genes in the rest of
the human genome to identify genes with the highest sequence similarity (cut off < 1e-10; fig. 3). Chromosome 4
was chosen as the reference because the Nakatani model
describes this chromosome as being entirely GAC_C1, and
most clade 1 GPCR families are represented on this chromosome. The results show that chromosomes 2, 5, 8, 10, 11, and
X contain regions in which genes have the highest similarity to
the genes on chromosome 4. These regions are postulated to
contain GAC_C linkage groups, and in which the GPCR and
neighboring genes presented in figure 2E can be found. In
chromosome 5 these sequences are concentrated in the 140–
160 Mb region which indicates preservation of the linkage
group, but in other chromosomes the regions with high sequence similarity are less condensed which indicates intrachromosomal translocation and possible hybridization.
This whole-genome analysis supports the results of the
small-scale comparison shown in figure 2E.
Synteny analysis of the cognate neuropeptide genes for
clade 1 GPCR families was also performed to investigate
2807
Yun et al. . doi:10.1093/molbev/msv179
MBE
FIG. 3. Whole-genome analysis for chromosomal regions which frequently encode proteins with high sequence similarity to proteins encoded on
chromosome 4, in which GAC_C1 is found in its entirety. Regions with high sequence similarity to chromosome 4 on chromosomes 2, 5, 8, 10, 11, and X
are connected by colored lines which match their destination chromosomes (chromosomal representation based on the UCSC chromosome browser).
Protein matches that correspond to genes presented in figure 2E are color coded to match.
intrafamily relationships and for assignment to GAC linkage
groups (fig. 4). The HCRT, NPY, NPFF, TAC, and CCK family
genes and PRLH2 were allocated to GAC_Es. PRLH1, PROKs,
and QRFPs (fig. 4E) are located on GAC_F, GAC_Ds, and
GAC_As, respectively. The fact that PRLH1’s ohnolog and
the paralogs of neighboring genes are located on GAC_Es
indicates a misannotation in the Nakatani model and also
indicates that PRLH1 was present on GAC_E0 post-2R WGD.
The PROK and QRFP genes, however, were likely translocated
from VAC_E to VAC_D and VAC_A, respectively, prior to 2R
WGD.
Relationship between Syntenic and Phylogenetic
Analyses of Clade 1 GPCR and Neuropeptide Genes
Clade 1 GPCR progenitors were aligned on VACs (fig. 5) based
on their position in the phylogenetic tree derived from
figure 1. Peptide progenitors were placed on VACs in parallel
to their cognate GPCRs (fig. 5), as phylogenetic trees for neuropeptides are inherently unreliable. Two progenitors were
predicted for each of the QRFPR, TACR, and CCKR families,
and three pre-2R progenitors were predicted for the NPYR
2808
and PRLHR families. In contrast to the GPCR families, multiple
pre-2R progenitors could not be identified in the neuropeptide families.
Paralogous and ohnologous relationships among family
members can be deduced from figure 5. For instance,
NPYRs 1, 3, 4, and 6 are grouped in a single branch of the
phylogenetic tree, which splits and then splits again, and are
distributed on different GAC_Cs, that is, C1, C3, C0, and C2,
respectively. This pattern is iconic of ohnologs, which are
produced from a single progenitor gene through 2R WGD.
NPY5R from the neighboring branch is also located on
GAC_C1 but because it is phylogenetically more distant it
can be deduced that separate progenitor genes produced
the NPYR1/3/4/6 and NPY5R groups. These progenitor
genes can be postulated to have been generated through
local duplication of a common ancestral gene indicated by
the node at which these two branches separate. Further back,
the vertebrate (black) and protostome (orange) lineages join
indicating the point at which the deuterostome and protostome lineages diverged. NPY2R and NPY7R are grouped
more distantly in the phylogenetic tree and distributed on
GAC_C1 and GAC_C2, respectively. Thus, NPY2R and NPY7R
Local Gene Duplication of GPCR and Cognate Neuropeptide Superfamilies . doi:10.1093/molbev/msv179
MBE
FIG. 4. Synteny for chromosomal regions containing the clade 1 peptide genes. This figure was generated as in figure 2 with some alterations and
additions, as described below. Conserved genomic synteny of clade 1 peptide genes is presented on GAC_E0 (A), GAC_E1 (B), GAC_E2 (C), and GAC_E3
(D) linkage groups. In some cases genes are located on scaffolds and, thus, the last four digits of the respective scaffold numbers are shown (e.g.,
JH375467 is indicated as 5467) above the gene instead of the chromosome number. Additionally, question marks indicate that synteny analysis could
not place absent genes in putative locations. (E) Arrangement of human chromosomal regions onto post-2R GAC linkage groups follows the description
provided in figure 2.
are ohnologous to each other and originate from an NPY2/7R
progenitor. Therefore, it can be postulated that the vertebrate
NPYR family emerged from three separate pre-2R progenitor
genes, NPY1/3/4/6R, NPY5R, and NPY2/7R, which all emerged
through prior local duplication. The number of NPYR family
genes then expanded through 2R WGD, although many of
these genes have been subsequently lost. The pattern of gene
loss is specific to species. For instance, coelacanth retains
seven members whereas humans retain only four members
(table 1).
Phylogenetic analysis demonstrates that protostomes contain NPYR-like, TACR-like, and CCKR-like GPCRs (fig. 5). Thus,
these families are likely to constitute the most ancient GPCR
groups in this clade. Nonvertebrate chordates possess QRFPRlike, PRLHR-like, PROKR-like, NPFFR-like, and HCRTR-like GPCR
families, whereas GPR83-like genes were only identified in
2809
Yun et al. . doi:10.1093/molbev/msv179
MBE
FIG. 5. Maximum-likelihood phylogenetic tree and schematic diagram for the evolutionary relationships among clade 1 GPCRs with their corresponding
peptides aligned parallel. The phylogenetic tree contains coelacanth, spotted gar, or elephant shark GPCR genes, if the orthologs are absent in human.
Chordate and protostome putative paralogous genes of clade 1 GPCRs are included. Nonhuman GPCRs are indicated with prefixes (co, coelacanth; sg,
spotted gar; es, elephant shark; bf, Branchiostoma floridae; ci, Ciona intestinalis; ce, Caenorhabditis elegans; dm, Drosophila melanogaster). Closely related
invertebrate GPCRs, found in a single species, are condensed and the gene quantities are presented in parentheses. GPCRs are distinguished by different
colors according to their taxonomic group (black: vertebrate; green: nonvertebrate chordate; orange: protostome; purple: elephant shark-specific
orthologs). Black circles within the phylogenetic tree represent local duplications within gene families prior to 2R WGD. The GAC placement of
individual genes is presented under the phylogenetic tree, where ohnologous genes are grouped together. Pre-2R GPCR progenitor genes, represented
by black circles with vertical arrows, are placed on a green bar representing VAC_C. Peptide progenitors, represented as black circles, are aligned parallel
on a blue bar representing VAC_E. These VACs are considered to be the original chromosomes in which clade 1 emerged. Progenitor genes that were
translocated to new VACs prior to 2R WGD are marked with colored boxes and annotated with their respective VAC. Neuropeptide progenitors are
connected to their cognate GPCR progenitors by dashed lines, whereas solid lines represent multiple GPCR interactions. Putative neuropeptide
progenitors are included and annotated with red circles.
vertebrates. It could be postulated that these families were
generated in deuterostome/early chordate and early vertebrate development; however, as demonstrated in the NPYR
family, in which vertebrate and protostomian GPCRs but no
other chordate GPCRs could be found, gene families may
become extinct in various lineages. Further detailed information on the evolutionary history of each GPCR family is described in the supplementary information, Supplementary
Material online.
Syntenic and Phylogenetic Analysis of Clade 2 GPCR
and Neuropeptide Genes
Clade 2 GPCRs include the opioid (OPR), neuropeptide-B/W
(NPBWR), somatostatin (SSTR), melanin-concentrating hormone (MCHR), urotensin-2 (UTS2R), galanin/spexin (GALR),
kisspeptin (KISSR), and, nominally, melanocortin (MCR) families. Fredriksson et al. (2003) placed the clade 2 GPCRs in a
distinct branch of the gamma-group of rhodopsin-like GPCRs
because they did not find close structural similarity between
the clade 2 GPCRs and beta-group GPCRs. However, the present phylogenetic tree (fig. 1) is comprised both beta-group
and clade 2 GPCRs because the clade 2 family, like the betagroup, interacts with neuropeptide ligands that are produced
through a maturation process by prohormone convertases.
2810
The MCR family is phylogenetically distant from the clade 2
GPCR families and is classified as part of the -group of rhodopsin-like GPCRs (Fredriksson et al. 2003). The MCR family is
only included in this clade because the cognate MSH/ACTH
peptides are derived from the POMC gene that also produces
a peptide ligand for the clade 2 OPRs. Syntenic and phylogenetic analysis to place clade 2 GPCR and cognate neuropeptide genes onto GACs and their progenitors onto VACs was
conducted with the same methods and rationale as presented
for clade 1. Syntenic analysis and GAC placement of clade 2
GPCR and cognate peptide genes are found in supplementary
figures S2 and S3, Supplementary Material online. VAC placement of the clade 2 GPCR progenitors parallel to their positions on the clade 2 phylogenetic tree is presented in figure 6.
Clade 2 GPCRs are mostly split between VAC_B and
VAC_I, which contain five and eight pre-2R progenitor
genes, respectively. Coincidentally, the MCR family also has
four pre-2R progenitors on VAC_B (fig. 6). The translocation
that separated the MCHR and SSTR families between VAC_I
and VAC_B likely occurred subsequent to the divergence of
the vertebrate and nonvertebrate chordate lineages but prior
to 2R WGD, similarly to the MCHR6/7 translocation to
VAC_C. Because the KISSR family emerged prior to chordate
and vertebrate divergence, estimating the time of translocation from VAC_I to VAC_A is more difficult. Further syntenic
Local Gene Duplication of GPCR and Cognate Neuropeptide Superfamilies . doi:10.1093/molbev/msv179
MBE
FIG. 6. Phylogenetic and schematic diagram for the evolutionary relationships among clade 2 GPCRs with their corresponding peptides aligned parallel.
This diagram was generated as in figure 5. The original VAC location for this clade cannot be clearly delineated between VAC_I and VAC_B. Therefore,
the GPCRs have been placed on a single bar representing both VAC_B and VAC_I, colored blue and purple, respectively. Translocations from these VACs
prior to 2R WGD are colored differently. The peptides are placed on three VAC bars, as the identity of a single original VAC is difficult to ascertain;
VAC_B and VAC_I, VAC_F, and VAC_D are colored blue and pink, orange, and gray, respectively.
and phylogenetic analysis of the clade 2 GPCRs focused on
translocations that occurred between the first and second
rounds of WGD and ancestral gene family emergence is discussed in the supplementary information, Supplementary
Material online.
The neuropeptide genes in this clade were allocated to
VAC_B, VAC_I, VAC_F, and VAC_D (fig. 6). The presence of
the PDYN/ENK/NOC/OMC on VAC_B, while NPB/W is located
on VAC_I, raises the possibility that these progenitors
emerged prior to the translocation from VAC_I to VAC_B,
possibly being translocated with their cognate GPCR progenitors. Therefore, the cognate NPBWR1/2 and OPRL/K/M/D
progenitors, which appear to be vertebrate specific, may
have emerged and diversified prior to the VAC_I to VAC_B
translocation. The closely related URP and SST progenitors
likely emerged from a single ancestral gene on VAC_F. This is
consistent with a recent study demonstrating that somatostatin and urotensin peptides and receptors are derived from
a single ancestral ligand-receptor pair before 2R WGD
(Tostivint et al. 2014). Likewise, the SSTRs and UTS2Rs are
mainly located on VAC_I, suggesting that these GPCRs likely
belong to the same family. The remaining ligands PMCH1,
SPX, GAL, and KISS are all located on VAC_D. The presence
of half of the clade 2 neuropeptide progenitors on VAC_D,
the split of their neighboring genes VAMP1, 2, and 3 over both
GAC_F and GAC_D, and the split of AURKA/B/C over GAC_D,
GAC_B, and GAC_F implicates VAC_D as the original location
of the clade 2 peptides, as all related gene groups can be found
on VAC_D (supplementary fig. S3, Supplementary Material
online).
Syntenic and Phylogenetic Analysis of Clade 3 GPCR
and Neuropeptide Genes
Clade 3 GPCRs include the orphan GPR37, endothelin
(EDNR), and neuromedin-B (NMBR)/bombesin receptor subtype (BRS)/gastrin-releasing peptide (GRPR) receptor families.
Syntenic analysis and GAC placement of clade 3 GPCRs and
neuropeptides are found in supplementary figures S4 and S5,
Supplementary Material online, respectively. VAC placement
against the clade 3 phylogenetic tree is presented in figure 7A.
The NMBR/GRPR/BRS3 gene family and the paralogs of neighboring gene families are divided over GAC_F3, GAC_F4,
GAC_B2, and GAC_B3, indicating an intra-2R translocation
from GAC_F to GAC_B. The EDNR and neighboring gene
families are split over GAC_F3, GAC_F4, and GAC_C1, indicative of mislabeling of the GAC_C1 region in the Nakatani
model, and the GPR37 progenitor was likely translocated from
VAC_F to VAC_D prior to 2R WGD. The GPR37 and EDNR
progenitors appear to have arisen from a local duplication of
an ancestral gene prior to vertebrate and nonvertebrate chordate divergence. This GPR37/EDNR ancestral gene originated
from a duplication of an even older ancestral gene which also
produced the GRP/BRS/NMB progenitor which, barring gene
loss in the protostome lineage, occurred after protostome/
deuterostome divergence but shares ancestry with the protostomian drCCHA1R and ceNPR14 GPCRs (fig. 7A).
Syntenic analysis indicates that a translocation prior to 2R
WGD also separated the clade 2 peptide progenitors over two
VACs with EDN1/2/3/4 and GRP/NMB being located on
VAC_B and VAC_A, respectively (supplementary fig. S5,
2811
Yun et al. . doi:10.1093/molbev/msv179
MBE
FIG. 7. Phylogenetic and schematic diagram for the evolutionary relationships among clade 3 (A) and clade 4 (B) GPCRs with their corresponding
peptides aligned parallel. This diagram was generated as in figures 5 and 6. Clade 3 GPCR progenitors are located on VAC_D and VAC_F and colored
gray and orange, respectively. The clade 3 peptides are split between VAC_B and VAC_A and are colored blue and red, respectively. Clade 4 GPCR
progenitors are located on VAC_I, VAC_B, VAC_E, and VAC_F and colored purple, blue, sky-blue, and orange, respectively. Clade 4 cognate peptides are
located on VAC_D and colored gray. Syntenic analysis was not performed for the MMOGP/GPR139-like genes due to the fractured nature of the
elephant shark genomic database.
Supplementary Material online). The unknown progenitor
peptide for GPR37 has been tentatively placed on VAC_B as
the ligand of GPR37’s closest relative, EDN1/2/3/4, is found on
VAC_B.
Syntenic and Phylogenetic Analysis of Clade 4 GPCR
and Neuropeptide Genes
Clade 4 GPCRs include the orphan GPR139/142, thyrotropinreleasing hormone (TRHR), neurotensin (NTSR), orphan
GPR39, growth hormone secretagogue (GHSR)/motilin
(MLNR), and neuromedin-U (NMUR) families. Additionally,
the NMOGPR, coNMUR4 (which, despite the name, is not part
of the NMUR family), and elephant shark-specific CAPAR/
GPR139-like families are also included. Syntenic analysis and
GAC placement of clade 4 GPCRs and neuropeptides are described in supplementary figures S6 and S7, Supplementary
Material online. VAC placement of clade 4 genes against the
clade 4 phylogenetic tree is presented in figure 7B.
Mixed vertebrate-protostome branches indicate that the
NMUR, TRHR, and coNMUR4 families emerged prior to the
urbilatarian ancestor (fig. 7B). The absence of protostome
representatives for the GHSR/MLNR/GPR39/NTSR and
GPR139/142/NMOGPR families indicates gene loss in the protostome lineage, in addition to wide-spread nonvertebrate
chordate gene loss.
Clade 4 GPCRs are located on VAC_B, VAC_E, VAC_F, and
VAC_I. Due to the isolation of GPR139/142, the timing of the
translocation to VAC_I that separated it from the rest of clade
4 cannot be estimated. The GHSR/MLNR and NTSR/GPR39
lineages both contain nonvertebrate chordate representatives. Therefore, the time of translocation that separated
the two lineages over VAC_F (GHSR/MLNR), VAC_B (NTSR)
and VAC_E (GPR39) cannot be defined except that it likely
2812
occurred after protostome and deuterostome divergence.
Conversely, the duplication that created the GPR39 and
NTSR progenitors from an ancestral gene appears to be vertebrate specific, thus, the translocation that placed the GPR39
progenitor onto VAC_E likely occurred subsequent to the
divergence of vertebrate and nonvertebrate chordate lineages
but prior to 2R WGD (fig. 7B). The CAPAR/GPR139-like,
NMOGPR, and coNMUR4 families do not appear to have amniote representatives, which indicates loss of these entire gene
families in both nonvertebrate and amniote lineages.
In contrast to the GPCR family, the majority of the clade 4
cognate neuropeptides are located on VAC_D, and the NMU/
S progenitor is located on VAC_C (fig. 7B). The presence of
the related GRM genes on GAC_Ds and GAC_C2 and the
MGAT4 family genes, on GAC_Cs and GAC_D2, shows close
evolutionary history between these VAC_D and VAC_C linkage groups (supplementary fig. S7, Supplementary Material
online). Therefore, prior to 2R WGD, the NMU/S progenitor
gene, along with neighboring genes, was likely translocated
from VAC_D to VAC_C. As the cognate neuropeptides for the
GPCRs phylogenetically closest to GPR39 are located on
VAC_D, the cognate neuropeptide for GPR39 is also likely
located in a VAC_D linkage group, possibly sharing sequence
similarity with NTS1/2. As obestatin, a product of the GHRL
gene, was claimed not to be a ligand for GPR39 (Zhang et al.
2005; Chartrel et al. 2007), we did not make a connection
between GHRL and GPR39.
Syntenic and Phylogenetic Analysis of Clade 5 GPCR
and Neuropeptide Genes
Clade 5 GPCRs include the orphan GPR19, arginine vasopressin (AVP)/oxytocin (OTR), neuropeptide-S (NPSR), orphan
GPR150, and gonadotropin-releasing hormone (GnRHR)
Local Gene Duplication of GPCR and Cognate Neuropeptide Superfamilies . doi:10.1093/molbev/msv179
receptor families. Syntenic analysis and GAC placement of
clade 5 GPCR and peptide genes are provided in supplementary figures S8 and S9, Supplementary Material online. VAC
placement against the clade 5 phylogenetic tree is presented
in figure 8.
VAC placement showed that the genes in this clade are
split between VAC_D and VAC_A (fig. 8). The GnRHR, AVPR,
and NPSR families possess closely related protostomian
GPCRs, indicating that each of these lineages emerged in a
preurbilaterian ancestor. The lack of a protostomian GPR150
indicates either loss of this branch in protostomes or emergence from a common ancestor of the NPSR family during
deuterostome development, followed by rapid differentiation.
The translocation that separated the related GPR150 and
NPSR progenitors between VAC_D and VAC_A occurred subsequent to the generation of these two families. The GPR19
family, which is also present on VAC_D, is somewhat distant
from the rest of clade 5, indicating loss of this distinct branch
in the protostome lineage.
Synteny analysis revealed that all the discovered cognate
neuropeptides for clade 5 GPCRs are located on VAC_C
(fig. 8). The Nakatani model labels AVP, OXT, and GnRH2
with unknown synteny, but our analysis confirmed that this
linkage group is located on GAC_C1. The two orphan GPCR
progenitors in this clade are GPR19 and GPR150, and our
analysis supports the notion that the cognate neuropeptides
for each of these may also be found on VAC_C (supplementary fig. S9, Supplementary Material online).
Discussion
Although neuropeptides and their GPCRs are well documented in vertebrates, the evolutionary mechanisms for the
MBE
development and interfamily relationships of neuropeptides
and GPCRs remain unclear. In this study, we retrieved rhodopsin-like GPCR and cognate neuropeptide gene sequences
from representative vertebrate and invertebrate species and
analyzed their phylogenetic relationships and the locations of
progenitor genes on reconstructed VACs. The most important finding of this study is that the progenitors of phylogenetically related families tend to be located on the same VACs,
which strongly supports the notion that these gene families
emerged through sequential local duplications during early
metazoan evolution. Previous studies have shown proliferation within gene families by local duplication, such as the
duplications within the NPYR and SSTR families prior to 2R
WGD (Lagerstrom et al. 2005; Liu et al. 2010; Larhammar and
Bergqvist 2013; Tostivint et al. 2014). The current analysis
reveals a more ancient pattern of local duplications that
gave rise to new families. This pattern is most evident in
clade 1, in which 13 of the 16 GPCR family progenitors are
located on VAC_C and 7 of the 9 cognate neuropeptide progenitors are located on VAC_E. Additionally, the majority of
clade 4 and 5 peptide progenitors are also located on single
VACs. This observation reveals that the cognate neuropeptide
progenitor genes of phylogenetically related GPCR families
tend to be located on parallel VACs, strongly suggesting
that neuropeptide families also multiplied through local duplications by the same pattern as their cognate GPCRs.
In contrast to clade 1, the majority of GPCR progenitors in
clades 2, 3, and 5 are split between two VACs. The split of a
single clade between two, or more, chromosomes prior to
vertebrate emergence can occur by the same mechanisms by
which translocations subsequent to 2R WGD lead to the
splitting of single GACs over multiple locations in extant
FIG. 8. Phylogenetic and schematic diagram for the evolutionary relationships among clade 5 GPCRs with their corresponding peptides aligned parallel.
This diagram was generated as in figure 5. Clade 5 GPCRs are placed on VAC_D and VAC_A and colored gray and red, respectively. All known peptides
can be mapped to VAC_C, which is colored green.
2813
Yun et al. . doi:10.1093/molbev/msv179
genomes (Nakatani et al. 2007). The Nakatani model only
rebuilds putative VACs dated shortly before 2R WGD; therefore, the model does not account for prior translocations.
Phylogenetic and syntenic analysis can be used to predict
the time periods of some translocations such as the clade 2
VAC_I-VAC_B and clade 4 VAC_B-VAC_E translocations that
occurred subsequent to the divergence of the vertebrate and
nonvertebrate chordate lineages.
In addition, chromosomal translocations between the first
and second WGDs can account for the spread of ohnologous
genes, originally located on a single VAC, across multiple distinct GACs. Furthermore, distinct linkage groups that have
been translocated next to each other may undergo secondary
translocations which hybridize the two linkage groups. These
complications can lead to misannotation in the Nakatani
model. Confirmation of chromosomal translocation through
in-depth synteny of invertebrate genomes and detailed analysis of the veracity of the Nakatani model are both beyond the
scope of this article but may be addressed in the future.
This study also reveals a highly dynamic process of emergence and loss of GPCR gene families through metazoan evolution, demonstrated in the clade 4 coNMUR4, GPR139/142,
and NMOGP/GPR139-like families (independent branch in
Bayesian phylogenetic analysis) which proliferated, separately,
in the urbilatarian, protostome, and vertebrate ancestors, but
have been subsequently lost in nonvertebrate chordates and
suffered extensive gene loss in amniotes. Ancient genes, as a
result of local duplication followed by subsequent diversification during metazoan evolution gave rise to a variety of gene
families. Emerging gene families undergo intrafamily and interfamily competition which can result in the loss of a family
member, gene family, or entire class of gene (Krishnan et al.
2012; Assis and Bachtrog 2013). Therefore, in addition to lineage-specific duplications, gene loss in protostomes may have
generated what appear to be deuterostome-specific gene
families, and vice versa. This pattern of duplication, differentiation, and lineage-specific gene loss produced the progenitors for the current diversity of vertebrate gene families
(Nordstrom et al. 2009). More recently, 2R of WGD in the
vertebrate ancestor led to, on average, a 4-fold increase in
gene family members. Subsequent duplications, gene loss, and
neo-/subfunctionalization have given rise to the current
number and diversity of vertebrate gene family members.
The conventional process of GPCR family expansion is
based on gene duplication followed by gradual differentiation
and functionalization in which interaction is preserved within
established GPCR-neuropeptide gene family systems.
However, there are other ways of differentiation; one example
is the MCR-ACTH system found on VAC_B in clade 2. This
system does not share strong phylogenetic ancestry with
other neuropeptide GPCRs, despite its cognate neuropeptides
being located in the same gene, POMC, as a neuropeptide for
the opioid GPCRs. The MCRs appear to be more closely related to the -group of rhodopsin-like GPCRs (Fredriksson
et al. 2003). By chance, the MCR progenitor likely arose by
local duplication of a member of the MECA gene group and
came under interaction with the ACTH peptide prior to 2R
WGD (Haitina et al. 2007). A local duplication of the PNOC
2814
MBE
gene, likely subsequent to 2R WGD, placed the opioid neuropeptide into the same region as the ACTH gene, creating a
hybrid gene. Subsequently, intragene duplications of the MSH
subunit of ACTH, along with proliferation of the MCR family,
led to the creation of an entirely new system that has only
been found in vertebrates (Harris et al. 2014).
The rates of gene loss through various stages of vertebrate
evolution can be estimated based on the number of pre-2R
progenitors and extant GPCR ohnologs in various vertebrate
species (table 1). These data demonstrate the loss of many
ohnologous genes subsequent to 2R WGD. For instance, one
may postulate that the presence of three pre-2R NPYR paralogs may give rise to 12 ohnologs post-2R; however, humans
have only four members, coelacanth and spotted gar have
seven, and elephant shark has six. In general, loss of ohnologous genes in the human lineage throughout GPCR and cognate neuropeptide families is greater than that of lineages
which have undergone lower rates of genomic change, such
as coelacanth (Amemiya et al. 2010), spotted gar (Amores
et al. 2011), and elephant shark (Venkatesh et al. 2014). When
tracing the evolutionary history of a gene family the more
retained paralogs that we find the better understanding we
acquire. Therefore, choosing specific representative species is
very important. Of mammalians, although humans are not
the most conserved the genome is manually annotated and
the sequences are more conserved than many other mammals (Burt et al. 1999). Of the tetrapods, avians, chicken in
particular, have some of the best preserved chromosomes
(Nishida et al. 2008). Teleost fish, that underwent a third
WGD, are exceedingly difficult to analyze due to the doubling
of linkage groups and large-scale gene loss. However, spotted
gar has recently been considered the best vertebrate representative as this species represents an ancestral pre-3R teleost
and has undergone fewer translocations than other vertebrates with mapped genomes (Amores et al. 2011). In the
future a fully mapped coelacanth genome could replace
chicken and human as the tetrapod representative, and
may be even more genomically conserved than gar.
Additionally, the elephant shark, which is both a cartilaginous
fish and the slowest evolving vertebrate known (Venkatesh
et al. 2014), is ideal for vertebrate genomic evolution research.
Additionally, lamprey could provide an interesting perspective on the early vertebrate genome because it may have
diverged from the jawed vertebrate lineage shortly after 2R
WGD (Caputo Barucchi et al. 2013), although there is uncertainty about the number of WGDs that it has undergone
(Mehta et al. 2013). Finally, although B. floridae is a useful
basal chordate representative (Elphick and Mirabeau 2014)
an even slower evolving basal chordate, Asymmetron lucayanum, exists (Yue et al. 2014). Unfortunately, the genomes of
coelacanth, elephant shark, lamprey, and A. lucayanum
remain unmapped, their chromosomes broken into many
separate scaffolds. Arrangement of these scaffolds into full
chromosomes would provide a great benefit to many areas
of genomic research.
An advantage of this study is that it provides clues to
identify novel neuropeptide genes for orphan GPCRs. In our
analysis, we included orphan GPCRs that are phylogenetically
Local Gene Duplication of GPCR and Cognate Neuropeptide Superfamilies . doi:10.1093/molbev/msv179
close to established neuropeptide GPCRs. It was found that
many of these orphan GPCRs reside on VACs with their related neuropeptide GPCRs. By aligning the known cognate
neuropeptide genes against GPCR genes on VACs, the relative
positions of cognate neuropeptides for orphan GPCRs can be
postulated. While speculative, this determination can help to
target gene discovery both by identifying the likely linkage
group of the unidentified peptide genes and by showing the
sequences of the closest known relatives of the peptides. For
example, the ligand for GPR150 is currently unknown; however, based on our analysis, the ligand is most likely on a
VAC_C linkage group and likely has sequence similarity to
NPS. Using this method, spexin was correctly predicted to
be a natural ligand for galanin GPCRs (Kim et al. 2014), and
linkage group analysis of secretin peptide and GPCR family
genes (Hwang et al. 2013) was helpful for the identification of
a receptor for a novel GCRP peptide (Park et al. 2013). In
addition, once a new gene is identified, comparison of the
gene to its closest evolutionary relatives allows for faster analysis of its putative functions.
Taken together, this study suggests a provisional model for
the evolutionary history of rhodopsin-like neuropeptide
GPCRs. Multiple duplications of an ancestral rhodopsin-like
GPCR, which likely arose in early metazoan-fungal evolution
from a duplication of a cyclic-AMP receptor (Krishnan et al.
2012), during early metazoan evolution gave rise to at least
five progeny each of which, through further duplications,
acted as the progenitors of the five clades of rhodopsin-like
GPCRs found in extant vertebrates. Concurrently, chromosomal shuffling spread these initial five GPCR genes, and their
subsequent duplications, around the genomes of successive
vertebrate ancestors. Correspondingly, cognate neuropeptide
gene families have coevolved in a manner similar to that of
GPCRs. In the future, complete assembled genomes of invertebrate and early vertebrate species as well as reconstructed
vertebrate linkage groups with higher resolution will provide
better means for further delineation of vertebrate genome
evolution.
Materials and Methods
Data Retrieval
The amino acid sequences of the rhodopsin-like neuropeptide GPCRs and cognate peptides were downloaded from the
Ensembl Genome Browser (http://www.ensembl.org), the
NCBI GenBank database (http://www.ncbi.nlm.nih.gov/
Entrez/), or the B. floridae genome database (http://
genome.jgi-psf.org/Brafl1/Brafl1.home.html). Search tools provided by the Ensembl Genome Browser were used for the
initial identification of orthologous and paralogous genes
from vertebrate and invertebrate species. This data set was
used to identify the vertebrate and invertebrate members of
rhodopsin-like neuropeptide GPCRs that were not already
registered in these databases. The genome databases of
human, mouse, chicken, anole lizard, Xenopus tropicalis, coelacanth, zebrafish, medaka, fugu, stickleback, tetraodon, spotted gar, elephant shark, lamprey, Ci. intestinalis, B. floridae,
C. elegans, and Drosophila were manually searched with the
MBE
TBLASTN algorithm. Database GPCR prediction was reasonably reliable, requiring only occasional manual editing; however, neuropeptide prediction was less reliable due to low
sequence conservation. Thus, many recorded sequences
were manually revised for incorrect splice sites and stop/
start codons. To determine full-length open-reading frame
sequences, exons and splice junctions were defined manually
using both the Ensembl protein report service and an
HMMgene (v.1.1) program (http://www.cbs.dtu.dk/services/
HMMgene/) that was provided by the CBS Prediction
Service. Putative signal peptides were predicted using
SignalP 3.0 (http://www.cbs.dtu.dk/services/SignalP/). Duplicates were removed from this data set using phylogenetic
analysis. Orthology and paralogy of the genes were determined based on phylogeny and synteny analyses. Accession
numbers of the sequences are listed in supplementary tables
S1–S11, Supplementary Material online.
Phylogenetic Analysis of Rhodopsin-Like Neuropeptide
GPCRs
Complete GPCR amino acid sequences were aligned using
MUSCLE (multiple sequence comparison by log-expectation)
as implemented in MEGA 6.06 (Tamura et al. 2013).
Maximum-likelihood trees were constructed using the
Jones–Taylor–Thornton model with 100 bootstrap replicates
and partial deletion with 95% site coverage cut-off selected in
the gaps/missing data treatment. Numbers on the nodes of
the trees indicate the bootstrap replicates corresponding to
each branch. Bayesian analysis was conducted with the same
maximum-likelihood database with Bayesian Evolutionary
Analysis Sampling Tree (BEAST) (Drummond and Rambaut
2007). BEAUti v1.8.2 was used to create BEAST input file using
Blosum62 substitution, the Gamma site heterogeneity model,
a strict clock, and a Markov Chain Monte Carlo length of 106
Genome Synteny Analysis and Tracing the
Duplication History of Rhodopsin-Like Neuropeptide
and GPCR Families
Multispecies genome synteny analysis was performed by
comparing genome regions, which contain the rhodopsin
family neuropeptide and GPCR genes. The chromosomal locations of the orthologs and paralogs of neighboring genes
were obtained from the Ensembl Genome Browser to create
synteny blocks (Cunningham et al. 2015). According to the
method of Yegorov and Good (2012), chromosome fragments with reliable synteny were matched to reconstructed,
putative pre-2R VAC models created by Nakatani et al. (2007).
The Nakatani model provides putative GAC linkage groups
that are displayed alongside their location in individual
human chromosomes and also indicates the location of
human orthologs and ohnologs in medaka, chicken, and
mouse (Nakatani et al. 2007). Thus, we were able to compare
the chromosome fragments containing our genes of interest
against VAC linkage groups to resolve the positions of the
genes of interest at consecutive stages of vertebrate genome
evolution (pre-2R VACs and post-2R GACs). Other species
were analyzed by first matching the synteny blocks with
2815
Yun et al. . doi:10.1093/molbev/msv179
their human orthologs and then comparing against the
Nakatani model.
Whole-Genome Analysis for Chromosomal Regions
with High Sequence Similarity to Chromosome 4
The amino acid sequences of known proteins in human chromosome 4 were downloaded from the Ensembl biomart database (http://www.ensembl.org). Sequence similarity analysis
against all other proteins encoded in the human genome
was then performed using a custom BLASTP pipeline implemented in the Perl programming language. For each query
protein a target protein, which had the lowest matching e
value, was identified. Chromosomal regions with a match
density of less than 3 per 10 Mb were discarded. The syntenic
relationships between chromosomes were plotted using the
Circos software package (Krzywinski et al. 2009).
Supplementary Material
Supplementary information, figures S1–S9, and tables S1–S11
are available at Molecular Biology and Evolution online (http://
www.mbe.oxfordjournals.org/).
Acknowledgments
This work was supported by grants from the Brain Research
Program (2011-0019205) and Basic Science Research Program
(2013R1A1A2010481) of the National Research Foundation
of Korea funded by the Ministry of Education, Science, and
Technology.
References
Amemiya CT, Powers TP, Prohaska SJ, Grimwood J, Schmutz J, Dickson
M, Miyake T, Schoenborn MA, Myers RM, Ruddle FH, et al. 2010.
Complete HOX cluster characterization of the coelacanth provides
further evidence for slow evolution of its genome. Proc Natl Acad Sci
U S A. 107:3622-3627.
Amores A, Catchen J, Ferrara A, Fontenot Q, Postlethwait JH. 2011.
Genome evolution and meiotic maps by massively parallel DNA
sequencing: spotted gar, an outgroup for the teleost genome duplication. Genetics 188:799-808.
Assis R, Bachtrog D. 2013. Neofunctionalization of young duplicate
genes in Drosophila. Proc Natl Acad Sci U S A. 110:17409-17414.
Burt DW, Bruley C, Dunn IC, Jones CT, Ramage A, Law AS, Morrice DR,
Paton IR, Smith J, Windsor D, et al. 1999. The dynamics of chromosome evolution in birds and mammals. Nature 402:411-413.
Caputo Barucchi V, Giovannotti M, Nisi Cerioni P, Splendiani A. 2013.
Genome duplication in early vertebrates: insights from agnathan
cytogenetics. Cytogenet Genome Res. 141:80-89.
Cardoso JC, Vieira FA, Gomes AS, Power DM. 2010. The serendipitous
origin of chordate secretin peptide family members. BMC Evol Biol.
10:135.
Cerda-Reverter JM, Larhammar D. 2000. Neuropeptide Y family of peptides: structure, anatomical expression, function, and molecular evolution. Biochem Cell Biol. 78:371-392.
Chartrel N, Alvear-Perez R, Leprince J, Iturrioz X, Reaux-Le Goazigo A,
Audinot V, Chomarat P, Coge F, Nosjean O, Rodriguez M, et al. 2007.
Comment on “Obestatin, a peptide encoded by the ghrelin gene,
opposes ghrelin’s effects on food intake.” Science 315:766; author
reply 766.
Cho HJ, Acharjee S, Moon MJ, Oh DY, Vaudry H, Kwon HB, Seong JY.
2007. Molecular evolution of neuropeptide receptors with regard to
maintaining high affinity to their authentic ligands. Gen Comp
Endocrinol. 153:98-107.
2816
MBE
Civelli O, Saito Y, Wang Z, Nothacker HP, Reinscheid RK. 2006. Orphan
GPCRs and their ligands. Pharmacol Ther. 110:525-532.
Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, CarvalhoSilva D, Clapham P, Coates G, Fitzgerald S, et al. 2015. Ensembl 2015.
Nucleic Acids Res. 43:D662–D669.
Dehal P, Boore JL. 2005. Two rounds of whole genome duplication in the
ancestral vertebrate. PLoS Biol. 3:e314.
Dores RM. 2013. Observations on the evolution of the melanocortin
receptor gene family: distinctive features of the melanocortin-2 receptor. Front Neurosci. 7:28.
Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis
by sampling trees. BMC Evol Biol. 7:214.
Eipper BA, Stoffers DA, Mains RE. 1992. The biosynthesis of neuropeptides: peptide alpha-amidation. Annu Rev Neurosci. 15:57-85.
Elphick MR, Mirabeau O. 2014. The evolution and variety of RFamidetype neuropeptides: insights from deuterostomian invertebrates.
Front Endocrinol. 5:93.
Fredriksson R, Lagerstrom MC, Lundin LG, Schioth HB. 2003. The Gprotein-coupled receptors in the human genome form five main
families. Phylogenetic analysis, paralogon groups, and fingerprints.
Mol Pharmacol. 63:1256-1272.
Haitina T, Takahashi A, Holmen L, Enberg J, Schioth HB. 2007. Further
evidence for ancient role of ACTH peptides at melanocortin (MC)
receptors; pharmacology of dogfish and lamprey peptides at dogfish
MC receptors. Peptides 28:798-805.
Harris RM, Dijkstra PD, Hofmann HA. 2014. Complex structural and
regulatory evolution of the pro-opiomelanocortin gene family. Gen
Comp Endocrinol. 195:107-115.
Hauser F, Cazzamali G, Williamson M, Blenau W, Grimmelikhuijzen CJ.
2006. A review of neurohormone GPCRs present in the fruitfly
Drosophila melanogaster and the honey bee Apis mellifera. Prog
Neurobiol. 80:1-19.
Holland PW, Garcia-Fernandez J, Williams NA, Sidow A. 1994. Gene
duplications and the origins of vertebrate development. Dev
Suppl. 125-133.
Hwang JI, Moon MJ, Park S, Kim DK, Cho EB, Ha N, Son GH, Kim K,
Vaudry H, Seong JY. 2013. Expansion of secretin-like G protein-coupled receptors and their peptide ligands via local duplications before
and after two rounds of whole-genome duplication. Mol Biol Evol.
30:1119-1130.
Hwang JI, Yun S, Moon MJ, Park CR, Seong JY. 2014. Molecular
evolution of GPCRs: GLP1/GLP1 receptors. J Mol Endocrinol.
52:T15–T27.
Jiang H, Lkhagva A, Daubnerova I, Chae HS, Simo L, Jung SH, Yoon YK,
Lee NR, Seong JY, Zitnan D, et al. 2013. Natalisin, a tachykinin-like
signaling system, regulates sexual activity and fecundity in insects.
Proc Natl Acad Sci U S A. 110:E3526–E3534.
Kim DK, Yun S, Son GH, Hwang JI, Park CR, Kim JI, Kim K, Vaudry H,
Seong JY. 2014. Coevolution of the spexin/galanin/kisspeptin family:
spexin activates galanin receptor type II and III. Endocrinology
155:1864-1873.
Krishnan A, Almen MS, Fredriksson R, Schioth HB. 2012. The origin of
GPCRs: identification of mammalian like Rhodopsin, Adhesion,
Glutamate and Frizzled GPCRs in fungi. PLoS One 7:e29817.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones
SJ, Marra MA. 2009. Circos: an information aesthetic for comparative
genomics. Genome Res. 19:1639-1645.
Lagerstrom MC, Fredriksson R, Bjarnadottir TK, Fridmanis D, Holmquist
T, Andersson J, Yan YL, Raudsepp T, Zoorob R, Kukkonen JP, et al.
2005. Origin of the prolactin-releasing hormone (PRLH) receptors:
evidence of coevolution between PRLH and a redundant neuropeptide Y receptor during vertebrate evolution. Genomics 85:688-703.
Larhammar D, Bergqvist CA. 2013. Ancient grandeur of the vertebrate
neuropeptide Y system shown by the coelacanth Latimeria chalumnae. Front Neurosci. 7:27.
Larhammar D, Salaneck E. 2004. Molecular evolution of NPY receptor
subtypes. Neuropeptides 38:141-151.
Lee YR, Tsunekawa K, Moon MJ, Um HN, Hwang JI, Osugi T, Otaki N,
Sunakawa Y, Kim K, Vaudry H, et al. 2009. Molecular evolution of
Local Gene Duplication of GPCR and Cognate Neuropeptide Superfamilies . doi:10.1093/molbev/msv179
multiple forms of kisspeptins and GPR54 receptors in vertebrates.
Endocrinology 150:2837-2846.
Lindemans M, Liu F, Janssen T, Husson SJ, Mertens I, Gade G, Schoofs L.
2009. Adipokinetic hormone signaling through the gonadotropinreleasing hormone receptor modulates egg-laying in Caenorhabditis
elegans. Proc Natl Acad Sci U S A. 106:1642-1647.
Liu Y, Lu D, Zhang Y, Li S, Liu X, Lin H. 2010. The evolution of somatostatin in vertebrates. Gene 463:21-28.
Lundin LG. 1993. Evolution of the vertebrate genome as reflected in
paralogous chromosomal regions in man and the house mouse.
Genomics 16:1-19.
Mehta TK, Ravi V, Yamasaki S, Lee AP, Lian MM, Tay BH, Tohari S, Yanai
S, Tay A, Brenner S, et al. 2013. Evidence for at least six Hox clusters
in the Japanese lamprey (Lethenteron japonicum). Proc Natl Acad Sci
U S A. 110:16044-16049.
Meyer A, Van de Peer Y. 2005. From 2R to 3R: evidence for a fish-specific
genome duplication (FSGD). Bioessays 27:937-945.
Mirabeau O, Joly JS. 2013. Molecular evolution of peptidergic signaling systems in bilaterians. Proc Natl Acad Sci U S A. 110:E2028–
E2037.
Mirabeau O, Perlas E, Severini C, Audero E, Gascuel O, Possenti R, Birney
E, Rosenthal N, Gross C. 2007. Identification of novel peptide hormones in the human proteome by hidden Markov model screening.
Genome Res. 17:320-327.
Nakatani Y, Takeda H, Kohara Y, Morishita S. 2007. Reconstruction of
the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Res. 17:1254-1265.
Nishida C, Ishijima J, Kosaka A, Tanabe H, Habermann FA, Griffin DK,
Matsuda Y. 2008. Characterization of chromosome structures of
Falconinae (Falconidae, Falconiformes, Aves) by chromosome painting and delineation of chromosome rearrangements during their
differentiation. Chromosome Res. 16:171-181.
Nordstrom KJ, Lagerstrom MC, Waller LM, Fredriksson R, Schioth HB.
2009. The Secretin GPCRs descended from the family of Adhesion
GPCRs. Mol Biol Evol. 26:71-84.
Nordstrom KJ, Sallman Almen M, Edstam MM, Fredriksson R, Schioth
HB. 2011. Independent HHsearch, Needleman–Wunsch-based, and
motif analyses reveal the overall hierarchy for most of the G proteincoupled receptor families. Mol Biol Evol. 28:2471-2480.
Oh DY, Kim K, Kwon HB, Seong JY. 2006. Cellular and molecular biology
of orphan G protein-coupled receptors. Int Rev Cytol. 252:163-218.
Ohno S. 1970. Evolution by gene duplication. Berlin (Germany):
Springer-Verlag.
MBE
Osugi T, Ubuka T, Tsutsui K. 2014. Review: evolution of GnIH and
related peptides structure and function in the chordates. Front
Neurosci. 8:255.
Park CR, Moon MJ, Park S, Kim DK, Cho EB, Millar RP, Hwang JI, Seong
JY. 2013. A novel glucagon-related peptide (GCRP) and its receptor
GCRPR account for coevolution of their family members in vertebrates. PLoS One 8:e65420.
Putnam NH, Butts T, Ferrier DE, Furlong RF, Hellsten U, Kawashima T,
Robinson-Rechavi M, Shoguchi E, Terry A, Yu JK, et al. 2008. The
amphioxus genome and the evolution of the chordate karyotype.
Nature 453:1064-1071.
Roch GJ, Tello JA, Sherwood NM. 2014. At the transition from invertebrates to vertebrates, a novel GnRH-like peptide emerges in amphioxus. Mol Biol Evol. 31:765-778.
Sefideh FA, Moon MJ, Yun S, Hong SI, Hwang JI, Seong JY. 2014. Local
duplication of gonadotropin-releasing hormone (GnRH) receptor
before two rounds of whole genome duplication and origin of the
mammalian GnRH receptor. PLoS One 9:e87901.
Steiner DF. 1998. The proprotein convertases. Curr Opin Chem Biol. 2:3139.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6:
Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol.
30:2725-2729.
Tostivint H, Ocampo Daza D, Bergqvist CA, Quan FB, Bougerol M,
Lihrmann I, Larhammar D. 2014. Molecular evolution of GPCRs:
somatostatin/urotensin II receptors. J Mol Endocrinol. 52:T61–T86.
Vaudry H, Seong JY. 2014. Neuropeptide GPCRs in neuroendocrinology.
Front Endocrinol (Lausanne). 5:41.
Venkatesh B, Lee AP, Ravi V, Maurya AK, Lian MM, Swann JB, Ohta Y,
Flajnik MF, Sutoh Y, Kasahara M, et al. 2014. Elephant shark genome
provides unique insights into gnathostome evolution. Nature
505:174-179.
Yegorov S, Good S. 2012. Using paleogenomics to study the evolution of
gene families: origin and duplication history of the relaxin family
hormones and their receptors. PLoS One 7:e32923.
Yue JX, Yu JK, Putnam NH, Holland LZ. 2014. The transcriptome of an
amphioxus, Asymmetron lucayanum, from the Bahamas: a window
into chordate evolution. Genome Biol Evol. 6:2681-2696.
Yun S, Kim DK, Furlong M, Hwang JI, Vaudry H, Seong JY. 2014. Does
kisspeptin belong to the proposed RF-amide peptide family? Front
Endocrinol (Lausanne) 5:134.
Zhang JV, Ren PG, Avsian-Kretchmer O, Luo CW, Rauch R, Klein C,
Hsueh AJ. 2005. Obestatin, a peptide encoded by the ghrelin gene,
opposes ghrelin’s effects on food intake. Science 310:996-999.
2817