Genomic V-gene repertoire in reptiles

bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Genomic V-gene repertoire in reptiles
David N. Olivieri1 , Bernardo von Haeften2 , Christian Sánchez-Espinel3 , Jose Faro2,4,5 , Francisco
Gambón-Deza4,6
1 School
of Computer Science, University of Vigo, Ourense 32004, Spain. 2 Area of Immunology, Faculty
of Biology, and Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain.
3 Nanoimmunotech SL, Pza. Fernando Conde, Montero Rı́os 9, 3620, Vigo, Spain. 4 Instituto Biomédico de
Vigo, 36310 Vigo, Spain. 5 Instituto Gulbenkian de Ciência, 2781-901 Oeiras, Portugal. 6 Servicio Gallego
de Salud (SERGAS), Unidad de Inmunologı́a, Hospital do Meixoeiro, 36210 Vigo, Spain.
[email protected], [email protected]
Abstract
Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary
lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged, one
gave rise to Squamata, while the other gave rise to Testudines, Crocodylia and birds. In this
study, we determined the genomic variable (V)-gene repertoire in reptiles corresponding to the
three main immunoglobulin (Ig) loci and the four main T-cell receptor (TCR) loci. We show
that squamata lack the TCR γ/δ genes and snakes lack the Vκ genes. In representative species
of testudines and crocodiles, the seven major Ig and TCR loci are maintained. As in mammals,
genes of the Ig loci can be grouped into well-defined clans through a multi-species phylogenetic
analysis. We show that the reptile VH and Vλ genes are distributed amongst the established
mammalian clans, while their Vκ genes are found within a single clan, nearly exclusive from
the mammalian sequences. The reptile and mammal V-genes of the TRA locus cluster into six
common evolutionary clans. In contrast, the reptile V-genes from the TRB locus cluster into three
clans, which have few mammalian members. In this locus, the V-gene sequences from mammals
appear to have undergone different evolutionary diversification processes that occurred outside
these shared reptile clans.
Keywords: Immunologic Repertoire, Reptile Evolution
1. Introduction
The order Reptilia is believed to have evolved from amphibian-like creatures, the labyrinthodonts,
which date back to the Carboniferous period some 312 million years ago (Mya) (Laurin & Reisz,
1995; Laurin & Girondot, 1999). The descendants of those precursors became progressively more
terrestrial and independent of their aquatic origins, venturing out on a larger expanse of land with
many new different habitats (Benton & Donoghue, 2007; deBraga & Rieppel, 1997; van Tuinen
& Hadly, 2004; Reisz & Müller, 2004). Evolving from this primordial amphibious ancestor, a
lineage began that would later split into two separate lines, the Sauropsids and the Synapsids, the
latter of which would eventually give rise to the line of extant mammals (Benton, 2005; Hedges &
Preprint
February 12, 2014
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Kumar, 2009). Shortly after this separation of the Sauropsids and Synapsids, the sauropsids further
divided into two separate lineages. One gave rise to the present day Squamata (lizards and snakes)
and Tuatara, while the other gave rise in a very early period to Testudines, later to Crocodylia, and
finally, gave way to birds (Benton, 2005; Hedges & Kumar, 2009).
The first jawed vertebrates developed an adaptive immune system consisting of B and T lymphocytes together with their associated antigen receptors, immunoglobulins (Ig) and T-cell receptors (TCR), respectively. This adaptive immune system evolved in parallel within Reptilia and the
sister group of mammals from their common amphibian ancestor (Janes et al., 2010).
Although Igs and TCRs share a basic modular structure (Charlemagne et al., 1998; Schroeder
& Cavacini, 2010; Narciso et al., 2011), Igs are made up of four polypeptide chains (two identical larger or heavy (H) chains and two identical smaller or light (L) chains), while TCRs are
heterodimers (Schroeder & Cavacini, 2010). The H chains of Igs consist typically of four or more
globular domains, with the N-terminal domain varying amongst the Igs of different B-cell clones
(termed variable (V) domain), and the remaining domains being shared by many B-cell clones
(termed constant (C) domains). In contrast, L chains of Igs and TCR polypeptide chains consist of
two domains, an N-terminal V domain, and a C domain. In addition, there are two different kinds
of TCRs, one composed of α/β heterodimers and the other of γ/δ heterodimers. In both Igs and
TCRs, only the V domains interact with antigen.
Each chain of the Igs and of TCRs is encoded in different loci. Jawed vertebrates have three
main loci corresponding to Ig genes (IGH, coding for H chains, and IGK and IGL, coding for κ
and λ L chains, respectively), and four main loci associated with the TCR genes (TRA, TRB, TRG
and TRD, coding for TCR α, β, γ and δ chains, respectively) (Cannon et al., 2004; Danilova &
Amemiya, 2009; Flajnik & Kasahara, 2010; Sun et al., 2012). In all these loci, multiple unique V
gene sequences (V-genes) exist that share a common homology. Each Ig and TCR V domain is
encoded by a unique combination of a V-gene, a D segment (only in Ig H and TCR β and δ chains)
and a J segment, the largest contribution to the V domain amino acid sequence being from the Vgene. Thus, identifying the V-gene repertoire within the genome of a species provides information
about its potential for mounting an adaptive immune response against antigen.
The Ig and TCR loci may have undergone an independent evolution, because they are located
in different chromosomes, or in widely separated regions within the same chromosome. One exception to this general trend exists, the TRD locus, which is embedded within the TRA locus. This
independent evolution hypothesis is supported by the formation of V-gene clusters in phylogenetic
trees constructed from these sequences (Olivieri et al., 2013). If true, this explanation would imply
the existence of an ancestral gene (or genes) that define each of the extant Ig and TCR V-genes.
In reptiles, published studies of the Ig genes, together with the functionality of their products,
are scarce (Gambón-Deza et al., 2012; Magadán-Mompó et al., 2013; Magadán-Mompó et al.,
2013). At present, there are no studies of V-genes in the the TCR loci of reptiles. In Squamata,
all the species studied so far have IgM, an IgD consisting of eleven domains, and IgY (Sun et al.,
2011; Gambón-Deza et al., 2012). In Anole, the Vκ and Vλ chains have been described and
the absence of the IGK locus in snakes (Wei et al., 2009; Gambón-Deza et al., 2009) has been
confirmed. Testudines have a more complex IGH locus structure (Li et al., 2012; Xu et al., 2009;
Magadán-Mompó et al., 2013), having one IgM gene, one IgD gene encoding eleven constant
domains, and multiple genes for IgY and IgD2. Previous studies of the IGH locus in crocodiles
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
have described three genes for IgM, one gene for IgD encoding seven viable constant domains,
three IgA genes, three IgY genes that are similar to IgY in birds, and two IgD2 genes, which
encode chimeric Igs with IgA and IgD related constant domains (Magadán-Mompó et al., 2013).
These studies have also detected in crocodiles that the IgA heavy chain is similar to the IgX and
IgA of amphibians and birds, respectively.
In contrast to the growing literature related to Igs in reptiles, there are fewer studies of TCRs in
these species. In this article, we uncover the genomic V-gene repertoire of both immunologic receptors in species representative of the three main orders of extant Reptilia: Squamata, Testudines,
and Crocodylia, and show relations that provide a deeper understanding of the evolution of this
large vertebrate lineage (Modesto & Anderson, 2004; Benton, 2005).
2. Methods
For these studies, we used genome data from whole genome shotgun (WGS) assemblies that
are publicly available at the NCBI and provided in FASTA format in the form of contigs. Until now, complete V-gene annotations have only been available in two species (i.e., human and
mouse), while incomplete annotations have existed in a few species, and even in these cases, only
within certain loci. We used our VgenExtractor software tool (Olivieri et al., 2013) to obtain the
V-gene sequences from genome files. Our software algorithm searches and extracts the genome
sequences of functional V-genes based upon known motifs. In particular, the algorithm scans large
genome files and extracts candidate exon sequences that fulfill specific criteria: they are flanked by
recombination signal sequences, they have a valid reading frame without stop codon, the length
is at least 280 bp long, and they contain two canonical cysteines and a tryptophan at specific
positions.
From the set of translated candidate exons produced by the VgenExtractor tool, a fraction of
these sequences could pass the initial filter, purely by chance, but have structures that are quite
different from a functional V-gene. Such sequences are easily discarded by subjecting all the candidate sequences to a BLASTP comparison against a V-gene consensus with a modest threshold
(e.g., evalue = 10−15 ). This operation produces a set of exons possessing the structural requirements for being functional V-genes.
Once viable V-genes are obtained we need to classify them into their respective loci. The
corresponding amino acid sequences were analyzed with the suite of tools provided by Galaxy
(Goecks et al., 2010). The analysis workflow of the amino acid sequences consist of a multiple
BLASTP alignment against a set of consensus sequences, each representing one of the seven main
Ig and TCR V-gene loci. Each candidate V-gene sequence is assigned to the locus having the
highest score with respect to the corresponding consensus.
We developed another independent algorithm for classifying V-genes. In this method, given
a V-gene sequence, we determine to which locus this sequence most likely belongs by obtaining
a heuristic score calculated from the results of a BLASTP comparison against the NR protein
database. The score is a weighted combination of information from sequence similarity results
(or hits). In particular, the score is constructed from the following: (a) the relevance and content
words from the protein description, (b) the similarity score, c) and the sequence coverage. A score
is calculated for each locus, and the V-gene sequence is assigned to the most likely locus. This
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
method has been validated from known annotations in the human and mouse (Olivieri et al., 2013).
This technique is used to show the consistency of results obtained with the the simpler workflow
described above and can be used to resolve ambiguous loci predictions of the simpler method,
which may arise in rare cases.
To compare the repertoire of V-genes in extant species, we carried out single and multi-species
phylogenetic analysis. The sequences for this study are available at a new public database repository vgenerepertoire.org (Olivieri & Gambón-Deza, 2014). At present, approximately 20,000
V-gene sequences from 82 mammals and 12 reptiles are indexed. For both the single and multispecies phylogenetic analysis, sequences were aligned using Clustal-omega (Sievers & Higgins,
2014) and trees were constructed with FastTree (Price et al., 2010) (using maximum likelihood
and WAG matrix). We used MEGA5 (Tamura et al., 2011) for visualization.
In Crocodylidae we used the NCBI accession numbers for the following species: American
alligator (Alligator mississippiensis; AKHW000000001), Chinese alligator (Alligator sinensis;
AVPB0000000001), while the following species used assemblies found in www.crocgenomes.org
: saltwater crocodile (Crocodylus porosus; croc sub2.assembly), and gharial (Gavialis gangeticus; PRJNA172383). In Testudines we used the NCBI accession numbers for the following
species: spiny softshell turtle (Apalone spinifera; APJP00000000.1), green sea turtle (Chelonia
mydas; AJIM00000000.1), Chinese soft-shelled turtle (Pelodiscus sinensis; AGCU00000000.1),
painted turtle (Chrysemys picta bellii; AHGY00000000.1). In Squamata we used the NCBI accession numbers for the following species: green anole (Anolis carolinensis; AAWZ00000000.2),
Burmese python (Python bivittatus; AEQU000000000.2), the king cobra (Ophiophagus hannah;
AZIM00000000.1), and data from the Boa Assemblathon 2 (Bradnam & et.al., 2013) (Boa constrictor; scaffold 6C file).
3. Results
From the WGS data obtained from the NCBI repositories of twelve Reptile species, we used
the VgenExtractor tool to obtain functional V-genes and classify them with respect to the locus
to which they belong. Table 1 presents a summary of all the functional V-genes encountered in
the analysis. For many of these species, it is the first time that Ig and TCR V-genes have been
described, and it completes the sparse data presently available with respect to these genes.
3.1. Squamata
Within the Squamata order, we studied the genomes of one Iguanidae (i.e., Anolis carolinensis), and three snakes (i.e., Python bivittatus, Ophiophagus hannah and Boa constrictor). The
phylogenetic trees of the V-genes for these species are shown in Figure 1.
Within the antibody V genes of the Anole, we did not detect the Vκ genes, contradicting
previous studies (Alfoldi, 2011; Wei et al., 2009). A further analysis showed that we did not
detect these Vκ sequences because they lack tryptophan at position 41 (Figure 2). This result
is surprising, since the presence and position of this amino acid is a general characteristic of all
functional V domains described so far, implying a structural alteration in this chain. The snake
species, which arose later in the evolution of this lineage, no longer possess the κ chain genes.
A possible explanation for this loss in snakes may be due to this structural modification of the κ
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Squamata
0.0
98
29
43
75
25
95
93
95
92
77
97
90
94
85
21
85
76
30
96
29
91
1.5
95
91
41
59
85
52
19
82
99
95 79
29
1.0
31
83
86
80
98
82 99
94
90
70
96
95
69
91
81
96
80 97
95
94
1.0
0.5
47
0.5
85
87
90
87
93
60
44
0.0
66
1.5
88
86
2.0
96
84
5384
100
85
4
75 6
10
77
83
64
78
73
98
74
34
79
94
94
0
92
49
83
81
10
0
95
91
80
92
78
40
87
96 6
1
90
96
63
75
93
46
94
26
90
90
62
87
73
48
87
93
82
88
95
84
90
90
95
83
97
73
89
87
99
72
86
94
99
98
87
99
87
88
96
91
70
80
87
88
59
57
92
99
94
99
83
85
29
100
6
95
88
2.0
88
99
95
75
0
92
85
18
79
86
82
88
99
69
98
36
70
84
84
75 8
6
9
94 2
17
99
90
99
87
93
84
96
80
29
31
91
90
96
75
74
65
89
88
88
88
86
93
93
90
86
64
71
92
96
41
66
92
98
51
69
89
94
66
57
38
97
95
89
80
83
89
70
93
77
95
41
88
75
92
81
79
2.0
84
88
92
91
78
2.0
1.5
99
96
1.5
92
70
94
1.0
31
98
93
94
7897
87
95
96
99
98
1.0
93
89
78 91
14
80
87 76
94
47
0
72
37
85
99
96 12
16
0.5
0
0.0
0.5
87
99
0
75
83
92
0
98
81
0
11
72
54
Anolis Carolinensis
0.0
100
96
73
Python bivittatus
IGHV
IGKV
IGLV
TRAV
TRBV
TRDV
TRGV
75
97
99
56
67
98
67
94
70
78
23
22
88
94
98
76
86
22
74
79
88
80
61
92
66
92
66
95
88
77
78
86
0
872
76
48
95
77
10
50
92
95
82
91
68
87
85
92
84 78
86
90 81
81
9092
77
64
27
67
0
83
84
76
70
90
83
93
31
49
85
76
Boa constrictor
58
20
37
95
89 6914
85
42
90
96
Ophiophagus hannah
Figure 1: V-gene repertoire in Squamata. Trees were constructed with the amino acid sequences obtained from the
VgenExtractor tool. These sequences were aligned with Clustal-omega. For constructing the phylogenetic trees, a
maximum likelihood algorithm with the WAG matrix and 500 bootstrap replicates were realized for validation. Rooting was performed at the midpoint and the linearization provided by Mega5 was applied to improve the visualization
of the trees. For each sequence, two independent classification techniques (view Material and Methods) were used
to determine its respective locus, thereby confirming that clades of the tree correspond to independent loci of Ig and
TCR.
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Table 1: Distribution of V-genes across the main Ig and TCR loci.
Order/specie
ighv igkv iglv trav trbv trgv trdv
Squamata
A. carolinensis
89 N/D 30
23
7
0
0
P. bivittatus
71
0
34
27
2
0
0
B. constrictor
54
0
17
13
1
0
0
O. hannah
77
0
28
25
1
0
0
Testudines
C. picta bellii
342 109 83 113
19
25
11
P. sinensis
117
87
45
60
16
18
10
A. spinifera
58
78
17
19
14
12
2
C. mydas
57
49
25
40
13
16
7
Crocodylia
A. mississippiensis 74
19
40
77
26
13
9
A. sinensis
112
28
60
77
29
15
12
C. porosus
46
17
22
29
19
16
3
G. gangeticus
103
20
24
55
21
15
7
all
149
134
85
131
702
353
200
207
259
334
152
245
sequence found in older Squamata. For the case of the IGH locus in the Anole, the results agree
with those we described previously by an independent method (Gambón-Deza et al., 2009).
With respect to TCR genes, our analysis showed that the Anole possesses a low number of V
genes in the TRB locus. Also, we found that V genes corresponding to the γ and δ chains (TRGV
and TRDV genes) are absent. Because of this anomalous finding, we searched the Anole genome
for genes that encode the constant (C) regions of the TCR γ and δ chains, with negative results. We
also analyzed transcriptome sequences for this species obtained from the liver and kidney (available from the NCBI public repository: Anole liver: SRA064777, trace SRR648821, Anole kidney:
SRA059267, trace SRR579557). While there were many Ig and TCR α/β mRNA sequences found
in these organs, we obtained negative results when searching for TCR γ/δ sequences. It is pos-
: : * :
: : * * : *
1
6
2
4
8
10
: :
12
14
16
* *
*
*
18
20
22
. : * :
24
26
. * : :
28
30
32
34
36
38
:
:
: :
40
42
44
. : : *
46
48
50
* * : :
52
54
56
58
60
*
*
*
: * * *
62
64
66
68
70
* * * * : * * : * .
72
74
76
78
80
82
*
84
. . *
86
88
: * * *
90
92
94
:
96
*
98
100
103
V1
T S G D I V V T Q T P S S L Q K N P G D R V D V Q C K T S S S V T S S S Y S Y M N F Y Q L L P G Q K P K L L I Y Y A T N R F E GM P D R F S G R G S G T D F T F T I N G V R P E D E G E Y Y C G Q S Y D S P L
V2
S S G N - V I T Q T P A S L Q K N P G D R V E I Q C K T S S S V T S S Y T N F I N F Y Q I L P G Q K P K L L I Y Y A T R R F E G T P D R F S G R G S G T D F T F T I D G V R P K D E G E Y Y C G Q V N N S P L
V3
S S G D I V V T Q T P S S L Q K N P G D R V D V Q C N T S S S V T G S S Y S Y M N F Y Q L R P G Q K P K L L I Y Y A T N R F E G T P D R F S G R G S G T D F T F T I N V V R P E D K G E Y Y C G Q G Y S H P L
V4
S S G E I V V T Q T P A S L Q K N P G D R V D I Q C K G S S S L D G - - - - Y I N F F Q L I P G Q K P K L L I Y Y A T N R F Q G T P H R F S G Q G S G T D Y T F S I N G V R P E D E G E Y Y C G H A I S Y P L
V5
S S G D I V V T Q T P A S L Q K N P G D R V D I Q C K T S S T V S - - - - T Y M N F Y K L I P G Q K P K L L I Y Y V T S R F E G T P D R F S G S G S G T D F T F T I S V V C P D D E G E Y Y C G Q G Y D Y P L
V6
S T G Q I V I T Q T P A L I Q A N P G D K V I F R C Q T S S S MG S - - - - Y M A L Y R L K D N Q K P K L L I Y D V T N R F E G T P Y R F S G Q G S G T D F T F T I N G V Q S E D E G E Y Y C C Q R Q S Y P L
V7
S S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S M S D D - - - - M A L Y Q F I P G K N P K L L L Y D S S S R F T G T P D R F S G S Y S G T D F T F T I S G V R P E D E R E Y Y C G Q H Y S Y P L
V8
S S G Q I V I T Q T P A S L H V N P G D R V I I Q C K A P S S M S N D - - - - M A L Y Q F I P G K N P K L L L Y E S S T R F T G T P D R F S G S Y S G T D F T F T I S G V H P E D E A K Y Y C G Q Y Y S R P L
V9
S S G Q I V I T Q A P A S L H V N P G D R V S I Q C K T S S S I S N D - - - - M H L Y Q F I P G Q N P K L L I Y Y A T G L F E G T P D R F S G S Y S G T D F T F T I S G V R P E D E A E Y Y C G Q S N S R P L
V10
S S G E I L I T Q T P V S L H V N P R D R V T L Q C K A S S S M S D D - - - - M A L Y Q F I P G K N P I L L L Y E S S T C F T G S P N H F S G S Y S G T D F T F T I S G V C P E D E G E Y Y C G Q Y Y N Q I L
V11
S S G Q I V V T Q S P A S L H V N P G D T V T I K C K A S S S I S N E - - - - MD L Y Q F I P G Q K P K L L I Y H A T Y R F E G T P D R F S G S Y S G T D F T F T I S G V R P E D E A E Y Y C G Q D Y S T P L
V12
S S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S M S D D - - - - M A L I Q F I L G Q K P K L L I H D G D T R F E G T P D R F S G S Y S G T D F T F T I S G V R P E D A A E Y Y C G Q D Y T T P L
V13
S S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S I S N Y - - - - M H L Y Q F I P G Q K P K L L I Y R A T S R F E G T P D R F S G S Y S G T D F T F T I S G V R P E D A A E Y Y C G Q G Y S T P L
V14
S S G Q I V I T Q T P A S L H V N P G D R V T I Q C K A S S S M S N D - - - - M A L Y Q F I L G Q K P K L L I H D A T S R F T G T P D H F S G S Y S G T D F T F T I S G V R P E D A A E Y Y C G Q G Y S T P L
V15
S S G Q I V I T Q T P A S L H V N P G D T V T I R C K A S S S M S N D - - - - MQ L Y Q F I P G Q N P K L L I Y D A T S R F T G T P D R F S G S Y S G T D F T F T I N G V R P E D E G E Y Y C G Q D Y S R P L
CDR1
W absence
CDR2
Figure 2: Alignment of amino acid sequences deduced from V-genes obtained manually in the genome of A. carolinensis. As can be seen, the amino acid tryptophan (W) does not occupy the canonical position 41, representing
an anomaly as compared with other functional V-genes. We produced the alignment and graphical output with the
Seaview software program (Gouy et al., 2010).
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
IGH 610 kb
IGHV
IGHC
Anole record
Anole record
Anole record
Anole record
Anole record
Anole record
Anole record
Anole record
Anole record
Anole record
TRAV
140 Kb
Figure 3: Recent duplications in IGHV (top) and TRAV (bottom) loci in the Anole. Track duplications tracks are
drawn in green. Constraints for duplications in: (a) the IGH locus are: segments ≥ 1Kb, identity ≥ 90%; (b) the TRA
locus are: segments ≤ 0.3Kb; identity ≥ 90%. Duplications only exist in segments where V-genes are located (not in
constant regions), and in the case of the IGH locus, some coincide with the position of V-genes, indicated by the red
lines.
sible that cells having TCRγ/δ may be abundant in other specialized tissues where transcriptome
data is lacking. Nonetheless, our studies, carried out with data from different tissue and different
sequencing methods, produced null results for searches of V-genes or constant gene sequences for
TCR γ and δ chains. Thus, these results strongly suggest that these species have lost the TCRγ/δ
receptor.
In the three serpents studied, there are few V-genes (Table 1). Most of these sequences from
these species belong to the IGH and IGL loci (Figure 1). In the three snakes, we found that the
IGK locus does not exist. From a phylogenetic analysis, we found that V sequences from the
IGH locus belong to a single group or clan that has undergone a recent expansion. This result is
different from that found in A. carolinensis, where many V-gene clans are found in the IGH locus.
In the case of the IGL locus, the number of V-genes in Serpentes is similar to that in the Anole.
Finally, just as with A. carolinensis, no V-gene sequences corresponding to the TRG and TRD loci
were found and only one or two V genes were identified in the snake TRB locus.
From studies we carried out previously in mammals, we found a higher birth and death rate
amongst the V-genes of the Ig loci as compared to that in the TCR loci. Trees of V-gene sequences
in reptiles have the same structure, suggesting that the same mechanisms exist. Figure 3 shows
recent duplications that occurred within the IGH and TRA loci. These loci are contained within
chromosome segments of length 610 Kb and 140 Kb, respectively. In this figure, track duplications
are drawn in green and represent those sequences having a similarity greater than 90 percent for
segment sizes > 1Kb. As can be seen, there are many duplications in the IGH locus and these
duplication processes occurs specifically in the IGHV region and not in the constant (i.e., IGHC)
region. By applying the same similarity criteria, few duplication tracks are visible in the TRAV
region, and those that are visible have smaller segment sizes: ≤ 0.3 Kb. Finally, we can observe
that some duplications coincide with V-genes (indicated with red lines) in the IGHV region, while
no V-genes coincide with the duplication tracks shown in the TRAV region.
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Testudines
26
8 6
919 1
79
87
33 89
81
75 93
93
75
96
70
85
100 58
95
99
97
82
98
75
14
84
98
31
90
69
30
89
72
94
85
78
90
33
85
86
77
1028
005
83
72
75
86
78
71
0
30
42
64
85
99
098 92
54
95
74
85
91
80
1.5
45
0
84
91
84
100
65
61
91
81
6
68
87
96
1.4
73
74
92
90
83
96
87
94
97
95
76
64
41
94
74
82
81
87
76
85
91
23
78
49
70
95
47
98
36
6
85
93
93
84
91
12
17
84
87
93
25
76
77 96
92
93
81
62
80
99
75
89
77
80
76
75
75
85
48
76
88
71 89
93
98
77
70
83
100
76
79 81
64
87
0
10
79
88
93
87
99
88
92
85
76
97
89 6
36
94
63
75
75
90
76
78
81
0 77
95
67
86
85
98
18
82
9096
35
89
86
25
70
74
86
Chelonia mydas
86
74
91
62
80
73
90
83
99
77 85
89
74
84
96
92
97
36
85
77
88
56
74
50
90
76
80
78
85
63
95
84
76
72
78
60
80
83
82
89 67
28
92
94
88
84
73
68
17
25
84
87
96
89
28
97
26
58
81
94
94
79
42
92
96
95
84
89 90
14
37
81
5998
91
80
95
3
97
63
93
73
89
1.2
96
91
8774
92
85
83
93
78
85
97
41
74
57
98
7475
1.0
78
70
89
97
47
100
84
81
83
67
72
97
86
92
73
79
96
85 34
92
60
94
99
14
78
88
83
6
85
66
89
77
69
79
91
99
92
0.8
99
1.0
98
99
51
67
91
44
91
41
83
70
81
94
99
26
97
75
919594
94
72
96
87
84
83
100 85
73
0.6
89
78
75
0.4
74
14
88
94
88
75
77
26
0.0
0.2
93
84
84
36 88
92
81
21
88
89
78
79
81
45
71
73
41
91
91 0
98
31
9
895
77 0
8
9
58
937
82
89
94
Pelodiscus sinensis
89
56 5722
95 67
75
7781
82
10089
35
9776 98
68
98
99
87
91
83
7395
90
975 69
9
91
79 8
870
0
52
9891
892 9
1 0
94 8 38
4
19
9981
51
98
73
33
85
Chrysemys picta bellii
0
4099
90
74
70368599
0.5
86
85
795988
8914
7390
0
89 8699
99
90
95
0.0
0
9
715020
1 9
78800 8
15
9 00
75 0
8876
61
7487
82
80
93 99
75 93
74 840
97
87 8274 91
98
97
5321 100
71
95
76
85
98
65
IGHV
IGKV
IGLV
TRAV
TRBV
TRDV
TRGV
91
81
90 50
93
70
976
1
9559
4
917
93 378
3
9 8736
71 9 85
83
62
89
9778 8
986 7679909
82
52
83
73
67
98
97
63
98 95
79
90
99
0 96
86
88
87
93
82 8699
64
83
85
4594
95
81
99
97
9189
77
9
98 91 63759
83
95
107997
0
186064
78
660
77
71
9
92
8315
6574222
90 9
8 9
758 3
703
8
77 0
94
87
98
46
16
94
79
95
85
90
48
88
89
29
71
83
94
0
96
98
29 78
89
90
98
89
99
76
74
75
93
73 76
70 88
85
78
77
889
68789 7
8
37
31 604
8
75
9527
9088 9
7
724
884 86
93
5
073496
92
94
76
82
78
86
33
90
38
77 88 0
81
87
18
78
85
93
78
96
91
8862
99
99
91 56
71
81
88
45
83
84
82
99
98 86
47 72
89
0
55
6973
57
79
61
7795
77
89
12
87
96
75
7295
74
81
76 92
40
78
90
89
99
78
84
90
77 52
100
0
93
89
88
92
88
87
87
90
0
99
89
77
93
72
49
18
93
89
99
85
9880
89
85
85
26
65
45
5
78
100
85
5
2,0
99
89
98
83
29
99
100
5
73
9897
2476
25 83
69
82
46 83
93
8588
66
77
78
90
87
92 86800
90
89 783
7 93
90 6
81
98
86
83
1,5
80
81
6
86
82
82
94
98
99 74
59
75
98
92
89
481
2.0
6
73 2909
8
7 49
9969 7 8
777
789
584
1
86
91
88
77
75883
4
91
50 79
71 86
6778
73
84
27 8693
996
94 9
8
83 784
92 75
93 0
98
81
9986
97
99 85
94
41
55 45
8787
89 6476
95 40 78
45
100
88 93
100
98
86
100
88
99
88
85
81
71
54
787575
6
83 3
80
98
9997
96
9
7
44 8
26
78
90
98 48
9777
85
75
30
7
36472
90
97
21
95
94 7763
85
85
85
100
8798
9187
76 7581
77
89
76
99
91
9069
76
99
71
76
90
9599
7375
75
79
84
79
77 91
83
75
93 78
77 67
86
1.5
95
644
84
73
909
8
81
0,5
1,0
91 6067
97 9
899 1
78
78279
60
40
7881 8
98836
29942
94 7 88
78 4
8
89902
81
9487
990
7
96 79
9780
51
77
98
9488
64
99
71
87
1.0
97
96
99
76
79
69
0
92
772944
99
96
84
87
87280807
91
19
86
0 740
9
0
10
5 76
84
75 84
581986
89
18700837696
91 62 40
9 897
78019 2
97 75
9
0 1 95
88
9
97229 14
7
88 7
0.5
66
76
0,0
0.0
Apalone spinifera
Figure 4: V-genes in Testudines. The trees were constructed in the same way as described in Figure 1.
3.2. Testudines and Crocodylia
In the differentiation of the reptile lineage to birds, the Testudine and Crocodylia branches
emerged. Four Testudine genomes are available: those of three land tortoises, Chrysemys picta
bellii, Pelodiscus sinensis and Apalone spinifera, and one marine turtle, Chelonia mydas. Figure
4 shows the phylogenetic analysis of the V genes sequences from these species. For the western
painted turtle, C. picta bellii, we found more than 700 V-genes, being the specie with the most
V-genes that we have studied to date. By comparison, the marine turtle has ≈ 1/3 as many genes
(Table 1). In all four turtles, there is a preponderance of Ig V-genes as compared with the number
of TCR V-genes. In the three Ig loci, there is an expansion of multiple V-gene clans, similar
to what happens in the Anole, suggesting that the structure of these genes is more diverse than
that found in mammals (Gambón-Deza et al., 2009). Within the Ig loci, there are more Ig heavy
chain genes than those for light chains; the λ light chain has the least Ig V-genes. With respect
to TCR V-genes, the TRA locus have the largest set, while the TRB locus has on average much
less genes. Such a low amount of Vβ genes is observed in all reptiles, opening the possibility that
an underlying process exists. In contrast to the Squamata, each of the turtle species we studied
possess TCR γ and δ V-genes, albeit with a wide variance amongst them. From the phylogenetic
study of these V-genes, the TRGV genes are included in one specific clan, while the TRDV genes
are within the clan of the TRAV genes (Figure 4).
In the order Crocodylia, there are four publicly available genomes (Alligator mississippiensis,
Alligator sinensis Crocodylus porosus, and Gavialis gangeticus). In each species, we found V-
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Crocodylia
97
78
84
96
81
94
861
1
9
86 3
68
43
87
93
79
36
93
77
95
95
57
0
99
92
69
67
80
98
81
97
96
86
92
98
93
75
94
91
94
85
98
84
77
50
85
90
72
9
925
55
79
92
0
100
8916
86
98
69
0
72
97
95
90
86
96
72
76
89
26
78
88
34
88
56
79
74
92
32 7
7
81
80
85
92
67
80
0
84
98
75
85
96
72
97
0
88
88
99
95 7
98
85
32
12
96
93
45
98
99
69
92
81
97
28
73
43
79
93
88
12
92
93 49
91
97
56
89
96
82
87
75
75
96
83
90
34
84
76
94
74
10
96
0
92
95
76
83
97
67
3
97
93
97
84
94
66
89 84
99
98
63
95 75
97
86
85
100
72
96
96
80
87
95
7
90
88
94
99 85
0
99
77
87
70
2
791
0
58
82
81
97
62
96
86
90 82
54
76
86
87
77
88
89
91
80
1.5
20
85
9604
85
4
7175 7
91
84
90
30
2
73 6
80
21
75
94
0
88
99
93
80
88
92
90
10
0
81
73
95
75
80
93
0
26
64
96
85
72
76
77
90
94
93
83
8
0
81
82
96
1.0
74
85
87
82
88
8474
63
88
88
91
98
75
76
3
83 90
79
82
76
0
88
95
14
97
82
74
85
64
0
89 96
0.5
52
56
26
28
78
44
85
90
88
96
94
85
9498
52
83
99
49
20
80
91 93 84
89
91 54
97
92
49
97
99
8678
Alligator sinensis
97
74
8679
5
98
92
96
98
98
80
76
34
80
76
96
0
10
19
95
82
75
98
84
93
88
83
85
82
96
74
87
66
77
93
64
86
85
8
65 6
85
95
79
99
85
79 84
78
67
7883
77
85
0
77
0
35
86
91
88
99 99
88
1.5
80
406
38
93 71
97
69
9681 98
60
0
79
91
49
8474
76
44 78
85
84
56
0.0
35
1.0
83
98
98
75
94
92
100
80
920
95
82
93
96
96
66
84
82
93
87
76
90
93
79
76
0.5
10
0
3
90
9
97
0
0.0
89
96
74
93
IGHV
IGKV
IGLV
TRAV
TRBV
TRDV
TRGV
88
33
88
99
81
77
78 89
75
73
96
76
94
74 98
84 81
96
5
250
90
88
99
84
94 41
54
26
80
0
100
75
91
25
89
91
85
66
87
93
26
74
89
Alligator mississippensis
97
91
99
68
79
84
28
0
82
8432
83
95
75
99
95
92
91
97
98
85
20
9379
95
26
63
97
99
81
34
9
84
98
79
33
80
56
93
49
96
1.5
38
98
28
82
66
0
88
85
71
76
49
99
84
94
15 41
95
81
85
92
87
84
79
54
58
40
78
99
89
91
89
96
96
84
100
96
619
86 3
69
67
92
66 83
87
31
82
77
100
68
87
97
79
85
92
79
83
93
93
95
57
0
92
95
18
79
68
0
97
81
43
85 91
76
3982
98
88
98
80
84
87
96
92
79
79
76
99
99
83
99
85
1.0
87
24
40
99
87
66
0
3
100
100
97
81
80
78
20
95
97
75
81
75
99
98
89
45
98
19
94
56
99
74
91
29
0.5
80
89
806
9
6977 93
89
1.5
30
79 34 85
99
80
1.0
49
76
88
99
85
83
79
92
0
0
83
98
85
35
76
5674
80
91 84
0.5
780
8
69
82
99
84
99
0.0
83 97
89
71
82
89
100
96
0.0
Gavialis gangeticus
Crocodylus porosus
Figure 5: V-genes in Crocodylia. The trees were constructed in the same way as described in Figure 1.
regions of the three main Ig loci and four TCR loci (Table 1). There are more V-genes for Ig
than for TCR. Amongst the Ig V-genes, the majority of them are within the IGH locus. All four
Crocodylia species have higher numbers of V-genes in the IGL locus than in the IGK one. Figure
5 shows the phylogenetic trees of the V-genes for each of these species.
3.3. Clans in V-gene sequences
We studied the phylogenetic relationships of V-genes from the 82 mammal and 12 reptile
species available in Vgenerepertoire.org (Olivieri & Gambón-Deza, 2014). From the large
set of immunoglobulin V gene sequences in this study, there is evidence for distinct groups or
clans (Lefranc & Lefranc, 2001) of related sequences from different species amongst the clades.
Previous studies have identified such clans in human and mice; the IMGT (Lefranc et al., 2009;
Giudicelli & Lefranc, 2012) has annotated three clans amongst the VH genes, three in Vκ, and at
least five clans in Vλ (Lefranc & Lefranc, 2001; Giudicelli & Lefranc, 1999).
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 6 shows multi-species trees obtained from alignments of the V-genes from the Ig loci.
Since the purpose of these trees was to explore the clan structure, we used all available sequences
of reptiles and mammals instead of trying to retain the same number of species from each class.
In these plots, the sequences corresponding to mammals are represented in blue, while those for
reptiles are drawn in red. The nodes having sequences which belong to clans annotated by the
IMGT are labelled in the trees. As seen in the trees of Figure 6, many sequences do not belong
to a previously defined clan. Since the clan studies to date have been based upon a limited set
of species, the set of identified clans has been necessarily inadequate. It is only now, with the
availability of large number of V-genes sequences from a broad representation of mammal and
reptile species (Olivieri & Gambón-Deza, 2014), that it is possible to define new clans.
In the IGHV tree of Figure 6, the V-gene sequences of reptiles are grouped together and these
groups are interspersed within mammal clans. For example, Clan-II contains sequences from
Testudines, Crocodylia, and Squamata that together form subgroups, and these subgroups are
interposed with several mammalian subgroups comprising many sequences. The tree for the IGLV
sequences shows that the reptile Vλ sequences are also grouped in several subgroups and these are
distributed amongst the five mammalian clans previously defined by the IMGT. In the IGK locus,
more Vκ clans can be identified in mammals than previously described, so that the present clan
nomenclature is not sufficient to group all the taxa. All the reptile IGKV genes cluster into a single
group forming an independent clan. Also, this clan contains only a small group of mammal V-gene
sequences, indicating a different evolution in this locus between mammals and reptiles.
V-gene clans have not been previously identified within the TCR loci. The multi-species
trees of Figure 7, consisting of TCR sequences, demonstrate that the clan concept is also relevant amongst these loci. Indeed, in the TRA locus, six clans can be identified, each having
sequences from both reptiles and mammals. As can be seen in the figure, the roots of each of the
clans correspond to reptile sequences, indicating the maintenance of these sequences over longer
periods of time than those found in mammals. The clans are numbered according to the amount
of taxons in the clan, ranked from largest (Clan I) to smallest (Clan V). The tree also demonstrates
that clans I and II diversified significantly in mammals, while in others, such as clan V, there was
less diversification in mammals than witnessed in reptiles. Also, we can deduce the presence of
additional clans in reptiles that retain remnants of sequences that were later lost in mammals.
In the tree for the TRB locus in Figure 7, three distinct clans can be identified that group
mammal and reptile taxons. Nonetheless, the majority of mammalian sequences in this locus do
not share the same clan with those from reptiles. Few mammal sequences are found within the
reptile clans, while those grouped in separate clans appear to have undergone an evolutionary
diversification different from reptiles within this locus.
Trees from the TCR γ/δ have a similar structure. In the TRDV tree, three clans can be identified that contain sequences from mammals, Testudines, and Crocodylia (this locus does not exist
in Squamata, as described previously). Similar results are seen in the multi-species tree for the
TRG locus, but in this case, the majority of sequences from reptiles belong to a single clan.
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
0.0
---------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------
Test/
Squa
--------------------------------------------------------------------------------- ------------ ----------- ------------- ---------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ----------------------------------------------------------------------------------------------------- ---------------------------------------------
Test/Croc/Squa
Test/Croc
0.0
0.2
0.4
-------------------
0.2
0.3
0.4
0.5
Clan II
0.6
0.7
Test/Squa
IGKV
0.6
-----------------------------
0.1
--------- -------------------------------------------- --------- -------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Clan III
ClanI
----
0.8
----
Clan II
Clan III
IGHV
0.0
-----------------------------------------------------
Test/Squa
----
------------------------------------
0.2
Test/
Squa
----------------- ----- ---------------------------------
Test/Croc/Squa
0.4
-------------------------------------------- ----- ------------------------------ ---------------------
Test/Croc/Squa
---- ------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------
0.6
Clan IV
0.8
Test/Croc/Squa
Clan I
Clan V
------------------------------------------------------------------ ----------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------
Test/Croc/Squa
------------------------------------------------- --------------- --------------- --------------- --------------- --------------- --------------------------------
----------
Test/Croc/Squa
IGLV
-------------------------------------------------------------------------------------------------
Clan I
Clan III
Test/Croc/Squa
-------------------------
Test/Croc
/Squa
Clan II
------------------------------------------------------------------------------------------------------------------------------------
---
---------
Test/Croc
Squa
Figure 6: Trees were generated from 11 reptiles and 50 mammals using the V-gene sequences of the three Ig loci. The
clans with reptile taxa are marked in red while those for mammals are indicated in blue. Previous studies (Lefranc
& Lefranc, 2001; Giudicelli & Lefranc, 1999) have defined the presence of the clans indicated in the plots. The
alignment of the amino acid sequences was performed with clustal-omega and tree was constructed using FastTree
with the WAG matrix. Plots were constructed using MEGA v5.2 with the linearized branch option. The labels of the
Reptilia orders correspond to Testudines (Test), Crocodylia (Croc), and Squamata (Squa).
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
0.0
Squa/Tes/Cro
0.0
----------------------------------------------
0.5
---- - Squa/Tes/Cro ------------------
Tes/Cro--------------------
1.0
0.5
------- -----------------------------------------
Squa/Tes/Cro ------------------------------------
Clan IV
Clan I
1.5
Clan VI
------------------------------
Clan II
Squa/Tes/Cro-------------------------------------------------------------------------------------
TRAV
Clan V
Clan III
Clan I
---------------------------------------
Squa/Tes/Cro
1.0
Clan II
1.5
Clan III
TRBV
------------------------------------------------- Squa/Tes/Cro
-----------------------
Mammal Clans
--------------------------
Squa/Tes/Cro
0.0
0.5
0.0
0.5
1.0
1.0
1.5
1.5
Tes/Cro
----------------------------------------------------------------
*
TRDV
*
-------------------------------------------------
TRGV
O
Cro
--------
----
Tes/Cro
Cro
*
---------------------------------------- ----- ------- ------- ------- ------- ------------------------
------------------------- ------- ------- ------- ------- ----------------- -------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------
----------------
*
Tes/Cro
*
Tes/Cro
Figure 7: Trees were generated from 11 reptiles and 50 mammals using the V-gene sequences of the four TCR loci.
The clans in reptile taxa are marked in red while those for mammals are in blue. The nodes that root the mammal
clans with the reptile clans are marked by (*), while others that are exclusively from mammals or reptiles are marked
with (o). The alignment and tree construction was performed in a similar way to that of Figure 6. The labels of the
Reptilia orders correspond to Testudines (Test), Crocodylia (Croc), and Squamata (Squa).
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
4. Discussion
How did the genes that code for Igs and TCRs arise and evolve in lineages of reptiles? It
is known that extant reptiles and mammals are separate vertebrate lineages that descended from
a common ancestor. These vertebrates must have inherited their antigen recognition machinery,
namely Ig and TCR genes, from this common ancestor. These vertebrates evolved on land in
different habitats, and underwent further speciation. After the divergence from mammals, the
reptile lineage split in two. One of these lines led to present day Squamata, while the other gave
way to Testudines and Crocodylia. We found that the Ig V-gene repertoire in reptiles is similar
in structure to that found in mammals. Just as in mammals, the data suggests that Ig V-genes
underwent a diversification process with high birth/death rates.
Another striking result is that the Squamata species lost the TCRγ/δ locus. Neither V- nor
C-gene sequences were found in either the genome or several transcriptome datasets, indicating
the absence of these genes. In each of the Squamata species we studied, there is a low number
of V-genes in both the TRA and TRB loci, perhaps underscoring a particular characteristic of
their TCRα/β. This study also revealed several relevant facts with regard to the evolution of
the Ig loci. From A. carolinensis, we found that the Vκ sequences have an anomalous alteration
in the canonical tryptophan, whose presence is a general characteristic of all functional V-genes
described so far. This alteration observed in Iguanidae may be related to the loss of this locus in the
suborder of Serpentes. It is known that birds also lost the IGK locus. However, in the evolutionary
line that gave rise to Testudines and Crocodylia, common to that of birds, we did not detect this
loss of tryptophan 41 at the canonical position. It is thus surprising that two evolutionary lines,
serpents and birds, that diverged ∼ 275 Mya independently lost the IGK locus.
In the Testudines and Crocodylia lineage, there is no such gene loss as seen in the Squamata.
Table 1, shows the V-gene distribution across all the seven major Ig and TCR loci. While the
average number of TRB V-genes is larger than that found in Squamata, it is still less than the
average number found in mammals (Olivieri & Gambón-Deza, 2014). For the case of the TRG
and TRD loci, the average number of V-genes was similar to that found in mammals.
At the evolutionary level, there are studies that show that the VH genes are grouped into three
clans (Lefranc & Lefranc, 2001; Giudicelli & Lefranc, 1999; Lefranc et al., 2009). These studies
distributed the sequences in clans from limited information based upon a small number of mammal
species. In this work, we presented multi-species trees formed from the V-genes of more than 90
mammal and reptile species. These trees show that additional clans exist beyond those described
from previous annotations.
In a previous analysis of A. carolinensis (Gambón-Deza et al., 2009), we showed that its VH
gene sequences cluster into five distinct groups, one of which is quite different from the rest. The
necessity to define new clans is shown in the reptilian VH and Vκ sequences. In the Vκ genes
of Testudines and Crocodylia, there exists a single clan different from those of mammals and
within which mammals have only a few V-gene sequences (Figure 6). This suggests that reptiles
generated κ chains from a homogeneous ancestral group, from which mammals either diversified
very little or lost most of the variants that were generated. Both in serpents and in birds, the κ chain
was lost, indicating that the Vκ genes expressed in reptiles may be dispensable. In contrast, the
IGKV tree of Figure 6 demonstrates the large κ chain diversification that took place in mammals,
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
corresponding to the most common Ig chain in these species, such as the case in human and mouse.
The retention of the five clans in the Vλ gene sequences, as described previously in mammals,
is significant because it suggests a positive selection mechanism. The reptile sequences maintain
this distribution of five V-genes clans found in mammals and each is rooted with a reptile clan.
The difference in sequences (which may be structural) that conditioned the grouping within clans
has been maintained in both lineages for more than 300 Mya, indicating functional transcendency
not yet understood.
Instead of recognizing antigen directly, TCRs recognize a complex of antigen-derived peptide and a major histocompatibility complex (MHC) molecule. Thus, this restriction could be
accounted for by a co-evolution selection between the TCR V-genes with MHC molecules. At
present, there are few comprehensive studies exist that characterize the MHC molecules in reptiles
(Glaberman et al., 2008; Glaberman & Caccone, 2008; Glaberman et al., 2009).
In a previous publication (Olivieri et al., 2013), we commented on the presence of long branches
in the trees corresponding to the TCR V-gene sequences of mammals. This same phenomena is
also observed in the TCR V-gene trees for reptiles. Due to the ability to study a large number of
V-gene sequences in these loci, we also describe for the first time the presence of well-defined
clans in the multi-species trees. In the TRA locus, the reptiles and mammals share common clans
and the roots of the clades are from both reptiles and mammals, indicating a common origin. One
explanation for this observed phenomena is a structural or functional preservation of products from
this loci in species that diverged more than 300 Mya.
The TRBV multi-species tree of Figure 7 describes an evolution between reptiles and mammals
that is quite different from that seen in the TRAV tree. The Vα sequences evolved from common
ancestors, as seen by the shared mammal and reptile root nodes. The distribution of the Vα
sequences are also evenly distributed amongst the clans. This same homogeneous distribution
is not seen in the TRBV tree. In particular, the vast majority of mammal Vβ sequences have
diversified into separate clans (i.e., mammal clans). In this way, the Vβ sequences of reptiles group
within tree old clusters with little representation from mammals. Thus, contrary to Vα sequences,
reptile and mammal Vβ sequences are segregated from each other, in a way reminiscent of Vκ
sequences. While the common evolutionary features of Vα in reptiles and mammals supports a
coevolution hypothesis between TCR and MHC, the evolutionary differences observed in the Vβ
suggests this chain may have a different role in reptiles and mammals. Studies of MHC evolution
in reptiles could provide clues that clarify this issue.
1
References
Alfoldi, J. e. a. (2011). The genome of the green anole lizard and a comparative analysis with birds and mammals.
Nature, 477, 587–591. doi:10.1038/nature1039.
Benton, M. (2005). Vertebrate Palaeontology. Oxford, UK: Blackwell.
Benton, M., & Donoghue, P. (2007). Paleontological evidence to date the tree of life. Mol. Biol. Evol., 24, 26–53.
doi:10.1093/molbev/msl150.
Bradnam, K., & et.al. (2013). Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate
species. Gigascience, 2, 10. doi:10.1186/2047-217X-2-10.
Cannon, J., Haire, R., Rast, J., & Litman, G. (2004). The phylogenetic origins of the antigen-binding receptors and
somatic diversification mechanisms. Immunological Reviews, 200, 12–22.
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
Charlemagne, J., Fellah, J., Guerra, A., Kerfourn, F., & Partula, S. (1998). T-cell receptors in ectothermic vertebrates.
Immunological Reviews, 166, 87–102.
Danilova, N., & Amemiya, C. T. (2009). Going adaptive. Annals of the New York Academy of Sciences, 1168,
130–155.
deBraga, M., & Rieppel, O. (1997). Reptile phylogeny and the interrelationships of turtles. Zool. J. Linnean Soc.,
120, 281–354.
Flajnik, M., & Kasahara, M. (2010). Origin and evolution of the adaptive immune system: genetic events and selective
pressures. Nat Rev Genet., 11, 47–59. doi:10.1038/nrg2703.
Gambón-Deza, F., Sánchez-Espinel, C., Mirete-Bachiller, S., & S, M.-M. (2012). Snakes antibodies. Dev Comp
Immunol., 38, 1–9. doi:10.1016/j.dci.2012.03.001.8.
Gambón-Deza, F., Sánchez-Espinel, C., & Magadán-Mompó, S. (2009). The immunoglobulin heavy chain locus in
the reptile anolis carolinensis. Mol Immunol., 46, 1679–87. doi:10.1016/j.molimm.2009.02.019.
Giudicelli, V., & Lefranc, M. (1999). Ontology for immunogenetics: the imgt-ontology. Bioinformatics, 15, 1047–54.
Giudicelli, V., & Lefranc, M. (2012). Imgt-ontology 2012. Front Genet., 3, 79. doi:10.3389/fgene.2012.00079.
Glaberman, S., & Caccone, A. (2008). Species-specific evolution of class i mhc genes in iguanas (order: Squamata;
subfamily: Iguaninae). Immunogenetics, 60, 371–82. doi:10.1007/s00251-008-0298-y.
Glaberman, S., DuPasquier, L., & Caccone, A. (2008). Characterization of a nonclassical class i mhc gene in a
reptile, the galápagos marine iguana (amblyrhynchus cristatus). PLoS One, 3, e2859. doi:10.1371/journal.
pone.0002859.
Glaberman, S., Moreno, M., & Caccone, A. (2009). Characterization and evolution of mhc class ii b genes in
galápagos marine iguanas (amblyrhynchus cristatus). Dev Comp Immunol, 33, 939–47. doi:doi:10.1016/j.
dci.2009.03.003.
Goecks, J., Nekrutenko, A., Taylor, J., & Galaxy-Team. (2010). Galaxy: a comprehensive approach for supporting
accessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 11, R86.
doi:10.1186/gb-2010-11-8-r86.
Gouy, M., Guindon, S., & Olivier, G. (2010). Seaview version 4: a multiplatform graphical user interface for sequence
alignment and phylogenetic tree building. Molecular Biology and Evolution, 27, 221–224.
Hedges, S., & Kumar, S. (2009). The Timetree of Life. New York, USA: Oxford Univ. Press.
Janes, D., Organ, C., Fujita, M., Shedlock, A., & Edwards, S. (2010). Genome evolution in reptilia, the sister group
of mammals. Annu. Rev. Genom. Human Genet., 11, 239–264. doi:10.1146/annurev-genom-082509-141646.
Laurin, M., & Girondot, M. (1999). Embryo retention in sarcopterygians, and the origin of the extra-embryonic
membranes of the amniotic egg. Ann. Sci. Nat. Zool., 20, 99–104.
Laurin, M., & Reisz, R. (1995). A reevaluation of early amniote phylogeny. Zoological Journal of the Linnean
Society, 113, 165–223. doi:10.1111/j.1096-3642.1995.tb00932.x.
Lefranc, M., Giudicelli, V., Ginestoux, C., Jabado-Michaloud, J., Folch, G., Bellahcene, F., Wu, Y., Gemrot, E.,
Brochet, X., Lane, J., Regnier, L., Ehrenmann, F., Lefranc, G., & Duroux, P. (2009). Imgt, the international
immunogenetics information system. Nucleic Acids Res, 37, D1006–12. doi:10.1093/nar/gkn838.
Lefranc, M., & Lefranc, G. (2001). The Immunoglobulin FactsBook. London, UK: Academic Press.
Li, L., Wang, T., Sun, Y., Cheng, G., Yang, H., Wei, Z., Wang, P., Hu, X., Ren, L., Meng, Q., Zhang, R., Guo,
Y., Hammarström, L., Li, N., & Zhao, Y. (2012). Extensive diversification of igd-, igy-, and truncated igy(δfc)encoding genes in the red-eared turtle (trachemys scripta elegans). J Immunol., 189, 3995–4004. doi:10.4049/
jimmunol.1200188.
Magadán-Mompó, S., Sánchez-Espinel, C., & Gambón-Deza, F. (2013). Igh loci of american alligator and saltwater
crocodile shed light on iga evolution.. Immunogenetics, 65, 531–41. doi:10.1007/s00251-013-0692-y.
Magadán-Mompó, S., Sánchez-Espinel, C., & Gambón-Deza, F. (2013). Immunoglobulin genes of the turtles. Immunogenetics, 65, 227–37. doi:10.1007/s00251-012-0672-7.
Modesto, S., & Anderson, J. (2004). The phylogenetic definition of reptilia. Systematic Biology, 53, 815–821.
doi:10.1080/10635150490503026.
Narciso, J., Uy, I., Cabang, A., Chavez, J., Lorenzo, J., Padilla-Concepcion, G., & Padlan, E. (2011). Analysis of the
antibody structure based on high-resolution crystallographic studies. New Biotechnology, 28, 435–47.
Olivieri, D., Faro, J., von Haeften, B., & Sánchez-Espinel, F., Gambon-Deza (2013). An automated algorithm for
bioRxiv preprint first posted online Feb. 12, 2014; doi: http://dx.doi.org/10.1101/002618. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.
extracting functional immunologic v-genes from genomes in jawed vertebrates. Immunogenetics, 65, 691–702.
doi:10.1007/s00251-013-0715-8.
Olivieri, D., & Gambón-Deza, F. (2014). Vgenerepertoire.org identifies and stores variable genes of immunoglobulins
and t-cell receptors from the genomes of jawed vertebrates. bioRxiv, . doi:10.1101/002139.
Price, M., Dehal, P., & AP, A. (2010). Fasttree 2–approximately maximum-likelihood trees for large alignments.
PLoS One, 5, e9490. doi:10.1371/journal.pone.0009490.
Reisz, R., & Müller, J. (2004). Molecular timescales and the fossil record: a paleontological perspective. Trends
Genet., 20, 237–41;.
Schroeder, H., & Cavacini, L. (2010). Structure and function of immunoglobulins. Journal of Allergy and Clinical
Immunology, 125, S41 – S52.
Sievers, F., & Higgins, D. (2014). Clustal omega, accurate alignment of very large numbers of sequences. Methods
Mol Biol., 1079, 105–16. doi:10.1007/978-1-62703-646-7_6.
Sun, Y., Wei, Z., Hammarstrom, L., & Zhao, Y. (2011). The immunoglobulin δ gene in jawed vertebrates: a comparative overview. Dev Comp Immunol., 35, 975–81. doi:10.1016/j.dci.2010.12.010.
Sun, Y., Wei, Z., Li, N., & Zhao, Y. (2012). A comparative overview of immunoglobulin genes and the generation of
their diversity in tetrapods. Developmental & Comparative Immunology, 39, 103–109.
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., & Kumar, S. (2011). Mega5: molecular evolutionary
genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular
Biology and Evolution, 28, 2731–2739. doi:10.1093/molbev/msr121.
van Tuinen, M., & Hadly, E. (2004). Error in estimation of rate and time inferred from the early amniote fossil record
and avian molecular clocks. J Mol Evol., 59, 267–76.
Wei, Z., Wu, Q., Ren, L., Hu, X., Guo, Y., Warr, G., Hammarström, L., Li, N., & Zhao, Y. (2009). Expression of igm,
igd, and igy in a reptile, anolis carolinensis. J. Immunol., 183, 3858–64. doi:10.4049/jimmunol.0803251.
Xu, Z., Wang, G., & Nie, P. (2009). Igm, igd and igy and their expression pattern in the chinese soft-shelled turtle
pelodiscus sinensis. Mol Immunol., 46, 2124–32. doi:10.1016/j.molimm.2009.03.028.