5. Phylogenetic analysis using COI gene 5.1. Introduction In zoological and ecological literature, identification of unknown specimens based on cytochrome oxidase I (COI) has become known as DNA barcoding (Floyd et al., 2002; Hebert et al., 2003; Remigio and Hebert, 2003; Moritz and Cicero, 2004). DNA barcoding has found a wide range of applications, from identification of specimens in conservation biology and molecular ecology. DNA barcoding system for animal life could be based upon sequence diversity in 5′ section of COI gene. COI exhibits a greater range of phylogenetic signal than any other mitochondrial genes (Hebert et al., 2003). As in other protein-coding genes, the third-position nucleotides of COI show a high incidence of base substitution, leading to a rate of molecular evolution that is about three times greater than that of 12S or 16S rRNA (Knowlton and Weigt, 1998). They also argued that 12S and 16S rRNA genes are having multiple insertions and deletions so they pose potential problems in their alignment. This problem would apply as well to the nuclear 28S rRNA and internal transcribed spacer regions (ITS). The classical procedure for such molecular identification has been the use of Blast searches (Altschul et al., 1997). Blast offers no information to help researchers choose among multiple close matches. Whereas the local alignment problem can be circumvented using global alignments, the remaining two problems cannot be addressed without a statistical evaluation of the phylogenetic associations among species. This question is difficult to address as the evolutionary relationship among genetic markers may not truly reflect the evolutionary 57 relationship among species. In cases, where reciprocal monophyly cannot safely be assumed, an analysis quantifying between-species and between-genera genetic variation forms a more correct basis of assignment. Such analyses, however, require comprehensive different methods of phylogenetic coverage that is generally not available to the biologist. This chapter address the species problem but instead attempt to devise and display different methods of phylogenetic analysis for the assignment of 16 species of catfishes to their family taxa. Different phylogenetic methods lead to improved accuracy and importantly, it provides a measure of statistical confidence associated with the barcoding assignment. 5.2. Materials and methods 5.2.1. Samples The DNA samples used for the phylogenetic analysis using COI marker was extracted as mentioned previously in general Materials and methods section. 5.2.2. DNA barcode region The standard cytochrome oxidase I (COI) gene was selected as a DNA barcode region. Polymerase chain reactions were performed to amplify the COI gene using universal primers. 58 5.2.3. Primers Universal COI gene primers were selected from previous studies (Ward et al., 2005) (Table 3). Table 3. Primer name and sequences of COI gene Name of the primer Sequence of the primer FishF1 5′- TCAACCAACCACAAAGACATTGGCAC - 3′ FishR1 5′ – TAGACTTCTGGGTGGCCAAAGAATCA – 3′ 5.2.4. Polymerase chain reaction condition Mastermix components Distilled water - 18 µL 10x MgCl2 Buffer - 2.5 µL dNTP (each 2.5mM) - 2 µL Taq DNA polymerase (3U/ µL) - 0.5 µL Primer mix - 1 µL Template DNA - 1 µL Polymerase chain reactions were performed in following temperature and timing condition programmed in TechGeneT M, thermal cycler. Initial denaturation at 94 ºC for 5 min. Number of cycles – 35 Denaturation at 94 ºC for 30 sec. 59 Annealing at 54 ºC for 30 sec. Extension at 72 ºC for 1min. Final extension at 72 ºC for 10 min. The amplified products were checked on 1.5% Agarose gel electrophoresis and the molecular weight was assessed using molecular weight marker (100 bp ladder). 5.2.5. Sequencing of the amplified products The PCR amplified products were successfully readout bidirectionally using an ABI 3730 capillary sequencer following manufacturer’s instructions. The sequence with good peak clarities were selected for further analysis. 5.2.6. Sequence characterization All the sequences showed length ranging from 588 to 655 bp and partial at both 5′ and 3′ ends. There were no insertions, deletions or stop codons in any of these sequences in final dataset. All the sequences were conceptually translated into proteins using vertebrate mitochondrial genetic code translation pattern with 1st frame. 5.2.7. Species confirmation by BLAST All the sequences showed 99% identity and null E-values of more than 90% query coverage with previously published COI sequences of their respective species in NCBI’s nucleotide database. Subsequently, morphological identity was taken as affirmative confirmation for those first published sequences of NCBI’s nucleotide database inspite of their <90% identity and higher E-values. It was the maximum identical percentage for the present sequences with Genbank sequences. 60 It suggested that the present sequences amplified by universal COI primers confirmed that sequences belong to the COI gene of their respective individual species. 5.2.8. Genbank submission NCBI’s BankIt protocol was properly followed for submitting the sequences. All the sequences were submitted with unique identification name and got accession numbers as given in Table 4. 5.2.9. Phylogenetic analysis The probability of rejecting the null hypothesis that sequences have evolved with the same pattern of substitution, as judged from the extent of differences in base composition biases between sequences (Disparity Index test, (Kumar and Gadagkar, 2001)). A Monte Carlo test (500 replicates) was used to estimate the P-values (Kumar and Gadagkar, 2001). P-values smaller than 0.05 are considered significant (marked with yellow highlights). The estimates of the disparity index per site are shown for each sequence pair. The analysis involved 45 (Ariidae), 16 (Plotosidae) and 52 (Bagridae) nucleotide sequences. Codon positions included were 1st +2nd+3rd+Noncoding. All ambiguous positions were removed for each sequence pair. There were 598 (Ariidae), 640 (Plotosidae) and 514 (Bagridae) positions in final dataset. Evolutionary analyses were conducted in MEGA5 (Kumar et al., 2011). Phylogenetic reconstructions were performed under MP, ML, and BI criteria. For each of the three families of studied 16 catfish species, other COI nucleotide sequences of the respective families of Ariidae (45 taxa), Bagridae (52 taxa) and 61 Plotosidae (14 taxa) were retrieved from the Genbank for the phylogenetic tree construction. Their accession numbers are listed in Table 4. The final alignment of the mitochondrial 5′ section of COI partial gene included 639bp (Ariidae), 589bp (Bagridae) and 640bp (Plotosidae). The MP reconstructions were conducted in PAUP v. 4.0b10 (Swofford, 2002) via heuristic searches with random addition (RA) of sequences and treebisection-reconnection (TBR); clade support was evaluated using non-parametric bootstrapping with RA and TBR. For ML and BI, the best-fit models of sequence evolution were estimated using the Akaike information criterion (AIC) in Model Test v. 3.7 (Posada and Crandall, 1998). All analyses were run unpartitioned. The ML analyses were performed in program RAxML v.7.04 (Stamatakis, 2006). RAxML searches were run in CIPRES portal v. 1.13 under default configurations. ML nodal support was evaluated in RAxML using the rapid bootstrapping algorithm with automatic estimation of runs. For RAxML searches, several runs from random starting seeds were performed to check convergence of likelihood scores. Model parameters were estimated simultaneously (i.e., unfixed). Remaining settings were left at their default values. The BI analyses were performed in MrBayes v. 3.1.2 (Ronquist and Huelsenbeck, 2003) via Markov chain Monte Carlo (MCMC) iterations. The MCMC analyses were conducted in triplicate using four chains and sampling trees every 100 generations. Conservatively, 25% of the first trees sampled in each MCMC run were discarded as burn-in. Marginal probabilities of summary 62 parameters, consensus phylograms, and posterior probabilities of nodes were estimated from the postburn-in samples of the three independent runs. To confirm that post-burn-in trees were sampled from the actual MCMC posterior distribution, marginal parameters (MrBayes log file) were analyzed using the Effective Sample Size (ESS) statistic in program Tracer (Drummond et al., 2007); ESS greater than 200 suggests that MCMC searches were run long enough to accurately represent the posterior distribution (Drummond et al., 2007). 5.2.10. Statistical analysis One-way ANOVA was performed to test the significance of taxa groupings in different subsets for taxonomic hierarchy (species, genus and family level) with Duncan in SPSS 21 (SPSS, Chicago, IL) software package. From this package, the basic module was employed for calculating mean and variance parameters, as well as those for one way of parametric analysis of variance (ANOVA). Statistical output were then made graphically to display on means plot and box plot. 63 5.3. Results 5.3.1. Isolation of genomic DNA The genomic DNA was isolated from 16 species of catfishes by salt extraction method that showed clear bands with mild shearing in 0.8% agarose gel electrophoresis as shown in Figure 6. Figure 6. Agarose gel electrophoresis of total genomic DNA. (Lane 1 – 16: Mystus bleekeri, Mystus dibrugarensis, Mystus cavasius, Mystus armatus, Mystus gulio, Neotropius atherinoides, Arius arius, Arius jella, Arius maculatus, Arius gagora, Arius subrostratus, Plicofollis tenuispinis, Plicofollis platystomus, Plotosus lineatus, Plotosus limbetus and Plotosus canius) 5.3.2. DNA quantification The amount of DNA present in all the isolated samples were ranging from 1 to 2 µg/µl. Finally, the DNA was diluted to 50 ng per µL with distilled water and used for all PCRs. 64 5.3.3. PCR amplification of DNA barcoding region COI gene was successfully amplified using universal primers (FishF1 and FishR1) for all 16 species. Figure 7 shows the amplified fragments of COI gene. 6 6 7 8 9 7 Figure 7. Agarose gel electrophoresis of amplified COI gene (Upper Lane 1 – 9: Mystus bleekeri, Mystus dibrugarensis, Mystus atrifasciatus, Mystus albolineatus, Mystus gulio, Neotropius atherinoides, Plotosus lineatus, Plotosus limbetus and Plotosus canius; Lower Lane 1 – 7: Arius arius, Arius jella, Arius maculatus, Arius gagora, Arius subrostratus, Plicofollis tenuispinis and Plicofollis platystomus; M – 100 bp ladder) From the gel analysis, the single highly intense band without smearing strongly indicated the amplification of COI gene. Their base pair length was predicted as approximately 700 bp by comparing with standard molecular marker (100 bp ladder). All the species displayed good amplified products (Figure 7). 65 5.3.4. Sequences utilized in this study Table 4. List of accession numbers and voucher codes given for the studied species with COI gene S.No 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. NCBI Accession number KF824816 JX460965 KF824794 KF824798 KF824803 KF824808 KF824813 JN228949 HQ219109 JN228946 HQ219128 FJ459517 HQ009492 EU490863 JX177677 JX177678 JQ289146 FJ170791 JQ289145 EU490857 JQ289151 EU490856 JX260836 HQ009500 EU490858 HQ009503 JQ289150 - KF824819 - KF824797 - KF824802 - KF824807 - KF824812 - KF824815 Species Neotropius atherinoides Neotropius khavalchor Mystus bleekeri Mystus atrifasciatus Mystus dibrugarensis Mystus albolineatus Mystus gulio Mystus vittatus Mystus malabaricus Mystus cavasius Mystus montanus Mystus tengara Mystus oculatus Mystus bocourti Mystus multiradiatus Mystus rhegma Mystus singaringan Mystus horai Mystus mysticetus Bagrus docmac Bagrichthys macracanthus Bagrichthys macropterus Batasio tengana Batasio travancoria Batasio tigrinus Horabagrus nigricollaris Pseudomystus siamensis 28. JX460967 Horabagrus brachysoma 29. 30. 31. 32. 33. 34. 35. 36. 37. Bagrus filamentosus_ Bagrus bajad Schilbe mystus Schilbe intermedius Arius arius Arius jella Arius maculatus Arius gagora Arius subrostratus HM882793 HM882791 HM882942 HM882962 KF824820 - KF824822 KF824823 - KF824825 KF824826 - KF824828 KF824829 - KF824831 KF824832 - KF824834 66 Source This study Unpublished Family Scheilbeidae This study Unpublished Bagridae Unpublished Katwate et al., 2012 Nwani et al., 2011 This study Scheilbeidae Ariidae 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. HQ682626 HQ682609 JX198217 KF824835 - KF824837 KF824838 - KF824840 KF447876 JX198180 FJ918912 GU702401 GU225557 Arius manillensis Arius dispar Arius venosus Plicofollis tenuispinis Plicofollis platystomus Plicofollis polystaphylodon Plicofollis argyropleuron Ariopsis felis Bagre bagre Bagre marinus Santos and Quilang, 2011 Unpublished This study Unpublished 48. JQ365264 Cathorops spixii Ribeiro et al., 2012 49. JX515603 50. JF493496 51. JF493494 Cephalocassis jatia Galeichthys feliceps Galeichthys ater Unpublished 52. JQ365364 Genidens genidens 53. JX124786 Genidens barbus 54. EF609287 Neoarius midgleyi 55. EF609286 Neoarius graeffei 56. EF607328 57. JX124821 Netuma thalassina Notarius grandicassis 58. JQ365228 Notarius luniscutis 59. EF609566 Osteogeneiosus militaris 60. EU751952 Potamarius nelsoni 61. GU225657 Sciades assimilis 62. FJ418755 Ageneiosus inermis 63. 64. 65. 66. 67. Ageneiosus ucayalensis Plotosus lineatus Plotosus limbetus Plotosus canius Plotosus nkunga EU490849 KF824841 - KF824843 KF824844 - KF824846 KF824847 - KF824849 JF494191 68. HM006974 Neosilurus hyrtlii 69. HM006990 Tandanus tandanus 70. 71. 72. 73. Cnidoglanis macrocephalus Porochilus rendahli Pangasius bocourti Pangasius bocourti EF609335 HM006980 JN021312 JX997836 67 Ribeiro et al., 2012 Unpublished Ward and Holmes, 2007 Ward and Holmes, 2007 Zhang, 2011 Unpublished Ribeiro et al., 2012 Lakra et al., 2011 ValdezMoreno et al., 2009 Unpublished Ardura et al., 2010 Unpublished Ageneiosidae This study Unpublished Page and Hughes 2010 Page and Hughes 2010 Plotosidae Unpublished Pangasidae 5.3.5. Estimation of base composition bias The difference in base composition bias per site (Kumar and Gadagkar, 2001) is shown in Figure 8. The analysis involved 45 (Ariidae), 16 (Plotosidae) and 52 (Bagridae) nucleotide sequences. Codon positions included were 1st +2nd+3rd+Noncoding. All ambiguous positions were removed for each sequence pair. There were 598 (Ariidae), 640 (Plotosidae) and 514 (Bagridae) positions in final dataset. Evolutionary analyses were conducted in MEGA5 (Kumar et al., 2011). The results were plotted for its taxonomic grouping (Figure 8). Note that even when the substitution patterns are homogeneous among lineages, the compositional distance will correlate with the number of differences between sequences as shown in Figure 8. 68 A) Plotosidae 69 to be continued… B) Ariidae to be continued… 70 C) Bagridae Figure 8. Categorized distributions and mean of base composition distance for COI gene. (Above: Box and whisker plot; Below: line plot). Significance of variation for ANOVAs: F= 19.79, d.f.= 3; 116, P< 0.0001 (Plotosidae); F= 44.21, d.f.= 3; 986, P< 0.0001 (Ariidae) and F= 24.83, d.f.= 3; 1322, P< 0.0001 (Bagridae). 71 5.3.6. Evolutionary test of the homogeneity of substitution patterns In order to obtain best estimate of the underlying pattern of substitution, this study confined its analysis to singletons, parsimony informative sites (Pi), fourfold degenerate sites, transition/transversion bias and interspecific to family level comparisons at 1st +2nd+3rd codon positions as shown in Table 5. The results were dot plotted for its statistical significance (Figure 9). Correlation of disparity index estimates between all species pairs relative to homogeneity in substitution pattern with statistical significance (P-value) were plotted in given Figure 9. Comparisons between groups (populations, species within genera, genera, family) with gradual increase in P-values are indicated by dotted circles plotted with the trendline. 72 Table 5. Codon-wise nucleotide substitution patterns. All frequencies are averages (rounded) over all taxa Position T C A G 1st 2nd 3rd For 639bp 16 41 30 29.2 26.4 29.0 29.8 28.4 25.5 15.0 34.2 24.9 31.8 14.6 6.0 17.5 1st 2nd 3rd For 639bp 16 41 27 28.1 26.4 29 32.8 29.4 25.4 15 33.7 24.7 31.8 14.7 6.7 17.8 1st 2nd 3rd For 639bp 16 42 33 30.2 26.1 29.0 26.0 27.0 25.5 14.6 36.5 25.5 32.1 14.8 4.7 17.2 1st 2nd 3rd For 545bp 20 42 24 28.4 23.8 29.0 30.5 27.8 24.7 15.2 38.6 26.2 31.5 14.2 7.1 17.6 A+T R Ariidae 41.5 5.07 56.0 0.75 64.2 5.55 54.1 4.786 Arius genus 41.4 1.16 56 0 50.7 7.47 52.8 5 Plicofollis genus 41.5 3.74 56.6 1.00 69.9 2.22 55.7 8.00 Plotosidae 44.72 5.44 56.78 62.36 1.46 54.61 31.00 Plotosus genus 73 Singleton 2-fold 4-fold Pi sites 11 3 22 36 9 0 65 74 0 0 105 105 23 9 179 211 3 0 17 20 11 0 75 86 0 0 125 125 4 3 81 88 3 0 32 35 16 0 74 90 0 0 124 124 4 4 27 35 2 0 40 42 18 0 66 84 0 0 106 106 25 1 134 160 1st 2nd 3rd For 545bp 20 41 23 28.1 23.4 29.0 32.3 28.2 24.4 15.0 38.3 25.9 31.9 14.6 6.8 17.8 1st 2nd 3rd For 589bp T 22 35 34 30.4 C 23.2 28.4 29.2 26.9 A 26.5 18.6 29.6 24.9 G 28.0 18.4 6.9 17.8 1st 2nd 3rd For 589bp 23 34 34 30.5 23.3 28.5 29.2 27.0 26.2 18.5 29.8 24.8 27.7 18.6 6.6 17.6 1st 2nd 3rd For 589bp T 23 34 33 29.7 C 21.8 28.8 28.3 26.3 A 27.7 19.9 31.3 26.3 G 27.9 17.2 7.9 17.7 44.69 5.39 56.48 60.94 1.54 54.02 22.00 Bagridae A+T R 48.5 2.20 43.6 2.10 63.6 1.57 55.3 31 Mystus genus 49.2 2.53 42.5 1.89 63.8 1.73 55.3 26 Scheilbidae Neotropius genus A+T R 40.7 2.69 43.9 2.88 44.3 1.49 46 13.00 74 4 0 37 41 20 0 73 93 0 0 114 114 13 1 84 98 Singleton 26 8 19 53 2-fold 2 0 49 51 4-fold 0 0 0 73 Pi sites 72 22 133 227 22 7 19 48 5 0 58 63 0 0 78 78 62 21 116 199 Singleton 26 15 48 89 2-fold 11 0 76 87 4-fold 0 0 0 96 Pi sites 3 0 0 3 Disparity index estimate Ariidae Ariidae P-value Disparity index estimate Bagridae Pagridae P-value Disparity index estimate Plotosidae Plotosidae P-value Figure 9. Correlation of disparity index based on COI gene sequence 75 5.3.7. Phylogenetic analysis at genus and family level No significant indels were observed in COI dataset. Start codon ATG was present in all the COI datasets in first site. Optimality results obtained under different analyses and model testing on the three data partitions are summarized in Table 6. Among the three reconstruction methods conducted (MP, ML-RAxML, BI) on the mitochondrial dataset, BI analysis resulted in least resolved tree. The consensus tree of three methods is shown in Figure 10 – 12. Bootstrap values and posterior probability values are labelled and indicated with gradient colour scheme for each of the respective node in format BI/ML/MP. 76 Table 6. Summary of initial conditions and results obtained in phylogenetic reconstructions and model testing Ariidae 45 taxa, 639bp Analysis Bagridae 52 taxa, 589bp Plotosidae 14 taxa, 640bp MP RA replicates 10 10 100 Bootstrap replicates 1000 1000 1000 Optimal trees retained 24 4 8 Optimal tree score (steps) 852 508 514 Consistency index 0.3751 0.4388 0.5920 Consensus type 50% majority rule 50% majority rule 50% majority rule Initial Model Bayesian HKY+G+I information criterion HKY+G TPM1uf+G Number of substitution rate parameters 2 2 2 ML Search replicates 10 10 100 Optimal tree score (InL) -4586.02 -6560.97 -3057.71 Bootstrap replicates 1000 1000 1000 BI Search replicates 1 1 3 Generations 1 x106 1 x106 6.75x106 Burn-in 1 x105 1 x105 3x104 Mean InL -4649.615 -6644.135 -3000.576 Effective Sample >200 size (all parameters) >200 >200 Consensus type 50% majority rule 50% majority rule 50% majority rule 77 A) B) Figure 10. Phylogeny of 43 Arioid species derived from the COI gene Fifty percent majority rule consensus on ~1 × 10 6 post-burn-in trees (mean lnL4586.02). (A) cladogram (vertical); Colored node labels indicate posterior probability and bootstrap values (BI/ML/MP) that are congruent with MP and ML. Asterisks (*) designate insufficient clade support; Highlighted (yellow) taxa labels indicate species utilized in this study. (B) phylogram (horizontal) elucidating the branch lengths with scales measuring the rate variation across lineages (taxon arrangement follows the same order as in cladogram). 78 A) B) Figure 11. Phylogeny of 50 Bagrid species derived from the COI gene. Fifty percent majority rule consensus on 1 × 10 6 post-burn-in trees (mean lnL-6560.97). (A) cladogram (vertical); Colored node labels indicate posterior probability and bootstrap values (BI/ML/MP) that are congruent with MP and ML. Asterisks (*) designate insufficient clade support; Highlighted (yellow) taxa labels indicate species utilized in this study. (B) phylogram (horizontal) elucidating the branch lengths with scales measuring the rate variation across lineages (taxon arrangement follows the same order as in cladogram). 79 A) B) Figure 12. Phylogeny of 14 Plotosid species derived from the COI gene. Fifty percent majority rule consensus on ~6.75 × 10 6 post-burn-in trees (mean lnL-3057.71). (A) cladogram (vertical); Colored node labels indicate posterior probability and bootstrap values (BI/ML/MP) that are congruent with MP and ML. Asterisks (*) designate insufficient clade support; Highlighted (yellow) taxa labels indicate species utilized in this study. (B) phylogram (horizontal) elucidating the branch lengths with scales measuring the rate variation across lineages (taxon arrangement follows the same order as in cladogram). 80 5.4. Discussion COI as a DNA marker has been able to discriminate between species phylogenetic relationships appropriately. While choosing as outgroup for a family level relationships, a phylogeneticist normally prefers a distantly related family taxa. Hence in this case, Ageneiosidae (for Ariidae), Scheilbeidae (for Bagridae) and Pangasidae (for Plotosidae) have been chosen as outgroups as observed from the previous literature (Sullivan et al., 2006). Robust genomic DNA was isolated from 16 species with repeated modifications from the standard procedure that showed clear bands. All these 16 catfishes and their habitual locations salinity differs that subsequently varieties the physicochemical nature of dorsal fin tissues. In particular, ethanol wash steps of standard salt extraction protocol were altered from three to five rounds of centrifugal wash to eliminate the excess salt from the isolated genomic DNA. The resulting bands of genomic DNA agarose gel shown in Figure 6 reflects the variation in intensity of the bands. The length of the COI sequence of all the individuals of 16 cafish species ranged from 545 to 645 bp long. The sequence which consists more than 600 bp long suggested that PCR amplified products belongs to mitochondrial COI gene not to nuclear mitochondrial DNA (NUMT). Nuclear insertions of mitochondrial origin, are found throughout the human genome and are believed to have arisen from DNA transfer between the mitochondrial and the nuclear genomes during evolution. Numts suggested to be less than 600bp long, show high sequence identity with the mitochondrial genome DNA sequence and can be large in size, the largest numts identified was >14.6 kb, with 15 out of 296 numts being greater 81 than 5.8 kb (Mourier et al., 2001). The remaining sequences that consists less than 600 bp might be the effect of lower PCR amplification even though they belong to COI gene confirmed by sequence identity and usage of vertebrate mitochondrial translation pattern. All sequences were conceptually translated into protein sequences. It is important to assess the pseudogene status of amplified products. Moreover, there was no evidence for the presence of NUMTs in Actinopterygii (Bensasson et al., 2001). COI nucleotide sequence data provide an opportunity to examine the rate of evolution and amount of phylogenetic information across various taxonomic levels. COI study was primarily interested in comparative series of taxa of different rank. The data presented in Figure 8. demonstrate that COI gene show a trend of increasing mean base composition distances with increasing rank of the groups compared, from within species to families. Heterogeneity of COI gene evolution rate, also significant in present data is widely known from earlier studies (Machordom and Macpherson, 2004). A one-way ANOVA (model with random effects for groups of the same size) showed that mean distances in four groups analysed were significantly different for three catfish families, F= 19.79, d.f. = 3; 116, P < 0.0001 (Plotosidae); F= 44.21, d.f.= 3; 986, P < 0.0001 (Ariidae) and F= 24.83, d.f.= 3; 1322, P < 0.0001 (Bagridae). However, to remember, this comparison was not quite correct for all of the DNA sequences compared, because it includes heterogeneous groups of catfish families of different size and sequence length. A common assumption in comparative sequence analysis is that the sequences have evolved with the same pattern of nucleotide substitution (homogeneity of the evolutionary process). Violation of this assumption is known to 82 affect the accuracy of phylogenetic inference and tests of evolutionary hypotheses. In this research, disparity index, ID have been proposed to measure the observed difference in evolutionary patterns for a pair of sequences. Based on this index, all the COI sequences of three catfish families were evaluated to test the homogeneity of the observed patterns. This test does not require a priori knowledge of the pattern of substitutions, extent of rate heterogeneity among sites, or the evolutionary relationship among sequences. Homogeneity assumption was tested by calculating the probability of observing a composition distance greater than that expected under the null hypothesis of homogeneity (i.e., ID more than zero). Relative saturation rates of different sites and types of substitution were assessed to estimate substitution patterns. In order to use coding sequences to reconstruct phylogenies accurately, it is important to adjust for the relative rates at which different codon sites and different types of substitution (e.g., transitions vs. transversions) saturate. This study compared the relative rates at which transitions and transversions saturate across taxonomic categories by counting substitutions in all pairwise comparisons between sequences from the following categories: within species, between species within a genus, between genera within a family and away from outgroup. Phylogenetic reconstruction from DNA or amino acid sequences relies heavily on suitable distance measures. COI gene region analyzed here have been frequently used to address phylogenetic relationships, but only at the generic level or below had its relationships successfully resolved. At the family level, this gene rarely gives satisfactory resolution (Brown et al., 1994; Miura et al., 1998) and often proves to be unreliable (Dowton and Austin, 1997; Mardulyn and Whitfield, 1999), 83 while at still higher levels; cytochrome oxidase sequences are not suitable for resolving relationships (Liu and Beckenbach, 1992; Howland and Hewitt, 1995; Frati et al., 1997). COI data set in this study in comparison to these higher-level phylogenetic relationships is a first step towards this objective. Ariidae The monophyly of the Ariidae has not been seriously questioned and is strongly supported on both molecular and morphological grounds (Sullivan et al., 2006; Kailola et al., 2004; Acero and Betancur, 2007). The group is divided into two subfamilies, the monogeneric Galeichthyinae (four species) which predominantly belong to marine species and the Ariinae (remaining taxa) (Acero and Betancur, 2007), so that apart from Ariinae, Galiechthys feliceps and Galiechthys ater were the only utilized species in this study. These contribute 106 singletons, 89 two-fold and 123 four-fold degenerative sites from the whole Ariidae family. Ti/tv bias was twofold higer in 1st codon position (9.00) than the 3rd codon position (4.65). In Arius genus, this becomes inverse with ti/tv bias, which was six-fold higher in 3rd codon position (7.47) than the 1st codon position (1.46). But in whole Ariidae family, ti/tv bias was nearly equal in 1st (5.07) and 3rd (5.55) codon positions (Table 5). From the phylogram of whole Ariidae, Galeichthyinae branches shows it had recently diverged with longer branch length in G. feliceps (Figure 10). Their base composition distance fall within thin limits of 5% level significance of Standard Error and does not affect the different levels of taxa (Figure 8). Different patterns of substitution have been attributed for G. feliceps against all the Ariid species (ID between 0.25 and 0.68) except A. jella and A. subrostratus at 5% level of significance (Figure 10). This might be the reason for the inverse relationship of ti/tv 84 bias at 1st and 3rd codon positions as explained in aforementioned statement. Even then, G. feliceps exhibit different substitution pattern (ID = 0.1522) against the same species G. ater, which was highly significant contributing the longer branch length in phylogram. The whole Arius genus comprises 8 species, out of which 5 are utilized from this study and the remaining two (A. dispar and A. manillensis) from Santos and Quilang, (2011). There are totally 531 conserved and 108 variant sites in Arius genus that consist 88 pi sites in total out of which 81 are at 3 rd postion of a codon. Especially, in 40th nucleotide base, adenosine triphosphate (ATP) was present in Arius arius, whereas all other species contain cytosine. Initially ti/tv bias for the Ariidae family was found to be 10. But with addition of outgroups, according to the HKY+G+I model tested through BIC, ti/tv bias found to be 7.19. The ti/tv bias, A+T content, Pi sites at all the 1+2+3 coding sites observed for all the species of Arius genus maintain balancing selection at the nucleotide level without any drastic change (Table 5). The base composition distance for all the species of Arius genus fell within limits of same species distance (<0.007) with the exception for Arius gagora (0.022). Against family comparison, Arius genus exhibited highest base composition distances when compared with Bagre bagre (Figure 8). Homogeneity pattern of substitution has been maintained for all the species of Arius genus accepting the null hypothesis with the exception between A. jella and A. gagora but with the lowest disparity index estimate (ID = 0.0819) (Figure 9). The cladogram from Figure 10A for Arius genus displayed monophyletic group with higher bootstrap and posterior probability values for all the species except A. subrostratus and the other species that are not sampled in this study. The phylogram shows that the branch lengths of the entire Arius genus showed negligible branch lengths depicting least substitution rates 85 all throughout the Ariid family based on COI dataset. Although the basal arioid clades are well defined, much controversy has arisen regarding the phylogeny and classification of ariid taxa, particularly within diverse Ariinae (Betancur, 2009a). The whole Plicofollis genus comprises 4 species, out of which 2 are utilized from this study and the remaining two (P. tenuispinis and P. argyropleuron) from unpublished study as reported in NCBI database. The A+T content at 3rd codon position was nearly 70% and the Ti/tv bias was higher at 1st coding site (3.74) than 3rd coding site (2.22). Other than this, no peculiar phenomenon could be observed on singletons, Pi sites and 2-fold and 4-fold degeneracy sites at all the 1st + 2nd + 3rd coding sites observed for all the species that may maintain balancing selection at the nucleotide level without any drastic change (Table 5). The base composition distance for all the species of Plicofollis genus fell within limits of same species distance (<0.002), but with the exception for Plicofollis platystomus (0.015) (Figure 8). Homogeneity pattern of substitution has been maintained for all the species of Plicofollis accepting the null hypothesis with the exception between Plicofollis argyropleuron with Plicofollis tenuispinis (ID = 0.084 and 0.096) and Plicofollis platystomus (ID = 0.176) (Figure 9). The cladogram constructed by Maximum Parsimony, Maximum Likelihood and Bayesian inference from Figure 10A for Plicofollis genus displayed monophyletic group with higher bootstrap and posterior probability values for all the species except two species of Neoarius, which intervened separating P. polystaphylodon. While this interruption was reported with less bootstrap value from ML construction. The phylogram shows that the branch lengths of all the Plicofollis genus showed <0.1 depicting moderate substitution rates all throughout the Ariid family. 86 Plotosidae Plotosid fishes have been called blunt-tail catfishes or eeltail catfishes which contributes very few reports for molecular marker studies. Previous results suggest that the family was originally marine, with an invasion into freshwaters. (Page and Hughes, 2010). As currently defined the family has 9 genera and ~40 species. Five of those genera are restricted to freshwaters in Australia and New Guinea, while the remaining four genera are marine. Of the marine genera, all but one is restricted to seas around Australia and New Guinea. The last marine genus, Plotosus is extremely widespread, occurring from South Africa and Japan to Australia. The whole Plotosus genus comprises 4 species, out of which 3 are utilized from this study and the remaining one species (P. nkunga) from unpublished study as reported in NCBI database. The A+T content at 1st and 3rd codon position was around 60% and the Ti/tv bias was higher at 1st coding site (5.39) than 3rd coding site (1.46) (Table 5). Other than this, abnormal extremities were not observed on singletons, Pi sites, two-fold and four-fold degeneracy sites at all the 1st + 2nd + 3rd coding sites observed for all the species of Plotosus genus that may maintain balancing selection at the nucleotide level without any drastic change (Table 5). There are 160 Pi sites in whole Plotosidae family and reduces to 98 sites when restricted to Plotosus genus. In that case, only one Pi site was observed in 2nd codon position at the 479 nucleotide site in this genus. The base composition distance for all the species of Plotosus fell within limits of same species distance (<0.006). Higher values of base composition distance (~0.25) across all the genus within Plotosidae family beyond the values of outgroup comparisons (~0.08) were clearly noted (Figure 8). Two outliers and two far outlier values were observed for 87 Porochilus rendahli and Cnidoglanis macrocephalus against the Plotosus genus. Homogeneity pattern of substitution has been maintained for all the species of Plotosus genus accepting the null hypothesis with the exception between Porochilus rendahli with all the species of Plotosus tenuispinis (ID = 0.35 and 0.43) (Figure 9). The cladogram constructed by Maximum Parsimony, Maximum Likelihood and Bayesian inference shown in Figure 12A for Plotosus genus displayed monophyletic group with higher bootstrap and posterior probability values for all the species The phylogram shows that the branch lengths of all the Plotosus genus showed branch lengths <0.1 depicting moderate substitution rates throughout this genus as per COI dataset. Bagridae Bagrid catfishes constitute a very important group among siluriformes having immense commercial importance from inland fisheries and aquaculture farming in south-east countries. Bagridae family, comprising of 27 genera (six in Indian region) is widely distributed in Asia and Africa (Talwar and Jhingran, 1991). With the available COI sequences, six genera have been utilized in this study with the importance given to Mystus genus. More pi characters (243) was observed on Bagridae for Maximum Parsimony tree construction. Minimized number of optimal trees (8) were retained as observed in Bagridae and highest (24 trees) for Ariidae. There are other examples of bursts of fish evolution, documented by molecular markers (Rutaisire et al., 2004; Duftner et al., 2005). The whole Mystus genus comprises 17 species, out of which 5 are utilized from this study and the remaining species from unpublished studies as reported in NCBI database. The 22 out of 48 singletons were observed in 1st codon position 88 which is higher than the 3rd codon position (19). Most singletons observed in Mystus genus were observed from the other species that were not utilized in this study. There are totally 189 variant sites in Mystus genus that consist 199 pi sites in total out of which 116 are at 3rd postion of a codon. Initially ti/tv bias for the Ariidae family was found to be 10. But with addition of outgroups, according to the HKY+G model tested through BIC, ti/tv bias found to be 1.963. The ti/tv bias was higher at 1st and 2nd coding site than 3rd coding site seeking out alternative explanations for neutral theory of natural selection. A+T content and singletons reported higher values in 1st codon position. Pi sites (199) was highest when compared to the other catfish families at all the 1st + 2nd + 3rd coding sites observed for all the species of Mystus that enable enrichment for the phylogenetic tree construction (Table 5). The base composition distance for all the species of Mystus changes dynamically (0 – 0.07) for highest value of same species distance (0.0720) between individuals of Mystus albolineatus and the null values arose from individuals of Mystus bleekeri and Mystus atrifasciatus. Between species base composition distance values lies between 0.058 – 1.523 where the highest value was noted between Mystus atrifasciatus and Mystus monatanus and the lowest between Mystus gulio and Mystus cavasius. Against family comparison, those values lies between 0.0136 1.0759 (Table 5), Mystus atrifasciatus exhibited highest base composition distances when compared with Bagrus docmac and the lowest between Mystus albolineatus and Bagricthys macracanthus (Figure 8). Homogeneity pattern of substitution has not been maintained for all the species of Mystus rejecting the null hypothesis with the highest disparity index estimate (ID = 1.5798) between Mystus montanus (Figure 9). The cladogram through MP, ML and BI phylogenetic methods from Figure 11A for Mystus genus displayed monophyletic group with higher bootstrap and posterior 89 probability values for all the species except Mystus montanus forming a distant and distinct clade whereas Mystus tengara collides into monophyletic clade when Neotropius genus was removed. The phylogram shows that the branch lengths of all the Mystus genus showed longer branch lengths compared to other catfish families utilized in this study accumulating intense substitution rates all throughout the Bagrid family. This may alter balancing selection under neutral theory of natural selection. Longest branch length was denoted from Mystus montanus, which could affect the total phylogram through long branch attraction. This phylogenetic analysis is a first step toward this objective, although it still needs more comparative ecological data for a comprehensive analysis of the evolution of feeding habits. This study establish a phylogenetic hypothesis for the three catfish families and examine the monophyly of the subfamilies and several genera. 90
© Copyright 2026 Paperzz