ASIAN J. EXP. BIOL. SCI. VOL 2(2) 2011: 201-206 © Society of Applied Sciences ORIGINAL ARTICLE COBALT: Multiple Protein Sequences Alignment Tool In Analysis of Cytochrome b Based Protein Diversity of osteichthyes. a K. S. Dangi*, Geeta and S.N. Mishra a Department of Biosciences, Maharshi Dayanand University, Rohtak, 124001 Haryana (INDIA). Centre for Biotechnology, Maharshi Dayanand University, Rohtak, 124001 Haryana (INDIA). ABSTRACT A tool that simultaneously aligns multiple protein sequences, automatically utilizes information about protein domains and has a good compromise between speed and accuracy will have practical advantages over current tools. While, study of Osteichythyes is vast and cytochrome b enzyme of mitochondria was analysed for evolutionary relationship by using COBALT. In most of classification system Osteichthyes (bony fishes) are paraphyletic with land vertebrates includes tetrapods amongst its decedents. The majority rule 75% consensus tree from twenty-five fresh water bony fish using BLASTp shows only few species in most primitive category provide lineage for other species evolution. COBALT provided the correlation and phylogenetic relationship of cytochrome b protein sequences that Ompok species, Siluriformes species, Krytopterus apogon, Silurus glanis, Wallga attu species with high parsimony than the other species. Results show that COBALT has reasonable runtime performance and alignment accuracy which can be a tool for a broad range of problems of phylogenetic analysis. KEYWORDS: Osteichythyes, Phylogenetic analysis, Cytochrome b, COBALT INTRODUCTION The simultaneous alignment of multiple sequences (multiple alignments) serves as a building block in several fields of computational biology such as phylogenetic studies [1], detection of conserved motifs [2], prediction of functional residues and secondary structure, prediction of correlations and even quality assessment of protein sequences [3]. The development of algorithms that can automatically produce biologically plausible multiple alignments is a subject of very active research [4].Unfortunately, finding a multiple alignment that rigorously optimizes the commonly used 'sum-of-pairs' scoring measure is computationally hard and not practical when more than a few sequences are involved [5]. COBALT has a general framework that uses progressive multiple alignment to combine pairwise constraints from different sources into a multiple alignment. COBALT does not attempt to use all available constraints but uses only a high scoring consistent subset that can change as the alignment progresses, where a set of constraints is called consistent if all of the constraints in the set can be simultaneously satisfied by a multiple alignment. Furthermore, CDD also contains auxiliary information that allows COBALT to create partial profiles for input sequences before progressive alignment begins, and this avoids computationally expensive procedures for building profiles. In India, fresh water bony fishes are distributed in large amount about 111 species in Krishna River. Indian fresh water Osteichthyes species are Labeo rohita (Rohu), Labeo calbasu (Calbasu), Catla catla (Catla) and cyprinus carpio (carp). Labeo rohita include about 29 species, Labeo calbasu include about 28 species, Catla catla include about 32 species and Cyprinus carpio include about 21 species in Krishna river of Karnataka. Some of them are edible species such as, Labeo rohila, Labeo calbasu, Catla catla and Cyprinus carpio are common fresh water edible species. Bony ASIAN J. EXP. BIOL. SCI. VOL 2(2) 2011 201 COBALT: Multiple Protein Sequences Alignment Tool In Analysis Of Cytochrome B Based Protein Diversity Of Osteichthyes....... K. S. Dangi et al. fishes are classified based on comparative anatomy, embryology, genetics, molecular biology and the fossil record. Cytochrome b is a highly conserved protein across the spectrum of species, found in plants, animals, and many unicellular organisms. The cytochrome b molecule has been studied for the glimpse it gives into evolutionary biology [6, 7]. Therefore, genetic variation and specificity of cytochrome b which varied with climatic condition conferred the genetic complexity which can be taken as a suitable marker enzyme for phylogenetic analysis. MATERIALSAND METHODS Collection of DNAsequences The sequence of cytochrome b were taken from the following source: Amblyceps mangois, gi:11o293492, Amplypharyngodon mola, gi:299832307, Bagarius yarrelli, gi:110293496, Barilius barna, gi:167887443, Barilius bendelisis, gi:300136784, Catla catla, gi:194173358, Chanda nama, gi:203287172, Channa punctata, gi:167887437, C. striata, gi:254055103, Devario aequipinnatus, gi:299832341, D. malbaricus, gi:289546477, Danio rerio, gi:261337155, Esomus danricus, gi:299832369, E.longinaus, gi:299832371, Wallago attu, gi:203287200 etc. were used for phylogenetic analysis by selecting the gene of respective species of bony fishes. Each protein sequence was changed to Pearson format for further analysis by COBALT software. Sequences were retrieved from the protein database by a single-pass BLASTp search (e-value <0.1) using the aligned as a result, 555 sequence hits passed the cutoff e-value. The like sequences identical by 75% or more were clustered using the NCBI BLAST CLUST program (http://www.ncbi.nlm.nih.gov/blast/docs/blastclust.html). BLAST Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences [8]. The BLAST algorithm is a heuristic program and performs local alignments. It is available to the users trough web service BLAST provided by NCBI. There are several types of algorithm optimized for particular types of queries: BLAST p for searching a protein database using a peptide query. BLAST works by first making a look-up table of all the “words” (short subsequences, which for proteins the default is three letters) and “neighboring words” i.e. similar words in the query sequence. The sequence database is then scanned for these “hot spots”. When a match is identified, it is used to initiate gap-free and gapped extensions of the “word”[9]. Scoring a multiple alignment The traditional method of scoring an alignment between a pair of sequences is to use a score matrix based on log-odds scores, such as the PAM or BLOSUM series of matrices. When there are more than a few sequences, the information content in a multiple alignment can be measured by its entropy. A small number of sequences, or correlations between residues, can have a harmful effect on an entropy-based scoring measure. Partitioning residues into a smaller number of residue classes can sometimes dampen this effect. The intuition used by Tharakaraman et al. [10] is to combine ideas behind log-odds and entropy-based scoring. Algorithm COBALT is a flexible tool for simultaneously aligning a given set of protein sequences, where users can directly specify pairwise constraints and/or ask COBALT to generate the constraints using sequence similarity, (optional) CDD searches and (optional) PROSITE pattern searches. COBALT will optionally create partial profiles for input sequences based on any CDD search results. Aside from these features, the COBALT algorithm is similar to that of other progressive multiple alignment tools: Step 1: Find alignments for generating constraints. Step 2: Find partial profiles and a pairwise consistent set of constraints. Step 3: Generate a guide tree. Step 4: Create a multiple alignment using the current set of constraints and guide tree. Step 5: Create bipartitions and realign. Computing the multiple sequence alignment The progressive multiple alignment does a depth-first traversal of the tree generated in Step 3. At each node of the tree, profiles for both subtrees and align these profiles to produce a multiple alignment for all of the sequences was generated. A variant of ordinary NeedlemanWunsch dynamic programming computes a global alignment of two profiles. The alignment process uses well-known techniques to reduce memory consumption [4] and includes two variations that are specific to profile alignments. The first variation is the choice of profileprofile score function [11]. 202 ASIAN J. EXP. BIOL. SCI. VOL 2(2) 2011 COBALT: Multiple Protein Sequences Alignment Tool In Analysis Of Cytochrome B Based Protein Diversity Of Osteichthyes....... K. S. Dangi et al. Refinement by finding new constraints The second refinement phase begins by finding conserved columns in the output from Step 5. A column in a multiple alignment is considered to be high scoring if its score exceeds a cutoff (set to 0.67 by default), and groups of at least two adjacent high-scoring columns are considered conserved. Iteration continues as long as the number of conserved columns increases. Before iterating, the set of constraints found in Step 2 is replaced by constraints that encompass alignment decisions based on conserved columns, pattern matches and user specified pairs. COBALT uses an all-against-all collection of pairwise constraints to represent each group of conserved columns. Conserved columns may contain gaps, but sequences that contain gaps in a conserved column do not participate in pairwise constraints for that column. This exception allows conserved columns to be used for most profileprofile alignments, while generating pressure on slightly misaligned sequences to shift position. RESULTS BLASTp analysis based on score matrix present a close association between O.bimaculatus, O.pabda, wallago attu but seems to be widely apart from other fresh water bony fishes (fig.1). Present endeavor therefore may aid interspecific hybridization in bony fishes and variations released through climate change may enrich genetic diversity in bony fishes but may be attributed to limitation of BLASTp algorithms used. Although relationship varied among species depending on the criteria used, it seems that for most of the traits of Ompok species and walago attu were very closely associated. Phylogenetic tree constructed from cluster analysis using software COBALT of have a better precision (fig.2). Fig. 1 BLASTp protein sequence score table, output is a summary of sequences producing significant alignments, along with both normalized scores and E values (see text for further details; only the few highest-scoring hits are shown) ASIAN J. EXP. BIOL. SCI. VOL 2(2) 2011 203 COBALT: Multiple Protein Sequences Alignment Tool In Analysis Of Cytochrome B Based Protein Diversity Of Osteichthyes....... K. S. Dangi et al. Fig. 2 Neighbor-joining phylogenetic tree for progressive alignment using COBALT. Here we show the same set of twenty five sequences aligned according to a neighbor-joining tree Therefore, unrooted tree was prepared by taking one discrete genetic variable and for twenty-five species of bony fish. Findings revealed that the few species (Ompok pabda, Ompok bimaculatus,P. anguloides, Wallago attu etc) are more primitive than other (Amplyceps mangois, Bagarius yarrelli, Catla catla, Labeo rohita, Danrio rerio etc). Among these Ompok species and Wallago attu were found most primitive phylogenetic relationship among the bony fishes suggested a complex evolutionary trend (fig.3). Fig. 3 Pairwise sequence alignment of bony fish, mitochondrial Cytochrome b protein sequence analysis by BLASTp. Phylogenetic tree shown here indicating genetic diversity of bony fishes 204 ASIAN J. EXP. BIOL. SCI. VOL 2(2) 2011 COBALT: Multiple Protein Sequences Alignment Tool In Analysis Of Cytochrome B Based Protein Diversity Of Osteichthyes....... K. S. Dangi et al. The minimum and maximum p distance indicate that cytochrome b enzyme highly conserved in case of Ompok bimarculatus, Ompok pabda, Wallago attu and other are quite divergent such as Amplyceps mangois, Amblypharyngodon mola, Bagarius yarrelli, Catla catla, Labeo rohita, Channa striata etc. The majority rule 75% consensus tree from twenty-five fresh water bony fish shows phyllogram (fig. 2). Twenty-five species with cytochrome b enzyme is correlated to each other to phylogenetic relationship, it shows that Ompok species, Siluriformes species, Krytopterus apogon, Silurus glanis, Wallga attu species have high parsimony than other species. DISCUSSION Results shows that COBALT has a better chance of producing a biologically meaningful multiple alignment compared to tools that do not utilize this information. The applicability of additional information such as secondary structure alignments computed with recent algorithms [12] and the detection of short highly conserved motifs found with de novo methods [13]. Progressive multiple alignment algorithms all have difficulty with highly divergent sequence inputs, and so COBALT may also benefit from incorporating alignment algorithms that explicitly process more than two sequences or sequence collections at a time [14]. Figure2 shows that the sequences from the twenty out of twenty five fresh water bony fishes evolved considerably slower than five fresh water bony fish species. These twenty species formed one cluster outside the other sequences and other five sequences evolved nearly at the same rate.According to this, cytochrome b enzyme varies their function species to species may be due to change in climate show change may enrich genetic diversity in bony fish species. The vast majority of Cytochrome b disorders are linked to mutations in nuclear-encoded proteins referred to as assembly factors, or assembly proteins. These assembly factors contribute to cytochrome b structure and functionality, and are involved in several essential processes, including transcription and translation of mitochondrion-encoded subunits, processing of pre proteins and membrane insertion, and cofactor biosynthesis and incorporation [15]. Mutations in these proteins can result in altered functionality of sub-complex assembly, copper transport, or translational regulation. Each gene mutation is associated with the functional changes in species and ultimately leads to species diversity. CONCLUSION Results, shows that when the rate of evolution varies extensively from branch to branch, many methods may fail to recover the true topology. Cytochrome b enzyme phylogenetic analysis between twenty-five fresh water bony fishes by using software COBALT align all protein sequences in a line and phyllogram show evolutionary relationship between all species by comparing them and concluded that five species are more primitive than other twenty fresh water bony fish species. ACKNOWLEDGEMENT We are grateful to University Grant Commission New Delhi, India for providing Meritorious Research Fellowship to K.S. Dangi. REFERENCES [1]. [2]. [3]. [4]. Fleissner,R. et al. (2005) Simultaneous statistical multiple alignment and phylogeny reconstruction. Sys. Biol., 54: 548561. Frith ,M.C. et al. (2004). Finding functional sequence elements by multiple local alignment. Nucleic Acids Res., 32: 189200. Socolich ,M. et al. (2005) Evolutionary information for specifying a protein fold. Nature, 437(7058): 512518. Edgar, R.C. (2004). MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5: 113. [5]. Li, M. et al. (2000). Near optimal multiple alignment within a band in polynomial time. In Proceedings of the 32nd ACM Symposium on Theory of Computing, pp. 425434. [6]. Ambler, R.P. (1991). Sequence variability in bacterial cytochromes c. Biochim. Biophys. Acta. 1058 (1): 427. [7]. Silveira, P.C, Streck EL,and Pinho, R.A. (2005). Cellular effects of low power laser therapy can be mediated by nitric oxide. Lasers Surg Med. 36 (4): 30714. [8]. Altschul, S.F, Gish W, Miller W, Myers E W, Lipman D J (1990). Basic local alignment search tool. J. Mol. Biol. 215: 403410. [9]. Altschul , S. F, Madden, T.L, Schaffer, A.A, Zhang, J., Zhang, Z., Miller, W and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25:33893402. [10]. Tharakaraman, K. et al. (2005). Alignments anchored on genomic landmarks can aid in the identification of regulatory elements. Bioinformatics, 21 (1): i440i448. [11]. Wang, G.and Dunbrack, R.L (2004). Scoring profile-to-profile sequence alignments. Protein Science, 13: 16121626. [12]. Zhou, H and Zhou, Y (2005). SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. ASIAN J. EXP. BIOL. SCI. VOL 2(2) 2011 205 COBALT: Multiple Protein Sequences Alignment Tool In Analysis Of Cytochrome B Based Protein Diversity Of Osteichthyes....... K. S. Dangi et al. Bioinformatics, 21: 36153621. [13].Rigoutsos, I and Floratos, A (1998). Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics, 14, 5567. [14].Zhang ,X and Kahveci, T (2006).ANewApproach forAlignment of multiple proteins. Pac. Symp. Biocomput., 11: 339350. [15]. Zee, J.M,and Glerum, D.M (2006). Defects in cytochrome oxidase assembly in humans: lessons from yeast. Biochem. Cell Biol. 84: 859869. Correspondence to Author : K. S. Dangi, Department of Biosciences, Maharshi Dayanand University, Rohtak, 124001 Haryana (INDIA).Email : [email protected] 206 ASIAN J. EXP. BIOL. SCI. VOL 2(2) 2011
© Copyright 2026 Paperzz