Studies On The Role Of Protein Structural Disorder On The Evolutionary Features Of Prokaryotic And Eukaryotic Genomes THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY (Sc.) IN BIOPHYSICS, MOLECULAR BIOLOGY AND BIOINFORMATICS By Arup Panda Department of Biophysics, Molecular Biology and Bioinformatics University of Calcutta 2015 Declaration I do hereby declare that the dissertation entitled ―Studies On The Role Of Protein Structural Disorder On The Evolutionary Features Of Prokaryotic And Eukaryotic Genomes‖ submitted to the evaluation committee is a record of original research done by me under the guidance of Professor Tapash Chandra Ghosh, Bioinformatics Center, Bose Institute, Kolkata. I certify that this work contains no material that has been accepted for the award of any other degree or diploma in my name in any university or other tertiary institution, and no part of this work will in the future be used for any other degree or diploma in any university or other tertiary institution without the prior approval from the committee. I further declare that all the results presented in this thesis either in statement or in the form of tables or figures are outcome of my own research and this thesis contains no material previously published or written by any other person, unless stated with due reference in appropriate context. Date: Place: Kolkata (Arup Panda) Acknowledgements I would like to express my deepest sense of gratitude to my revered supervisor Dr. Tapash Chandra Ghosh, Professor, Bioinformatics Center, Bose Institute, Kolkata for allowing me to pursue Ph. D. under his guidance. His unfeigned care, spontaneous help, valuable suggestions and constant encouragement has played the key role in the successful completion of this dissertation in time. Further, I owe my sincere gratitude to Professor Pinakpani Chakraborty, Head of the Department BIC, Bose Institute, and other faculty members of BIC, Bose Institute: Dr. Sudipto Saha, Dr. Subhra Ghosh Dastidar and Dr. Jhumur Ghosh. I would like to take this opportunity to thank all my respected teachers of the Department of Biophysics Molecular biology and Bioinformatics and the Department of Biochemistry, Calcutta University. I would like to express my indebtedness to all my seniors and other lab members and staff members of BIC, Bose Institute. Finally, I would like to express my gratitude to my parents and my family members. Abstract Abstract: Intrinsically disordered proteins (IDPs) are a class of proteins that lack stable threedimensional structures under physiological conditions. IDPs were found to have intriguing roles in cellular signalling, regulation of cell division, transcriptional, translational control, etc. Due to their extensive functional importance, special interest has been paid to find the attributes of IDPs in origin and evolution of prokaryotic and eukaryotic systems. In this thesis we explored the role of disordered proteins in various evolutionary features of prokaryotic and eukaryotic genomes. At first, we analyzed whether disordered proteins have been exploited for microbial adaptation to the aerobic environment. Our analysis with prokaryotes from four oxygen requirement groups revealed that aerobic proteomes contain high amount of disordered residues irrespective of their selection for any other genomic or proteomic attributes. We analyzed the functional significance of disordered proteins in aerobic proteomes and proposed that high protein disorder is an adaptive opportunity for aerobic microbes to fit with the genomic and functional complexities of aerobic lifestyles. Considering the inherent differences in genome organization between cold and warm blooded vertebrates previously, it was proposed that warm-blooded vertebrates had endured a significant GC increase. This type of genome transition was supposed to increase thermodynamic and structural stabilities of proteins through a selective increase in protein hydrophobicity. However, in our study we showed that GC transition between vertebrate genomes increases protein disorder content in warm-blooded proteomes to promote functional diversity of proteins encoded by GC-rich genes. To evaluate how disordered residues influence human disease gene evolution, we analysed the evolutionary rates of human neurodegenerative disease (NDD) associated genes. Here, we observed that human NDD genes are evolutionarily conserved relative to non-disease genes. To explain the conserved nature of NDD genes, we exploited several evolutionary parameters such as protein connectivity, 3‘UTR length, relative aggregation propensity (RAP), nature of hub proteins (singlish/multi-interface), etc. Relative importance of these determinants was confirmed from categorical regression analysis. Our investigation has clarified the role of protein disorder content on the evolutionary attributes of NDD genes and also explored its inter-connection with the other determinants of protein evolutionary rates. Abbreviations ADDA: Automatic Domain Decomposition Algorithm ANOVA: ANalysis Of VAriance AUC: Area Under the Curve BKL: Biobase Knowledge Library CAI: Codon Adaptation Index CBP: Creb-Binding Protein CD: Circular Dichorism spectroscopy CH-Plot: Charge-Hydrophobicity Plot COA: COrrespondence Analysis COG: Cluster of Orthologous Group dN/dS: Evolutionary Rate dN: Non-Synonymous substitution rate dS: Synonymous substitution rate FTIR: Fourier Transform Infrared Spectroscopy GAD: Genetic Association Database gBGC: GC Biased Gene Conversion GC3: GC content at the third codon position Go: Gene Ontology HGMD: Human Gene Mutation Database HMM: Hidden Markov Model IDPs: Intrinsically Disordered Proteins ITC: IsoThermal Calorimetry IUPs: Intrinsically Unstructured Proteins KID: Kinase Inducible transcriptional activation Domain. LHGR: Long Homogeneous Genome Regions Abbreviations MCC: Matthews Correlation Coefficient MD: Molecular Dynamics NDD: Neurodegenerative Disease OMIM: Mendelian Inheritance In Man PDB: Protein Data Bank PIR: Protein Information Resource PSRC: PhotoSynthetic Reaction Center RAP: Relative Aggregation Propensity RNaseA: RiboNuclease-A ROC: Receiver Operating Characteristic RSCU: Relative Synonymous Codon Usage SANS: Small-Angle Neutron Scattering SAXS: Small-Angle X-Ray Scattering smFRET: single-molecule Fluorescence Resonance Energy Transfer tAI : t-RNA Adaptation Index Contents Contents Page number 1. Chapter-1: General introduction: 1.1. Preface: 1.2. Early history of protein structure determination and 1-27 1 discovery of disordered proteins: 4 1.3. Amino acid composition bias of disordered proteins: 8 1.4. Nucleotide composition of disordered proteins: 10 1.5. Characterization of intrinsically disordered proteins: 11 1.6. Identification of disordered regions by CH-plot analysis: 13 1.7. Identification of disordered regions by prediction algorithms: 14 1.8. Abundance of disordered proteins: 17 1.9. Databases of disordered proteins: 18 1.10. Functional annotations of disordered proteins: 19 1.11. Disordered protein and disease association: 23 1.12. Evolution and disordered proteins: 23 1.13. Origin of the proposal: 25 1.14. General organization of the thesis: 27 2. Chapter2: Resources and Methods 28-35 2.1. NCBI database: 28 2.2. Ensembl database: 28 2.3. USCS genome browser: 28 2.4. Gene Expression Atlas: 29 2.5. Human disease gene databases: 29 2.6. MicroRNA.org database: 29 2.7. BioGRID database: 30 2.8. Pfam database: 30 2.9. UniProt and DBD databases: 31 2.10. AgBaseGOanna web server: 31 2.11. IUPred algorithm: 31 2.12. FoldIndex: 32 2.13. ANCHOR algorithm: 33 Contents 2.14. NuPoP web server: 33 2.15. CpG Island Searcher: 33 2.16. TANGO algorithm: 34 2.17. CodonW: 34 2.18. ClustalW: 34 2.19. BLAST: 35 2.20. Statistical analysis: 35 3. Chapter -3: Prevalent structural disorder carries signature of prokaryotic adaptation to oxic atmosphere: Chapter summary: 36- 69 37 3.1. Introduction: 38 3.2. Methods: 41 3.3. 3.2.1. Collection of dataset: 41 3.2.2. Prediction of disordered residues: 42 3.2.3. Calculation of GC content and amino acid frequencies: 42 3.2.4. Disorder content of aerobic and anaerobic COGs: 42 3.2.5. Prediction of disordered binding sites and transcription factors: 43 3.2.6. Statistical analysis: 43 Results: 44 3.3.1. Predicted protein disorder in prokaryotic genomes: 44 3.3.2. High protein disorder in aerobic prokaryotes and other covariates: 47 3.3.3 High protein disorder in aerobic prokaryotes and their functional implications: 56 3.4. Discussion: 66 3.5. Conclusions: 69 4. Chapter-4: GC-made protein disorder sheds new light on vertebrate evolution: Chapter summary: 70-96 71 4.1. Introduction 72 4.2. Results 74 4.2.1. Compositional transition within vertebrates Contents effects on protein intrinsic disorder content: 4.2.2. Confounding factors that can modulate protein intrinsic disorder in trasition and non-transition groups: 4.2.3. Discussion : 4.3.1. 81 Disorder Content Evolution in Human Proteins functional Advantages: 4.3. 75 Correlation between GC and protein disorder significance of amino acid choice: 4.2.4. 74 Potential caveats: 86 87 89 4.4. Conclusion: 91 4.5. Materials and methods 91 4.5.1. Collection of dataset 91 4.5.2. Prediction of protein intrinsic disorder content 92 4.5.3. Mapping of genes to their corresponding isochores. 92 4.5.4. Analysis of nucleosome occupancy and CpG islands between ordered and disordered regions: 92 4.5.5. Association between GC changing substitutions and disorder promoting amino acid mutations: 93 4.5.6. Analysis of multifunctionality: 94 4.5.7. Analysis of aggregation propensity and hydrophobicity 94 4.5.8. Statistical analysis 94 5. Chapter-5: Insights into the Evolutionary Features of Human Neurodegenerative Diseases: Chapter summary: 97-109 98 5.1. Introduction: 99 5.2. Materials and Methods: 100 5.2.1. Dataset Preparation for Evolutionary Rate Estimation: 100 5.2.2. Determining Gene Expression Level and Expression Width: 100 5.2.3. Protein-protein interaction data: 101 5.2.3. Identification of Nature of Hub Proteins: 101 5.2.4. microRNA Targeting and 3‘ UTR Length Calculation: 101 5.2.6. Estimation of Protein Disorder Content: 101 Contents 5.3. 5.2.5. Computing Protein Relative Aggregation Propensity (RAP): 102 5.2.6. Statistical Analyses 102 Results: 5.3.1. 5.3.2. 5.3.3. Gene Expression Level Constraining the evolutionary Rates of NDD Genes: Examining Protein Connectivity and miRNA Targeting as 102 Influential Factors of Protein Evolutionary Rates : 103 Protein Intrinsic Disorder Content and Nature of Hub Proteins as the Functions of Protein Evolutionary Rates : 5.3.4. 105 Independent Forces of Protein Evolutionary Rates Using Categorical Regression Model: 5.4. 104 Relative Aggregation Propensity Negatively Steers Protein Evolutionary Rates: 5.3.5. 102 Discussion : 106 107 6. Chapter-6: Summary and general conclusions: 110-111 7. References: 112-132 8. Publications: 133 8.1. List of publications: 133 8.2. Reprints: 134
© Copyright 2026 Paperzz