Searching for the diamonds in the ocean of sequencing reads Maggie C.Y. Lau (Geosciences) Biological sciences face the “Big Data” problem after the emergence of next generation sequencing, which generates terabytes of data in less than a week. The hundreds of millions of reads (nucleotide sequences) inform about what organisms are in the studied samples, what can/do they do and how. Next generation sequencing has been applied to study human genomes, microbiomes, sewage treatment plants, air, soils, etc. My research employs this technology: To investigate the response of microorganisms in the Arctic and Antarctic terrestrial systems to global warming To reveal the metabolic potential and activity of microorganisms that are analogs of life in the early Earth history and on other extraterrestrial planets To examine the role of microorganisms in biogeochemical cycles and their distribution patterns To discover the metabolic capabilities of the yet-to-be cultivated organisms Research computing is necessary for processing the large quantity of data in order to produce useful information to address the aforementioned scientific questions. The procedure includes, but not limited to, basic text manipulation, quality-filtering, sequence assembly (i.e. joining short reads into longer contiguous sequences), gene prediction, sequence annotation (i.e. assign taxonomic and functional identity) and phylogenetic analysis. The large input files already demand for a growing amount of storage space, let alone the memory required for analyzing the complex datasets that exceeds the computing capacity of a personal computer. Moreover, many commercial or opensource algorithms available for bioinformatics analyses have been developed to employ multiprocessors to shorten the computational time needed for certain memory-hungry and timedemanding tasks. The access to high performance computing units have enabled multiple bioinformatics projects to be performed in parallel and the development of analytical approaches customized for each project. These projects have yielded significant findings that advance our knowledge in the aforementioned areas. Bioinformatics is an indispensable tool in all biological sciences. Research computing is foreseen to play a critical in the success of our research in geomicrobiology as well as the relevant courses. Our demand for high-speed data transfer and large-scale computing resources will continue to grow in the coming future.
© Copyright 2026 Paperzz