Abstract - National Computational Infrastructure

IN53A-3798The NCI High Performance Computing
(HPC) and High Performance Data (HPD) Platform to
Support the Analysis of Petascale Environmental Data
Collections (invited)
The National Computational Infrastructure (NCI) has co-located a
priority set of national data assets within a HPC research platform. This
powerful in-situ computational platform has been created to help serve
and analyse the massive amounts of data across the spectrum of
environmental collections – in particular the climate, observational data
and geoscientific domains. This paper examines the infrastructure,
innovation and opportunity for this significant research platform.
NCI currently manages nationally significant data collections (10+ PB)
categorised as 1) earth system sciences, climate and weather model
data assets and products, 2) earth and marine observations and
products, 3) geosciences, 4) terrestrial ecosystem, 5) water
management and hydrology, and 6) astronomy, social science and
biosciences. The data is largely sourced from the NCI partners (who
include the custodians of many of the national scientific records), major
research communities, and collaborating overseas organisations. By colocating these large valuable data assets, new opportunities have arisen
by harmonising the data collections, making a powerful transdisciplinary
research platform
The data is accessible within an integrated HPC-HPD environment - a
1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack
cloud system and several highly connected large scale and highbandwidth Lustre filesystems.
New scientific software, cloud-scale techniques, server-side
visualisation and data services have been harnessed and integrated
into the platform, so that analysis is performed seamlessly across the
traditional boundaries of the underlying data domains. Characterisation
of the techniques along with performance profiling ensures scalability of
each software component, all of which can either be enhanced or
replaced through future improvements.
A Development-to-Operations (DevOps) framework has also been
implemented to manage the scale of the software complexity alone.
This ensures that software is both upgradable and maintainable, and
can be readily reused with complexly integrated systems and become
part of the growing global trusted community tools for cross-disciplinary
research.
Authors
Ben Evans
Australian National University
Tim Pugh
Bureau of Meteorology
Lesley Wyborn
Australian National University
David Porter
Australian National University
Chris Allen
Australian National University
Jon Smillie
Australian National University
Joseph Antony
Australian National University
Claire Trenham
Australian National University
Bradley Evans
Macquarie University
Duan Beckett
Bureau of Meteorology
Tim Erwin
Commonwealth Scientific and Industrial Research Organisation - CSIRO
Edward King
CSIRO Marine and Atmospheric Research Hobart
Jonathan Hodge
Commonwealth Scientific and Industrial Research Organisation - CSIRO
Robert Woodcock
CSIRO Land and Water Canberra
Ryan Fraser
CSIRO
David Lescinsky
Geoscience Australia