Grid ENabled Integrated Earth system model (GENIE)

CASE FOR SUPPORT: e-Science Round 2
Grid ENabled Integrated Earth system model (GENIE):
A Grid-based, modular, distributed and scaleable Earth
System Model for long-term and paleo-climate studies
Summary
Whole Earth System modelling requires the integration of a number of specialised components. Current
computing technologies are not well suited for constructing, executing and effectively utilising such a
model. However, the Grid and associated component-based application construction techniques should
provide a natural solution. To achieve this, a structured, multi-disciplinary and multi-institutional
collaboration is needed for model development and use, and to share the large volumes of output data
from integrated simulation runs. We propose to challenge use of the Grid to unify widely distributed UK
expertise, and generate a new kind of Earth System Model (ESM). Our scientific focus is on long-term
and paleo-climate change, especially through the last glacial maximum (~20kyr BP) to the present
interglacial, and the future long-term response of the Earth system to human activities. A realistic ESM
for this purpose must include models of the atmosphere, ocean, sea-ice, marine sediments, land surface,
vegetation and soil, ice sheets and the energy, biogeochemical and hydrological cycling within and
between components.
We propose to develop, integrate and deploy a Grid-based system which will allow us: (i) to flexibly
couple together state-of-the-art components to form a unified ESM, (ii) to execute the resulting ESM on
the Grid, (iii) share the distributed data produced by simulation runs, and (iv) to provide high-level open
access to the system, creating and supporting virtual organisations of Earth System modellers. The project
will deliver both a flexible Grid-based architecture, which will provide substantial long-term benefits to
the Earth system modelling community (and others who need to combine disparate models into a coupled
whole), and also new scientific understanding from versions of the ESM generated and applied in the
project. The components will be supplied by recognised centres of excellence at Reading, SOC, UEA,
CEH and Bristol (all university departments being graded 5 or 5* in the 2001 RAE). The Grid-based
architecture will leverage significant ongoing activity and experience in the e-Science centres at
Southampton and Imperial College (both 5*). The project will fill important gaps in an emerging
spectrum of Earth System Models, and represents a rare example of using the Grid for a truly multidisciplinary modelling activity.
Technological Challenge (Grid Stretch)
Earth system science is by its nature interdisciplinary. No conventional disciplinary institute is capable of
delivering all the expertise necessary to develop a complete Earth system model. There are two
approaches to solving this problem. The ‘conventional’ approach is to form a new interdisciplinary
research institute and transfer expertise (e.g. Potsdam Institute for Climate Impact Research, PIK).
Alternatively, the Grid enables the creation of a virtual organisation (Foster et al. 2001) that links the
necessary resources and expertise and facilitates the sharing of results that are obtained. This new
approach has the advantages of lower cost and greater flexibility. Participants continue to work within
centres of excellence in their disciplines, thus benefiting from access to a wide base of specialist
knowledge. Furthermore, new ideas and disciplines can be engaged and integrated with comparative ease.
To realise this new approach we will significantly stretch and extend existing Grid technologies to:
• Encapsulate existing state-of-the-art, computationally efficient models as components, enabling them
to be coupled together effectively to produce a unified Earth system model.
• Provide efficient execution strategies for such a system, ranging from a single run at a single location
to executions automatically distributed and coordinated across physically distributed resources
1
•
•
Develop user-level access to such a system that will enable Earth system scientists to explore varying
scenarios and perform modelling experiments without needing to be concerned with the low–level
details of the models employed or of their implementation
Provide a framework to collaboratively share and post-process all the distributed data produced by
such simulations
GENIE Scenario
A simplified example of how the GENIE system we envisage will be used is as follows: The system will
be accessed via a portal, which will allow a user to compose, execute and analyse the results from an
Earth system simulation. After authenticating themselves with the portal, a user will have access to a
library of components that can each model different aspects of the Earth system (for example, ocean,
atmosphere) at different resolutions. Intelligent selection from the library is made possible by reference to
metadata supplied by the component author. The selected components, along with suitable mesh
conversion tools to allow data exchange at model boundaries, other data necessary to initialise the model
and an event queue to sequence the data exchange between the components and specify how often to
archive data, are composed. From this an intelligent meta-scheduler determines the resource requirements
and maps the processing required to a distributed Grid of compute resources using middleware such as
Globus and Condor. At runtime each component produces distributed data, which can be monitored
during execution and is also archived automatically as specified by the user. From the portal it is possible
to browse this archive of results using post-processing visualization tools and re-use results from the
archive to seed new calculations.
How does the Grid activity enable the science?
The Grid-enabled, component-based open modular framework that we propose will provide unique
capabilities to Earth system scientists, enabling for the first time, realistic whole Earth system simulations
at a range of spatial and temporal resolutions to be constructed, executed and analysed. The Grid activity
will enable the science in the following ways:
• It is the best way to construct a holistic model able to incorporate all sub-systems thought to be
capable of influencing long-term and paleo-climatic change.
• It allows the system to be used in a variety of scenarios without recoding or internal modification.
• It supports an open community of Earth system scientists, enabling new models to be incorporated
into the framework and existing models combined in a variety of ways to test alternative hypotheses.
• It is the most cost-effective way of achieving the computing power required to perform long-term
(multi-millennial) simulations of the complete Earth system at moderate resolution, or for parameterspaces studies of shorter timescales.
• It will facilitate the collaborative sharing of the data produced by distributed simulations for the
benefit of the whole community.
Scientific Research Challenge
The scientific driver for this project is to
understand the astonishing and, as yet,
unexplained natural variability of past climate in
terms of the dynamic behaviour of the Earth as a
whole system. Such an understanding is an
essential pre-requisite to increase confidence in
predictions of long-term future climate change.
The figure on the left shows the changes in carbon
dioxide, temperature and methane over the last
four glacial cycles recorded in the Vostok ice core
(Petit et al. 1999). The causes of these major
glacial-interglacial cycles that have dominated the
2
past few million years of Earth history remain highly uncertain. However, it is clear that changes in many
components of the Earth system appear to have amplified rather weak orbital forcing. These include: land
ice, sea ice and vegetation cover affecting Earth’s albedo (reflectivity), CO2, CH4 and water vapour
affecting the ‘greenhouse effect’, and ocean circulation affecting heat transport. Previous modelling and
data studies (e.g. Imbrie et al. 1993, Shackleton, 2000, Berger et al. 1998) have revealed that non-linear
feedbacks are important, and that these feedbacks extend beyond the physical subsystem to include
biological and geochemical processes. For example, changes in the marine carbon cycle (Watson et al.
2000) and terrestrial vegetation cover (de Noblet et al. 1996, Claussen et al. 1999) are fundamental
contributors to past climate change.
Hence our working hypothesis is that realistic simulations of long-term climate change requires a
complete Earth system model that includes, as a minimum, components representing the atmosphere,
ocean, sea-ice, marine sediments, land surface, vegetation, soil, and ice sheets and the energy,
hydrological and biogeochemical cycling within and between components. The model must be capable of
integration over multi-millennial time-scales. The design of the system will allow other components, such
as atmospheric chemistry, to be added at a later stage.
At present, state-of-the-art models of the essential components of the Earth climate system exist mostly as
separate entities. Where several components have been coupled, as in the more elaborate versions of the
Hadley Centre model (Cox et al. 2000), they are computationally too demanding for long-term or
ensemble simulations. Conversely, existing efficient models of the complete system (Petoukhov et al.
2000) employ highly idealised models of the individual components, with reduced dimensionality and
low spatial resolution.
Our objectives are to build a model of the complete Earth system which is capable of numerous long-term
(multi-millennial) simulations, using components which are traceable to state-of-the-art models, are
scaleable (so that high resolution versions can be compared with the best available, and there is no barrier
to progressive increases in spatial resolution as computer power permits), and modular (so that existing
models can be replaced by alternatives in future). Data archiving, sharing and visualisation will be
integral to the system. The model will be used to quantitatively test hypotheses for the causes of past
climate change and to explore the future long-term response of the Earth system to human activities.
All of the necessary component models have already been developed within the NERC community
(representing a considerable investment of resources) and by our collaborators at the Hadley Centre.
Further work will be required to produce compatible, computationally efficient components for Grid
coupling and to represent the hydrological and biogeochemical cycling within and between components.
Our initial scientific focus will be on one
fundamental transition of the Earth system:
from the last glacial maximum to the present
interglacial warm period (the Holocene). This
interval has been chosen because it
encapsulates both gradual and rapid climate
changes, and high-resolution data records
exist against which to test the model.
The figure on the left is a high resolution
snow accumulation and temperature record
from the Greenland ice core, showing the rich
behaviour of the Earth system during the last deglaciation (Kapsner et al. 1995).
Specifically, we will use the Earth System model to investigate:
•
The timing of the Bolling-Allerod warm phase: General Circulation Model (GCM) based
simulations using a simple ocean model suggest that this warming occurs earlier than would be
expected from orbital theory alone.
3
•
The magnitude and extent of the Younger-Dryas cold phase, and the anti-phase climate variations
recorded in Antarctic and Greenland ice cores (Blunier et al. 1998): The links between the
hemispheres and the degree to which the Younger-Dryas extended beyond the Atlantic remain
uncertain.
•
The changes in vegetation and carbon storage during the Holocene (Clausen et al., 1999)
•
The minimum complexity (in terms of system components, processes within components, and
resolution) required to simulate these changes in the system
•
The predictability (or otherwise, due to chaotic behaviour) of the fully coupled system. Will small
changes in initial conditions result in major changes to the glacial-interglacial transition?
•
The robustness of predictions of carbon cycle feedback on global warming (Cox et al. 2000;
Lenton 2000), and long-term (multi-millennial) projections of climate change and carbon cycling
(Archer et al. 1998)
How the research underpins NERC’s broader vision of Earth System Science
The draft NERC Science and Innovation Strategy document explicitly discusses the need to understand
the behaviour of the Earth system revealed by the Vostok ice core, and highlights the need for a new
Earth System science approach. Our project will provide the innovative methodologies desired, and
address many of the detailed research needs in the Key Science Themes of Climate Change and
Biogeochemical Cycles. Our aim is to help realise the NERC vision for the development of coupled
Earth System Models. GENIE will provide an ideal tool for investigating rapid changes in climate such
as the Dansgaard-Oeschger and Heinrich events during the ice ages. This is one focus of the RAPID
climate change thematic programme and the model will be available in the latter stages of that
programme. GENIE will provide a superior alternative to the climate models currently used in integrated
assessment models. The development of an Integrated Assessment Model is a core programme of the
Tyndall Centre. The manager of that programme, Dr. Jonathan Koehler has agreed to collaborate.
GENIE will contribute to the new programme on Quantifying the Earth System (QUEST), especially the
hierarchy of coupled models and the proposed ‘virtual laboratory’. Grid training is an important
component of this project and it will produce a group of Grid-aware environmental scientists.
Methodology and Detailed Plan of Research
In the following plan of research we identify specific work-packages, e.g. “(EMBM1)” and the timetable
for executing these is given in Appendix 1.
Steps toward a complete model
In GENIE, computational ‘components’ will correspond to models of ‘components’ of the Earth system.
These components will be developed from existing code, with the addition of meta-data that enables their
flexible interfacing. GENIE will provide a methodology for coupling components in order to test
hypotheses concerning the processes and feedback mechanisms that are important for long-term changes
to the environment. The system will be flexible and allow users to add new components and evaluate their
importance within the whole Earth System. However, to develop such a system we must start from a set
of exemplar components. These components have been chosen because they (i) satisfy the scientific aims
of the proposal for a model capable of simulating long-term change, (ii) have physical representations that
can be directly traced to more complex components used within General Circulation Models, (iii) already
exist and are well tested so that this proposal can focus on their coupling and Grid-enabling.
The development programme is structured around three milestones:
Component Set 1 (“GENIE-Trainer”, 12 months):
To enable a rapid development of the key Grid techniques, a simple set of model components will be
made available at the start of the proposal. These will be the appropriate basis for the development and
4
implementation of the basic set of Grid technologies and facilitate training of environmental scientists in
the techniques required to Grid-enable other components in the system. The initial model will consist of
just two components: (a) an energy-moisture balance atmosphere model coupled to (b) a 3-D ocean model
at very low-resolution. This 2-part model was developed as part of the NESMI (NERC Earth System
Modelling Initiative), is available now, and will provide a simple core test-bed for using the Grid.
Component Set 2 (“GENIE-Mini”, 18 months):
The next step will be to couple (b) the 3D ocean component to (c) a 3D planetary wave atmosphere, (d)
the land surface, and (e) sea-ice. This will require scientific effort defining the coupling method between
the atmosphere, land surface and sea-ice components. The 3D ocean component will be the same as used
in component set 1 but at higher resolution, and with the inclusion of (f) marine biogeochemistry.
Component set 2 will also help us learn about the necessary methods and implications of flexibility,
modularity and scalability because we will have two alternative atmosphere components (energy-moisture
balance and 3D planetary wave atmosphere), and two alternative resolutions of the ocean. Much of the
initial assembly and interfacing of the atmosphere, ocean and sea-ice will have been undertaken as part of
a recently funded NERC COAPEC project (NER/T/S/2001/00191).
Component Set 3 (“GENIE-Grid”, 30 months):
The final step will be to couple the remaining components, (g) marine sediments, and (h) ice-sheets, and
include more complete representations of biogeochemical and hydrological cycling. This will require
scientific and technological effort in defining and achieving the asynchronous coupling of ice sheets and
marine sediments to the rest of the model.
Details of Earth system modelling components
The following components will be integrated in various realisations of the GENIE model. All components
are comparable to, or significant advances over, most existing intermediate complexity models.
(a) Energy-moisture balance atmosphere (EMBM): A standard 2-D diffusive model of atmospheric heat
and moisture transport, incorporating radiation and bulk transfer formulae for air-sea and air-land surface
fluxes of heat and moisture (Weaver et al. 2001) will be used in the “GENIE-Trainer”. SOC will isolate
the source code for this component (currently tied to the ocean) and add meta-data (EMBM1).
(b) 3-D Ocean (OCEAN): A 3-D, non-eddy-resolving, frictional geostrophic model (Edwards et al 1998,
Edwards and Shepherd in press) will be used throughout. This allows much longer time-steps than a
conventional ocean GCM by neglecting acceleration and momentum transport, to obtain the large-scale,
long-term circulation only. More than 5 man-years of effort have been invested in developing this model.
It was coupled to the energy-moisture balance atmosphere as part of NESMI (R. Marsh, SOC). SOC will
isolate a coarse-resolution version (18x18 longitude-latitude grid points and 8 depth levels) and add metadata (OCEAN1) for inclusion in the “GENIE-Trainer”. A variable-resolution version with more
sophisticated coupling to the 3-D atmosphere will be produced for “GENIE-Mini” (OCEAN2).
(c) 3-D Atmosphere (ATMOS): A 3-D, non-transient-eddy-resolving, stationary wave model (Valdes &
Hoskins 1989) will be used in “GENIE-Mini” and “GENIE-Grid”. This uses the same equation set as in a
conventional atmospheric GCM but allows a much longer time-step (~1 month rather than 30 minutes),
by parameterising baroclinic instability. This model was developed during a 3-year NERC-funded project
and subsequently used in several other NERC and EU projects. Moisture transport, clouds and
precipitation are being included using conventional methods, in a project funded by COAPEC
(NER/T/S/2001/00191). Reading will isolate the code, help define coupling to the ocean, land surface and
sea-ice and add appropriate meta-data for inclusion in “GENIE-Mini” (ATMOS1). Then a variableresolution version will be enabled and scaleability issues addressed (ATMOS2).
(d) Land surface, hydrology and biogeochemistry (LAND): A simplified version of the MOSES landsurface scheme (Cox et al. 1999), already developed by P. M. Cox of the Hadley Centre, will be used in
“GENIE-Mini”. This allows a longer time step than full MOSES (~12 hours rather than 30 minutes) by
excluding fast processes (e.g. canopy interception). The ‘TRIFFID’ model from the Hadley GCM will be
5
used to capture vegetation dynamics and their effect on land surface properties. CEH Wallingford will
isolate the source codes, help define the coupling to the atmosphere and runoff to the ocean, and add
meta-data (LAND1). Next, carbon and nitrogen cycling will be switched on and biogeochemical coupling
to the atmosphere and ocean included (LAND2). Finally, a fully ‘traceable’ (to the GCM) land-surface
scheme will be developed for “GENIE-Grid” (LAND3). This will retain all MOSES processes but
explicitly time-average to achieve a fast version with a long time-step.
(e) Sea-ice (ICE): A 2-D sea-ice model, incorporating standard thermodynamics (Hibler 1979) and
elastic-viscous-plastic dynamics (Hunke and Dukowicz 1997) will be included in “GENIE-Mini” and
“GENIE-Grid”. At the time of writing, as part of the aforementioned COAPEC project, the energymoisture balance atmosphere (a) and ocean (b) models, have been coupled to a thermodynamic free-drift
version of this sea ice model (R. Marsh, SOC). SOC will isolate the code, help define coupling to the
atmosphere and ocean, add meta-data, and include this component in “GENIE-Mini” (ICE1).
(f) Ocean biogeochemistry (BIO): Existing representations of marine carbon and nutrient (phosphate,
silicic acid, and iron) cycling (Ridgwell 2001) that have been successfully applied to questions of past
(Watson et al. 2000) and future carbon cycle behaviour, will be integrated into the 3-D ocean model by
UEA (BIO1). This work has already begun as part of the Tyndall Centre Integrated Assessment Model
programme. Next, the oceanic nitrogen cycle and its influence on carbon cycling, and the fractionation of
a variety of stable (13C/12C, 15N/14N, 30Si/28Si, 87Sr/86Sr) and radiogenic (14C/12C) isotopes, and trace
elements (Ge/Si, Cd/Ca) will be included (BIO2). This will facilitate the simulation of paleooceanographic records and thus aid model testing.
(g) Marine sediments (SEDS): A model of the interaction of the deep-sea geochemical sedimentary
reservoir with the overlying ocean will be included in “GENIE-Grid”. This will be derived from an
existing representation of opal diagenesis developed at UEA (Ridgwell 2001) and from standard schemes
for dissolution of calcium carbonate and remineralisation of organic matter (SEDS1). It will enable
GENIE to capture the ‘slow’ (>1 thousand years) response of the ocean carbon cycle to perturbation, and
facilitate model testing by comparing predicted sediment core records with actual records. UEA will
define the asynchronous coupling of the sediments to the ocean, and add meta-data to the component code
(SEDS2).
(h) Ice sheets (SHEET): An existing ice sheet model (Payne 1999) will be applied to simulate glacial
maximum ice sheets and deglaciation in “GENIE-Grid”. It will operate at finer spatial scales (currently
scaleable over a range 20-100+ km grid cells), but longer (decadal) time steps than other components.
The model has been developed under 2, 2-year NERC grants totalling ~£170k. It now includes fast ice
flow and can be used to study large-scale surging, important in phenomena such as Heinrich events. The
code is currently being parallelised and this will be complete at the start of the project. Bristol will first
develop a stand-alone simulation of full glaciation (SHEET1). Then the atmosphere (and ocean) will be
coupled in an asynchronous fashion and the coupled model implemented at coarse resolution to aid massbalance parameterisation (SHEET2). Spatial and temporal changes in ice-sheet extent and thickness,
surface albedo, topographic blocking effects on atmospheric circulation, and the output of freshwater to
the ocean will all be simulated. Finally, the model will be implemented at finer resolution to capture the
flow physics more accurately (SHEET3).
Grid technologies required
The GENIE system will require the application and development of a number of Grid technologies.
(1) Component Wrapping (WRAP): A component repository will be set up to store the meta-data and
source code relating to the ‘wrapped’ science components (see below). The science source code (e.g.
FORTAN77/90, C & C++) will be packaged into individual software components. IC and Soton will
develop an XML Schema to capture the meta-data for the science components (WRAP 1). This will
include the exposed methods and argument, the input, the output and behaviour of the component, as well
as describing the scientific capability of the component. IC will develop a simple GUI to define the XML
and allow a scientist to integrate a science module within a component through a few lines of manually
written code. Soton will use the XML Schema to define the database structure to allow automatic
6
insertion and extraction of the generated datasets (see DB below). Once the initial XML Schemas have
been defined, further work packages will wrap the “component set 1” science modules: Atmosphere at IC
(WRAP 2), and Ocean at Soton (WRAP 3) and write the data interchange modules (WRAP4: IC and
Soton) which will allow components to exchange information at domain boundaries taking into account
coordinate system transforms.
(2) Computation (COMP): Computational resources within the collaboration will be formed into a
virtual organisation using middleware such as Globus. These resources will comprise traditional
supercomputers, Beowulf clusters and Condor pools. The wrapped “component set 1” modules will be
tested on computational resources at Soton and IC to verify their functionality and the basic integrity of
the framework for GENIE Trainer (COMP1 and COMP2 – IC and Soton respectively). Work will
continue to wrap the “component set 2” science modules using the technology developed in WRAP 1.
This will ensure the prototype wrapping technology is viable for use by the environmental science team.
IC will integrate its science modules (described using the XML Schema) into the application framework
using wrapping code generated in Java. Soton will use the same XML Schema to wrap science modules
using web services technology. We envisage using Web services as the communication mechanism
between resources while using Java wrapped components within a computational resource. Such
distinctions will be transparent to the end user, but mirrors closely recent developments in Grid
computing research (announced by Globus team, Edinburgh, Jan 2002)
(3) Meta-Scheduler (SCHED): The meta-scheduler collects information relating to the currently
available computational resources, the science components, and the application definition (provided by
the user) to minimise the overall execution time by instantiating components on the most appropriate
execution platforms. The user generates the application definition by browsing the existing component
meta-data stored within the distributed component repositories. The performance of a component on a
particular platform is obtained by interrogating its performance database, generated and enhanced
whenever a component is executed. By understanding the application structure and exposing the data
flows between components we are able to optimally map components to potentially distributed resources
and re-distribute the components should circumstances dictate, e.g. the availability of better resources.
This work is in an advanced state of development under an EPSRC ‘High Performance Scientific
Software Components’ grant (GR N/13371) and will be developed further through funded work within
the Reality Grid (EPSRC Pilot Project).
(4) Automated Data archiving, querying and post-processing (DB): This will facilitate collaborative
sharing of simulation results between partners in the project. Sharing, re-use and exploitation of these data
sets requires the ability to locate, assimilate, retrieve and analyse large volumes of data produced at
distributed locations. We will use open standards to provide transparent access to the data along with
open source/ commercial database systems to provide a robust, secure and distributed back-end for the
data handling. The key requirements for the database system are (i) setting up the databases and
developing standards, and (ii) writing high level database post-processing tools, which apply functions to
the database and delivers the processed data to the user and back to the database. An early prototype of
such a system was developed at Southampton as part of the UK Turbulence Consortium activities in 1998
and has most recently been applied in other engineering domains for automated data archiving (e.g. Cox
2001). This work package will be developed at Southampton and be integrated into the GENIE portal,
leveraging expertise from two recently funded Grid projects: GEODISE (Grid based optimisation:
EPSRC) and a BBSRC project to deliver a Grid-based bio-molecular database.
Database system (DB1): The underlying database system will use the XML and XML Schema
(developed in WRAP1) to specify the portable database infrastructure that underlies our system, and
binary formats for the bulk data. This will allow for automated generation and population of the
underlying open source/ commercial database system (e.g. Storage Resource Broker, DB2, SQL Server,
Oracle, Tamino), whilst retaining the flexibility to add new metadata dynamically as part of the postprocessing analysis. The post-processing facility will be integrated into the GENIE Portal and will allow
for user queries to be made to the distributed databases, and re-use of simulation results to seed new
calculations.
7
Data post-processing and database integration (DB2): Analysis and post-processing tools for the
distributed data resulting from Grid-based simulation runs will allow for new information and knowledge
to be deduced from simulation data. For visualisation we will use the tools developed under the “Grid for
environmental systems diagnostics and visualisation” project. These are ideal and K. Haines has agreed to
fully collaborate. We will also liase with the NERC Datagrid proposal, if it is successful. Much of the
paleo-climate data is held at alternative data centres (e.g. http://www.ngdc.noaa.gov/paleo and
http://www.pangaea.de) but the Datagrid proposal will also be considering model output.
(5) GENIE Portal (PORTAL): The portal will be the web-based mechanism for authenticating users,
browsing the component and data repositories, composing simulations, executing them and analysing the
results. It will leverage significant and ongoing activity at Soton in Problem Solving Environment
development (funded by EPSRC) and at IC in the EPIC (“e-Science Portal at Imperial College”) project
(a LeSC Centre funded by DTI). It will also enable users to monitor an ongoing simulation. The portal
will be developed by IC and Soton with IC focussing on the component integration (PORTAL1) and
Soton focussing on Database integration (PORTAL2). Integration to Globus & Condor and security
issues will be undertaken by Southampton and Imperial (PORTAL3).
Deliverables
12-month: GENIE-Trainer: Grid-based 3D ocean and energy-moisture balance atmosphere model. Proof
of concept for Grid coupling of components. Used to train environmental science RAs about Grid
methodology.
18-month: GENIE-Mini: Grid-based coupled 3D ocean, 3D atmosphere, land and sea-ice model.
Comparison of multi-decadal simulations with results of conventional (COAPEC) modelling approach.
Past ~30kyr simulation with imposed ice sheets and greenhouse gas forcing. Collaborators will be able to
access and make simple queries of distributed simulation results over the Grid.
24-month: GENIE-Mini with interactive carbon cycle. Assessment of robustness of carbon cycle-climate
feedback predictions over the 21st century. Extension of this assessment to coming centuries. Assessment
of the predictability of paleo-climate events (e.g. Younger Dryas).
30 month: GENIE-Grid: Grid-based complete model with interactive ice sheets and marine sediments.
Fully coupled simulations of the last deglaciation. Simulations of the long-term (next ~20 kyr) response
of the Earth system to addition of fossil fuel carbon. Collaborators will be able to make sophisticated
queries of the distributed simulation results and visualise running simulations.
36-month: Presentation of results derived from Grid-based simulations of (i) inter-comparisons with
alternative models, (ii) last glacial termination, and (iii) the long-term response of the Earth system to
human activities. The full model and infrastructure will be made available to the community and training
sessions will be held (in collaboration with the National Institute for Environmental eScience (NIEeS)).
Nature of the Research Team
Principal Investigator: PJ Valdes (Reading). Co-Investigators: MGR Cannell (CEH Edinburgh), SJ Cox
(Southampton), J Darlington (Imperial College), RJ Harding (CEH Wallingford), AJ Payne (Bristol), JG
Shepherd (SOC) and AJ Watson (UEA). Recognised Researchers: TM Lenton (CEH Edinburgh), AJ
Ridgwell (UEA). Collaborators: PM Cox (Hadley Centre), RM Marsh (SOC). Allied Researchers: NR
Edwards (Bern), K Haines (Reading), J Koehler (Tyndall Centre). In addition to the 10 posts requested
below, Southampton e-Science centre will contribute a funded PhD student who will assist in the Grid
technologies for this project.
Maturity of partnership, experience in delivering large, complex projects
The investigators are all recognised leaders in their fields, and have a track record of collaboration. The
project will leverage expertise at the London Regional e-Science Centre at Imperial College (directed by
Darlington), and Southampton’s Regional e-Science Centre for which SJ Cox is technical director. Cox is
8
PI for the ‘Grid Enabled Optimisation and Design Search (GEODISE)’ EPSRC funded e-Science testbed
project, which will share key Grid-based technologies with GENIE (e.g. in the Database workpackage).
All scientific partners in the project have been meeting at workshops over the last 3 years, to plan various
Earth system modelling activities. Harding co-ordinated the NERC Earth System Modelling Initiative, in
which Lenton, Cannell and Marsh participated. Shepherd, Ridgwell and Lenton collaborate on a Tyndall
Centre project (IT1.31). Watson supervised the PhDs of Lenton and Ridgwell. Harding has worked
extensively with PM Cox and colleagues at the Hadley Centre. Lenton co-ordinates a Research Network
in Systems Theory (http://www.cogs.susx.ac.uk/daisyworld) that includes PM Cox and Watson. Shepherd
and Valdes have a COAPEC project together, on which Marsh is a recognised researcher, and which also
includes Bristol. Shepherd co-ordinates the Earth system modelling initiative (ESMI) at SOC. SJ Cox and
Payne have collaborated on a variety of applications of high-performance computing to environmental
problems. In particular, they have developed one of the first ice-sheet models to use a parallel-processing
architecture (Takeda et al, in press); they also have a joint NERC funded project. The e-Science Centres
at Southampton and Imperial College are working together on a number of projects and meet regularly at
National and International Grid meetings.
Justification for Resources
We request 10 full-time posts (some staggered) to undertake the science, e-science and co-ordination,
plus 4 part-time posts to enable efficient management and operation of the project. The work-packages
for which we need each member of the Science and Grid teams are detailed above and in Appendix 1.
Science Co-ordinator and Project Manager. (1) T. Lenton (SSO, CEH Edinburgh) for 3 years. TL will
report to the PI and directly manage achieving the project milestones and coordinate the activities of the
research team. He will be an integral part of the professional management of this large project, and, in
consultation with our industrial partners, we will require 50% of his time to fulfil this demanding role. In
the other 50% of his time, TL will synthesise the science in the project, including helping in the
modelling of global biogeochemical cycles and designing and implementing the simulations. Grid
training will be a series of visits to Imperial and Southampton.
Science Team. These posts will provide expertise in each of the key science components.
(2) PDRA at Reading funded for 2 years (starting at month 12), managed by Valdes. The PDRA will be
responsible for the execution of the long simulations using GENIE-Mini and GENIE-Grid, with a
particular focus on the predictability of the glacial-interglacial transition. Grid training will involve
extended visits to Imperial during the first 6 months.
(3) PDRA at SOC funded for 2 years (starting from month 0), managed by Shepherd. Grid training will
involve an extended placement (~6 months during the 1st year) at Southampton e-science centre.
(4) A. Ridgwell (PDRA, UEA) for 2.5 years (starting from month 0), managed by Watson. AR will make
a vital contribution with the ocean biogeochemical and sediment schemes he developed during his PhD
and his skills in Earth system modelling. Grid training will consist of visits to Imperial.
(5) HSO at CEH Wallingford for 2 years (starting at month 6), managed by Harding. They will work
closely with Peter Cox of the Hadley Centre and the Joint Centre for Hydro-Meteorological Research
provides an ideal setting for this collaboration. Grid training will involve visits to Southampton.
(6) PDRA at Bristol for 2 years (starting at month 12), managed by Payne. Grid training will consist of a
~3 month placement at the Southampton e-science centre.
The environmental science PDRAs are requested at salary point 6 and we believe that we will be able to
recruit suitable staff to this exciting proposal at this pay scale.
Grid Team. These posts will provide expertise in each of the key technologies.
(7,8) 2 Grid PDRAs at Southampton regional e-Science Centres for 3 years each, managed by Cox. They
will liase closely with the team at Imperial and the environmental scientists at SOC, Bristol, and CEH
Edinburgh & Wallingford.
9
(9,10) 2 Grid PDRA at Imperial for 3 and 2.5 years each, managed by Darlington. They will liase closely
with the team at Southampton and the environmental scientists at Reading, UEA and CEH Edinburgh.
We have requested salary points between 10 and 15, for the 4 e-Science RAs. This high level of salary
relates directly to the pay-scales of the IT/ Application professionals who can deliver to fixed deadlines
the high quality software engineering we need to deliver GENIE and have skills in e.g. XML (+W3C
protocols), databases, C/C++, Java, PSE development, UDDI, CORBA, and applied Grid technologies.
Our survey of sites advertising IT jobs with similar skills indicates that salaries at this level should allow
us to recruit in this extremely competitive market.
Support Staff. We have requested 20% of a full time system programmer at point 15 on the ADC2 scale
again targeted at the level required to recruit a highly competent member of professional IT staff in this
competitive market. This job will involve administering our distributed Grid-based system, including a
variety of parallel & distributed cluster computing resources; installing, maintaining, and patching our
Grid middleware e.g. Globus/ Condor maintaining the operating systems e.g. Linux/ Windows– of
particular importance and relevance to this project will be keeping up-to-date security patches applied to
web services to ensure the integrity of our systems. These jobs represent a significant specialist load for a
member of staff that considerably exceeds the base-level provision of computing infrastructure that is
provided by the relevant service providers at each site.
Secretarial and Administrative Staff. This project envisages significant and effective interaction with the
various academic and industrial partners. The degree of reporting, correspondence and administrative
arrangements will be higher than for a simple stand-alone research project: we are therefore requesting to
purchase at several sites 10-20% of the time of our existing skilled secretarial and administrative staff.
Due to the distributed nature of the project, this support is naturally distributed amongst the sites.
Travel Costs. To ensure full benefit is obtained from this project, it is essential that we exchange ideas
with other workers in the field by presenting our work and being represented at appropriate national and
international conferences and Grid/ e-Science forums/ meetings. We have requested funds to allow each
of the requested staff to attend 1-2 meetings each year. This is in line with the number of such meetings
that the investigators have attended over the last several years. It is also considered vital that international
travel be supported since much e-Science activity will be happening in the USA and mainland Europe.
We have requested funds to allow the science PDRAs to spend significant periods at the e-science sites
and the e-science PDRAs to visit the scientists, and for travel to our quarterly project review meetings and
biannual review meetings with our industrial partners. This is an integral part of the management of the
project and will speed-up delivery of the various components of GENIE, along with dissemination to our
academic and the wider community. This level of support is based directly on the amounts spent over
recent years by the investigators in other comparable projects for on- and off-campus meetings at a
variety of sites.
Computing Facilities. The staff employed on the grant will need dedicated, high quality computers and
associated equipment to enable them to function in-office and to provide facilities when travelling around
to other academic partners. The modest additional infrastructure costs will provide for essential machines
to develop and host the repositories and data archives (particularly important for testing out Gridmiddleware and distributed web services before deployment), disk file servers, and tape backup facilities
related directly to the project and its staff. These are in addition to the facilities offered at each of the
partner sites, which are detailed in Appendix 2.
Office consumables. This includes specialist books (on e.g. Web technologies which tend to be expensive
due to their target IT market and limited life expectancy), specialist journal purchases, printer
consumables (for binding copies of documentation and information from the web, where appropriate),
paper, photocopying, storage media (e.g. CDs), telephone, and fax services. The sum requested is in line
with the level of spending that the investigators have required in delivering on the Grid/ e-Science/ Web
technology projects they have worked on to date.
10
Management
In a project with this complexity of interlocking parts it is important both that progress is planned
realistically and that each task is monitored from the start. Each PI has responsibility for a specific
software development item and/or model component. Overall management of the project will be the
responsibility of the management committee which will be will be chaired by the PI and comprise all coPIs and the scientific co-ordinator/ project manager (T. Lenton). It will meet quarterly and set project
goals. An Earth System science steering group, chaired by Valdes, will ensure that the scientific goals are
met and that the timescale for the building and adaptation of the component models is maintained. An
architecture group, chaired by Cox, will ensure that the Grid infrastructure for the project is delivered to
the correct time schedule. Both PI’s have considerable experience in managing large research groups. T.
Lenton’s duties will include active coordination of the Earth System science tasks, as well as liasing with
the architecture group on the Grid tasks. Quarterly project meetings will be held, in which all participants
and interested parties will meet to exchange information on scientific and Grid-framework issues and
ensure that the project deliverables are on schedule. Intervening meetings will use the Access Grid
facilities at the e-Science Centres in Edinburgh, London and Southampton. The part-time project
administrator will co-ordinate meetings and paperwork. Financial management will be the responsibility
of the PI’s institution (Reading). The PI will report to NERC at the times of deliverables.
Connectivity
We will hold a workshop at ca 24 months into the project, after GENIE-Mini is delivered. This will
provide an opportunity to involve industry and interested scientists. We have had informal discussions
with the director of the new National Institute for Environmental eScience (NIEeS) and have proposed
that a summer school and/or workshop on Earth system modelling should be held. This will be help
promote the subject area and train and build a community of eScience-educated Earth system modellers.
Industry involvement: Industrial support in-kind is provided by a joint team of Intel and Compusys staff
(see attached letters of support), who view this project as an important exemplar of Grid based
technologies. Intel is the world’s largest computer hardware company, who will assist in the provision of
state-of-the art systems technology throughout the lifetime of the project to integrate into our Grid
testbed. Intel is also supporting the teams developing some of the Grid middleware that we will use (e.g.
Condor, who Cox is working with on a variety of projects). Compusys are one of Europe’s leading High
Performance Cluster integrators, and recently supplied a 324-node commodity system to the University of
Southampton. Their collaboration on this project will further assist our Southampton University
Computing Service in delivering a Grid-enabled service on this new facility.
International links: We will link to the ‘Bern group’ led by Thomas Stocker and other groups in Europe
and the USA. In particular, our collaborator Neil Edwards, the developer of the 3-D ocean model, has a 4year fellowship in Bern on ‘Efficient Earth System Models and the role of surface feedbacks in decadal to
centennial variability’. We will engage fully in the international Earth system modelling community,
including contributing to the IGBP Global Analysis, Integration and Modelling (GAIM) programme
(Lenton is on the task force). The closest Grid activity to the proposed work is the Earth System Grid
(“Turning Climate Model Datasets Into Community Resources”) in the USA, which is much more
focussed on sharing large volumes of data. In principle GENIE will complement this work, with its focus
on a rich set of components (beyond ocean and atmosphere coupling), whose development has been
supported by NERC over many years and the resulting studies of long-term and paleo-climate change.
Growth, outreach and exploitation
Our approach is specifically designed to build capacity in this area, engage newcomers to Earth system
modelling, and thus encourage growth in this multi-disciplinary subject area. There already exists a large
community that has an interest in using and expanding Earth system models. The work proposed in this
project will be disseminated via the availability of working systems (GENIE-Trainer, GENIE-Mini,
GENIE-Grid) to the academic partners. This will be backed up by publications in academic journals and
at conferences in the normal way. The academic partners will share experiences with other national and
11
international e-Science activities at conferences and by active collaboration with other UK e-Science
consortia. Once in place, the GENIE modelling framework will be made available to the wider NERC
community. A major route for exploitation will be via community use and extension of the framework.
Scientists who are not necessarily experts in modelling or computational techniques will be able to create
and execute sophisticated whole Earth system model simulations. The distributed data from such
simulations will be accessible for visualisation by the whole community. Modellers will be able to wrap
new science components and contribute them to the repository. In this way, the usability, flexibility and
extensibility of the GENIE system will enable a dynamic virtual organisation of Earth system scientists,
and will greatly ease the construction of future generations of Earth system models.
References
Hunke, E.C. & Dukowicz, J.K. Journal of Physical
Oceanography 27, 1849-1867 (1997).
Imbrie, J., et al. Paleoceanography 8, 699-735 (1993).
Kapsner, W.R., et al. Nature 373, 52-54 (1995).
Lenton, T.M. Tellus 52B, 1159-1188 (2000).
Noblet, N.I., de, et al. Geophysical Research Letters 23,
3191-3194 (1996).
Payne, A.J. Climate Dynamics 15, 115-125 (1999).
Petit, J.R., et al. Nature 399, 429-436 (1999).
Petoukhov, V., et al. Climate Dynamics 16, 1-17 (2000).
Ridgwell, A. J. PhD thesis, UEA, Norwich, UK (2001).
Shackleton, N.J. Science 289, 1897-1902 (2000).
Takeda, A.L., et al. Computers and Geosciences (in press).
Valdes, P.J. & Hoskins, B.J. Journal of the Atmospheric
Sciences 46, 2509-2527 (1989).
Watson, A.J., et al. Nature 407, 730-733 (2000).
Weaver, A.J., et al. Atmosphere-Ocean 39, 361-428 (2001).
Archer, D., et al. Global Biogeochemical Cycles 12, 259-276
(1998).
Berger, A., et al. Climate Dynamics 14, 615-629 (1998).
Blunier, T., et al. Nature 394, 739-743 (1998).
Claussen, M., et al. Geophysical Research Letters 26, 20372040 (1999).
Cox, P.M., et al. Climate Dynamics 15, 183-203 (1999).
Cox, P.M., et al. Nature 408, 184-187 (2000).
Cox, S.J., et al. IEEE Computer Society Press. Proc. ICPP
2001 (2001).
Edwards, N.R., et al. Journal of Physical Oceanography 28,
756-778 (1998).
Edwards, N.R. & Shepherd, J.G. Climate Dynamics (in
press).
Foster, I., et al. (2001), Intl. J. High Performance Computing
App. 15: 200-222.
Hibler, W. D. Journal of Physical Oceanography 9, 815-846
(1979).
Appendix 1: Timetable of work packages
Months:
Person
IC 1
IC 2
Soton 1
Soton 2
Edinburgh
Reading
SOC
UEA
Walling.
Bristol
0-6
6-12
12-18
WRAP1
computation
WRAP2
WRAP3
WRAP1
(for DB1)
Planning
PORTAL1
COMP1
WRAP4
COMP2
WRAP4
DB1
Co-ordinate
WRAP4
COMP2
PORTAL2
SCHED
PORTAL3
COMP3
DB2
Simulations
ATMOS1
OCEAN2
ICE1
SEDS1
LAND2
SHEET1
Simulations
ATMOS2
ICE1
Presentation
SEDS2
LAND3
SHEET2
EMBM1
OCEAN1
BIO1
OCEAN1
OCEAN2
BIO2
LAND1
18-24
PORTAL1
24-30
PORTAL3
30-36
Deployment,
Documentation
Simulations
Simulations
Integration
Dissemination,
Documentation
Presentation
Presentation
Presentation
Presentation
SHEET3
Presentation
Appendix 2: Existing Computing Resources from Partners
Southampton Regional e-Science Centre and SOC: Access to University SGI Origin 2000 (24 node);
324-processor Intel Linux Beowulf cluster; SGI Origin 3000 planned. London Regional e-Science
Centre at Imperial: 24-processor Sun E6800; 32-processor Compaq Alpha cluster; 22-processor Linux
cluster; resources being expanded over the next three years through £3M SRIF. Reading Meteorology:
6-processor SGI origin 2000; access to University SGI machine; Condor-based system of >50 Sun
12
workstations being established. UEA: 3 dual-processor Compaq Alpha DS20 machines. Bristol: Access
to a 160-processor Beowulf cluster with fast connections. CEH Edinburgh: ~100-processor Beowulf
cluster (Intel 1.8 GHz). CEH Wallingford: ~50-processor cluster (Sun 700 MHz). Local Area Network
(LAN) connections: 100 Mbps at all sites with maximum capacity 1 Gbps. Wide Area Network (WAN)
connections: SOC and Reading have 34 Mbps connections with plans to increase to 1 Gbps. Bristol is on
the SuperJanet4 backbone with 2.5 Gbps links to Reading and Edinburgh. SuperJanet3 connections at
Bristol and UEA are 155 Mbps. CEH sites have 2 Mbps connections, with increases under negotiation.
Imperial College is a point of presence on the London MAN with a direct 1Gb connection. Southampton
also has a 1Gb connection. We will feedback any additional or ongoing networking requirements that
arise as a result of the project to the Grid Network Team, which has close links to Cox and Darlington.
13