Particle Based Data

DAL for theorists: Implementation of the
SNAP service for the TVO
Claudio Gheller, Giuseppe Fiameni
InterUniversitary Computing Center CINECA, Bologna
Ugo Becciani, Alessandro Costa
Astrophysical Observatory of Catania
Victoria, May 2006
The Simple Numerical Access Protocol Service
The Snap service extracts or "cuts out" rectangular (spherical or even irregular)
regions of some larger theory dataset, returning a subset of the requested
size to the client.
Snap basic components:
•
DATA
•
SNAP code
•
SERVICE
Victoria, May 2006
1. Data and Data Model
In order to analyze the needs of data produced by
numerical simulations, we have considered a wide
spectrum of applications:
•
Particle based Cosmological simulations
•
Grid based Cosmological simulations
•
Magnatohydrodynamics simulations
•
Planck mission simulated data
•
...
(thanks to V. Antonuccio, G. Bodo, S. Borgani, N. Lanza, L. Tornatore)
At the moment, we consider only RAW data
Victoria, May 2006
1. Data
In general, data produced by numerical simulations are
•
Large (GB to TB scale)
•
Monolithic (few files contains plenty of data)
•
Uncompressible
•
Non standard (propretary formats are the rule)
•
Non portable (depend from simulation machine)
•
No (or few) annotations – metadata
•
Heterogeneous in units (often code units)
Victoria, May 2006
Data: the HDF5 format
HDF5 (http://hdf.ncsa.uiuc.edu) represents a possible solution to
deal with such data
HDF5 is
• Portable
between
modern platform
• High performance
• Well supported
• Well documented
• Rich of tools
most
of
HDF5 data files are
• Platform independent (portable)
• Well organized
• Self defined
• Metadata enriched
• Efficiently accessible
HDF5 drawbacks
• Requires some expertise and skill
to be used
• Information are difficult to access
• Can be subject to major library
changes (see HDF4 to HDF5)
Victoria, May 2006
Data: our HDF5 implementation
Each file represents an output time
The structure is simple: all the data objects are
at the root level:
/BmMassDensity
Dataset {512, 512, 512}
/BmTemperature
Dataset {512, 512, 512}
/BmVelocity
Dataset {512, 512, 512, 3}
/DmMassDensity
Dataset {512, 512, 512}
/DmPosition
Dataset {134217728, 3}
/DmVelocity
Dataset {134217728, 3}
HDF5 metadata make the
completely self-consistent
Structural metadata (strictly required
from the library)
• rank
• Dimensionality
Data objects (at the moment) can be:
•
Structured grid: rank 4 (scalars or vectors)
•
Unstructured
vectors)
points: rank
2
(scalar
file
or
Annotation
metadata
(required
from our implementation)
• Data object name
• Data object description
• Unit
• Formula
Victoria, May 2006
Data Model schema
Victoria, May 2006
Implementation
of the model
The
database
is
at
present
implemented on a PostgreSQL
Linux installation.
Victoria, May 2006
The Snap Code: overview
The Snap code acts on large datafiles on different platforms. Therefore it
has been implemented according to the following requirements:
•
Efficiency
•
Robustness
•
Portability
•
Extensibility
We have adopted the C++ programming language over the HDF5 format
and APIs.
Source HDF5 file
Dataset1
...
...
SNAP
service
Snapped HDF5 file
Dataset1
...
Download
It is compiled under Linux (Gnu Compiler) and AIX (xlC compiler)
Dataset M
Dataset N
Victoria, May 2006
The Snap Code
Goal: select all the data that fall inside a pre-defined region. At present the
region can be only rectangular.
Input:
Output:
Data filename
One
Data objects (one or more)
Spatial Units
Box Center
ore more HDF5 file
with
the
same
descriptive metadata as
the original dataset.
Box Size
Output filename
Data objects names
Victoria, May 2006
The Snap Code
Data Geometry and Topology: at present we support regular mesh based
data and unstructured data (particles). The data structure is crucial for the
Snap implementation features
Mesh Based Data:
Particle Based Data:
Selection is performed using
HDF5 hyperslabs selection
functions. Only necessary
data are loaded in memory.
Particle positions are loaded in memory
Particles inside the selected region are identified
and their ids are stored (linked list)
Selection is extremely fast.
Other particle based dataset are loaded in memory
and the list is used to select target particles
Future upgrades:
Selected particles are written in the file
Support of spherical (or even
irregular) regions
Procedure can become “heavy”
Support of periodic boundary
conditions
Parallel implementation
Victoria, May 2006
Access to the Archive (the service)
The archive can be accessed in two complementary ways:
•
Via web and web portal
•
Via web service and high level applications
VisIVO
User app. 1
Web Portal
WEB
PHP, Java…
User app. 2
WEB SERVICE
TomCat+Axis
OGSA-DAI
Data Archive (data + metadata+apps)
Victoria, May 2006

Download Report

Particle Based Data

Paperzz.com

Your Paperzz