A Framework for the inclusion of theory data in the VO

Theory in the Virtual Observatory
(TVO)
Goals of Euro-VO DCA WP4
Gerard Lemson, GAVO
ARI-ZAH, Heidelberg
MPE, Garching
Theory in the VO, Garching, 7.4.2008
Overview
• Recap VO
• Why “Theory in the VO”?
• Theory in the IVOA
– Simple Numerical Access Protocol
• Intro to this workshop.
Theory in the VO, Garching, 7.4.2008
Recap VO
• Reminder, what is VO about?
–
–
–
–
“Universe on your desktop”
All astronomical resources online available
Behind friendly interfaces
Interoperable
• What is an “astronomical resource”?
–
–
–
–
–
data (all stored results of astronomical experiments)
software packages (IRAF,AIPS)
(web) services (Simbad, NED)
publications (LANL, ADS)
people (you)
Theory in the VO, Garching, 7.4.2008
Web helps to access resources
• Interesting astronomical resources may be
–
–
–
–
–
unavailable
unknown
not here
large (the farther away, the larger!)
complex
• Web technologies help:
–
–
–
–
–
Discovery: search engines, Google-like or structured
Documentation: HTML
Retrieval: relatively easy access
Filtering: server-side reduction of data streams
Web applications: services as resources
• Main issue, understanding each other...
Theory in the VO, Garching, 7.4.2008
Theory in the VO, Garching, 7.4.2008
Theory in the VO, Garching, 7.4.2008
Esperanto
• Standardisation
– Discovery (registries)
– Data description (“meta-data”)
– Data formats (FITS, VOTable)
– Protocols
– (Web) Application Interfaces
– Query language
• Organised in IVOA
Theory in the VO, Garching, 7.4.2008
VO’s Esperanto
Theory in the VO, Garching, 7.4.2008
Observations in the VO
• Most IVOA standardisation efforts concentrate on
observational data sets
– image archives
– source catalogues
– spectra
• Standards observationally biased
– Sky-based query protocols: cone search, SIAP, SSAP
– Source catalogue combination: ADQL, XMatch
– Data models: spectra, STC, characterisation
(sky/time/energy/flux)
Theory in the VO, Garching, 7.4.2008
Theory in the VO: issues
• Good reasons for emphasis on observations
– simple observables: photons detected at a certain time from a
certain area on the sky in a certain wavelength interval
– pre-existing (meta-)data format standards (FITS, “csv”)
– long history of archiving
– valuable over long time (digitising 80yr old plates)
• Simulations not so simple
– more varied “observables”: anything that can be modelled is
explicitly there
– no standardisation (not even HDF5)
– archiving ad hoc, for local use
– Moore’s law makes useful lifetime relatively short: few years later
can do better
Theory in the VO, Garching, 7.4.2008
“Moore’s law” for N-body
simulations
Theory in the VO, Garching, 7.4.2008
Courtesy Simon White
Interoperability
• Current IVOA standards not always relevant
• Distributed resources hard to join
–
–
–
–
no common sky
no common objects
no common observables
data models tailored to observations
• Complex data structures, not supported by
messaging format standards
– AMR, trees, graphs, Voronoi tesselations
• Individual data products often VERY LARGE
and not obviously reduced without explicit user
interaction.
Theory in the VO, Garching, 7.4.2008
So why bother?
• Simulations are interesting:
– For many cases only way to see processes in action
– Others can think of science cases you may not have thought of
– Complex observations require sophisticated models for
interpretation
• Bridging gap in specialisations: not everyone has
required expertise or resources to create simulations,
though they can analyse them.
• Many use cases do not require the latest/greatest
– exposure time calculator
– survey design
Theory in the VO, Garching, 7.4.2008
John Hibbard http://www.cv.nrao.edu/~jhibbard/n4038/n4038.html
Toomre & Toomre, 1972
Courtesy Volker Springel
NASA/CXC/SAO/G. Fabbiano et al.
Theory in the VO, Garching, 7.4.2008
Di Matteo, Springel
and Hernquist, 2005
IVOA: Theory Interest Group
• http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/IvoaTheory
• “Provide a forum for discussing theory
specific issues in a VO context. “
• Use cases for working groups.
• Projects
– Semantics
– Micro-simulations
– Simple Numerical Access Protocol (SNAP)
Theory in the VO, Garching, 7.4.2008
SNAP
• Goal:
– create a VO protocol for discovering, querying and
retrieving simulation data
– Similar to other S*AP protocols
• Restricted to 3+1D simulations:
– At least some common elements
– Challenging
•
•
•
•
large
complex
diverse
no support in IVOA (compare theory spectra)
Theory in the VO, Garching, 7.4.2008
Data access protocols
1. Find standard services in registry (say SIAP or
SSAP)
•
Filter on type of service, sky-footprint, wavelength.
2. Query these services using protocol syntax, in
general based on location on the sky.
•
•
Spectra in a circle on the sky, images overlapping a
certain rectangle
Results in VOTable, providing some metadata per
image/spectrum.
3. Retrieve desired results in standardised format
– FITS for images or spectra
– VOTable or other XML representation for spectra or
source lists.
Theory in the VO, Garching, 7.4.2008
SNAP 1: registry
•
Different motivations for querying a
simulation registry.
– no “interesting patch in the sky”
– no object about which more information is
desired
– no standard set of variables
•
•
How do we classify simulation archives?
Need new features for describing SNAP
services.
Theory in the VO, Garching, 7.4.2008
SNAP 2: query protocol
• Is it possible to conceive of queries that
makes sense for all simulation access
services?
– No common-sky based simple query to send
to lots of simulation archives
• Need new model to describe simulations
and base queries on.
• Less is known, more abstract model.
Theory in the VO, Garching, 7.4.2008
SNAP Data Model
• Goal: assist in describing and retrieval.
• Meta-data model.
– We only know that part of space is evolved in time.
– Properties, objects, dimensions, coordinate systemss, units all flexible.
– Compare to (RA/DEC, JD, λ, Flux)
• Should answer common questions about simulations, such as
–
–
–
–
–
–
What type of object is being simulated?
What physics is included?
What “observables” are available?
What are the typical dimensions?
How are the objects represented?
What numerical algorithms were used?
• Support:
– “Locate simulations that contain a galaxy cluster of about 1014
Msun, used SPH type hydrodynamics”
– etc
Theory in the VO, Garching, 7.4.2008
Poster
Bourges et al
Theory in the VO, Garching, 7.4.2008
SNAP Registry
• Difficult to separate steps 1 and 2.
• Registries not fine-grained.
• Individual institutes may lack expertise to deal with
complex data model.
– Metadata describing simulations not easy to fit in “flat” table.
– S*AP-like HTTP GET queries not flexible
• SNAP Registry
– Few centers acting as registries for fine grained simulation data
– Registration and browsing interfaces
– Evt ADQL query interface based on SNAP data model
Theory in the VO, Garching, 7.4.2008
SNAP 3: data retrieval
• Often very large datasets.
• Need server-side filtering to reduce size of transferred
byte streams:
– cut-out (how to decide which part of box?), projection, gridding,
cluster finder, visualisation (full virtual telescopes?)
• What data formats?
– FITS, binary VOTable, HDF5?
– how about more complex data structures?
• Server side analysis
– 2pt correlations, power spectra, density profiles, ...
• For now concentrate on discovery and links to web
services.
Theory in the VO, Garching, 7.4.2008
Theory in the Euro-VO DCA
• Work package 4: theory in the VO.
• Deliverable
– this workshop
– whitepaper
A Framework for the inclusion of theory data
in the VO
• Theory Experts Group (this workshop’s
SAC)
Theory in the VO, Garching, 7.4.2008
This workshop: goals
• Use cases
– Science with TVO-like aspects.
– (How) might TVO facilitate work?
•
•
•
•
•
Early implementations
Presentations on VO-like facilities
Discussions
Questionnaire
Whitepaper
Theory in the VO, Garching, 7.4.2008
This workshop: sessions
•
•
•
•
•
3+1D simulations
micro-simulations
theory-theory interoperability
theory-observational interface
computational infrastructure
Theory in the VO, Garching, 7.4.2008
Simulation types
• 3+1D simulations
– Subject of SNAP
• Overview (V. Springel)
• Projects (H. Wozniak, J. Schaye)
• VO efforts (R. Wagner, P. Hennebelle)
• Micro-simulations
– individually small
– different use cases from SNAP-like simulations
•
•
•
•
parameter space sampling
MANY parameters
MANY observables
on-line simulations feasible
– Large variety (all speakers)
Theory in the VO, Garching, 7.4.2008
Interoperability
• Theory-theory
– Not as straightforward as for observations.
– Examples
• MODEST (P. Teuben)
• Code comparisons (I.Iliev):
– Santa Barbara Cluster Comparison Project
– Aspen-Amsterdam Void Finder Comparison Project
• Data reuse (S.Charlot)
• Theory-observational
– Assist observers to use theoretical resources (vice versa?).
– Use cases.
• survey planning, exposure time calculator
• analysis of detailed observations, using detailed models
– Where do theory and observations meet? (Qi Guo)
• virtual telescopes (E.Bertin, S. Borgani)
• analyse observations as far as possible and compare physical properties
(G.Kauffmann)
Theory in the VO, Garching, 7.4.2008
Detailed observations
electron density
gas pressure
gas temperature
Courtesy Alexis Finoguenov, Ulrich Briel, Peter Schuecker, (MPE)
Theory in the VO, Garching, 7.4.2008
Detailed predictions
Theory in the VO, Garching, 7.4.2008
Courtesy Volker Springel
Computational infrastructure
• New technologies can assist, or may be
required to implement these ideas.
• Real life examples:
– Grid (M. Steinmetz, M. Spaans)
– Algorithms (L.M.Sarro)
– Relational databases (J. Blaizot)
– VO aware visualisation tools (G. Caniglia)
Theory in the VO, Garching, 7.4.2008
Discussions
• Address “typical” VO issues
– what is it good for, why should I participate,
doesn’t it lead to bad science...?
• Possibly formalised in questionnaire to be
sent around to participants after workshop
• Feedback for WP4 whitepaper.
Theory in the VO, Garching, 7.4.2008
Thank you.
Theory in the VO, Garching, 7.4.2008
Questions I
• Which resources are important?
– raw simulation results, post-processed, analysis, virtual
observations
– services to produce these
• What considerations should we apply to decide what
types of resources should be available?
– reproducibility, (re-)usability (by ...), ...
• What should accompany resources published on line?
– documentation, software readable metadata, help desk
– ...
• What scientific content is of most interest?
– large vs small
Theory in the VO, Garching, 7.4.2008
Questions II
• What questions do you want to ask of a
registry?
– content, methods, physics, characterisations
• What dangers do you see in publishing
resources online?
– quality control, bad science
• What reasons do you have to publish
resources online?
– what reasons to not do this?
Theory in the VO, Garching, 7.4.2008