5P28.pdf

The Gaia Data Access and Analysis System
J. Torra (1), F. Figueras (1), C. Jordi (1), X. Luri (1), C. Fabricius (2), E. Masana (1), B. López Martí (1) , P. Llimona (1)
(1)Universitat de Barcelona – IEEC, E-08028 Barcelona, Spain
(2) Københavns Universitet, DK-2100 København Ø, Danmark
The Gaia mission will largely improve our knowledge of the Galaxy. Providing positions, proper motions and parallaxes of unprecedented precision (some µas) for a very large number of stars, and capable
to simultaneously determine radial velocity and photometric data, its impact in several branches of astrophysics will be enormous. Gaia scientific data must be extracted from the five years observations
performed from a rotating satellite scanning the sky at a constant rate. TDI technique allows tracking the stars as they cross the focal plane, covered by a CCD’s array, where the images of two parts of the
sky separated by a large angle are superimposed (for details, see Carrasco et al., this conference).
Gaia will observe some billion of stars plus some million of other objects, from galaxies to minor planets. The treatment of the five-year observations of such a large amount of data is challenging. The data
reduction is a self-consistent process, the Global Iterative Solution (GIS), where astrophysical data, calibration data, plus satellite attitude and several parameters modeling the observations are
simultaneously determined, is a cornerstone for the mission. The Gaia data base will contain about one Petabyte of data, and the reduction processes may need some 1019 floating point operations.
GDAAS (Gaia Data Acces and Analysis Study) aims to prove the feasibility of the mission from the point of view of the data treatment. In a first phase (2000-2002), GDAAS1, a first prototype of the system
was built. The database was filled with telemetry data provided by the Gaia simulator, and some crucial processes and GIS, in its first approach, were run. We are now developing a second phase,
GDAAS2 (2000-2004). Our goals are to prove the reduction process, that is, the GIS convergence, using more evolved core algorithms, as well as to implement some algorithms devoted to the treatment of
specific objects or phenomena. The third phase, GDAAS3, will involve a deeper scientific validation, plus a technical design of the operational concept.
GDAAS is developed by the GDAAS Consortium, constituted by GMV, UB and CESCA
GDAAS Prototype
The Gaia simulator
•Initial Data Treatment (IDT): telemetry decoded, elementary data
observations created
(Each observation is characterized by a time and the field angles along and
across the scanning direction)
•Cross-matching: Links together all observations of a given source..
•Global Iterative Solution: Determination of scientific and calibration data
•Algorithms (variability, binaries, ...)
Gaiasimu modules:
Gaiasimu is composed of four main components:
• A tool box, containing among other things numerical and astronomical methods.
• GASS (Gaia System Simulator), a specialised module generating simulated telemetry as expected to be received from Gaia.
GOG (Gaia Object Generator), a sky simulator able to generate a full variety of celestial objects (stars, QSOs, minor planets,
etc) – in development
• GIBIS (Gaia Instrument and Basic ImageSimulator), a specialised module to simulate CCD images as produced by Gaia
instruments.
Cross
Matching
Gaia simulator
Sources
Sources &
Observations
GAIA
Database
Telemetry Stream
Observations
& Attitude
Sources, Observations,
Attitude, Calibration & Global
Initial Data Treatment
Goals:
Simulations are needed for many important aspects of Gaia: mission and instrument design, data reduction preparation and
scientific assesment among others.
To cater for these needs the Gaia Simulation Working Group (SWG) was created early after the mission approval. This group, has
been developing Gaiasimu (the Gaia simulator) for several years. Gaiasimu has generated a wide variety of simulated data (from
realistic CCD images to mission telemetry) used during the design phases of Gaia. Its development is continuing in order to
increase its realism and level of detail for the next phases of development and to prepare the scientific exploitation of Gaia data.
In particular, Gaiasimu provides the data used in GDAAS.
Gaiasimu is being developed using an object-oriented approach and implemented in Java. See the SWG web page at
http://gaia.am.ub.es/SWG/ for more information.
Global
Iterative
Solution
Demonstrating the principle of GIS
The reduction process, based in the principle of Global Iterative Solution
(GIS), involves all the observations of "well behaved" sources and, solving
a minimisation problem in a linearized system of equations, look for the
attitude reconstruction, the derivation of the geometric and photometric
calibration and the astrometric and global parameters determination.
All the sources and observations in a given time
Attitude updating
All the sources and observations in a given time
The GIS consists of four steps, each one of them an iterative least-squares
process:
GAIA
Database
•Attitude updating: Determination of the smooth attitude.
•Global updating: Parameters in the model (i.e. PPN γ)
•Calibration updating: Instrumental geometrical and photometrical
calibration
•Source updating: Position, proper motions and parallaxes of the sources
Second GIS testing campaign (May-September, 2004)
Goals:
Proof of GIS convergence: starting with departures
from the nominal model, recover the model used to
generate telemetry data, both in the ideal (no
additional noise) and realistic case (scattering in
the observations).
Data and Astrometric Model:
-18 months of mission data: 2 108 observations for
200000 stars with V<13 (only about 2 per mil of
the real mission)
-Global astrometry formulated in a relativistic
framework (see Anglada et al., this conference).
The barycentric coordinate velocity of the star is
parameterised by the standard six astrometric
parameters (α, δ, π, µα, µδ, µr).
-The astrometric residuals are tested for
discrepances with the prescripcions of General
Relativity (the γ term – space curvature– is one of
the parameters to be solved)
-Photometry and radial velocity data are not
included
Results:
The last results indicate GIS convergence (see
figures below).
Further testing is needed to improve the
weighting system, accelerate convergence,
improve the astrometric model, for time
codifications,...
All the observations of a given source
All the sources in the mission time
Calibration updating
Source updating
Global updating
The first fully operational system used to process GIS:
Hardware at CESCA: Compaq AlphaServer HPC320, 16 of the 32 processors distributed in 8 nodes
have been extensively used (2001-2004). Five TB hard disk space installed exclusively for GIS
testing.
The GIS testing campaign has consumed until now a total of about 5000 hours of computer elapsed
time, up to 400 GB (80%) of the available disk storage and 30 tapes cartridges, containing a total of
about 600 GB.
Processing framework: The core of the system has been extensively tested. Input/output data for all
algorithms entering in the astrometric reduction process has been tested at the µas accuracy level.
Portability to other hardware/software platforms has been tested and succeeded.
Left panel: Mean differences between the obtained and the
theoretical value for the parallax π after each GIS iteration.
Right panel: Mean angular separation ||∆r|| between the
obtained and the expected position at the central time of the
considered mission period (18 months) after each GIS
iteration. The system is clearly approaching the theoretical
(simulated) values.
Near Future: GDAAS 3 (2005-2006)
Scaling to the real mission:
The testing campaign uses a scaled-down version of the full GIS expected to be run in the actual reduction of Gaia. To estimate the scaling
factors to extrapolate to the full mission one should take into account:
1. The observing period: The longest GIS will correspond to the full mission duration (5 years).
2. Number of objects: Gaia will observe about 109 objects, but only about 108 of them will be used to run GIS. The testing campaign was
based on approximately 200,000 objects.
The telemetry ingestion, database initialisation, cross-matching and GIS of the full mission will require about 4 x 1018 FLOPs and 3 x 109
sec of CPU time. The treatment of the Spectro instrument will still need additional resources.
1