ALMA Archive Operations

ALMA Archive Operations
Impact on the ARC Facilities
Data volume
1 year nominal ALMA
is 200 TB.
we’ve planned for
200 TB during the ‘early’
years in total.
no backups! The ARCs are
the backups.
Replication to ARCs using media
if necessary and over network once possible.
Deliverables
ARCs will receive the software and support for the procurement of the
hardware and the Oracle licenses. We are also including the ARCs in the
planning of the archive operational concepts under the assumption that
both the hardware and the software are very similar.
Database is Oracle, if an ARC for whatever reason decides to deviate from
this it has to carry the development and additional operational costs.
Essentially this means that it is technically feasible but....
Hardware is essentially the same story, although a bit more relaxed (in
particular if we manage to replicate over the network).
The NGAS software plays an essential role in the operational concepts.
High level concepts
SCO is hub for bulk and
meta-data.
OSF archive is hidden. Data
are first replicated to SCO and
from there to the ARCs.
In general everything is replicated
to the ARCs; in practice part of the monitor
and log data might be irrelevant.
Proposals are submitted to the SCO and replicated
to the ARCs. OT submission interface talks to SCO.
Nominal Operations
Database replication will be done using Oracle streams replication
technology to the various sites. This means that meta-data will be
available at the ARCs within seconds.
Bulk data replication will be done using the NGAS mirroring service
(network) or the NGAS cloning service (hard disks).
The NGAS archives at the ARCs are virtually independent from the
SCO NGAS, i.e. they don’t share DB or any other resources, but they
‘know’ of each other.
Network transfer: During periods of average data rate the bulk data
should arrive at the ARCs within a few minutes as well (network
bandwidth limitation).
nominal operations
Media transfer: Assuming that we would send media twice a week
and that they get delivered within one week, the maximum delay
would be 1.5 weeks after the observation. This has to be defined
and implemented.
Important data could still be replicated through the network.
Access to data is always transparently possible, i.e. user accessing
ARC can request data even if it has not yet been replicated.
Hardware
Full blown ALMA ARC will consist of four 19” racks.
3 x 8 NGAS servers (4 HU, 24 disks)
3 NGAS disk handling and front-end servers (3 HU, 16 disks)
2 database machines (3 HU, 16 disks)
Potentially additional disk array for database
Terminal server for all machines
Network switch
hardware
This equipment requires sufficient cooling and good racks
One of the 4 HU NGAS servers more than 100 kg, i.e. one of the
three racks will be approximately 1 ton.
Since one rack with current disk capacity holds about 1 year of
ALMA data it should be possible to keep the full ALMA archive
stable in terms of total space required by replacing disks with new
double capacity disks after
2-3 years (no increase of data rate).
This not only saves space, but also power and maintenance.
Deployment
Layout of the ARCs is essentially the same as for the SCO.
We just asked for prices for the first full installation of hardware at the OSF,
which is very similar to 0.75 of an ARC: Price is about 135,000 $ including
110 TB of disk space. 8 x 24 slot machines, plus 14 x 16 slot machines.
Only 35% of the slots filled.
1TB disk ~ 300$ (really low dollar :-(()
Prices
Totals per TB: 300 $/disk + 240 $/slot in computer = 540 $/TB
including auxiliary machines (disk handling, DB, front-end).
No infrastructure included (racks, cooling, network, UPS...)
At the time when we have to procure the hardware for the ARCs
this should have gone down to about 270 $/TB. We have to buy
about 1 year worth of capacity (200 TB) initially and that should by
that time fit in half of the number of machines/slots.