Enabling File-Transparent Event Access from the Grid

Grid Collector: Enabling File-Transparent
Object Access For Analysis
Wei-Ming Zhang
Kent State University
John Wu, Alex Sim, Junmin Gu and Arie Shoshani
Lawrence Berkeley National Lab
In collaboration with
Jerome Lauret, Victor Perevoztchikov,
Valeri Faine, Jeff Porter, Sasha Vanyashin
Brookhaven National Laboratory
Goals
• Transparent object access
– No need for analysts to manage files and disk space
– No need for analysts to access remote mass storage systems
• Select objects based on their attribute values
– E.g., production=P03ia & numberOfPrimaryTracks>200
• Improve analysis system’s throughput by
– Eliminating the need to read all objects in a file
– Providing optimized disk space management and automatic
garbage collection
– Automating the retrieval of files from remote storage systems
• Interactive analysis of data distributed on the GRID
– Providing quick partial answers
– Enabling users to transparently share files in disk caches
June 2003
Grid Collector
Previous Work
Storage Resource Access
Coordination System (STACS)
and the GCA client software for
STAR
Strength –
• Transparent event access for one
storage site
• Efficient evaluation of selection
conditions through bitmap index
• Interactive estimation of the
selection size
Weakness –
• GCA client software was
designed for Objectivity data
User’s
Application
Query
Estimator
(QE)
Bitmap
index
Query
Monitor
(QM)
Caching
Policy
Module
open,
read,
close
Disk Cache
file
caching
file caching
request
– Need to access ROOT files now
• STACS only accesses one HPSS
– STAR data is to be distributed on
the Grid
June 2003
Query
estimation /
execution
requests
Grid Collector
Cache
Manager
(CM)
File
Catalog
(FC)
Grid Collector
Grid Collector is a collection of modules that include functionalities of
STACS and GCA client
New features –
• Integrate with STAR analysis framework to extract events from ROOT
files
• Use Storage Resource Manager for disk (DRM) and HPSS (HRM)
– GRID enabled, capable of accessing multiple sites
•
More efficient implementation of bitmap index
HRM
Logical
Request
Analysis Event
Iterator
Bitmap
Index
File
Catalog
File
scheduler
DRM
Disk
Cache
June 2003
Grid Collector
BNL
Disk
Cache
HRM
Disk
Cache
LBNL
The Building Blocks
• Bitmap Index
– Indexes each event
– Efficient for partial range queries
• Storage Resource Manager
– Manages disk cache
– Automatic retrieval of needed files from the Grid
• File Scheduler
– Coordinates file accesses
• File Catalog
– Provides location information about files
• Index Feeder
– Digests ROOT files to extract information about events (tags)
• Event Iterator
– Feeds events to analysis code in a stream
June 2003
Grid Collector
Using Grid Collector
• Existing practice
– Specify a list of files or directories containing the desired events
– Analyze all events in the files
• Reading more events than needed
– Files have to be on disk before analysis
• User has to manage the files and space
• All files have to be present at the same time
• Using Grid Collector
– Specify the conditions characterizing the desired events, such as
“production=P03ia & numberOfPrimaryTracks>=200”
– Analyze only events satisfying the conditions
• By reading only the events selected using the bitmap index
– Files are retrieved and managed by the Grid Collector
• User does not have to know about the files
• Files are retrieved in a stream, reducing the disk space required
June 2003
Grid Collector
Detailed Use Case
• Using a sample analysis script called doEvents.C
• Analyzing first 100 events from production P03ia with 200 or more
primary tracks
– .x doEvents.C(100, “select production=P03ia &
numberOfPrimaryTracks>=200”)
• To analyze all events, set the first argument to a negative integer
• To try different conditions without analyzing them, a separate
command is available
• Creating your own script to use the Grid Collector
–
–
–
–
June 2003
Load StGridCollector library
Create an object of type StGridCollector
Initialize the object with a select statement
Pass the object to StIOMaker just like a StFile object, the rest of the
code is exactly the same as using StFile
Grid Collector
Status and Future Plans
• Current state
– Grid Collector is ready to be used
– Currently (June, 2003), we are populating the bitmap index for a
STAR user (John Amonett, Kent State University) to do flow
analyses
• Future plans
–
–
–
–
–
Speed up the index building process
Enable parallel and distributed analyses for large jobs
Provide capability for users to analyze events in a specified order
Make it into a Grid-enabled service
Collaborate with other experiments (?)
• Contact information
– John Wu <[email protected]>
– Wei-Ming Zhang <[email protected]>
– Jerome Lauret <[email protected]>
June 2003
Grid Collector