GridSchoolSRM2

Advanced topic: The SRM
protocol and the StoRM
implementation
Ezio Corso (EGRID Project, ICTP)
Advanced topic on data
management

I will briefly describe how the classic SE works:



I’ll then talk about the SRM protocol:



Its origin to allow tape resources to be accessed from the
GRID.
Particular attention to design differences with classic SE.
SRM transition as an interface to disk storage
resources.


Highlight design points and consequences for file security.
File security: POSIX-like ACL access to files from the GRID.
Differences with Tape based systems.
I’ll finally talk about StoRM: an SRM implementation
that allows POSIX like ACL access.
I. Classic SE
Classic SE


It allows disk resources to be accessed from
the GRID.
What makes a machine into a SE? Three
components are needed:



A component that publishes and tells the GRID
that it is an available storage resource.
The usual framework for authentication: GSI.
A component that actually moves the files around:
the characterizing feature!
Classic SE

Component that allows the GRID to be aware of its presence,
i.e. to be included in the GRID information system


There is an LDAP Server that publishes information about the SE.
Information organised according to the GlueSchema: specifically by
the GlueSEUniqueID entity.



Part of the information is updated dynamically, especially that
concerning the disk space available and disk space occupied.



Information describing the SE such as its name and listening port of
service.
Information specific to each VO that the SE is serving such as the local
path to the file holding directory, available space, etc.
It is done through LDAP Providers found in /opt/lcg/libexec.
The providers run periodically scripts which update the dynamic
information.
Finally the rest of the grid information system periodically polls the
information made available by the SE present there.
Classic SE

User authentication: Grid Security Infrastructure GSI

Core of GLOBUS 2.4 libraries: used by service in charge of
moving files around!


i.e. /opt/globus/lib/libglobus_gsi_credential_gcc32dbg.so.0,
/opt/globus/lib/libglobus_gsi_proxy-core_gcc32dbg.so.0, etc.
Set of scripts run by cron jobs to manage pool accounts:



/opt/edg/sbin/edg-mkgridmap creates a gridmap file by
reading a local configuration file that specifies sources of
allowed credentials, from LDAP server or a specific file.
/opt/edg/sbin/lcg-expiregridmapdir used to remove the
mapping to local credentials when a grid user no longer is
working on that machine.
/opt/edg/sbin/edg-fetch-crl used to retrieve revocation lists of
invalid certificates.
Classic SE
Component that carries out the functionality of
moving files around the GRID.
 In general it is just any implementation of a
transport protocol that implements GSI!



GridFTP most common!
RFIO
Anything that somebody comes up with as long as
it is GSI enabled: it is just a matter of who will
adopt it and use it!
Classic SE
GridFTP:
 Essentially an FTP server extended/optimized
for large data transfers:



Parallel streams for speed.
Allows checkpoints during file transfers, for later
resuming.
Authentication through GSI certificates
instead of user name + password
Classic SE

Central point:




It is FTP! A user can do what an FTP client allows to be
done!
There is no separation of what can be done from the grid,
and the actual transport protocol.
There is no explicit and separate list of file manipulation
operations that can be done from the grid!
There is no uniform view of the possible file manipulations:
they are linked to the underlying transport protocol!


Depending on the protocol you may not have the same
functionality
For the same functionality the specific protocol must be used: it
may not be possible to access seamlessly all SEs!
Classic SE
Compare with CEs that have LRMS interface to
forked jobs or to batch jobs.
 It is an abstraction layer on the kinds of
computations that can be done.
 LRMS may not be a great protocol (gLite CEs
are somewhat different)… yet it is an attempt
to introduce an abstraction.
Classic SE
A more serious consequence of the lack of abstraction is how to
apply POSIX ACL like control on files, from the grid. It is left up
to the transport protocol!

For GridFTP:



It is FTP modified for GSI.
FTP allows file manipulation compatible with underlying Unix
filesystem permissions.
If grid control on files is needed, it is the underlying filesystem that
must be carefully managed!




Map users to specific local accounts: not pool accounts. Each grid user
can be controlled individually once it gets into the machine.
Partition local accounts into especially created groups: reflects data
access patterns.
Carefully crafted directory tree guides data access.
So a grid user with no access rights to a file is stopped because the
GridFTP server gets stopped on its track by the local filesystem!
Classic SE

In any case the proposed solution is
problematic because data may be
present in several SEs:



Users have same UID across all SEs.
Replication/Synchronisation of directory
structure across all SEs.
Users supplied with tools to manage
permissions coherently across all SEs.
Classic SE
Central point:
 GRID lacked the concept of access control within the same VO.

It was only possible to find it when passing to the local
machine.

The local machine had the means to enforce it: users + group
membership.

Security therefore is set up behind the scenes at the
implementation level!

No GRID concept involved! No GRID abstraction available to:



Express fine grained authorization.
Express what can be accessed.
Check GRID credentials.
Classic SE
VOMS proxies and GridFTP



Allows to define roles and groups: it therefore
allows for fine tuning who the GRID user is.
It is up to the system receiving these detailed
credentials to decide what local resources to use.
For SE there is still the same problem of explicitly
listing what these resources are: dependency on
the transport protocol as stated.
II. The SRM protocol
The SRM protocol
Storage Resource Manager protocol:
 Originally devised to allow grid access to tape based
resources that had a disk area acting as cache.
 Staging of files:





A request for a file arrives
If it is in cache it is returned right away
Otherwise it is first fetched from tapes, copied to disk and
then returned.
The system takes care of consistency between cache and
tapes.
Needed to offset latency due to robotic arm switching
tapes.
The SRM protocol
SRM designed to handle that Tape/Disk-cache
scenario, from the GRID:
1.
The presence of cache area introduces the
concept of file type:



Volatile: files get written in cache and the system
then removes them automatically after a lifetime
expires.
Permanent: the files that get into cache are not
removed automatically by the system
Durable: files do have a lifetime that may expire
but the system does not remove them and
instead sends an e-mail notification to the user.
The SRM protocol
File staging introduces the concept of
asynchronous calls to get or put a file:
2.



SRM request issued to get a file
Server replies immediately without
waiting for staging to complete.
Server returns a Request Token which the
client uses to periodically poll the
request’s status.
The SRM protocol
3.
The cache area also introduces a partition of file namespace:


Tape must store files: there have to be names that uniquely
identify the file in tape!
The cache area must serve files.



It may return a path to fetch the file on disk that is different from the
name that allows to uniquely identify the file in tape.
It can easily support different fetching mechanisms… that is different
transport protocols!
SRM reflects this distinction in the concept of SURLs and TURLs:

SURL: Storage URL - A name that identifies a grid file in SRM storage:
it is what the GRID sees!


srm://storage.egrid.it:8334/old-stocks/NYSE.txt
TURL: Transfer URL – A name that identifies a transport protocol and
the path to fetch the file: it is how the GRID moves the file around!

gridftp://storage.egrid.it:2110/home/ecorso/examples/2005/data.txt
The SRM protocol
Central point:
 SRM introduces an abstraction to separate
transfer protocol from the file operation itself.
 Although introduced to handle the cache
area, it also solves classic SE issues!
 It decouples file operations from transfer
protocol!
The SRM protocol
Direct consequence:
 SRM servers do not move files in and out of
GRID storage!
 They only return TURLS!
 It is up to the SRM client once it gets a TURL
to call a GridFTP/RFIO/etc client for moving
files!
 SRM acts only as a broker for file
management requests!
 Transfer is decoupled from data presentation!
The SRM protocol
Extra features and concepts in the protocol:
 Big issue of not running out of space during a
large file transfer.


System used by the HEP community to
store/manage huge amounts of data from LHC.
SRM introduced space management and
reservation interface.
The SRM protocol

It distinguishes three types of reserved disk space:




warned.
Space type and file type cannot be mixed in arbitrary ways:



Volatile: will be freed by the system as soon as its lifetime expires.
Permanent: will not be freed by the system.
Durable: will not be freed but the user that allocated it will be
Permanent space will be able to host all three types of files.
Volatile space can only host Volatile files.
The general way of working:




Space request is made.
Server returns a SpaceToken.
All subsequent SRM calls made by the client pass on the token.
The SRM server keeps track tokens and recognises allocated space.
The SRM protocol
The protocol calls: Data Transfer Functions
 Misnomer… no data is moved by an SRM
server
 srmPrepareToPut, srmPrepareToGet: for
putting a file into GRID storage or getting one
out.
 srmStatusOfPutRequest
srmStatusOfGetRequest for polling!
 They work on SURLs!
The SRM protocol
The protocol calls: Cache area
management



srmExtendFileLifeTime for extending
lifetime of volatile files
srmRemoveFiles to remove permenent
files
srmReleaseFiles, srmPutDone to force
early lifetime expiry
The SRM protocol
The protocol calls: Directory functions to
manage files in tape
 srmRmdir
 srmMkdir
 srmRm
 srmLs
 They work on SURL!
The SRM protocol

The protocol calls: Space management
functions




srmReserveSpace
srmReleaseSpace
srmGetSpaceMetaData
Space Token returned and used with all
Data transfer functions.
III. SRM applied to disk storage!
SRM applied to disk storage!


SRM addresses the issues of classic SE: it is
natural to use it also for disk resources.
There was also another important driving
force for its adoption:



Many facilities were in place for LHC analysis of
data coming from experiments production centres.
The facilities had high performance storage
solutions in place, employing disk parallel file
systems such as GPFS and Lustre.
With advent of GRID technologies it became
necessary to adapt existing installations to the
GRID.
SRM applied to disk storage!

The context of operation is now different:


No tape with a cache in between
In general all concepts are kept with slight
semantic adjustments




SURL/TURL distinction is kept - it decouples
transfer protocol from data presentation as stated.
Three file types are kept - some files may be
copied and live just for a certain amount of time.
Space reservation is kept - it is an important
functionality.
Directory functions are kept.
SRM applied to disk storage!
Some compromises:
 Asynchronous nature of srmPrepareToGet,
srmPrepareToPut and srmCopy, remain
although don’t make sense.
 SpaceType distinction makes less sense:



Arguably the whole disk can be seen as
permanent space, and so allow all three file types.
Akin to tapes that are permanent by their nature.
Releasing of file and lifetime extension remain
for volatile files; srmRemoveFiles for
managing cache files does not make sense
IV. StoRM SRM implementation
StoRM SRM implementation
Result of collaboration between:
INFN - Grid.IT Project from the Physics
community
+
ICTP - EGRID Project: to build a pilot
national grid facility for research in Economics
and Finance (www.egrid.it)
StoRM SRM implementation

StoRM’s implementation of SRM 2.1.1 meant
to meet three important requirements from
Physics community:



Large volumes of data exasperating disk
resources: Space Reservation is paramount.
Boosted performance for data management:
direct POSIX I/O call.
Security on data as expressed by VOMS: strategic
integration with VOMS proxies.
StoRM SRM implementation

EGRID Requirements:


Data comes from Stock Exchanges: very strict legally binding
disclosure policies. POSIX-like ACL access from GRID
environment.
Promiscuous file access: existing file organisation on disk
seamlessly available from the grid + files entering from the
grid must blend seamlessly with existing file organisation.
Very challenging – probably only partly achievable!

StoRM: disk based storage resource manager… allows
for controlled access to files – major opportunity for
low level intervention during implementation.
StoRM SRM implementation

How StoRM solves POSIX-like ACL access
from the GRID:


All file requests are brokered with SRM protocol.
When StoRM receives an SRM request for a file:


StoRM asks policy source for access rights to: given
SURL for given grid credentials.
Check is made at the grid credential level: not local user
as before! And it is done on a grid view of a file as
identified by the SURL!
StoRM SRM implementation



The only part of the implementation outside of the protocol is
the Policy Source: a GRID service that is able to
formulate/express physical access rules to resources.
StoRM leverages grid’s LogicalFileCatalogue (LFC) as policy
source: it is intended for Logical Names! StoRM therefore
stretches its use. Still, it is very GRID-friendly: it is not a
proprietary solution!
It would be better to have it explicitly in the SRM protocol: SRM
2.1.1 does have some Permission functions but their expressive
power is weak, and in the next version of the protocol they will
be re-addressed (srmSetPermission, srmReassignToUser,
srmCheckPermission).
StoRM SRM implementation

A last note: physical enforcement
through JustInTime ACL setup.




All files have no ACLs setup: no user can
access files.
Local Unix account corresponding to grid
credentials is determined.
ACL granting requested access set up for
local user.
ACL removed when file no longer needed.
Advanced topic on data
management
Thank-you!