GridPP Public Service Summit

GridPP
Prof. David Britton
UK-T0 Meeting
21st Oct 2015
David Britton, University of Glasgow
IET, Oct 09
GridPP Project leader
University
of Glasgow
Slide
1
Outline
•
•
•
•
Overview of GridPP
Past and future growth
GridPP5
Key messages
David Britton, University of Glasgow
UK-T0 Meeting
Slide
2
GridPP in 2015
GridPP is ~10% of WLCG
18 sites; 57k CPU; 32 PB Disk; 14 PB Tape
David Britton, University of Glasgow
WLCG: 170+ sites in 42 countries
515K CPU; 290PB Disk;150PB Tape
UK-T0 Meeting
Slide
3
Evolving Architecture
Hierarchy
Mesh
Evolution of
computing models
• Network capabilities and data access technologies have significantly
improved our ability to use resources independent of location.
• Jobs to data but Jobs can fall-back to remote data access if local copies
not available.
• Tier-1 and Tier-2 still offer different capabilities and levels of service.
David Britton, University of Glasgow
UK-T0 Meeting
Slide
4
GridPP Hardware
• Based mostly on off-the-shelf hardware
– Compute: typically X86_64 Intel Xeon Architecture - minimum 2GB
per core; 30-50GB scratch per core; 1 or 10 Gbps networking.
– Storage: 2/4 TB disks in raid-6 configuration; 10 Gbps networking.
• Dedicated Tape Storage facility at the RAL Tier-1.
• ~12 PB disk at Tier-1 and ~20PB at the Tier-2 sites
• ~16k logical CPU at Tier-1 and ~40k at the Tier-2 sites
• Tier-1 has dedicated (redundant) 2x10G optical link to
CERN (OPN) and 2*30G links to JANET.
• Tier-2 sites typically 10G connections to JANET.
David Britton, University of Glasgow
UK-T0 Meeting
Slide
5
RAL Tier-1 Resource
Growth
Shutdown
Higgs Discovery
LHC turn-on
Scale Tests
Similar growth in storage.
David Britton, University of Glasgow
UK-T0 Meeting
Slide
6
Scale of Global Operation
Performances
(ATLAS Experiment)
2M files transfered.day-1
1PB transfered.day-1
1PB deleted.day-1
4M files deleted.day-1
David Britton, University of Glasgow
UK-T0 Meeting
Slide
7
LHC and More
9% of Tier-2 CPU
and 4% of Tier-1
CPU was delivered
to 32 non-LHC VOs
between Jan 2012
and Dec 2014
David Britton, University of Glasgow
UK-T0 Meeting
Slide
8
Looking Ahead
 Data Volume
David Britton, University of Glasgow
UK-T0 Meeting
Slide
9
Complexity
Z  mm decay with 25 vertices
(April 15th 2012)
•
Simulated Event Display at
140 PU (102 Vertices)
David Britton, University of Glasgow
UK-T0 Meeting
Slide
10
Our Challenge
Volume: >200 PB at present (on disk
for ATLAS alone) will grow by factor
of >10x by Run-4.
Complexity: Pile up increasing from
23 to 140 by Run-4 increases
computational problem superlinearly.
David Britton, University of Glasgow
UK-T0 Meeting
Slide
11
Service Decomposition
GridPP’s value is the support and provision of a service layer that lays
between the hardware and the software and enables the LHC computing
models.
• The majority of GridPP effort is not is not used to run hardware.
• GridPP staff effort does not write experiment software.
(AiBM)
David Britton, University of Glasgow
UK-T0 Meeting
Slide
12
EGI Services
Services required by WLCG that are provided via the EGI project. The UK leads two of these,
co-leads a third, and makes contributions to two more.
David Britton, University of Glasgow
UK-T0 Meeting
Slide
13
WLCG Services
WLCG services not provided via EGI: The UK contributes to 13 services and co-leads
two. Most WLCG partners contribute to most of these shared international tasks.
David Britton, University of Glasgow
UK-T0 Meeting
Slide
14
UK Services
UK Services: These are services that every country needs to perform for the benefit
of their own infrastructure. These 14 tasks that are core business for GridPP. Some
of these services do (and increasingly can) benefit other UK infrastructures.
David Britton, University of Glasgow
UK-T0 Meeting
Slide
15
LHC
VO Management
LSST
SKA
......
Reconstruction
Data Manag.
Analysis
Services:
Federate
d
HTC
Clusters
Federate
d Data
Storage
Virtual
platform
Monitoring
Accountin
g Incident
reporting
AAAI
VO tools
Tape
Archive
.......
Share in common where it makes sense to do so
Note: this does NOT mean centralised
Slide
16
GridPP5
• GridPP5 proposal has just been funded (April 2016; 4-years).
• We presented ambitious plans to evolve our Tier-2 infrastructure
to a sustainable model based on 5 large sites, complemented by
smaller sites that can be run with little (0.5 FTE) manpower.
• We recognised the opportunity, particularly at the Tier-1, to
optimise productivity by sharing the e-Infrastructure to improve
cost efficiency and reduce duplication.
• We believe there are mutually beneficial opportunities in
engaging with existing and emerging groups that require “einfrastructure” under the UK-T0 and EU-T0 monikers.
David Britton, University of Glasgow
UK-T0 Meeting
Slide
17
Key Messages -1
• GridPP runs an HTC grid infrastructure primarily for LHC
data but the HW is sufficiently generic that can be widely
of use; middleware is getting less bespoke!
• There is a lot of knowledge/skills within GridPP,
particularly around handling large volumes of data
(transporting; metadata; processing).
• There is a lot of experience within the LHC Experiments in
running global-scale complex infrastructures.
• We don’t suggest that others should necessarily do things
our way; but we do have expert knowledge in some, but
not all, areas that might be exploited.
David Britton, University of Glasgow
UK-T0 Meeting
Slide
18
Key Messages -2
• We have some limited hardware capacity that you can use
for free to develop/test/scale-test things.
• We have some (very) limited manpower that can work with
you to get you going.
• We are keen to explore economies of scale by aligning our
infrastructure with yours where it makes mutual sense to
do so.
• We are keen to enter peer-relationships with other groups,
particularly STFC funded projects, under the auspices of
UK-T0, to show a return on investment in GridPP (aka
"Impact") that goes beyond the production of our headline
science.
David Britton, University of Glasgow
UK-T0 Meeting
Slide
19