clouds - University of Victoria

Cloudsatothersites
T2-typecomputing
RandallSobie
UniversityofVictoria
RandallSobieIPP/Victoria
1
Overview
•  CloudsareusedinavarietyofwaysforTier-2typecomputing
–  MCsimulation,productionandanalysis
–  Commercial/private,in-house/distributed
•  Motivationforusingclouds
–  Easeofuse,reducedmanpowercosts,resourcesharing
–  Separationofapplicationandsystemadministration
–  Leveragesoftwaredevelopmentbycommercialworld
•  Howarecloudsbeingused?
–  VMprovisioning,jobmanagement,benchmarks,storage,networking,monitoring
RandallSobieIPP/Victoria
2
CloudcomputinginHEP
Dedicated
Virtual
cluster
CloudcomputinginHEPistypicallyproviding
5-20%oftheprocessingofcurrentprojects
“Dedicated”clouds
(OwnedbyHEP)
“Opportunistic”clouds
Opportunistic
(privateandcommercial)
RandallSobieIPP/Victoria
3
Clouddeployments
Traditional
bare-metal
Specificpurposecloud
(e.g..LTDABaBar,HLTclouds)
Standalone/private
cloud
(e.g.PNNL,NorduGrid)
Distributedclouds
(e.g.UK,Canada,
Australia,INFNClouds)
Bare-metalorin-housecloudwithexternalcloud
(e.g..CERN,BNL)
RandallSobieIPP/Victoria
4
Examplesofclouddeployments
(meanttoillustrateouruseofclouds)
RandallSobieIPP/Victoria
5
Australian Belle II Grid Site
SingleCREAMCEservices
ATLASTier-2(Torque)
and
BelleIIsite(DynamicTorque)
Australia-ATLAs Tier 2
TORQUE + Maui
14,000 HEPSpec ~
(1400 cores)
Dynamic Torque
CREAM CE
distribute jobs via SSH
TORQUE + Maui
(Belle II) LCG.Melbourne.au
Dynamic Torque
control VMs
Research Cloud
(Currently 700 cores)
RandallSobieIPP/Victoria
6
Why private cloud?
 
 
 
 
 
 
Chosen for flexibility, efficient use of compute resources for services
Provides easy load-balancing and availability features
Provides templating features
Easy re-use of templates to test and instantiate new server instances
Non-systems staff can provision their own instances of services
Software Defined Networking is more malleable than physical
networking, encourages better networking practices, including security
Lessons learned
  VM’s and/or containers provide needed flexibility to support multiple
collaborations and different user needs
Ceph storage is very robust and flexible
  VM’s impose a 15%-20% performance penalty on HEP compute
workload without careful tuning
  Move to containers on bare metal planned
OpenStack features do not help us make sure a certain number of
instances are up and healthy and consistent
Kubernetes looks appealing in this respect
RandallSobieIPP/Victoria
7
GridPP(P.Love/A.McNab)
UniversityOpenstackinstances
•  CloudsatHEPinstitutions(Oxford/Imperial).
•  ECDFcloudinEdinburghhasrecentlymadeavailabletotheHEP
UKVacuumdeployments
•  Keytoourlight-weightTier-2strategywhereweoperatewithminimal
manpoweratthesite(<1000cores).
DatacentredcommercialOpenstack
•  ScaleofaTier-2facility.
•  Freeaccesstothetheirsystem(ATLAS)whilsttheywerecommissioningthings;
paidforaccesswhenfundsavailable.
•  NetworkconnectivitytotheUKacademicnetworkisonly1Gbitbuttheyhave
planstoupgrade
RandallSobieIPP/Victoria
8
Italy(INFN;MassimoSgaravattoetal)
PrivateOpenStackCloud(Padova-Legnaro)calledCLOUDAREAPADOVANA
Usedby~25usergroups/projectthatfinanciallycontributedfortheresources
Batchprocessing
•  Relyingontheelastiqframework,HTCondorbatchclustersareinstantiated.
•  Thesebatchclustersare'dynamic':newworkernodesareautomatically
addedorareremoveddependingonload.
•  CMSCloudprojectisintegratedwiththelocalTier-2.
•  E.g.CMSVMscanaccesstheT2storage(dcache)usingthesamelocal
protocol(dCAP)usedbytheT2WNs.
•  PlanstodeploytheSynergyservice,whichallowstomanagetheresource
allocationusingafair-shareapproach,withoutastaticpartitioningofsuch
resourcesamongtherelevantusercommunities.
RandallSobieIPP/Victoria
9
NorduGrid
RandallSobieIPP/Victoria
10
BernSwitzerland
SWITCHengines–SwissNRENcommercialcloud(OpenStack)
(freeduringdevelopmentphase)
RandallSobieIPP/Victoria
11
Canada
DistributedcloudsystemforATLASandBelleII
•  IntegratedintoPanda/DIRAC
•  Inproductionfor3-4years
•  AlsousedbyCanadianastronomy
• 
• 
• 
• 
• 
• 
uCernVM,CVMFS,Squid-discovery(Shoal)
DistributedVMimagerepository
Datawrittentolocalstorageandtransferred
BenchmarksrunatVMboot
VMtimemeasurementsforaccounting
Reasonablemonitoring
•  UpdatingsystemforOpenNebula
•  Studyingdatafederations(e.g.Dynafed)
•  Context-awareness
Job
Submit user script
HTCondor
Starts job
CloudScheduler
CloudScheduler
Start VMs
Compute
Cloud
VM
VM
Image
Repository
10-15cloudsmanagedbyHTCondor/
CloudScheduler(4000-5000cores)
800-1000cores(each)EC2/Azure
(Egressfeeswaived)
•  Challengesincludemanagingresources
acrossmanyadministrativedomains
RandallSobieIPP/Victoria
12
CanadianWLCG“cloud”–includesAustralianT2
FridayOctober6
Cloudresources
10clouds
4300cores
RandallSobieIPP/Victoria
13
Jobscheduling/VMprovisioning
•  VarietyofmethodsforrunningHEPworkloadsonclouds
–  VM-DIRAC(LHCbandBelleII)
–  VAC/Vcycle(UK)
–  HTCondor/CloudScheduler(Canada)
–  HTC/GlideinWMS(FNAL),HTC/VM(PNNL),HTC/APR(BNL)
–  Dynamic-Torque(Australia)
–  CloudAreaPadovana(INFN)
–  ARC(NorduGrid)
•  Eachmethodhasitsownmeritsandoftenwasdesignedtointegrated
cloudsintoanexistinginfrastructure(e.g.local,WLCGandexperiment)
RandallSobieIPP/Victoria
14
Commercialandprivateclouds
•  Commercialclouduse
–  PrimarilyAmazonEC2andMicrosoftAzure(withgrants)
–  ATLASdiscussinguseofGCE
–  OthercommercialOpenStackclouds
•  DataCentred(UK),SWITCHengines(Switzerland)
–  CERNcommercialcloudprocurement
•  Privateclouds
–  OpenStackandOpenNebularesearch-fundedcloudsbutnotinvolvedinHEP
RandallSobieIPP/Victoria
15
Networkconnectivity
•  AmazonandMicrosoftcloudsareconnectedtotheresearchnetworksin
NorthAmerica(probablyGCEaswell)
–  Egresschargescanbewaiveduponrequest
•  Trans-borderortrans-oceantrafficcanbeanissue
–  BecomeanimportantdiscussiontopicintheLHCONEmeetings
•  Privateopportunisticclouds
–  trafficflowsoverresearchnetworkbutnotLHCONEnetwork
RandallSobieIPP/Victoria
16
CPUBenchmarks
Newsuiteof“fast”benchmarks
–  HEPiXBenchmarkWorkingGroup
–  Suiteavailableincludes“fastHS”(LHCb)andWhetstonebenchmarks
•  WritetoElasticSearchDB
–  RunbenchmarksinthepilotjoborduringthebootoftheVM
Datastorage
–  DatawrittentolocalstorageonnodeandthentransferredtoselectedSE
–  UKgrouphasdonesomeworkintegratingtheirobjectstorewithATLAS
–  BNLusingS3storageonEC2forT2-SE
RandallSobieIPP/Victoria
17
Monitoring
Cloudorsitemonitor
CloudSystemmonitor
Sensu,Munin,RabbitMQ,Mongo-DB,Ganglia
Application
Benchmarksandaccounting
Applicationmonitor
ElasticSearchDB
Pandamonitoring
RandallSobieIPP/Victoria
18
Summary
•  CloudsatHEPsites
–  Typicallyintegratedintoanexistinginfrastructure
–  Seenasawaytobettermanagemulti-userresources
–  CloudR&Dfundingopportunities
•  Opportunisticresearchclouds
–  Easywaytoutilizecloudsatnon-HEPresearchcomputingfacilities
–  Norequirementforon-siteapplicationspecialistsorcomplexsoftware
•  Commercialclouds
–  EC2/Azure/GCEdominatebutotherOpenStackclouds
–  Grantandsomecontractedresources
–  Trans-bordernetworkconnectivitybeingaddressed
RandallSobieIPP/Victoria
19