Belle computing upgrade Ichiro Adachi 22 April 2005 Super B workshop in Hawaii Belle’s computing goal • Data processing 3 months to reprocess entire data accumulated so far using all of KEK computing resources efficient resources flexibility • Successful ( I think at least ) 1999 - 2004 all data processed and used for analysis for summer conferences ( good or bad? ) Example: DsJ(2317) from David Brown’s CHEP04 talk BaBar discovery paper : Feb 2003 Belle: confirm DsJ(2317) : Jun 2003 also validate Belle: discover B DsJ(2317)D: Oct 2003 software reliability BaBar: confirm B DsJ(2317)D: Aug 2004 “How can we keep computing power ?” 1 Present Belle computing system Tape Library 500TB DTF2 Sparc 0.5GHz 8TB disk 50TB disk Athron 1.67GHz 50TB IDE disk Xeon 0.7GHz 155 TB disk + Tape Library 1.29PB S-AIT Athron 1.67GHz Xeon 3.2GHz Xeon 2.8GHz Xeon 3.4GHz HSM 4TB disk Tape Library 120TB DTF2 Pen3 1.26GHz 2 major components • under rental contract • start from 2001 • Belle own system 2 Computing resources evolving • Purchased what we needed as we accumulated integrated luminosities so far GHz TB 3500 3000 TB 200 2000 CPU 1600 2500 HSM volume 160 2000 1200 120 1500 800 80 400 40 1000 Disk capacity 500 0 0 0 2001.2 2005.1 2001.2 • Rental system contract 2005.1 2001.2 2005.1 Processing power at 2005: 7fb-1/day 5fb-1/day at 2004 Expired on 2006 Jan. Has to be replaced to new one 3 New rental system • Specifications Based on Oide’s luminosity scenario 6-year contract to 2012 Jan Middle of bidding process • 40,000 specCINT2000_rates compute servers at 2006 • 5(1)PB tape(disk) storage system with extensions • fast enough network connection to read/write data at the rate of 2-10GB/s (2 for DST, 10 for physics analysis) • User friendly and efficient batch system that can be used collaboration wide x 6 data Rental period In a single 6-year lease contract we hope to double the resource in the middle, assuming Moore’s law in the IT commodity market 4 Lessons and remarks • Data size and access • Mass storage Hardware Software • Compute server 5 Data size & access • Possible consideration rawdata/yr(TB) 200 Detector & accelerator upgrades can change this slope rawdata Belle rawdata size integ. lum 160 1 PB for 1 ab-1 (at least) 2004 120 Read once or twice/year 80 Keep at archive 2003 compact beam data for analysis 40 2002 (“mini-DST”) 2000 2001 0 60 TB for 1 ab-1 0 50 100 150 200 Access frequently and (almost) Integ.luminosity/yr(fb-1) randomly on disk Easy access preferable MC 180 TB for 1 ab-1 3 beam data in Belle’s law on disk? where to go? Read all data files by most of users 6 Mass storage : hardware • Central system in the coming computing • Lesson from Belle We have been using SONY DTF drive technology since 1999. SONY DTF2…No roadmap to future development. Dead-end. SONY’s next technology choice is S-AIT. vendor’s trend Testing a tape library of S-AIT from 2004. cost & time Already recorded in 5000 DTF2 tapes. We have to move… The back-end S-AIT system •SONY Petasite tape library system in 7 rack wide space • main system (12 drives) + 5 cassette consoles with total capacity of 1.3 PB (2500 tapes) The front-end disks 2Gbit FC switch •18 dual Xeon PC servers with two SCSI channels •8(10) connecting one 16 320(400)GB IDE disk RAID system •Total capacity is 56(96)TB 7 Mass storage : software • 2nd lesson We are moving from direct tape access to hierarchical storage system We have learned that automatic file migration is quite convenient. But we need a lot of capacity so that we do not need operators to mount tapes Most of users go through all of (MC) data available in HSM, and each access from user is random, not controlled at all. Each access requires tape reloading to copy data onto disk. # of reloading for a tape is hitting its limit ! in our usage, HSM not archive, but a big cache need optimization in both of HSM control & user I/O huge disk may help ? 8 Compute server • 40,000 specCINT2000_rate at 2006 • Assume Moor’s law is still valid for coming years • Bunch of PC’s is difficult for us to manage At Belle, limited human resources Belle software distribution • “Space” problem One floor of Tsukuba exp. hall B3(~10m20m) 2002 cleared and flooring 2005 full ! No more space ! Air condition system should be equipped “electricity” problem:~500W for dual 3.5GHz CPUs Moor’s law is not enough to solve this problem 9 Software • Simulation & reconstruction Geant4 framework for Super Belle detector underway Simulation with beam background is being done For reconstruction, robustness against BG can be a key. 10 Grid • Distributed computing at Belle MC production carried out at 20 sites outside KEK ~45 % of MC events produced at remote institutes from 2004 Infrastructure Super-SINET 1Gbps to major universities inside Japan Need improvements for other sites • Grid Should help us Effort with KEK computing research center SRB(storage resource broker) Gfarm at Grid technology research center, National Institute of Advanced Industrial Science and Technology(AIST) 11 Summary • Computing for physics output Try keeping the present goal • Rental system Renew from 2006 Jan • Mass storage PB scale: not only size but also type of accesses Technology choice and vendor’s roadmap • CPU Moor’s law alone does not solve “space” problem • Software Geant4 simulation underway • Grid Infrastructure getting better in Japan (SuperSINET) 12
© Copyright 2025 Paperzz