AMD will move to 65nm fabrication only next year

Report about HEPIX
Roma April 3-7
SFT Group Meeting
May 5, 2006
René Brun
CERN
http://hepix.caspur.it/spring2006/
HEPIX Spring 2006 in Rome
• The meeting was held in the Italian National Research Council
(CNR) , a very comfortable auditorium although the networking
should have been more stable than it was all week. Initially there
were hardware problems but by mid-week it was the presence of
locally broadcasting nodes in the room which created much
instability and which could not be traced.
• Alongside the traditional HEPiX sessions, there were a number of
special meetings such as the LCG2 GDB3, the OPN4 working
group and others so the total registration count was over 120
although not all were present all week.
• Unlike previous meetings, this one was mostly separated into
topics with a convener appointed for each topic. Also, as in the
past two HEPiX meetings in Europe, this meeting attracted a
noticeable number of representatives of LCG Tier 2 sites, from
across Europe especially.
René Brun, CERN
Report about HEPIX Spring 2006
2
René Brun, CERN
Report about HEPIX Spring 2006
3
Highlights
•
•
•
•
•
Computer room cooling and air conditioning systems were mentioned in a
majority of site reports. Several sites are having to build or equip new
computer rooms to get round capacity restrictions in existing facilities.
As usual at recent HEPiX meetings, there were a number of benchmarks
presented with very detailed overheads well worth a look if you are
interested in performance or costs. ・
New format for HEPiX with half-day sessions on dedicated topics; such as
networking, performance optimization and databases were new to HEPiX,
with corresponding invited speakers.
Collaboration on and re-use of HEP-developed tools was not particularly
emphasized . On the other hand, there were, as often the case, a few
examples of wheels being re-invented for no obvious reason.
Also some random tools CERN/IT might want to look at : Imperia for web
page content management (PSI site report); Subversion, mentioned
several times by DES Group as a possible replacement for CVS for code
management, seems to have arrived on at least a couple of HEP sites.
• Virtualisation, Virtualisation, Virtualisation
•
What to do about Bird flu by Bob Cowles( security talk)
René Brun, CERN
Report about HEPIX Spring 2006
4
Site Reports
• TRIUMF, CASPUR, RAL, CERN,DESY, FZK, CNAF,
JLAB, LAL, NIKHEF, PSI, RZG, SLAC, BNL
• Nearly all sites installing thousands of Opteron
machines.
René Brun, CERN
Report about HEPIX Spring 2006
5
Plenary talks
•
•
•
•
•
•
LCG status by Les
CPU technologies (Bernd Panzer)
Power consumption issues (Yannick Perret IN2P3)
Dual-Core batch Nodes (Manfred Alef FZK)
Benchmarking AMD64 and EMT64 (Ian Fisk)
Networking technologies
René Brun, CERN
Report about HEPIX Spring 2006
6
INTEL and AMD roadmaps
 INTEL has moved now to 65 nm fabrication
 new micro-architecture based on mobile processor development,
Merom design (Israel)
 Woodcrest (Q3) claims + 80% performance compared with 2.8 GHz while
35% power decrease
some focus on SSE improvements (included in the 80%)
AMD will move to 65nm fabrication only next year
focus on virtualization and security integration
need to catch up in the mobile processor area
currently AMD processors are about 25% more power efficient
INTEL and AMD offer a wide and large variety of processor types
hard to keep track with new code names
René Brun, CERN
Report about HEPIX Spring 2006
7
Multi core developments
 dual core dual CPU available right now
 quad core dual CPU expected in the beginning of 2007
 8-core CPU systems are under development , but not expected
to come into market before 2009
(http://www.multicore-association.org/)
cope with change in programming paradigm, multi-threading, parallel
Heterogeneous and dedicated multi-core systems
 Cell processor system
PowerPC + 8 DSP cores
 Vega 2 from Azul Systems 24/48 cores for Java and .Net
 CSX600 from ClearSpeed (PCI-X, 96 cores, 25 Gflops, 10W)
Rumor : AMD is in negotiations with ClearSpeed to use their processor board
 revival of the co-processor !?
René Brun, CERN
Report about HEPIX Spring 2006
8
Game machines
Microsoft Xbox 360 (available, ~450 CHF)
PowerPC based, 3 cores (3.2 GHz each), 2 hardware threads per core
512 MB memory
peak performance = ~ 1000 GFLOPS
Sony Playstation 3 (Nov 2006)
Cell processor, PowerPC + 8 DSP cores
512 MB memory
peak performance = ~ 1800 GFLOPS
problem for High Energy physics :
 Linux on Xbox
 Focus is on floating point calculations, graphics manipulation
 Limited memory, no upgrades possible
INTEL P4 3.0 GHz = ~ 12 GFLOPS
ATI X1800XT graphics card = ~ 120 GFLOPS
use the GPU as a co-processor, 32 node cluster at Stony Brook
CPU for task parallelism GPU for data parallelism
compiler exists , quite some code already ported
www.gpgpu.org
René Brun, CERN
Report about HEPIX Spring 2006
9
Market trends
 The market share of AMD + INTEL in the desktop PC, notebook PC and
server are is about 98 % (21% + 77%)
 On the desktop the relative share is INTEL = 18% , AMD = 82%
(this is the inverse ratio of their respective total revenues)
 In the notebook area INTEL leads with 63%
 The market share in the server market is growing for AMD, 14% currently
Largest growth capacity is in the notebook (mobile) market
René Brun, CERN
Report about HEPIX Spring 2006
10
René Brun, CERN
Report about HEPIX Spring 2006
11
René Brun, CERN
Report about HEPIX Spring 2006
12
René Brun, CERN
Report about HEPIX Spring 2006
13
René Brun, CERN
Report about HEPIX Spring 2006
14
Batch Systems
•
•
•
•
ATLAS (Laura Perini)
CMS (Stefano Belforte)
LHCb (Andrei Tsaregorotsev)
ALICE (Federico Carminati)
René Brun, CERN
Report about HEPIX Spring 2006
15
Databases (convener Dirk)
• Introduction: Dirk described how LCG databases are kept
up to date via asynchronously replication via Streams. He
compared the concerns of local and central site managers
and how these must be reconciled to provide an overall
reliable service.
• Database Service for Physics at CERN (Luca Canali)
• Database Deployment at CNAF (Barbara Martelli)
• Database Deployment at RAL (Gordon Brown)
René Brun, CERN
Report about HEPIX Spring 2006
16
Optimisation and Bottlenecks
(Convener Wojciech Wocjik)
• Performance and Bottleneck Analysis (Sverre Jarp) this is work done
in the framework of CERN openlab collaboration with industry. One
of the first choices to make is which compiler gets the best
performance from your chip; then which compiler parameters have
which effect? Having explained the methodology and emphasized the
importance of selecting good tools, knowing the chip architecture and
how your algorithm maps to this, he then presented some results
obtained from the openlab collaboration with Intel.
•
Code/Compiler Problems (Rene Brun) threading and the importance
of making programmes thread-safe in order to take full advantage of
multi-core chips.
•
Controlling Bottlenecks with BQS (Julien Deveny)
•
Optimising dCache and the DPM (Greg Cowan). Each Tier 2 site has
unique policies and constraints. This leads to various combinations of
middleware components. The University of Edinburgh chose dCache
and LCG DPM (Disc Pool Manager). Using XFS in the DPM tests
showednoticeably better performance but not on the dCache tests.
René Brun, CERN
Report about HEPIX Spring 2006
17
Storage day (1)
•
Tape Technology (Don Petravick) At Fermi, tape capacity doubles
every 18-24 months, LTO-3 drives currently store 400GB but there is
no inherent tape density limit as there is for disc technology. In
summary he claims tape offers high quality retention technology and
simple, reliable units of expansion but it does complicate Hierarchical
Storage Management data handling and it requires specialised skills
to manage and operate. And the future roadmap appears to face no
fundamental engineering limitations.
•
Disc Technology (Martin Gasthuber) He presented various disc
configurations such as FC36 SAN37, SCSI FC and others. Important
components are not only the discs themselves but also the
interconnects and the disc and network controllers. Expected
performance is 40MB/sec throughput per TB of storage. He listed
issues to consider when acquiring discs. Discs are getting just too slow
and price per GB is flattening out. He offered with some predictions
no further increase in FC use but rather Serial Attached SCSI (SAS)
which will come with smaller form factors; SATA38 will be around for
a while but there will be no real improvement in performance. He
ended by describing Object Storage Devices (OSD) which he believes
will come in the coming years storage in a box and offering multiple
protocols.
René Brun, CERN
Report about HEPIX Spring 2006
18
Storage day (2)
•
Hardware Potpourri (Andrei Maslennikov) Andrei described what he called a
fat disc server contender. He compared what CERN requires for CASTOR
performance with what his configuration can achieve and he believes it could
satisfy the needs for CASTOR for a cheaper price.
•
GPFS and StoRM (Luca dell'Agnello)
•
Local File Systems (Peter Kelemen) Comparison of XFS and ext3.
•
AFS/OSD Project (Ludovico Giammarino) this is being developed in CASPUR
in conjunction with CERN and FZK. The principle goal is to improve AFS
performance and scalability.
•
WAN Access to a Distributed File System (Hartmut Reuter).
•
Disk to Tape Migration Introduction (Michael Ernst)
•
CASTOR 2 (Sebastien Ponce) a quick overview of CASTOR 2 and how it has
changed from version 1.
•
dCache (Patrick Fuhrmann)
•
HPSS (Andrei Moskalenko)
René Brun, CERN
Report about HEPIX Spring 2006
19
Virtual Servers for Windows (Alberto Pace)
•
Alberto started with a demo of creating a couple of virtual systems on his
desktop (one Windows, one Linux using SLC) and while they were being
created, he started the presentation with a history of how virtual computers
have long been a dream of computer scientists.
•
As the Intel X86 architecture is becoming by far the most commonly-found
system in our environments, running virtual X86 systems on real X86
systems is more attractive than previous implementations of virtual
computers.
•
In CERN there is an ever-increasing number of requests for dedicated
servers running individual applications or services. But limitations of space,
management overhead and the often-underused CPU load on many of these
servers makes virtualisation an interesting option.
•
The CERN team has built a number of different configurations of Windows
2003-based servers and Linux (both SLC3 and SLC4) virtual systems which
can be called up on demand. The scheme uses the Microsoft Virtual Hosting
Server. The user can configure the hardware down to the size of memory,
the presence of a floppy or CD/DVD, the number of discs, etc. He or she can
request use of the server for a finite time or long-term and more options
will be offered in the future.
René Brun, CERN
Report about HEPIX Spring 2006
20
Why Virtual servers
• More and more requests for dedicated servers in the CERN
computer centre
• Excellent network connectivity, to the internet and to the CERN
backbone (10 Gbit/s)
• Uninterruptible power supply
• 24x365 monitoring with operator presence
• Daily backup with fast tape drives
• Hardware maintenance, transparent for the “customer”
• Operating system maintenance, patches, security scans
• “customer” focus only on “his application”.
• Customer not willing to share his server with others, but ready to pay
lot of $$, €€, CHF
• Frame for this server hosting service:
• http://cern.ch/Win/Help/?kbid=251010
René Brun, CERN
Report about HEPIX Spring 2006
21
However, after an inside look …
• Installing and maintaining custom servers is time
consuming …
• Lot of management overhead
• Space in the computer centre is a scarce resource
• Several of these servers are underused
• Hardly more than 2-3 % CPU usage
• Excellent candidate for virtualization
René Brun, CERN
Report about HEPIX Spring 2006
22
Goal of virtualization
•
Clear separation of hardware management from Server (Software)
management
•
•
Hardware management
•
•
•
•
Could be even be made by independent teams
Ensure enough server hardware is globally available to satisfy the global CPU +
Storage demand
Manages a large pool of identical machines
Hardware maintenance
Server (Software) management
•
•
•
Manages server configuration
Allocates server images to machines in the pool
Plenty of optimization possible
• Automatic reallocation to different HW according to past performances
•
Little overhead
•
Emulation of PC on real PC is very efficient
René Brun, CERN
Report about HEPIX Spring 2006
23
Server on Demand
• Chose from a set of “predefined” images
•
•
•
•
•
•
Windows server 2003
Windows Server 2003 + IIS + Soap + Streaming
Windows Server 2003 + Terminal Server Services
…
Scientific Linux CERN 3 or 4
…
• Takes resources from the pool of available HW
• Multiple, different, OS can be hosted in the same box
• Available within 10 minutes
• Before: between one week and one months
• Cost: much cheaper, especially manpower
• Performances: unnoticeable difference
René Brun, CERN
Report about HEPIX Spring 2006
24
What’s next ?
• We can expect request for more “Server types”
• Various combinations of OS and applications
• We can expect request for custom server types
• User creates and manages his server images
• Future server on demand
•
•
•
•
“I need 20 servers with this image for one month”
“I need an image for this server replicated 10 times”
“I need more CPU / Memory for my server”
“I do not need my server for 2 months, give me an image I can reuse
later”
• “I need a test environment, OS version n+1, to which I can migrate my
current production services”
• I need 10 Macintosh instances …
• …
René Brun, CERN
Report about HEPIX Spring 2006
25
Conclusion
• Server virtualization a strategic direction for (windows) server
management at CERN
• HW and SW management can be independent
• We can expect consequences also for traditional batch systems
• Instead of allocating CPU time for jobs submitted for a rigid OS
configuration one could allocate bare “virtual PC time”
• User would submit “PC image hosting the job”. Farm independent of
OS, less security implication (for the farm management),
unprecedented flexibility for users
René Brun, CERN
Report about HEPIX Spring 2006
26
Scientific Linux
•
Status and Plans (Troy Dawson) Current usage of SL is at least
16,000 installations (total of SL3 and SL4). Fermilab itself is
standardising on SLF 4.2 and trying to phase out all the
unsupported distributions (those before SL3). They are gearing
up for SL5, although they are bound by Redhat release date for
RHEL485 and they realise it will not arrive in time to be
packaged and deployed before LHC startup. He asked if there
is a need long-term for Itanium releases or any other
architecture; the answer, at least from this audience, was no.
• SLC (Jarek Polok) 2100 individual SLC3 installations, 3559
centrally-managed installations and 2400 SLC3 installations
outside CERN. SLC 4.3 is just coming into use after its official
release at the beginning of April. As explained above, the
projected release date of RHEL 5 (only next year) means that
SLC4 will be the officially-supported release for LHC startup. It
is planned to start migrating to it on the central clusters in
September this year.
René Brun, CERN
Report about HEPIX Spring 2006
27
Security (Bob Cowles)
Bob covered a range of topics, starting with the
dangers and risks of Skype, especially of becoming a
Supernode when connected to a powerful network;
apparently this does not happen to systems behind
NAT49 boxes.
• Skype is banned at CERN and monitored at SLAC.
Turning to topical matters, service providers should
be concerned about the risks of a bird flu epidemic if
people start seriously to get infected and have to stay
home, how to run the operation; what happens if
they use infected home PCs to login?
•
• He displayed the list of some 30 passwords he had
sniffed during the week from among the HEPiX
attendees.
• He listed 10 tips to improve security (see overheads).
René Brun, CERN
Report about HEPIX Spring 2006
28
Passwords
•
SMTP
•
•
POP3
•
•
kastela3, Romania2, ecdMJee4dD,
baum2kid, ghbghb, 1@roma06,
ubc789, 84relax, 4q63wbg,
light2484, tDsfCxJs
•
ICQ
•
IMAP
•
Dadoes63, cal1pat0
dnow12i, Bruck5BD *
hoFK87, 1etsg0, 21
filipch ckmckmir,
obheyto, authum1808
R2gsumb0, rugbybear
v3sm9r-EGEE, k7u0na
Dad123Red345, 123456
Tuesday, ippin, nk0
René Brun, CERN
•
lworib4u, iosara44,
tuesday, ha66il33
gg147231, lalamisi
xircom12, power0
123stell, B7A8
FTP
•
!!
Report about HEPIX Spring 2006
29
Next Meeting
• Next meeting at Jefferson Lab, 9th October,
• followed by DESY in Spring 2007.
René Brun, CERN
Report about HEPIX Spring 2006
30