Architectures

10/8/2008
Introduction to High Performance
Computing
Jon Johansson
Academic ICT
University of Alberta
Copyright 2008, University of Alberta
Agenda
• What is High Performance Computing?
• What is a “supercomputer”?
supercomputer ?
• is it a mainframe?
•
•
•
•
•
Supercomputer architectures
Who has the fastest computers?
Speedup
Programming for parallel computing
The GRID??
Copyright 2008, University of Alberta
1
10/8/2008
High Performance Computing
• HPC is the field that concentrates on developing
supercomputers and software to run on
supercomputers
• a main area of this discipline is developing parallel
processing algorithms and software
• programs that can be divided into little pieces so that each
piece can be executed simultaneously by separate
processors
p
Copyright 2008, University of Alberta
High Performance Computing
• HPC is about “big problems”, i.e. need:
• lots of memory
• many cpu cycles
• big hard drives
• no matter what field you work in, perhaps your
research would benefit by making problems “larger”
• 2d → 3d
• finer mesh
• increase
i
number
b off elements
l
t iin th
the simulation
i l ti
Copyright 2008, University of Alberta
2
10/8/2008
Grand Challenges
•
•
•
•
•
•
•
•
weather forecasting
g
economic modeling
computer-aided design
drug design
exploring the origins of the universe
searching for extra-terrestrial life
computer vision
nuclear power and weapons simulations
Copyright 2008, University of Alberta
Grand Challenges – Protein
To simulate the folding of
a 300 amino acid protein in water:
# of atoms: ~ 32,000
,
folding time: 1 millisecond
# of FLOPs: 3 × 1022
Machine Speed: 1 PetaFLOP/s
Simulation Time: 1 year
(Source: IBM Blue Gene Project)
Ken Dil and Kit Lau’s protein folding model.
IBM’s answer: The Blue Gene Project
US$ 100 M of funding to build a
1 PetaFLOP/s computer
Charles L
Brooks III,
Scripps
Research
Institute
Copyright 2008, University of Alberta
3
10/8/2008
Grand Challenges - Nuclear
• National Nuclear Security
Administration
• http://www.nnsa.doe.gov/
• use supercomputers to run
three-dimensional codes to
simulate instead of test
• address critical problems of
materials aging
• simulate the environment of
the weapon and try to
gauge whether the device
continues to be usable
• stockpile science, molecular
dynamics and turbulence
calculations
Copyright 2008, University of Alberta
http://archive.greenpeace.org/comms/nukes/fig05.gif
Grand Challenges - Nuclear
• March 7, 2002: first fullsystem three-dimensional
simulations of a nuclear
weapon explosion
• simulation used more than
480 million cells (grid:
780x780x780)
• if the grid is a cube
• 1,920 processors on IBM
ASCI White at the Lawrence
Livermore National
laboratoryy
• 2,931 wall-clock hours
or 122.5 days
• 6.6 million CPU hours
ASCI White
Test shot “Badger”
Nevada Test Site – Apr. 1953
Yield: 23 kilotons
Copyright 2008, University of Alberta
http://nuclearweaponarchive.org/Usa/Tests/Upshotk.html
4
10/8/2008
Grand Challenges - Nuclear
•
•
Advanced Simulation and Computing Program (ASC)
http://www.llnl.gov/asc/asc_history/asci_mission.html
Copyright 2008, University of Alberta
Agenda
• What is High Performance Computing?
• What
Wh t iis a “supercomputer”?
“
t ”?
• is it a mainframe?
•
•
•
•
•
Supercomputer architectures
Who has the fastest computers?
Speedup
Programming for parallel computing
The GRID??
Copyright 2008, University of Alberta
5
10/8/2008
What is a “Mainframe”?
• large and reasonably fast machines
• the speed isn
isn'tt the most important characteristic
• high-quality internal engineering and resulting proven
reliability
• expensive but high-quality technical support
• top-notch security
• strict backward compatibility for older software
Copyright 2008, University of Alberta
What is a “Mainframe”?
• these machines can, and do, run successfully for
years without interruption (long uptimes)
• repairs can take place while the mainframe continues
to run
• the machines are robust and dependable
• IBM coined a term advertise the robustness of their
mainframe computers :
• Reliability,
Reliability Availability and Serviceability (RAS)
Copyright 2008, University of Alberta
6
10/8/2008
What is a “Mainframe”?
•
Introducing IBM System z9 109
•
•
Designed for the On Demand
B i
Business
IBM is delivering a holistic
approach to systems design
•
•
•
•
Designed and optimized with a
total systems approach
Helps keep your applications
running with enhanced
protection against planned and
unplanned outages
Extended security capabilities
for even greater protection
capabilities
Increased capacity with more
available engines per server
Copyright 2008, University of Alberta
What is a Supercomputer??
• at any point in time the term “Supercomputer” refers
to the fastest machines currently available
• a supercomputer this year might be a mainframe in a
couple of years
• a supercomputer is typically used for scientific and
engineering applications that must do a great amount
of computation
Copyright 2008, University of Alberta
7
10/8/2008
What is a Supercomputer??
• the most significant difference between a
supercomputer and a mainframe:
• a supercomputer channels all its power into executing a few
programs as fast as possible
• if the system crashes, restart the job(s) – no great harm
done
• a mainframe uses its power to execute many programs
simultaneously
• e.g. – a banking system
• must run reliably for extended periods
Copyright 2008, University of Alberta
What is a Supercomputer??
• to see the worlds “fastest” computers look at
• http://www.top500.org/
http://www top500 org/
• measure performance with the Linpack benchmark
• http://www.top500.org/lists/linpack.php
• solve a dense system of linear equations
• the performance numbers give a good indication of peak
performance
Copyright 2008, University of Alberta
8
10/8/2008
Terminology
• combining a number of processors to run a
program is
i called
ll d variously:
i
l
• multiprocessing
• parallel processing
• coprocessing
Copyright 2008, University of Alberta
Terminology
• parallel computing – harnessing a bunch of
processors on the
th same machine
hi tto run your
computer program
• note that this is one machine
• generally a homogeneous architecture
• same processors, memory, operating system
• all the machines in the Top 500 are in this
category
Copyright 2008, University of Alberta
9
10/8/2008
Terminology
• cluster:
• a set of generally homogeneous machines
• originally
i i ll b
built
ilt using
i llow-costt commodity
dit
hardware
• to increase density, clusters are now
commonly build with 1-u rack servers or
blades
• can use standard network interconnect or
high performance interconnect such as
I fi ib d or M
Infiniband
Myrinet
i t
• cluster hardware is becoming quite
specialized
• thought of as a single machine with a name,
e.g. “glacier” – glacier.westgrid.ca
Copyright 2008, University of Alberta
Terminology
• distributed computing - harnessing a bunch
off processors on different
diff
t machines
hi
tto run
your computer program
• heterogeneous architecture
• different operating systems, cpus, memory
• the terms “parallel” and “distributed”
computing
ti are often
ft used
d interchangeably
i t h
bl
• the work is divided into sections so each
processor does a unique piece
Copyright 2008, University of Alberta
10
10/8/2008
Terminology
• some distributed computing projects are built
on BOINC (B
(Berkeley
k l O
Open Infrastructure
I f t t
for
f
Network Computing):
• SETI@home – Search for Extraterrestrial
Intelligence
• Proteins@home – deduces DNA sequence,
given a p
g
protein
• Hydrogen@home – enhance clean energy
technology by improving hydrogen production and
storage (this is beta now)
Copyright 2008, University of Alberta
Terminology
• “Grid” computing
• a Grid is a cluster of
supercomputers
• in the ideal case:
• we submit our job with resource
requirements
• the job is run on a machine with
available resources
• we get results back
• NOTE: we don’t care where the
resources are, just that the job is run.
Copyright 2008, University of Alberta
11
10/8/2008
Terminology
• “Utility” computing
• computation and storage facilities are
provided as a commercial service
• charges are for resources actually used
– “Pay and Use computing”
• “Cloud” computing
• aka “on-demand computing”
• any IT-related capability can be provided as
a “service”
• repackages grid computing and utility
computing
• users can access computing resources
in the “Cloud” – i.e. out in the Internet
Copyright 2008, University of Alberta
How to Measure Speed?
• count the number of “floating point operations”
required to solve the problem
• +-x /
• results of the benchmark are so many Floating point
Operations Per Second (FLOPS)
• a supercomputer is a machine that can provide a
very large number of FLOPS
Copyright 2008, University of Alberta
12
10/8/2008
Floating Point Operations
• multiply 2 1000x1000 matrices
• for each resulting
g array
y element
• 1000 multiplies
• 999 adds
• do this 1,000,000 times
• ~109 operations needed
• increasing array size has the
number of operations increasing
as O(N3)
⎡1
⎢2
⎢
⎢ ...
⎢
⎣N
2 ... N ⎤ ⎡ 1
⎥⎢ 2
⎥⎢
⎥ ⎢ ...
⎥⎢
⎦⎣N
2 ... N ⎤
⎥
⎥
⎥
⎥
⎦
Copyright 2008, University of Alberta
Agenda
• What is High Performance Computing?
• What
Wh t is
i a “supercomputer”?
“
t ”?
• is it a mainframe?
•
•
•
•
•
Supercomputer architectures
Who has the fastest computers?
Speedup
Programming for parallel computing
The GRID??
Copyright 2008, University of Alberta
13
10/8/2008
High Performance Computing
• supercomputers use many CPUs to do the work
• note that all supercomputing architectures have
• processors and some combination cache
• some form of memory and IO
• the processors are separated from the other processors by
some distance
• there are major differences in the way that the parts
are connected
• some problems fit into different architectures better
than others
Copyright 2008, University of Alberta
High Performance Computing
• increasing computing power available to
researchers
h
allows
ll
•
•
•
•
increasing problem dimensions
adding more particles to a system
increasing the accuracy of the result
improving experiment turnaround time
Copyright 2008, University of Alberta
14
10/8/2008
Flynn’s Taxonomy
• Michael J. Flynn (1972)
• classified computer architectures based on the
number of concurrent instructions and data streams
available
• single instruction, single data (SISD) – basic old PC
• multiple instruction, single data (MISD) – redundant systems
• single instruction, multiple data (SIMD) – vector (or array)
processor
• multiple instruction,
instruction multiple data (MIMD) – shared or
distributed memory systems: symmetric multiprocessors and
clusters
• common extension:
• single program (or process), multiple data (SPMD)
Copyright 2008, University of Alberta
Architectures
• we can also classify supercomputers
according
di tto h
how th
the processors and
d memory
are connected
• couple processors to a single large memory
address space
• couple computers, each with its own memory
address space
p
Copyright 2008, University of Alberta
15
10/8/2008
Architectures
• Symmetric
Multiprocessing (SMP)
• Uniform Memory Access
(UMA)
• multiple CPUs, residing
in one cabinet, share the
same memory
• processors and memory
are tightly coupled
• the processors share
memory and the I/O bus
or data path
Copyright 2008, University of Alberta
Architectures
• SMP
• a single copy of the
operating system is in
charge of all the
processors
• SMP systems range from
two to as many as 32 or
more processors
Copyright 2008, University of Alberta
16
10/8/2008
Architectures
• SMP
• "capability computing"
• one CPU can use all the
memory
• all the CPUs can work on
a little memory
• whatever you need
Copyright 2008, University of Alberta
Architectures
• UMA-SMP negatives
• as the number of CPUs get large the buses
become saturated
• long wires cause latency problems
Copyright 2008, University of Alberta
17
10/8/2008
Architectures
• Non-Uniform Memory Access (NUMA)
• NUMA is similar to SMP - multiple CPUs share a single
memory space
• hardware support for shared memory
• memory is separated into close and distant banks
• basically a cluster of SMPs
• memory on the same processor board as the CPU (local
memory) is accessed faster than memory on other processor
boards (shared memory)
• hence "non-uniform"
• NUMA architecture scales much better to higher numbers of
CPUs than SMP
Copyright 2008, University of Alberta
Architectures
Copyright 2008, University of Alberta
18
10/8/2008
Architectures
University of Alberta SGI Origin
SGI NUMA cables
Copyright 2008, University of Alberta
Architectures
• Cache Coherent NUMA (ccNUMA)
• each CPU has an associated cache
• ccNUMA machines use special-purpose hardware to
maintain cache coherence
• typically done by using inter-processor communication
between cache controllers to keep a consistent memory
image when the same memory location is stored in more
than one cache
• ccNUMA performs poorly when multiple processors attempt
to access the same memory area in rapid succession
Copyright 2008, University of Alberta
19
10/8/2008
Architectures
Distributed Memory
Multiprocessor (DMMP)
• each
h computer
t h
has itits own
memory address space
• looks like NUMA but there is
no hardware support for
remote memory access
•
the special purpose
switched network is
replaced by a general
purpose network such as
Ethernet or more
specialized interconnects:
• Infiniband
• Myrinet
Lattice: Calgary’s HP ES40 and ES45
cluster – each node has 4 processors
Copyright 2008, University of Alberta
Architectures
• Massively Parallel Processing (MPP) Cluster of
commodity PCs
• processors and memory are loosely coupled
• "capacity computing"
• each CPU contains its own memory and copy of the
operating system and application.
• each subsystem communicates with the others via a highspeed interconnect.
y, a problem
p
must be
• in order to use MPP effectively,
breakable into pieces that can all be solved simultaneously
Copyright 2008, University of Alberta
20
10/8/2008
Architectures
Copyright 2008, University of Alberta
Architectures
• lots of “how to build a cluster” tutorials on the
web
b – just
j tG
Google:
l
• http://www.beowulf.org/
• http://www.cacr.caltech.edu/beowulf/tutorial/b
uilding.html
Copyright 2008, University of Alberta
21
10/8/2008
Architectures
• Vector Processor or Array Processor
• a CPU design that is able to run mathematical operations on
multiple data elements simultaneously
• a scalar processor operates on data elements one at a
time
• vector processors formed the basis of most supercomputers
through the 1980s and into the 1990s
• “pipeline” the data
Copyright 2008, University of Alberta
Architectures
• Vector Processor or Array Processor
p
on many
yp
pieces of data simultaneously
y
• operate
• consider the following add instruction:
• C=A+B
• on both scalar and vector machines this means:
• add the contents of A to the contents of B and put the sum in C'
• on a scalar machine the operands are numbers
• on a vector machine the operands are vectors and the
instruction directs the machine to compute the pair-wise sum of
each pair of vector elements
Copyright 2008, University of Alberta
22
10/8/2008
Architectures
• University of Victoria has 4 NEC
SX-6/8A vector processors
p
• in the School of Earth and Ocean
Sciences
• each has 32 GB of RAM
• 8 vector processors in the box
• peak performance is 72 GFLOPS
Copyright 2008, University of Alberta
Agenda
• What is High Performance Computing?
• What
Wh t is
i a “supercomputer”?
“
t ”?
• is it a mainframe?
•
•
•
•
•
Supercomputer architectures
Who has the fastest computers?
Speedup
Programming for parallel computing
The GRID??
Copyright 2008, University of Alberta
23
10/8/2008
BlueGene/L
• The fastest on the Nov. 2007 top 500 list:
• http://www.top500.org/
http://www top500 org/
• installed at the Lawrence Livermore National
Laboratory (LLNL) (US Department of
Energy)
• Livermore California
Copyright 2008, University of Alberta
http://www.llnl.gov/asc/platforms/bluegenel/photogallery.html
Copyright 2008, University of Alberta
24
10/8/2008
BlueGene/L
• processors: 212992
• memory: 72 TB
• 104 racks – each has 2048 processors
• the first 64 had 512 GB of RAM (256
MB/processor)
• the 40 new racks have 1 TB of RAM (512
MB/processor)
• a Linpack performance of 478.2 TFlop/s
• in Nov 2005 it was the only system ever to
exceed the 100 TFlop/s mark
• there are now 10 machines over 100 TFlop/s
Copyright 2008, University of Alberta
The Fastest Five
Site
Computer
Cores
Year
Rmax (Gflops)
Rpeak (Gflops)
DOE/NNSA/LANL
Roadrunner – BladeCenter
QS22/LS21 Cluster
Cell/Opteron
122400
2008
1,026,000
1,375,780
212992
2007
478,200
596,378
163840
2007
450,300
557,060
62976
2008
326,000
503,810
30976
2008
205,000
260,000
United States
IBM
DOE/NNSA/LLNL
United States
BlueGene/L - eServer Blue
Gene Solution
IBM
Argonne National
Laboratory
United States
Texas Advanced
Computing
Center/Univ. of
Texas
BlueGene/P Solution
IBM
Ranger – SunBlade x6420,
Opteron Quad 2 GHz
SGI
United States
DOE/Oakridge
National
Laboratory
United States
Jaguar – Cray XT4 QuadCore
Opteron 2.1 GHz
Hewlett-Packard
Copyright 2008, University of Alberta
25
10/8/2008
# of Processors with Time
The number of processors in the fastest machines has
increased by about a factor of 200 in the last 15 years
Copyright 2008, University of Alberta
# of Gflops Increase with Time
O Petaflop!
One
P t fl !
Machine speed has increased by more than a factor of
15000 since 1993
“Roadrunner” tests at > 1 petaflop for June 2008
Copyright 2008, University of Alberta
26
10/8/2008
Future BlueGene
Copyright 2008, University of Alberta
Roadrunner
• cores: 122400
• 6,562 Opteron dual-core, 12,240 Cell
•
•
•
•
memory: 98 TB
278 racks
a Linpack performance of 1026.00 TFlop/s
in June 2008 it was the only system ever to
exceed the 1 PetaFlop/s mark
• cost: $100 million
• weight: 500,000 lbs
• power: 2.35 (or 3.9) megawatts
Copyright 2008, University of Alberta
27
10/8/2008
Roadrunner
Copyright 2008, University of Alberta
Agenda
• What is High Performance Computing?
• What
Wh t is
i a “supercomputer”?
“
t ”?
• is it a mainframe?
•
•
•
•
•
Supercomputer architectures
Who has the fastest computers?
Speedup
Programming for parallel computing
The GRID??
Copyright 2008, University of Alberta
28
10/8/2008
Speedup
• how can we measure how much faster our program runs
when using
g more than one p
processor?
T1
• define Speedup S as:
• the ratio of 2 program execution times
• constant problem size
S=
TP
• T1 is the execution time for the problem on a single
processor (use the “best” serial time)
• TP is the execution time for the problem on P processors
Copyright 2008, University of Alberta
Speedup
• Linear speedup
p
p
• the time to execute the
problem decreases by
the number of processors
• if a job requires 1 week
with 1 processor it will
take less that 10 minutes
with 1024 processors
Copyright 2008, University of Alberta
29
10/8/2008
Speedup
• Sublinear speedup
• the usual case
• there are generally some
limitations to the amount
of speedup that you get
• communication
Copyright 2008, University of Alberta
Speedup
• Superlinear speedup
• very rare
• memory access patterns
may allow this for some
algorithms
Copyright 2008, University of Alberta
30
10/8/2008
Speedup
• why do a speedup test?
• it’s hard to tell how a
program will behave
• e.g.
• “Strange” is actually fairly
common behaviour for untuned code
• in this case:
• linear speedup to ~10
cpus
• after 24 cpus speedup
is starting to decrease
Copyright 2008, University of Alberta
Speedup
• to use more processors
efficiently
ffi i tl change
h
this
thi
behaviour
• change loop structure
• adjust algorithms
• ??
• run jobs with 10-20
processors so the machines
are used efficiently
Copyright 2008, University of Alberta
31
10/8/2008
Speedup
• one class of jobs that have linear speed up are called
“embarrassingly
embarrassingly parallel”
parallel
• a better name might be “perfectly” parallel
• doesn’t take much effort to turn the problem into a
bunch of parts that can be run in parallel:
• parameter searches
• rendering the frames in a computer animation
• brute force searches in cryptography
Copyright 2008, University of Alberta
Speedup
• we have been discussing Strong Scaling
• the problem size is fixed and we increase the number of
processors
• decrease computational time (Amdahl Scaling)
• the amount of work available to each processor decreases
as the number of processors increases
• eventually, the processors are doing more communication
than number crunching and the speedup curve flattens
• difficult
diffi lt tto h
have hi
high
h efficiency
ffi i
ffor llarge numbers
b
off
processors
Copyright 2008, University of Alberta
32
10/8/2008
Speedup
• we are often interested in Weak Scaling
• double the problem size when we double the number of
processors
• constant computational time (Gustafson scaling)
• the amount of work for each processor has stays roughly
constant
• parallel overhead is (hopefully) small compared to the real
work the processor does
• e.g. Weather prediction
Copyright 2008, University of Alberta
Amdahl’s Law
• Gene Amdahl: 1967
parallelize some of the
• p
program – some must
remain serial
• f is the fraction of the
calculation that is serial
• 1-f is the fraction of the
calculation that is parallel
• the maximum speedup that
can be obtained by using P
processors is:
parallel
serial
f
1-f
S max =
1
(1 − f )
f+
P
Copyright 2008, University of Alberta
33
10/8/2008
Amdahl’s Law
• if 25% of the calculation must remain serial
th b
the
bestt speedup
d you can obtain
bt i iis 4
• need to parallelize as much of the program as
possible to get the best advantage from
multiple processors
Copyright 2008, University of Alberta
Agenda
• What is High Performance Computing?
• What
Wh t is
i a “supercomputer”?
“
t ”?
• is it a mainframe?
•
•
•
•
•
Supercomputer architectures
Who has the fastest computers?
Speedup
Programming for parallel computing
The GRID??
Copyright 2008, University of Alberta
34
10/8/2008
Parallel Programming
• need to do something to your program to use
multiple processors
• need to incorporate commands into your
program which allow multiple threads to run
• one thread per processor
• each thread gets a piece of the work
• several ways (APIs) to do this …
Copyright 2008, University of Alberta
Parallel Programming
• OpenMP
• introduce statements into your code
• in C:
• in FORTRAN:
#pragma
C$OMP or !$OMP
• can compile serial and parallel executables from the
same source code
• restricted to shared memory machines
• not clusters!
• www.openmp.org
Copyright 2008, University of Alberta
35
10/8/2008
Parallel Programming
• OpenMP
• demo: MatCrunch
• mathematical operations on the elements of an array
• introduce 2 OMP directives before a loop
• # pragma omp parallel // define a parallel section
• # pragma omp for
// loop is to be parallel
• serial section:
4.03 sec
• parallel section – 1 cpu: 40.27 secs
• parallel
ll l section
ti – 2 cpu: 20
20.25
25 secs
• speedup = 1.99 // not bad for adding 2 lines
Copyright 2008, University of Alberta
Parallel Programming
• for a larger number of
processors the
speedup for
MatCrunch is not
linear
• need to do the
speedup test to see
how your program will
behave
Copyright 2008, University of Alberta
36
10/8/2008
Parallel Programming
• MPI (Message Passing Interface)
• a standard set of communication subroutine libraries
• works for SMPs and clusters
• programs written with MPI are highly portable
• information and downloads
•
•
•
•
http://www.mpi-forum.org/
MPICH: http://www-unix.mcs.anl.gov/mpi/mpich/
LAM/MPI: http://www.lam-mpi.org/
O
Open
MPI:
MPI http://www.open-mpi.org/
htt //
i
/
Copyright 2008, University of Alberta
Parallel Programming
• MPI (Message Passing Interface)
• supports
t the
th SPMD,
SPMD single
i l program multiple
lti l
data model
• all processors use the same program
• each processor has its own data
• think of a cluster – each node is getting a
copy
py of the p
program
g
but running
g a specific
p
portion of it with its own data
Copyright 2008, University of Alberta
37
10/8/2008
Parallel Programming
• starting mpi jobs is not
standard
• for mpich2 use “mpiexec”
• start a job with 6
processes
• 6 copies of the program
run in the default
Communicator Group
“MPI_COMM_WORLD”
• each process has an ID
– its “rank”
Copyright 2008, University of Alberta
Parallel Programming
• example: start N
processes to calculate
N-1 factorial
• 0! = 1
• 1! = 1
• 2! = 2 x 1 = 2
• 3! = 3 x 2 x 1 = 6
• …
• n! = n x (n-1) x…x 2 x 1
Copyright 2008, University of Alberta
38
10/8/2008
Parallel Programming
• generally the master process will:
•
•
•
•
•
send work to other processes
receive results from processes that complete
send more work to those processes
do final calculations
output results
• d
designing
i i an efficient
ffi i t algorithm
l ith for
f allll thi
this iis
up to you
Copyright 2008, University of Alberta
Parallel Programming
• it’s possible to combine OpenMP and MPI for
running on clusters of SMP machines
• the trick in parallel programming is to keep all
the processors
• working (“load balancing”)
• working on data that no other processor needs to
touch (there aren’t any cache conflicts)
• parallel programming is generally harder than
serial programming
Copyright 2008, University of Alberta
39
10/8/2008
Agenda
• What is High Performance Computing?
• What
Wh t is
i a “supercomputer”?
“
t ”?
• is it a mainframe?
•
•
•
•
•
Supercomputer architectures
Who has the fastest computers?
Speedup
Programming for parallel computing
The GRID??
Copyright 2008, University of Alberta
Grid Computing
• A computational grid:
• is a large-scale distributed computing infrastructure
• composed of geographically distributed
distributed, autonomous
resource providers
• lots of computers joined together
• requires excellent networking that supports resource
sharing and distribution
• offers access to all the resources that are part of the grid
• compute cycles
• storage capacity
• visualization/collaboration
• is intended for integrated and collaborative use by multiple
organizations
Copyright 2008, University of Alberta
40
10/8/2008
Grids
• Ian Foster (the “Father of the Grid”) says that to be a
Grid three points must be met
• computing resources are not administered centrally
• many sites connected
• open standards are used
• not a proprietary system
• non-trivial quality of service is achieved
• it is available most of the time
• CERN says a Grid is “a service for sharing computer
power and data storage capacity over the Internet”
Copyright 2008, University of Alberta
Canadian Academic Computing Sites
in 2000
Copyright 2008, University of Alberta
41
10/8/2008
Canadian Grids
• Some sites in Canada have tied their resources together to form
7 Canadian Grid Consortia:
• ACENET
• CLUMEQ
• SCINET
• HPCVL
• RQCHP
• SHARCNET
• WESTGRID
Atlantic Computational Excellence Network
Consortium Laval UQAM McGill and Eastern
Quebec for High Performance Computing
University of Toronto
High Performance Computing Virtual
Laboratory
Reseau Quebecois de calcul de haute performance
Shared Hierarchical Academic Research
Computing Network
Alberta, British Columbia
Copyright 2008, University of Alberta
WestGrid
SFU Campus
Edmonton
Calgary
UBC Campus
Copyright 2008, University of Alberta
42
10/8/2008
Grids
• the ultimate goal of the Grid idea is to have a system
that you can submit a job to, so that:
• your job uses resources that fit requirements that you specify
ƒ 128 nodes on an SMP
ƒ 200 GB of RAM
• or
ƒ 256 nodes on a PC cluster
ƒ 1 GB/processor
• when done the results come back to yyou
• you don’t care where the job runs
• Vancouver or St. John’s or in between
Copyright 2008, University of Alberta
Sharing Resources
• HPC resources are not available quite as readily as
your desktop computer
• the resources must be shared fairly
• the idea is that each person get as much of the resource as
necessary to run their job for a “reasonable” time
• if the job can’t finish in the allotted time the job needs to
“checkpoint”
• save enough
g information to begin
g running
g again
g
from
where it left off
Copyright 2008, University of Alberta
43
10/8/2008
Sharing Resources
• Portable Batch System
(T
(Torque)
)
• submit a job to PBS
• job is placed in a queue
with other users’ jobs
• jobs in the queue are
prioritized by a scheduler
• your job executes at
some time in the future
An HPC Site
Copyright 2008, University of Alberta
Sharing Resources
• When connecting to a
Grid we need a layer of
“middleware” tools to
securely access the
resources
• Globus is one example
A Grid of HPC Sites
• http://www.globus.org/
http://www globus org/
Copyright 2008, University of Alberta
44
10/8/2008
Questions?
Many details in other sessions of this
seminar series!
Copyright 2008, University of Alberta
45