Bull`s view on the future of computin

Cartesius Opening
June, 14th, 2013
Jean-Marc DENIS
International Business Director
Extreme Computing Business Unit
© Bull, 2013
SurfSara Opening – June 14th
1
Cartesius (Renatus, 1596 – 1650)(*)
René Descartes (French: [ʁəne dekaʁt]; Latinized: Renatus Cartesius; adjectival form:
"Cartesian";[6] 31 March 1596 – 11 February 1650) was a French philosopher,
mathematician, and writer who spent most of his adult life in the Dutch Republic. He
has been dubbed the 'Father of Modern Philosophy'. Descartes' influence in
mathematics is equally apparent; the Cartesian coordinate system — allowing
reference to a point in space as a set of numbers, and allowing algebraic equations to
be expressed as geometric shapes in a two-dimensional coordinate system (and
conversely, shapes to be described as equations) — was named after him. He is credited
as the father of analytical geometry, the bridge between algebra and geometry, crucial
to the discovery of infinitesimal calculus and analysis. Descartes was also one of the key
figures in the Scientific Revolution and has been described as an example of genius.
Descartes was a major figure in 17th-century continental rationalism, later advocated
by Baruch Spinoza and Gottfried Leibniz, and opposed by the empiricist school of
thought consisting of Hobbes, Locke, Berkeley, Jean-Jacques Rousseau, and Hume.
Leibniz, Spinoza and Descartes were all well versed in mathematics as well as
philosophy, and Descartes and Leibniz contributed greatly to science as well.
He is perhaps best known for the philosophical statement "Cogito ergo sum" (French:
Je pense, donc je suis; English: I think, therefore I am), found in part IV of Discourse on
the Method (1637) and §7 of part I of Principles of Philosophy (1644).
La Haye en Touraine, the town was the birthplace of the philosopher René Descartes
(1596–1650), although his family home was in nearby Chatellerault. Descartes left La
Haye in approximately 1606 to attend the College Henri IV at La Fleche. The town was
renamed La Haye-Descartes in 1802 in his honor, and then renamed again to Descartes
in 1967.
(*) http://en.wikipedia.org/wiki/Ren%C3%A9_Descartes
© Bull, 2013
SurfSara Opening – June 14th
2
Cartesius (SurfSara, 2013 – … )
Phase 1 (2013)
271 TFlops
572 compute nodes
44800 GB memory
1071 TiB storage
IB FDR
© Bull, 2013
SurfSara Opening – June 14th
3
Phase 2 (2014)
1.349
1.652
112.512
6.964
IB
© Bull, 2013
SurfSara Opening – June 14th
Tflops (x5)
compute nodes (32 Fat & 1620 Thin) (x3)
GB Memory (x2,5)
TiB storage & 202 GB/s (x7)
FDR (no change)
4
Why ExaScale Computing?
Oil & Gas: better resource detection
Complexity of algorithm
1015
flops
Oil reservoir
discovered
10 PF
(10 16)
1000
Visco-elastic FWI
Petro-elastic inversion
100
Elastic FWI
Visco-elastic modeling
10
Unclear image
1 PF
(10 15)
Isotropic/anisotropic FWI
Elastic modeling/RTM
Isotropic/anisotropic RTM
Isotropic/anisotropic modeling
1
0,5
Non-significant
image
50 TF
0,1
Paraxial isotropic/anisotropic imaging (50x10
Asymptotic approximation imaging
12)
1995 2000 2005 2010 2015 2020
Industrial challenges in oil and gas:
depth imaging roadmap – courtesy IESP
Human brain project
Aircraft: complete multi-physics simulation
Capacity:
# of Overnight
Loads cases run
Unsteady
RANS
103
Courtesy AIRBUS France/IESP
105
21
1 Zeta (10 )
18
RANS
Low
Speed
1 Exa (10 )
15
1 Peta (10 )
RANS
High
Speed
106
1980
HS
Design
1990
Data
Set
2000
CFD-based
LOADS
& HQ
2010
‘Smart’ use of HPC power:
1 Tera (10 )
•
•
•
1 Giga (10 )
Algorithms
Data mining
Knowledge
2020
Aero
Optimisation
& CFD-CSM
106
102
104
Available
Computational
Capacity [Flop/s]
LE
S
12
9
2030
Full MDO
CFD-based
noise
simulation
Real-time
CFD-based
in flight
simulation
Capability achieved during one night batch
© Bull, 2013
SurfSara Opening – June 14th
5
(Some) Exascale challenges
1,000
x30
PFlops
30
x30
1
2010
© Bull, 2013
2015
SurfSara Opening – June 14th
2020
6
Addressing the Exascale Challenges
Optimize system Power Consumption (minimize PUE)
Develop new HPC processors
Fix the Memory wall  TeraBytes Bandwidth
Terabit interconnect (optical links everywhere)
Non-Volatile Memory (NV-RAM)  storage and fast memory
SW complexity: manageability, programming models
© Bull, 2013
SurfSara Opening – June 14th
7
Bull focus for ExaScale Computing
Power Consumption
Exponential increase
in number of cores
100 millions
of cores
In 2011, 50% of CIO claimed that none
of their compute tasks did use more
than 120 cores
#cores
20MW
MWatts
x20
1MW
1 PF
1000 PF
2012
2020
FLOPS
x1000
Average number of cores per supercomputer
(Top 20 of Top500)
© Bull, 2013
SurfSara Opening – June 14th
2020
exaflops
8
Bull research program for ExaScale Computing
Power Consumption
Exponential increase
in number of cores
PUE optimization
SW stack
Down to 1 + ε
(very) hot water
Adiabatic Computer room
Cogeneration
No wasted energy. Any piece of heat
is re-used
Supercomputer management
Power monitoring tools
Use the right HW for the right app
Application optimization
Save (a lot) on energy consumption
with (very) limited performance
degradation
OS
Communications (MPI but not only)
Batch
Affinity (cpu/mem/node/…)
Data management (filesystems)
Overpass current interconnect
limitations
Topology (ies)
RDMA mechanisms
Latency at large Scale
Programming model
(many) different programming
models: MIMD+SIMD
Languages
Reliability
Opportunities for Collaborations
© Bull, 2013
MTBF close to zero…
 automatic recovery mechanisms
SurfSara Opening – June 14th
9
Manageability at ExaScale
“The processor is the new transistor" (Chris Rowen)
•MPI, OpenMP, Threads, Cuda, OpenCL, ... Raise level
of
•Message passing, shared memory
abstraction
•Locality
Set of compute resources
Parallelism based compute resources
New high level programming languages
Optimize compute environment
• Describe key characteristics of applications
• Elect the most appropriate set of node types
• Manage resources with heuristics predicting the future workload
Migrate Processes
• Resource fragmentation reduction
• Hardware failures Prediction
Allow dynamic application frameworks
• Automatic application loadbalancing
• Meshes refinement optimization
• Restart lost processes in case of failure
© Bull, 2013
SurfSara Opening – June 14th
10
Programmability at ExaScale
Parallelism / Concurrency is easy to apprehend
… but much more complex to express in an application program
Distribute task and data to operate on
Old SMP approaches (bulk parallelism à la OpenMP) making a come back (cf MIC)
Old SIMD approaches (bulk parallelism à la CM2/CM5) making a come back (cf CUDA)
At Highest level Message passing (MPI-3)  Data decomposition
With increasing degree of parallelism hierarchical approach is necessary
© Bull, 2013
SurfSara Opening – June 14th
11
2020 exascale downscale to
departmental and Embedded computing  SME’s computing
By 2020
- Pflops in a rack
- TFlops in a chip
Number of nodes
Computation
(Flops & Inst.)
Memory Capacity
(B)
Global Memory
BW (B/s)
Interconnect
bisection BW
Storage Capacity
(B)
Storage BW (B/s)
IOP/s
Power Cons.
(W)
© Bull, 2013
PetaFlop
system
(2012)
ExaFlop /
data center
(2020)
PetaFlop/
departmental
(2020)
TeraFlop /
embedded
(2020)
[3-8],000
[50-200],000
(10x)
1 ExaFlop
(1000x)
> 100 PB
(1000x)
> 100 PB/s
(1000x)
~50 PB/s
(1000x)
>1 EB
(1000x)
> 10 TB/s
(1000x)
> 100 M
(1000x)
< 20 MW
(20x)
[50-100]
1
1 PetaFlop
1 TeraFlop
> 10^14
> 10^11
> 100 TB/s
> 100 GB/s
~10 TB/s
N/A
>1 PB
> 1 TB
> 10 GB/s
> 10 MB/s
> 100,000
> 100
< 20 KW
< 20 W
1 PetaFlop
[1-2]00 TB
[2-5] 00
TB/s
[5-10]0
TB/s
[1-10] PB
[10-500]
GB/s
100,000
[.5-1.] MW
SurfSara Opening – June 14th
12
Cogito Ergo Sum
Computa Ergo Sum
© Bull, 2013
SurfSara Opening – June 14th
13
Cogito Ergo Sum
Computo Ergo Sum
© Bull, 2013
SurfSara Opening – June 14th
14
© Bull, 2013
SurfSara Opening – June 14th
15