SAME: climatizzazione di un trattore agricolo

ANSYS HPC
for CFD Applications
Release 17.0
Agenda
ƒ High-Performance Computing – Motivazioni
ƒ Le soluzioni ANSYS HPC
ƒ Miglioramenti delle performance HPC per
ANSYS CFD R17.0
HPC – Motivazioni
ƒ A parità di complessità del modello, ridurre i tempi di design
Ÿ impatto sul time to market
ƒ A parità di tempo, possibilità di studiare modelli più complessi
Ÿ maggiore dettaglio di conoscenza sui propri prodotti
ƒ A parità di tempo e complessità del modello, possibilità di studiare più
varianti
Ÿ studi parametrici con analisi delle correlazioni input/output
3
HPC – Motivazioni
ƒ Necessità di studiare modelli più accurati e/o più complessi (high fidelity)
Ÿ passaggio da studio di componente a studio di sistema
Ÿ geometrie sempre più complicate e dettagliate
Ÿ griglie di calcolo più fitte
Ÿ maggiore dettaglio di conoscenza sui propri prodotti
Ÿ maggiore possibilità di sviluppo dei prodotti
4
HPC – Motivazioni
ƒ Necessità di applicare modelli numerici più avanzati
Ÿ transitori
Ÿ turbolenza
Ÿ combustione
Ÿ multifase
Ÿ ecc.
5
HPC – Motivazioni
ƒ Necessità di provare diverse configurazioni
Ÿ analisi di sensitività
Ÿ ottimizzazione
Ÿ robust design
6
HPC – Motivazioni
Financial ROI Results
I risultati dell’indagine indicano un notevole ritorno dell’investimento sull’HPC:
• $356 medi in ricavo per dollaro investito in HPC
• $38 medi in profitti (o risparmi sui costi) per dollaro investito in HPC
Source: IDC report “Creating Economic Models Showing the Relationship Between Investments
in HPC and the Resulting Financial ROI and Innovation”; October 2013, IDC #243296, Volume: 1.
7
LE SOLUZIONI ANSYS HPC
Interdisciplinarietà: unica soluzione, multi-fisica
ƒ Qualsiasi sia la richiesta di simulazione, ANSYS HPC fornisce la
capacità di calcolo parallelo richieste per accelerare il tempo di
soluzione e risolvere problemi con elevata accuratezza (high fidelity).
ƒ I solutori ANSYS in ambito meccanico,
fluidodinamico ed elettromagnetico, tra cui:
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
dinamica
esplicita,
ANSYS Mechanical
ANSYS Autodyn
ANSYS Fluent
ANSYS CFX
ANSYS Icepak
ANSYS Polyflow
utilizzano tutti le stesse licenze ANSYS HPC per essere eseguiti in
parallelo.
9
Courtesy
Courte
Cou
rtesy
sy
y of FCA Italy
Italy
y
ANSYS HPC Solutions at Every Scale
Scalability on
supercomputers
HPC cluster
appliances
pp
Efficiency
on
multi-core
orrkstat on
workstation
s
Le soluzioni ANSYS HPC
ƒ
ƒ
ƒ
ƒ
Per un singolo utente che vuole affrontare una
simulazione sulla propria workstation, un singolo
ANSYS HPC Pack permette l’accelerazione del calcolo
fino a 8 volte.
Per utenti che hanno accesso a grandi risorse HPC, gli
ANSYS HPC Packs possono essere combinati per
abilitare il calcolo parallelo su centinaia, o addirittura
migliaia, di cores.
512
128
32
8
1
2
3
4
5
6
7
HPC Packs per simulazione
HPC Workgroup
ƒ
ƒ
ƒ
Cores
abilitati
HPC (per processo)
HPC Pack
ƒ
32768
8192
2048
Offre la possibilità di avere grandi volumi di calcolo
parallelo per migliorare la produttività degli utenti.
Abilita un numero massimo totale di cores di calcolo (da
16 a 32768 sullo stesso server) al quale un team ha
accesso.
HPC Parametric Pack
ƒ
Moltiplica la disponibilità di licenze per le single
applicazioni, abilitando l’esecuzione simultanea di più
design points e consumando solo un set di licenze
applicativo per volta (solo via ANSYS Workbench).
11
ANSYS HPC Parametric Pack
ƒ
Le licenze ANSYS HPC Parametric Pack scalano la
possibilità da parte dell’utente ad eseguire
contemporaneamente
più
analisi
parametriche
all’interno di ANSYS Workbench.
ƒ
Una licenza ANSYS HPC Parametric Pack consente di
valutare fino a 4 design simultaneamente, senza
alcuna richiesta aggiuntiva di licenze applicativo (di
fatto sono moltiplicate le licenze “base”).
Tempo
Number of Simultaneous Design Points Enabled
64
32
(esempio: 4 design points)
Esecuzione sequenziale
16
Riduzione tempo
di calcolo
8
Esecuzione in simultanea
dp1
d
dp2
dp3
dp4
4
1
2
3
4
5 12
Number of HPC Parametric Pack Licenses
MIGLIORAMENTI DELLE
PERFORMANCE HPC PER
ANSYS CFD R17.0
13
Improved Parallel Performance & Scaling – CFX 17.0
ANSYS Application Example
Case Details:
•
•
•
•
•
Application
General flow
Airfoil
External Aerodynamic Flow
100 M hex elements
Single Domain
Turbulent Flow
R17 vs. R15:
>5X faster solution
@ 2048 cores
R17 vs. R16:
Solution time reduced by up to 39%
@ 4096 cores
Scaling to 25K nodes/core
Improved Parallel Performance & Scaling – CFX 17.0
ANSYS Application Example
32%
faster!
R17 vs. R16:
32% faster @ 4096 cores
Case Details:
Application
Mesh motion
• Automotive IC Engine Application
• 146 M nodes (380M elements:
tet/prism/pyramid)
• Single Domain
• Turbulent Flow
Improved Parallel Performance & Scaling – CFX 17.0
ANSYS Application Example
Case Details:
Application
Turbomachinery
• Full Turbine
• Steady (FR)
• 13 M nodes (hex)
• 256 cores Æ 50K nodes/core
• 4 Domains
• Casing, guide vanes, runner, draft tube
• Turbulent Flow
R17 vs. R16:
Absolute 5-10% faster
Minimal scaling change
Improved Parallel Performance & Scaling – CFX 17.0
ANSYS Application Example
Case Details:
Application
Turbomachinery
• Full Turbine
• Unsteady (TRS)
• 13 M nodes (hex)
• 256 cores Æ 50K nodes/core
• 4 Domains
• Casing, guide vanes, runner, draft tube
• Turbulent Flow
R17 vs. R16:
Absolute 10-30% faster
Speed-up @ 16 compute nodes
5.8X Æ 7X
Improved Parallel Performance & Scaling – CFX 17.0
ANSYS Features & Capabilities
Application
Turbomachinery
Background:
• Particular parallel performance issue on large
partition counts
Optimized source point performance
•
Improved efficiency with large numbers of
source points
"GaTurbineBlade" by Tomeasy - Own work by uploader; produced with Adobe
illustrator. Licensed under CC BY-SA 3.0 via Commons https://commons.wikimedia.org/wiki/File:GaTurbineBlade.svg#/media/File:GaTurbineBla
de.svg
Test case showing reduction in total CPU time when using large
numbers of source points (reduction of additional
computational cost of source points by as much as 70%)
Improved Parallel Performance & Scaling – CFX 17.0
ANSYS Features & Capabilities
Application
Radiation
Background:
• Problems modeling collimated radiation such as
headlights and solar irradiation use the Monte Carlo
solver. This solver needs to take full advantage of
HPC potential
Enhanced Monte Carlo Radiation model
• Optimized the model so that the total number of rays
(histories) remains consistent, independent of the
number of core partitions
ANSYS Application Example
Headlights, solar irradiation
• 2-pectral bands (multiband) participating
media; 5 radiation domains (2 fluid, 3 solid);
3.5 million elements of which 2.2 million
radiation elements
• Specified serial histories – 10 million
Complex headlamp case with 10 million ray
histories. Comparison when solving only
radiation and energy
Improved Parallel Performance & Scaling – CFX 17.0
ANSYS Features & Capabilities
I/O
Background:
• Time to read and write files to HPC for large and
complex cases with many regions/face sets could
significantly lengthen overall solution time
Optimized HPC I/O speedup
•
•
Optimization of CFX solver to HPC interface
resulted in a substantial speed-up
I/O time now nearly negligible even at 64
cores
Reduction in wall clock seconds for I/O on an
example test case with many regions
Miglioramenti delle performance HPC per ANSYS Fluent
21
Improved Parallel Performance & Scaling – Fluent 17.0
Robustness
ANSYS Features & Capabilities
No reordering
Not converged >200
iterations
Background:
• Fluent’s priority has been to deliver the best results,
not the fastest convergence
Conservative Coarsening Method default for
Pressure-based Coupled Solver:
• Especially helpful for native polyhedral meshes and/or
highly stretched cells
Algebraic multigrid solver now automatically
reorders the linear system
• Ensures proper ordering in multiple cell zones
(was limited to within a single cell zone)
RCM reordering
Converged in 94 iterations
Improved Parallel Performance & Scaling – Fluent 17.0
Partitioning
Faster METIS partitioning:
• Updated library and optimized algorithms deliver
significant partitioning speed-up for many larger cases,
particularly those with adapted meshes
• 64-bit indexing in METIS and for partition storage to
enable larger models
• Future proofed: Tested up to 2 billion cells!
Partition Time - Seconds
ANSYS Features & Capabilities
Combustor
tor 830M
830M Cells
Cells CRAY
CRAY
XE6
350
300
250
200
150
100
50
0
16.0.0
17.0.0
4096
141
111
8192
295
174
ANSYS Application Examples
Combustor:
• 40% faster to partition for 8192 cores
• Less than 3 minutes
Truck:
• 99% faster to partition for 512 cores
• Just 18 seconds (versus 36 minutes!!)
Auto Partition time - Seconds
Truck 134M Cells
2500,0
2000,0
1500,0
1000,0
500,0
0,0
102 204
4
8
16.0.0 923,1 2175, > 1 hour
17.0.0 18,2 15,8 18,5 27,4
256
512
409
6
51,7
Improved Parallel Performance & Scaling – Fluent 17.0
Partitioning
ANSYS Features & Capabilities
Background:
• DPM and combustion models pose challenges to
parallel performance as users attempt to loadbalance flow and physics calculations
New Option: Model-Weighted Partitioning
• Automatically weights multiple physics models
across the full set of processors within a specified
load imbalance tolerance
• Users can select the factors and relative weightings
• Turbulence, combustion, radiation, detailed
kinetic mechanism (25 species, 113 reactions)
• 60% faster for 128 cores (Just 82 seconds)
700
Time in Seconds
ANSYS Application Example
Oxy-Fuel Burner:
Oxy-fuel Burner, 1.9M hex cells
600
500
400
300
200
100
0
32
64
128 256 512 1024
Default
647,26314,59203,16112,15 65,05 37,1
Load Balance 198,08150,59 82,03 61,76 34,29 22,33
Improved Parallel Performance & Scaling – Fluent 17.0
Partitioning
ANSYS Features & Capabilities
Background:
• Partitions need to communicate with each other. Lack
Exhaust 33M Neighborhood Creation
of optimization can slow performance, especially for
moving/dynamic mesh cases where the neighborhood
needs to be updated frequently
interface identification for better performance and
completeness
• Better identification of interfaces improves robustness
ANSYS Application Example
Exhaust System:
• Speed-up from 1X to 30X depending on case and
number of cores
160
140
Time in seconds
Neighborhood Creation Optimization:
• Optimized communication algorithms and improved
180
120
100
80
60
40
20
0
128 256 512 1024 2048 4096 8192
16.0.0 7,828 4,75 6,219 7,882 17,07 52,63 156,4
17.0.0 3,844 2,539 1,866 1,838 2,346 2,793 5,749
Improved Parallel Performance & Scaling – Fluent 17.0
ANSYS Application Example
Case Details:
•
•
•
•
Application
General flow
External flow over a passenger sedan
d
Number of cells: 4 Million
Cell Type: Mixed
Models used: Standard K-HH turbulence
General solver scalability improvements
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Improved Parallel Performance & Scaling – Fluent 17.0
ANSYS Application Example
Case Details:
•
•
•
•
Application
General flow
Vehicle exhaust model
d l
Number of cells: 33 Million
Cell Type: Mixed
Models used: SST K-omega turbulence
Optimized Neighborhood Creation
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Improved Parallel Performance & Scaling – Fluent 17.0
Application
Mesh motion
Engine Crankcase Lubrication Model:
•
•
•
85% faster run time (<6 hours)
Faster than recent competitive benchmark
Crankshaft Rotation in a sliding mesh zone,
Piston motion through dynamic mesh
layering, Oil slosh modeled with VOF, 5M cell
Poly Mesh
Engine Crankcase
kca
ase Lubrication
Lubrication
Model
Total Run Time per One Cycle
Total Run Time (hrs)
ANSYS Application Example
20
15
10
5
0
16.0.0
17.0.0
48
18
13,85
96
14
9,26
Big speed-ups for moving dynamic mesh due to:
• Neighborhood optimization
• Sliding interface optimization
• Parallel solver optimization
Representative Illustration
192
10,83
5,86
Improved Parallel Performance & Scaling – Fluent 17.0
ANSYS Application Example
Case Details:
Application
Mesh motion,
combustion
• 4-stroke spray guided Gasoline Direct
•
•
•
•
Injection
Number of cells: 2 Million
Cell Type: Mixed
Models used: Standard K-HH turbulence
Moving mesh, Spray, Combustion
Big speed-ups for moving dynamic mesh due to:
• Neighborhood optimization
• Sliding interface optimization
• Parallel solver optimization
• Combustion code refactoring
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Improved Parallel Performance & Scaling – Fluent 17.0
ANSYS Application Example
Application
Multiphase
Case Details:
•
•
•
•
Circulating Fluidized
dB
Bed
d
Number of cells: 2 Million
Cell Type: Mixed
Models used: Laminar
Solid inlet
Gas inlet
General solver scalability improvements
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Improved Parallel Performance & Scaling – Fluent 17.0
ANSYS Application Example
Case Details:
•
•
•
•
Application
Multiphase
Wave loading on Oil
il Ri
Rig
Number of cells: 7 Million
Cell Type: Mixed
Models used: SST K-omega turbulence
General solver scalability improvements
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Improved Parallel Performance & Scaling – Fluent 17.0
ANSYS Application Example
Case Details:
•
•
•
•
•
Application
Combustion
Flow through a Combustor
b t
Number of cells: 12 Million
Cell Type: Polyhedra
Models used: Realizable K-HH turbulence
Species transport
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Improved Parallel Performance & Scaling – Fluent 17.0
ANSYS Application Example
Case Details:
•
•
•
•
Application
Aeroacoustics
External flow over aircraft
i
ft landing
l di gear
Number of cells: 15 Million
Cell Type: Mixed
Models used: LES
General solver scalability improvements
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Improved Parallel Performance & Scaling – Fluent 17.0
ANSYS Application Example
Case Details:
Application
Turbomachinery
• Single-stage Transonic
i axial-flow
i l fl FFan
•
•
•
•
General solver scalability improvements
Stator Row
Number of cells: 3 Million
Cell Type: Hexahedral
Models used: SST K-omega turbulence
Unsteady (sliding interfaces)
Ref: NASA-103800
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Improved Parallel Performance & Scaling – Fluent 17.0
ANSYS Application Example
Case Details:
Application
Turbomachinery,
multiphase
• Cavity flow in a centrifugal pump
• Number of cells: 2 Million
• Model used: Realizable K-HH turbulence
General solver scalability improvements
Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Optimized for the Latest HPC Architectures – Fluent 17.0
ANSYS Application Example
Case Details:
• 1.2 million cell pipe benchmark
Hardware Configuration:
• One node of XL250Gen9s with E5-2690v3, 128GBs 2133MHz memory and 2 NVIDIA K80s
GPU
Optimized for the Latest HPC Architectures – Fluent 17.0
ANSYS Application Example
GPU
Case Details:
• 9.6 million cell pipe benchmark
Hardware Configuration:
• Cluster of XL250Gen9s with E5-2690v3, 128GBs 2133MHz memory and 2 NVIDIA K80s/node