ANSYS HPC for CFD Applications Release 17.0 Agenda High-Performance Computing – Motivazioni Le soluzioni ANSYS HPC Miglioramenti delle performance HPC per ANSYS CFD R17.0 HPC – Motivazioni A parità di complessità del modello, ridurre i tempi di design impatto sul time to market A parità di tempo, possibilità di studiare modelli più complessi maggiore dettaglio di conoscenza sui propri prodotti A parità di tempo e complessità del modello, possibilità di studiare più varianti studi parametrici con analisi delle correlazioni input/output 3 HPC – Motivazioni Necessità di studiare modelli più accurati e/o più complessi (high fidelity) passaggio da studio di componente a studio di sistema geometrie sempre più complicate e dettagliate griglie di calcolo più fitte maggiore dettaglio di conoscenza sui propri prodotti maggiore possibilità di sviluppo dei prodotti 4 HPC – Motivazioni Necessità di applicare modelli numerici più avanzati transitori turbolenza combustione multifase ecc. 5 HPC – Motivazioni Necessità di provare diverse configurazioni analisi di sensitività ottimizzazione robust design 6 HPC – Motivazioni Financial ROI Results I risultati dell’indagine indicano un notevole ritorno dell’investimento sull’HPC: • $356 medi in ricavo per dollaro investito in HPC • $38 medi in profitti (o risparmi sui costi) per dollaro investito in HPC Source: IDC report “Creating Economic Models Showing the Relationship Between Investments in HPC and the Resulting Financial ROI and Innovation”; October 2013, IDC #243296, Volume: 1. 7 LE SOLUZIONI ANSYS HPC Interdisciplinarietà: unica soluzione, multi-fisica Qualsiasi sia la richiesta di simulazione, ANSYS HPC fornisce la capacità di calcolo parallelo richieste per accelerare il tempo di soluzione e risolvere problemi con elevata accuratezza (high fidelity). I solutori ANSYS in ambito meccanico, fluidodinamico ed elettromagnetico, tra cui: dinamica esplicita, ANSYS Mechanical ANSYS Autodyn ANSYS Fluent ANSYS CFX ANSYS Icepak ANSYS Polyflow utilizzano tutti le stesse licenze ANSYS HPC per essere eseguiti in parallelo. 9 Courtesy Courte Cou rtesy sy y of FCA Italy Italy y ANSYS HPC Solutions at Every Scale Scalability on supercomputers HPC cluster appliances pp Efficiency on multi-core orrkstat on workstation s Le soluzioni ANSYS HPC Per un singolo utente che vuole affrontare una simulazione sulla propria workstation, un singolo ANSYS HPC Pack permette l’accelerazione del calcolo fino a 8 volte. Per utenti che hanno accesso a grandi risorse HPC, gli ANSYS HPC Packs possono essere combinati per abilitare il calcolo parallelo su centinaia, o addirittura migliaia, di cores. 512 128 32 8 1 2 3 4 5 6 7 HPC Packs per simulazione HPC Workgroup Cores abilitati HPC (per processo) HPC Pack 32768 8192 2048 Offre la possibilità di avere grandi volumi di calcolo parallelo per migliorare la produttività degli utenti. Abilita un numero massimo totale di cores di calcolo (da 16 a 32768 sullo stesso server) al quale un team ha accesso. HPC Parametric Pack Moltiplica la disponibilità di licenze per le single applicazioni, abilitando l’esecuzione simultanea di più design points e consumando solo un set di licenze applicativo per volta (solo via ANSYS Workbench). 11 ANSYS HPC Parametric Pack Le licenze ANSYS HPC Parametric Pack scalano la possibilità da parte dell’utente ad eseguire contemporaneamente più analisi parametriche all’interno di ANSYS Workbench. Una licenza ANSYS HPC Parametric Pack consente di valutare fino a 4 design simultaneamente, senza alcuna richiesta aggiuntiva di licenze applicativo (di fatto sono moltiplicate le licenze “base”). Tempo Number of Simultaneous Design Points Enabled 64 32 (esempio: 4 design points) Esecuzione sequenziale 16 Riduzione tempo di calcolo 8 Esecuzione in simultanea dp1 d dp2 dp3 dp4 4 1 2 3 4 5 12 Number of HPC Parametric Pack Licenses MIGLIORAMENTI DELLE PERFORMANCE HPC PER ANSYS CFD R17.0 13 Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example Case Details: • • • • • Application General flow Airfoil External Aerodynamic Flow 100 M hex elements Single Domain Turbulent Flow R17 vs. R15: >5X faster solution @ 2048 cores R17 vs. R16: Solution time reduced by up to 39% @ 4096 cores Scaling to 25K nodes/core Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example 32% faster! R17 vs. R16: 32% faster @ 4096 cores Case Details: Application Mesh motion • Automotive IC Engine Application • 146 M nodes (380M elements: tet/prism/pyramid) • Single Domain • Turbulent Flow Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example Case Details: Application Turbomachinery • Full Turbine • Steady (FR) • 13 M nodes (hex) • 256 cores Æ 50K nodes/core • 4 Domains • Casing, guide vanes, runner, draft tube • Turbulent Flow R17 vs. R16: Absolute 5-10% faster Minimal scaling change Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example Case Details: Application Turbomachinery • Full Turbine • Unsteady (TRS) • 13 M nodes (hex) • 256 cores Æ 50K nodes/core • 4 Domains • Casing, guide vanes, runner, draft tube • Turbulent Flow R17 vs. R16: Absolute 10-30% faster Speed-up @ 16 compute nodes 5.8X Æ 7X Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Features & Capabilities Application Turbomachinery Background: • Particular parallel performance issue on large partition counts Optimized source point performance • Improved efficiency with large numbers of source points "GaTurbineBlade" by Tomeasy - Own work by uploader; produced with Adobe illustrator. Licensed under CC BY-SA 3.0 via Commons https://commons.wikimedia.org/wiki/File:GaTurbineBlade.svg#/media/File:GaTurbineBla de.svg Test case showing reduction in total CPU time when using large numbers of source points (reduction of additional computational cost of source points by as much as 70%) Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Features & Capabilities Application Radiation Background: • Problems modeling collimated radiation such as headlights and solar irradiation use the Monte Carlo solver. This solver needs to take full advantage of HPC potential Enhanced Monte Carlo Radiation model • Optimized the model so that the total number of rays (histories) remains consistent, independent of the number of core partitions ANSYS Application Example Headlights, solar irradiation • 2-pectral bands (multiband) participating media; 5 radiation domains (2 fluid, 3 solid); 3.5 million elements of which 2.2 million radiation elements • Specified serial histories – 10 million Complex headlamp case with 10 million ray histories. Comparison when solving only radiation and energy Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Features & Capabilities I/O Background: • Time to read and write files to HPC for large and complex cases with many regions/face sets could significantly lengthen overall solution time Optimized HPC I/O speedup • • Optimization of CFX solver to HPC interface resulted in a substantial speed-up I/O time now nearly negligible even at 64 cores Reduction in wall clock seconds for I/O on an example test case with many regions Miglioramenti delle performance HPC per ANSYS Fluent 21 Improved Parallel Performance & Scaling – Fluent 17.0 Robustness ANSYS Features & Capabilities No reordering Not converged >200 iterations Background: • Fluent’s priority has been to deliver the best results, not the fastest convergence Conservative Coarsening Method default for Pressure-based Coupled Solver: • Especially helpful for native polyhedral meshes and/or highly stretched cells Algebraic multigrid solver now automatically reorders the linear system • Ensures proper ordering in multiple cell zones (was limited to within a single cell zone) RCM reordering Converged in 94 iterations Improved Parallel Performance & Scaling – Fluent 17.0 Partitioning Faster METIS partitioning: • Updated library and optimized algorithms deliver significant partitioning speed-up for many larger cases, particularly those with adapted meshes • 64-bit indexing in METIS and for partition storage to enable larger models • Future proofed: Tested up to 2 billion cells! Partition Time - Seconds ANSYS Features & Capabilities Combustor tor 830M 830M Cells Cells CRAY CRAY XE6 350 300 250 200 150 100 50 0 16.0.0 17.0.0 4096 141 111 8192 295 174 ANSYS Application Examples Combustor: • 40% faster to partition for 8192 cores • Less than 3 minutes Truck: • 99% faster to partition for 512 cores • Just 18 seconds (versus 36 minutes!!) Auto Partition time - Seconds Truck 134M Cells 2500,0 2000,0 1500,0 1000,0 500,0 0,0 102 204 4 8 16.0.0 923,1 2175, > 1 hour 17.0.0 18,2 15,8 18,5 27,4 256 512 409 6 51,7 Improved Parallel Performance & Scaling – Fluent 17.0 Partitioning ANSYS Features & Capabilities Background: • DPM and combustion models pose challenges to parallel performance as users attempt to loadbalance flow and physics calculations New Option: Model-Weighted Partitioning • Automatically weights multiple physics models across the full set of processors within a specified load imbalance tolerance • Users can select the factors and relative weightings • Turbulence, combustion, radiation, detailed kinetic mechanism (25 species, 113 reactions) • 60% faster for 128 cores (Just 82 seconds) 700 Time in Seconds ANSYS Application Example Oxy-Fuel Burner: Oxy-fuel Burner, 1.9M hex cells 600 500 400 300 200 100 0 32 64 128 256 512 1024 Default 647,26314,59203,16112,15 65,05 37,1 Load Balance 198,08150,59 82,03 61,76 34,29 22,33 Improved Parallel Performance & Scaling – Fluent 17.0 Partitioning ANSYS Features & Capabilities Background: • Partitions need to communicate with each other. Lack Exhaust 33M Neighborhood Creation of optimization can slow performance, especially for moving/dynamic mesh cases where the neighborhood needs to be updated frequently interface identification for better performance and completeness • Better identification of interfaces improves robustness ANSYS Application Example Exhaust System: • Speed-up from 1X to 30X depending on case and number of cores 160 140 Time in seconds Neighborhood Creation Optimization: • Optimized communication algorithms and improved 180 120 100 80 60 40 20 0 128 256 512 1024 2048 4096 8192 16.0.0 7,828 4,75 6,219 7,882 17,07 52,63 156,4 17.0.0 3,844 2,539 1,866 1,838 2,346 2,793 5,749 Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example Case Details: • • • • Application General flow External flow over a passenger sedan d Number of cells: 4 Million Cell Type: Mixed Models used: Standard K-HH turbulence General solver scalability improvements Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example Case Details: • • • • Application General flow Vehicle exhaust model d l Number of cells: 33 Million Cell Type: Mixed Models used: SST K-omega turbulence Optimized Neighborhood Creation Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric Improved Parallel Performance & Scaling – Fluent 17.0 Application Mesh motion Engine Crankcase Lubrication Model: • • • 85% faster run time (<6 hours) Faster than recent competitive benchmark Crankshaft Rotation in a sliding mesh zone, Piston motion through dynamic mesh layering, Oil slosh modeled with VOF, 5M cell Poly Mesh Engine Crankcase kca ase Lubrication Lubrication Model Total Run Time per One Cycle Total Run Time (hrs) ANSYS Application Example 20 15 10 5 0 16.0.0 17.0.0 48 18 13,85 96 14 9,26 Big speed-ups for moving dynamic mesh due to: • Neighborhood optimization • Sliding interface optimization • Parallel solver optimization Representative Illustration 192 10,83 5,86 Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example Case Details: Application Mesh motion, combustion • 4-stroke spray guided Gasoline Direct • • • • Injection Number of cells: 2 Million Cell Type: Mixed Models used: Standard K-HH turbulence Moving mesh, Spray, Combustion Big speed-ups for moving dynamic mesh due to: • Neighborhood optimization • Sliding interface optimization • Parallel solver optimization • Combustion code refactoring Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example Application Multiphase Case Details: • • • • Circulating Fluidized dB Bed d Number of cells: 2 Million Cell Type: Mixed Models used: Laminar Solid inlet Gas inlet General solver scalability improvements Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example Case Details: • • • • Application Multiphase Wave loading on Oil il Ri Rig Number of cells: 7 Million Cell Type: Mixed Models used: SST K-omega turbulence General solver scalability improvements Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example Case Details: • • • • • Application Combustion Flow through a Combustor b t Number of cells: 12 Million Cell Type: Polyhedra Models used: Realizable K-HH turbulence Species transport Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example Case Details: • • • • Application Aeroacoustics External flow over aircraft i ft landing l di gear Number of cells: 15 Million Cell Type: Mixed Models used: LES General solver scalability improvements Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example Case Details: Application Turbomachinery • Single-stage Transonic i axial-flow i l fl FFan • • • • General solver scalability improvements Stator Row Number of cells: 3 Million Cell Type: Hexahedral Models used: SST K-omega turbulence Unsteady (sliding interfaces) Ref: NASA-103800 Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example Case Details: Application Turbomachinery, multiphase • Cavity flow in a centrifugal pump • Number of cells: 2 Million • Model used: Realizable K-HH turbulence General solver scalability improvements Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric Optimized for the Latest HPC Architectures – Fluent 17.0 ANSYS Application Example Case Details: • 1.2 million cell pipe benchmark Hardware Configuration: • One node of XL250Gen9s with E5-2690v3, 128GBs 2133MHz memory and 2 NVIDIA K80s GPU Optimized for the Latest HPC Architectures – Fluent 17.0 ANSYS Application Example GPU Case Details: • 9.6 million cell pipe benchmark Hardware Configuration: • Cluster of XL250Gen9s with E5-2690v3, 128GBs 2133MHz memory and 2 NVIDIA K80s/node
© Copyright 2025 Paperzz