Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004 Overview • The HPC Challenge Benchmark was announced last year at SuperComputing’2003 • The HPC Challenge Benchmark consists of – – – – – – LINPACK (HPL) STREAM PTRANS (transposing the array used by HPL) RandomAccess (random read/modify/write) FFT and some low-level MPI latency & BW measurements • No single figure of merit is defined Overview (continued) • Q: What is a “balanced” system? • My answer: “A balanced system is one for which the primary applications are limited in performance by the most expensive component(s) of the system.” The Two Questions • We need to understand both performance and cost in the context of low-level component metrics • Performance – What performance model? – Use a harmonically weighted, time-based model • Cost – What cost model? – Simple linear additive cost model Performance Model • Composite Figures of Merit for Performance must be based on “time” rather than “rate” – i.e., weighted harmonic means of rates • Why? – Combining “rates” in any other way fails to have a “Law of Diminishing Returns” • Time = Work/Rate • Repeat for each component: Ti = Wi/Ri A Simple Composite Model • Assume the time to solution is composed of a compute time proportional to peak GFLOPS plus a memory transfer time proportional to sustained memory bandwidth • Assume “x Bytes/FLOP” to get: " Balanced GFLOPS" 1 " Effective FP op" 1 FP op x Bytes Peak GFLOPS Sustained GB/s • Target SPECfp_rate2000 as the workload Does Peak GFLOPS predict SPECfp_rate2000? SPECfp_rate2000 vs Peak MFLOPS 8000 7000 Peak MFLOPS 6000 5000 4000 3000 2000 1000 0 0.00 5.00 10.00 15.00 SPECfp_rate2000/cpu 20.00 25.00 30.00 Does Sustained Memory Bandwidth predict SPECfp_rate2000? SPECfp_rate2000 vs Sustained BW 7.000 6.000 GB/s per CPU 5.000 4.000 3.000 2.000 1.000 0.000 0.00 5.00 10.00 15.00 SPECfp_rate2000/cpu 20.00 25.00 30.00 Optimized Model Results • Results rounded to nearby round values: – – – – – Bytes/FLOP for large caches === 0.16 Bytes/FLOP for small caches === 0.80 Size of asymptotically large cache === ~12 MB Coefficient of best fit === ~6.4 The units of the coefficient are SPECfp_rate2000 / Effective GFLOPS Does this Revised Metric predict SPECfp_rate2000? Optimized SPECfp_rate2000 Estimates 30.00 Estimated Rate/cpu 25.00 20.00 15.00 10.00 5.00 0.00 0.00 5.00 10.00 15.00 SPECfp_rate2000/cpu 20.00 25.00 30.00 Cost Model • Assume simple linear additive model – FLOPS cost some amount – Sustained BW costs a different amount – Define: System characteristics b = Rmem / Rcpu g = Wmem / Wcpu Application characteristics d = ($/BW) / ($/FLOPS) Technology characteristics How to Optimize? • For a given application g (Wmem/Wcpu), what is the optimum system balance b? • Many people seem to believe that the system should be “balanced” to the application: – boptimal = g i.e., – Rmem/Rcpu = Wmem/Wcpu • This does not optimize price/performance The Correct Optimization • This is actually an easy optimization problem • Minimize cost/performance – Same as minimizing cost * time • Optimum cost/performance occurs at – b = sqrt(g/d) • Definitely not intuitive! Example: High BW, expensive BW gamma = 3, delta = 3 3.000 g = 3 relatively high BW d = 3 relatively expensive BW relative price/performance 2.500 2.000 Optimum price/performance is at b=1, not b=3 Improvement in price/performance is ~30% 1.500 1.000 0.500 0.000 0.10 1.00 beta 10.00 High BW, very expensive memory gamma = 3, delta = 10 3.500 3.000 relative price/performance 2.500 g = 3 relatively high BW d = 10 very expensive BW Optimum price/performance is at b=0.58, not b=3 Improvement in price/performance is ~50% 2.000 1.500 1.000 0.500 0.000 0.10 1.00 beta 10.00 Low-BW, expensive BW gamma = 0.1, delta = 3 14.000 12.000 relative price/performance 10.000 g = 0.1 low BW application d = 3 moderately expensive BW Optimum price/performance is at b=0.18, not b=0.1 Improvement in price/performance is ~5% 8.000 6.000 More BW helps here even though it is expensive, because the application does not need much. 4.000 2.000 0.000 0.10 1.00 beta 10.00 Medium BW, expensive BW gamma = 1, delta = 3 4.500 4.000 relative price/performance 3.500 3.000 2.500 g = 1 modest BW d = 3 moderately expensive BW Optimum price/performance is at b=0.58, not b=1 Improvement in price/performance is ~10% 2.000 1.500 1.000 0.500 0.000 0.10 1.00 beta 10.00 Summary • • • • Balance is important to cost/performance You must understand performance You must understand cost Optimum cost-performance is not intuitive!
© Copyright 2026 Paperzz